# Capstone Project: Criminal Case Database

### Overall Contents:
- [Background](#1.-Background) **(In this notebook)**
- [Webscraping](#2.-Webscraping) **(In this notebook)**
- Exploratory Data Analysis
- Modeling 1 Logistic Regression
- Modeling 2 k-Nearest Neighbours
- Modeling 3 Random Forest
- Evaluation
- Conclusion and Recommendation

## 1. Background

Singapore uses the Common Law legal system, where there is an importance of judicial precedents. This means that judges decide cases based on past decisions of the courts. The decisions of higher courts such as the Supreme Court are binding on the lower courts.
Further to past decisions, in Criminal Law, there is a Penal Code and Criminal Procedure Code which creates a statutory framework for investigation, trials, and sentencing in Criminal Law Cases.  

The start of legal research tends to be a slow, manual, and inefficient process. Given the facts of the case at hand, the lawyer first analyzes and determines the relevant area of law to start the research.  
According to a survey done by the ALL-SIS Task Force on Identifying Skills and Knowledge for Legal Practice in 2013, more than half the respondents frequently started their legal research by either looking through statutes or through a case law database, while slightly more than a third would frequently start their research through consulting a subject-specific guide.[1]  
In the current state of the industry, this starting point can take a long time as the statutes and subject-specific guides tend to be wordy, and the case law databases contain many judgments which require further inspection to narrow down according to the case at hand.  


### 1.1 Datasets

The datasets that I will use will be created by myself through the use of webscraping from legal websites such as LawNet, which contains the judgments, and the Singapore Statutes Online webpage, which contains digital copies of the statutes of Singapore.  

The datasets create are as followed:-

* subordinatecourt.csv
* subordinatecourt_compiled.csv 
* statecourt.csv 
* statecourt_compiled.csv
* statutes_crimes.csv
* database.csv
* database_temp.csv

## 2. Judgment Processing

### 2.1 Libraries Import

In [5]:
# Import libaries
import numpy as np
import pandas as pd
from court import Court, Database

### 2.2 Data Import

In [6]:
database = Database()

In [7]:
database.create_database('supreme')

Current progress: DataFrame created.
Current progress: Completed judgment processing and export.


In [8]:
database.create_database('subordinate')

Current progress: DataFrame created.
Current progress: Completed judgment processing and export.


In [9]:
database.database_df.head()

Unnamed: 0,aggravated_discussed,case_name,citations,court,decision date,link,mitigation_discussed,offences,statutes,tribunal/court
0,0,Chander Kumar a/l Jayagaran v Public Prosecuto...,"Perumal v Public Prosecutor and another,Syed S...",supreme,18 January 2021,https://www.lawnet.sg/lawnet/web/lawnet/free-r...,1,Unsure,"394 Criminal Procedure Code,8 Criminal Procedu...",Court of Appeal
1,0,Public Prosecutor v Teo Ghim Heng [2021] SGHC 13,"Public Prosecutor v BNO,and Osman bin Ali v Pu...",supreme,22 January 2021,https://www.lawnet.sg/lawnet/web/lawnet/free-r...,0,Unsure,"304 Criminal Procedure Code,304 The Criminal P...",General Division of the High Court
2,1,GCM v Public Prosecutor and another appeal [20...,"Public Prosecutor v GCM,AQW v Public Prosecuto...",supreme,25 January 2021,https://www.lawnet.sg/lawnet/web/lawnet/free-r...,1,Unsure,376 Criminal Law Reform Act,High Court
3,1,Public Prosecutor v Salzawiyah bte Latib and o...,"Joseph v Public Prosecutor,Public Prosecutor v...",supreme,26 January 2021,https://www.lawnet.sg/lawnet/web/lawnet/free-r...,1,,,General Division of the High Court
4,0,Public Prosecutor v Salzawiyah bte Latib and o...,"Chai Chien Wei Kelvin v Public Prosecutor,Publ...",supreme,26 January 2021,https://www.lawnet.sg/lawnet/web/lawnet/free-r...,0,Unsure,"2 Misuse of Drugs Act,2 Penal Code,2 Criminal ...",General Division of the High Court


There are missing values in certain columns such as citations, offences, and statutes.  

This suggests that the use of rule-based processing is not perfect, and leads to instances where the details are not extracted from the judgments.  

However, I am not dropping these columns as they can still be found via their name.

## 2.4. Summary

**Summary**

**For train_df, test_df, weather_df and spray_df:**
* The column names have been changed to lower case.
* The date dtype will be converted to datetime in exploratory data analysis section.

**For train_df and test_df:**
* There are no missing values. The selected columns have been dropped and the column values have been changed to lower case.

**For weather_df:**
* The missing values are indicated as 'M' and '-'.
* The missing values in tavg, heat and cool columns have been calculated.
* The water1, depart, depth, snowfall, sunset, sunrise columns have been removed as majority has missing values or these columns will not be used in our analysis.
* The missing values in sealevel, stnpressure, wetbulb, avgspeed and preciptotal have been removed.
* The trace value in preciptotal in has been converted to 0.00.
* The numerical columns have been converted to int/float dtype.

**For spray_df:**
* The time column will not be used in our analysis and has been removed.
* There are spray locations that are beyond the trap locations and have been removed.

## Exporting Data

In [None]:
# # Placed the # to refrain from executing
#train_df.to_csv("../data/train_df_clean.csv", index = False)
#test_df.to_csv("../data/test_df_clean.csv", index = False)
#weather_df.to_csv("../data/weather_df_clean.csv", index = False)
#spray_df.to_csv("../data/spray_df_clean.csv", index = False)

## References

[1] "A Study of Attorneys' Legal Research Practices and Opinions of New Associates' Research Skills," *ALL-SIS Task Force on Identifying Skills and Knowledge for Legal Practice*, June 2013. [Online]. Available: [https://www.aallnet.org/allsis/wp-content/uploads/sites/4/2018/01/final_report_07102013.pdf](https://www.aallnet.org/allsis/wp-content/uploads/sites/4/2018/01/final_report_07102013.pdf) [Accessed: May. 6, 2021].