# Capstone Project: Criminal Case Database

## Problem Statement

Given the current inefficient processes for beginning legal research, this project seeks to create a proof-of-concept on the creation of an Information Retrieval system in the form of a database of Criminal Law Cases which includes statistical summaries, and legal citations to improve on the speed and efficiency of legal research.  

This project will use Natural Language Processing (NLP) to capture the relevant details of each judgment to create a database which can be filtered to provide not only the cases and other relevant cases, but also a statistical summary of the other cases cited in the judgment and other factors such as mitigation or aggravating factors in order to allow faster research which is also more data driven.  

The project is intended to be a proof of concept on using data science to increase the efficiency of legal research through the use of statistical summaries and data analysis.

### Overall Contents:
- [Background](#1.-Background) **(In this notebook)**
- [Webscraping Lawnet](#2.-Webscraping-Lawnet) **(In this notebook)**
- [Webscraping Singapore Statutes](#3.-Webscraping-Singapore-Statutes) **(In this notebook)**
- Natural Language Processing
- Search Function
- Flask and Google App Engine
- Conclusion and Recommendation

## 1. Background

Singapore uses the Common Law legal system, where there is an importance of judicial precedents. This means that judges decide cases based on past decisions of the courts. The decisions of higher courts such as the Supreme Court are binding on the lower courts.
Further to past decisions, in Criminal Law, there is a Penal Code and Criminal Procedure Code which creates a statutory framework for investigation, trials, and sentencing in Criminal Law Cases.  

The start of legal research tends to be a slow, manual, and inefficient process. Given the facts of the case at hand, the lawyer first analyzes and determines the relevant area of law to start the research.  
According to a survey done by the ALL-SIS Task Force on Identifying Skills and Knowledge for Legal Practice in 2013, more than half the respondents frequently started their legal research by either looking through statutes or through a case law database, while slightly more than a third would frequently start their research through consulting a subject-specific guide.[1]  
In the current state of the industry, this starting point can take a long time as the statutes and subject-specific guides tend to be wordy, and the case law databases contain many judgments which require further inspection to narrow down according to the case at hand.  


### 1.1 Datasets

The datasets that I will use will be created by myself through the use of webscraping from legal websites such as LawNet, which contains the judgments, and the Singapore Statutes Online webpage, which contains digital copies of the statutes of Singapore.  

The datasets that I will use are as follows:- 

* statutes.csv  

The datasets that I will create are as follows:-

* subordinatecourt.csv
* subordinatecourt_compiled.csv 
* statecourt.csv 
* statecourt_compiled.csv
* statutes_crimes.csv  

The information extracted from webscraping Lawnet will be presented in a database in the following format:  

|Name|Type|Dataset|Description|
|:---|:---|:---|:---|
|**court**|*object*|subordinatecourt.csv, subordinatecourt_compiled.csv, statecourt.csv, statecourt_compiled.csv|Court instance|
|**date**|*datetime*|subordinatecourt.csv, subordinatecourt_compiled.csv, statecourt.csv, statecourt_compiled.csv|Decision date of the judgment|
|**title**|*object*|subordinatecourt.csv, subordinatecourt_compiled.csv, statecourt.csv, statecourt_compiled.csv|Case name|
|**link**|*url*|subordinatecourt.csv, subordinatecourt_compiled.csv, statecourt.csv, statecourt_compiled.csv|url of the judgment|


The information extracted from webscraping statutes will be presented in a database in the following format:  

|Name|Type|Dataset|Description|
|:---|:---|:---|:---|
|**section**|*object*|statutes_crimes.csv|The statute section|
|**title**|*object*|statutes_crimes.csv|The statute title|
|**link**|*url*|statutes_crimes.csv|url of the statute section|
|**statute**|*object*|statutes_crimes.csv|Name of the Statute|
|**section_statute**|*object*|statutes_crimes.csv|Combination of `section` and `statute`|





## 2. Webscraping Lawnet

For the webscraping of judgments from LawNet, I will be using a custom class `Court` which has methods and functions created to do the following:  
1. Identify the last page of the website as there are multiple pages of entries
2. Create a dataframe of all cases which belong to criminal law, including their links
3. Identify new cases which are not in my database
4. Update the database
5. Save each judgment as a .html file (as there is a 3 month free resource limit, this is to enable myself to have more data for the project.)
  
  
![lawnet sample](../images/lawnet_sample.png)  

A sample of the Lawnet website[2]. The red box shows an example of a criminal case, while the blue box shows the page numbers



### 2.1 Court class creation  

Going through the custom class, there are two main parts to it:  
1. Creating a database of criminal cases found on Lawnet  
2. Archiving the html file for each new criminal case in the database  

The first step is to create the class Court, and initialize it.  

<details> 
    <summary> <b> Click here for code </b></summary>
    
```python
# Create the Court class
class Court:

    def __init__(self, name):
        """
        Create a court. Only accepts "subordinate" and "supreme".
        """
        # Set the name of the instance
        self.name = name
        
        # Raise an error if the user tries to create an instance of a court that does not exist
        if self.name != 'subordinate' and self.name != 'supreme':
            raise CourtNameError("There is only the Subordinate (State) or Supreme Court!")
```  
</details>  

An error is raised if any instance other than 'subordinate' or 'supreme' is created as there are only two court categories on Lawnet.

### 2.2 Creating a database of criminal cases found on Lawnet  

#### 2.2.1 Lawnet information
Next, the class needs a few key pieces of information before it can iterate through and scrape the Lawnet website for new judgments:  
* The target url for requests  
* The number of pages to iterate through  

Hence, I created two private methods which capture these pieces of information.  
  
<details> 
    <summary> <b> Click here for code </b></summary>
    
```python
    def __set_soup(self):
        """
        Sets the target Court's Lawnet page using `name`.
        """
        # Set the url for API requests
        self.url = "https://www.lawnet.sg/lawnet/web/lawnet/free-resources?p_p_id=freeresources_WAR_lawnet3baseportlet&p_p_lifecycle=0&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_pos=2&p_p_col_count=3&_freeresources_WAR_lawnet3baseportlet_action="+self.name
        
        # Set the variables for BeautifulSoup parsing
        self.response = requests.get(self.url)
        self.html = self.response.text
        self.court = BeautifulSoup(self.html, 'lxml')

    def __get_num_pages(self):
        """
        Fetches number of pages from the website as an int
        """
        # Check for the last page from the html code
        self.page = str(self.court.find_all('li', {'class': 'lastPageActive'}))
        self.page = ''.join(filter(str.isdigit, self.page))
        
        # Replace the first "3" as it comes from the url
        self.last_page = int(self.page.replace("3","",1))
``` 
</details>  

The first method lets the class know what the url for the particular court instance is, and the second lets the class know what the last page of entries is.

#### 2.2.2 Pulling urls  

With the key information set, now I created a code to crawl through the Lawnet page to pull and create a dataframe of new judgments and their respective links.  

This private method `__fetch_urls` first creates a variable for the base domain for all lawnet judgments, then iterates through the pages until it completes the last page which was previously identified, capturing the court category as `court`, decision date as `date`, case name as `title`, and judgment link as `link` for each entry in a dictionary.  

Each dictionary is then appended to a list which is finally converted to a pandas dataframe which is sorted by date, with the newest entries at the bottom.
<details>
    <summary> <b> Click here for code (warning: long code) </b></summary>  
    
```python
    def __fetch_urls(self):
        """
        Uses `name` to scrape lawnet for the cases and urls and store it in a dataframe `court_df`.
        """
        # Create an empty list for results
        self.results_list = []
        
        # Set the base domain for the urls
        self.domain = "https://www.lawnet.sg/lawnet/web/lawnet/free-resources?p_p_id=freeresources_WAR_lawnet3baseportlet&p_p_lifecycle=1&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_pos=2&p_p_col_count=3&_freeresources_WAR_lawnet3baseportlet_action=openContentPage&_freeresources_WAR_lawnet3baseportlet_docId="
        
        # Create counter for current page
        self.current_page = 1
        
        # Loop while it is not the last page
        while self.current_page <= self.last_page:
            
            # Set the full url for the current page
            url1 = self.url+"&_freeresources_WAR_lawnet3baseportlet_page="+str(self.current_page)
            
            # Establishing the connection to the web page:
            response1 = requests.get(url1)
            
            # Pull the HTML string out of requests and convert it to a Python string.
            html1 = response1.text
            court1 = BeautifulSoup(html1, 'lxml')
            
            # Get the relevant elements (date, name, link)
            search_results = court1.find_all('ul', {'class': 'searchResultsHolder'})
            
            # Iterate throught the search results to find the list elements
            for li in search_results:
                li_list = li.find_all('li')
                for element in li_list:
                   # start a dictionary to store this item's data
                    result = {}
                    
                    # Set the court
                    result['court'] = self.name
                    
                    # get the date
                    result['date'] = element.find('p', {'class': 'resultsDate'}).text
                    
                    # get the title and full link/url
                    a_href = element.find('a')
                    if a_href:
                        result['title'] = a_href.text.strip()   # element text
                        link = str(a_href['href']) # href link
                        
                        # Remove unnecessary parts of the link
                        link = link.replace("javascript:viewContent","")
                        link = link.strip("')(")
                        
                        # Add the result including the domain
                        result['link'] = self.domain+link
                        
                        # only store "full" rows of data
                        if len(result) == 4:
                            self.results_list.append(result)
                            
                    # Clear cell output
                    clear_output(wait=True)

                    # Print current progress
                    print(f'Current progress: page {self.current_page}/{self.last_page}.')

            # Raise page counter for the loop to pull results from the next page
            self.current_page += 1
            
        # Create a dataframe with all the links, sorted by date
        self.court_df = pd.DataFrame(self.results_list)
        self.court_df = self.court_df.sort_values(by='date')
        self.court_df = self.court_df.reset_index(drop=True)
        
        # Print current progress
        print(f'Current progress: DataFrame created.')
```
</details>

#### 2.2.2.3 Filtering to only criminal cases  

With this dataframe created, the results are filtered to those where "Public Prosecutor" appears in the `title` as all criminal cases will have this pattern in it.  

<details>
    <summary> <b> Click here for code </b></summary>
    
```python
   
    def __only_crim(self):
        """
        Narrows results to only Criminal cases
        """
        # Filters to cases where "Public Prosecutor" is mentioned - these are criminal cases
        self.court_df = self.court_df[self.court_df['title'].str.contains("Public Prosecutor")]
        self.court_df = self.court_df.reset_index(drop=True)

        # Print current progress
        print(f'Current progress: Narrowed to only Criminal cases.')
    
``` 
</details>  

#### 2.2.2.4 Comparison against existing database  

The dataframe is then compared against the existing database to identify new entries and merge them into the existing database.  

This step has two purposes:  
1. I do not want duplicate entries in the database  
2. For the archival step, we only want to archive new judgments which have not been previously archived  

Once the comparison is done, the database is sorted by date, and both the new entries and updated database are saved in separate .csv files.  

<details>
    <summary> <b> Click here for comparison code </b></summary>
    
```python
    def __compare_csv(self):
        """
        Loads the complete csv database and compares data with the dataframe to identify new entries.
        Returns new entries as `court_df` and all entries as `court_full`.
        """
        # Convert dates to datetime
        self.court_df['date'] = pd.to_datetime(self.court_df['date'], dayfirst=True)
        
        # Load the full dataset .csv and convert dates to datetime
        self.court_full = pd.read_csv(f'../data/{self.name}court_compiled.csv')
        self.court_full['date'] = pd.to_datetime(self.court_full['date'], format='%Y-%m-%d')
        
        # Filter to entries which are not in the full dataset
        self.court_df = self.court_df[~self.court_df['link'].isin(self.court_full['link'])]
        
        # Merge new entries to the full dataset, sorted by date
        self.court_full = self.court_full.merge(self.court_df, how='outer')
        self.court_full = self.court_full.sort_values(by='date')
        self.court_full = self.court_full.reset_index(drop=True)


        # Print current progress
        print(f'Current progress: {len(self.court_df.date)} New entries identified and saved.')   
    
``` 
</details>  

<details>
    <summary> <b> Click here for exporting code </b></summary>
    
```python
    def __export_csv(self):
        """
        Exports dataframes to the respective .csv files
        """
        # Save the new entries in a .csv
        self.court_df.to_csv(path_or_buf=f'../data/{self.name}court.csv', index=False)
        
        # Save the new full dataset in a .csv
        self.court_full.to_csv(path_or_buf=f'../data/{self.name}court_compiled.csv', index=False)
   
    
``` 
</details>  

#### 2.2.2.5  Public method to pull the urls  

The above methods were all private methods. To actually pull the urls from Lawnet, I created a method containing a pipeline that joins together the above steps.  

This method also creates or updates a log of when I last pulled urls from Lawnet.  

<details>
    <summary> <b> Click here for code </b></summary>
    
```python
    def pull_urls(self):
        """
        Call command to pull urls and export to csv database
        """
        # Call the functions required to pull the urls from Lawnet
        self.__set_soup()
        self.__get_num_pages()
        self.__fetch_urls()
        self.__only_crim()
        self.__compare_csv()
        self.__export_csv()
        
        # Write a log of the last pull date
        self.file = open('../logs/pull_log.txt', 'a', encoding='utf_8')
        self.file.write(f'list last updated on: {datetime.today()}; \n')
        
        # Print current progress
        print(f'Current progress: Completed url pull and export.')   
    
``` 
</details>  

Hence, with the above methods, I was able to scrape and create a database of judgments from Lawnet which can be updated with no duplicates by running it again.

### 2.3 Archiving the judgments from Lawnet  

#### 2.3.1 Loading the database  

For the archival of judgments from Lawnet, I first created a method to load the database .csv files as pandas dataframes.  
I did this as I wanted to prevent any clashes between any dataframes which may be stored in memory and the actual database which has been saved.  

The date column of the dataframe is converted to datetime format to prevent any potential errors in sorting.  

At the same time, this method also creates a dictionary of the `links` column in the database which is compared to the `links` of the new entries to identify and assign them a `key1`, which is their index in the database, which will be used for the export file name.  

<details>
    <summary> <b> Click here for code </b></summary>
    
```python
    def load_csv(self):
        """
        Loads .csvs using `name` as lists of dictionaries.
        """
        # Load the .csv files as pandas dataframes
        self.court_link_list = [pd.read_csv(f'../data/{self.name}court.csv').link.to_dict()]
        self.court_link_dict = pd.read_csv(f'../data/{self.name}court_compiled.csv').link.to_dict()
        self.court_full = pd.read_csv(f'../data/{self.name}court_compiled.csv')
        self.court_full['date'] = pd.to_datetime(self.court_full['date'], format='%Y-%m-%d')   
    
``` 
</details>  

This is not a private method as I want to be able to load the database at any time without having to archive the files.

#### 2.3.2 Saving the judgments as .html  

For the actual archival, I first set a base file name for each file which consists of the `court` name of the instance and the string `court`.  

Next, the method iterates through each link from the new entries dataset to send a request to the url, returning a BeautifulSoup parsed response.  

This response is then saved as a .html file in the judgments folder, with the file name according to their assigned key1.  

To avoid overloading the servers, a random sleep timer is set for between 1 and 4 seconds between each archival.  

If there are any errors, the archival for the judgment is skipped and an error log is created.  


<details>
    <summary> <b> Click here for code </b></summary>
    
```python
    def __save_html(self):
        """
        Iterates through the lists of dictionaries to save judgments from Lawnet as a .html file.
        """
        # Create a counter for the progress bar
        self.count = 1

        # Sets the file name for the judgment
        self.file_name = self.name+'court'
            
        # Iterate through the list of new link entries
        for item in self.court_link_list:
            for key, value in item.items():            
                # Sets the request for the link
                url = value
                response = requests.get(url)
                
                # Pull the HTML string out of requests and convert it to a Python string.
                html = response.text
                soup = BeautifulSoup(html, 'lxml')
                
                # Match value to key in full list for file name
                key1 = list(self.court_link_dict.keys())[list(self.court_link_dict.values()).index(value)]
                try:                    
                    # Try to save as html file
                    file = open(f'../judgments/{self.name}_court/{self.file_name}_{key1}.html', 'x', encoding='utf_8')
                    file.write(str(soup))
                    file.close
                    sleep(randint(1,4))

                    # Clear cell output
                    clear_output(wait=True)

                    # Print current progress
                    print(f'Current progress: {self.count}/{len(self.court_link_list[0])}.')
                    self.count += 1

                except:
                    # Skip if error, print an error log with index
                    file = open('../logs/error_log.txt', 'a', encoding='utf_8')
                    file.write(f'{self.file_name}_list error: {key1}.html : {datetime.today()}; \n')
                    
                    # Clear cell output
                    clear_output(wait=True)

                    # Print current progress
                    print(f'Current progress: {self.count}/{len(self.court_link_list[0])}.')
                    self.count += 1   
    
``` 
</details>  

#### 2.3.3 Public method to archive the judgments  

Similar to the pulling of cases from Lawnet, I created a pipeline method which calls the functions in order to archive the new entries from Lawnet.  

<details>
    <summary> <b> Click here for code </b></summary>
    
```python
    def archive(self):
        """
        Call command to archive the urls as .html files.
        """
        # Call the functions required to archive the new html files from Lawnet
        self.load_csv()
        self.__save_html()
        
        # Update the log file for the last archival date
        self.file = open('../logs/archival_log.txt', 'a', encoding='utf_8')
        self.file.write(f'last archived on: {datetime.today()}; \n')
        
        # Print current progress
        print(f'Current progress: {len(self.court_link_list[0])} HTMLs archived.')   
    
``` 
</details>  

With the methods for pulling the urls and archiving the files split, I am able to perform each task separately in case I do not want to perform it all at once, or in case any errors occur.

### 2.4 Testing the custom class  

#### 2.4.1 Libraries Import  

I will import the custom class `Court` and pandas to explore the results of the code.

In [1]:
# Import libaries
import pandas as pd
from criminalcasedatabase import Court

### 2.5 Create an instance for the Supreme Court and pull the urls

In [2]:
supreme_court = Court('supreme')
supreme_court.pull_urls()

Current progress: page 6/6.
Current progress: DataFrame created.
Current progress: Narrowed to only Criminal cases.
Current progress: 1 New entries identified and saved.
Current progress: Completed url pull and export.


Exploring the new entries dataframe:

In [3]:
supreme_court.court_df.head()

Unnamed: 0,court,date,title,link
1,supreme,2021-06-03,Mohammad Farid bin Batra v Public Prosecutor -...,https://www.lawnet.sg/lawnet/web/lawnet/free-r...


Exploring the full database:

In [4]:
supreme_court.court_full.tail()

Unnamed: 0,court,date,title,link
41,supreme,2021-05-14,Syed Suhail bin Syed Zin v Public Prosecutor -...,https://www.lawnet.sg/lawnet/web/lawnet/free-r...
42,supreme,2021-05-18,Tan Kok Meng v Public Prosecutor - [2021] SGCA 55,https://www.lawnet.sg/lawnet/web/lawnet/free-r...
43,supreme,2021-05-19,Public Prosecutor v Mangalagiri Dhruva Kumar -...,https://www.lawnet.sg/lawnet/web/lawnet/free-r...
44,supreme,2021-06-02,Sulaiman bin Mohd Hassan v Public Prosecutor -...,https://www.lawnet.sg/lawnet/web/lawnet/free-r...
45,supreme,2021-06-03,Mohammad Farid bin Batra v Public Prosecutor -...,https://www.lawnet.sg/lawnet/web/lawnet/free-r...


The newest entries are stored at the bottom to prevent the overwriting of keys during judgment archival.  

If the newest entries are at the top, the index of all the rows will keep changing.

### 2.6 Archive the htmls

In [5]:
supreme_court.archive()

Current progress: 1/1.
Current progress: 1 HTMLs archived.


The code ran fine and the html has now been archived in its respective project subfolder.  

![archived](../images/archived.png)

### 2.7 Create an instance for the Subordinate Courts and pull the urls

Repeating the steps for the other court:

In [6]:
state_court = Court('subordinate')
state_court.pull_urls()

Current progress: page 4/4.
Current progress: DataFrame created.
Current progress: Narrowed to only Criminal cases.
Current progress: 0 New entries identified and saved.
Current progress: Completed url pull and export.


### 2.8 Archive the htmls

In [7]:
state_court.archive()

Current progress: 0 HTMLs archived.


The code has worked without any errors, and I now have the judgments archived as .html files for processing in the next notebook.

## 3. Webscraping Singapore Statutes

Criminal offences in Singapore are not listed in a central database or statute, but found within many different statutes. Given the vast number of statutes where criminal offences can be found, I will limit the scope to the some of the statutes in Singapore based on criminal offences as listed in [Gloria James-Civetta & Co's article on Singapore Crimes and Punishment](https://www.singaporecriminallawyer.com/crimes-punishment/)[3].

Due to each statute being different and full of nuances, I will manually create databases of crimes from each of the statutes which are available from [Singapore Statutes Online](sso.agc.gov.sg).  

This database of possible criminal offences will be used for the information retrieval as one of the items for the NLP identification is the criminal offence in each judgment.

### Additional Libraries Import

For the webscraping of Singapore Statutes, I will use `requests`, `BeautifulSoup`, and `re`.

In [7]:
# Imports

import requests
from bs4 import BeautifulSoup
import re

### Statutes 

The following are the Singapore Statutes which I will be using and their links, manually compiled into a .csv file:

In [9]:
statutes = pd.read_csv('../data/statutes.csv')

In [10]:
statutes

Unnamed: 0,statute,link
0,Misuse of Drugs Act,https://sso.agc.gov.sg/Act/MDA1973
1,Sedition Act,https://sso.agc.gov.sg/Act/SA1948
2,Vandalism Act,https://sso.agc.gov.sg/Act/VA1966
3,Children and Young Persons Act,https://sso.agc.gov.sg/Act/CYPA1993
4,Immigration Act,https://sso.agc.gov.sg/Act/IA1959
5,Income Tax Act,https://sso.agc.gov.sg/Act/ITA1947
6,Companies Act,https://sso.agc.gov.sg/Act/CoA1967
7,Prevention of Corruption Act,https://sso.agc.gov.sg/Act/PCA1960
8,Customs Act,https://sso.agc.gov.sg/Act/CA1960
9,Road Traffic Act,https://sso.agc.gov.sg/Act/RTA1961


# 3.1 Scraping code creation

Starting with the Penal Code, I will identify the relevant data from the html in order to create a database containing the section, crime name, and link in order to create a database.  
This will then be used to create a function that can be run on the other statutes.

## 3.1.1 Penal Code
### Fetch the Penal Code content by URL.


In [8]:
# Target Penal Code Statute page:
url = statutes[statutes['statute'] == 'Penal Code'].link.values[0]

headers = requests.utils.default_headers()
headers.update(
    {
        'User-Agent': 'Mozilla/5.0',
    }
)

# Establishing the connection to the web page:
response = requests.get(url, headers=headers)

# You can use status codes to understand how the target server responds to your request.
# Ex., 200 = OK, 400 = Bad Request, 403 = Forbidden, 404 = Not Found.
print(response.status_code)

# Pull the HTML string out of requests and convert it to a Python string.
html = response.text

200


### Parse the HTML document with Beautiful Soup.

This step allows me to access the elements of the webpage.

In [9]:
penal_code = BeautifulSoup(html, 'lxml')

In [13]:
# This code collects the sections and links from the statute and puts it into a database 

# List to store results
results_list = []

# Get the relevant elements (name, link) from all 'ul' with class 'nav' (exact match)
search_results = penal_code.find_all(lambda tag: tag.name == 'ul' and 
                                   tag.get('class') == ['nav'])
# Iterate through the unordered lists
for ul in search_results:
    # Find the lists within
    li_list = ul.find_all('li')
    # Iterate through the lists to get each section
    for element in li_list:
        # start a dictionary to store this item's data
        result = {}
        # get the title and full link/url
        a_href = element.find('a')
        # Skip section if it is repealed
        if 'Repealed' in a_href.text.strip():
            continue
        # Split text to section and title
        else:
            text = a_href.text.strip().split(' ', 1)
            result['section'] = text[0]
            result['title'] = text[1]   # element text
        # Add the link
        result['link'] = url+str(a_href['href']) # href link
        result['statute'] = 'Penal Code'
        results_list.append(result)

penal_code_df = pd.DataFrame(results_list)

### Narrow to sections containing offences

Currently, this has to be done manually based on my domain knowledge. Within the scope of the project, I will not be creating an NLP algorithm to automate this due to a lack of time, but this should be improved on in the future through machine learning.

In [14]:
# Limit the sections to those containing offences (crimes)
penal_code_df = penal_code_df.iloc[104:,:]

In [15]:
penal_code_df

Unnamed: 0,section,title,link,statute
104,107,Abetment of the doing of a thing,https://sso.agc.gov.sg/Act/PC1871#pr107-,Penal Code
105,108,Abettor,https://sso.agc.gov.sg/Act/PC1871#pr108-,Penal Code
106,108A,Abetment in Singapore of an offence outside Si...,https://sso.agc.gov.sg/Act/PC1871#pr108A-,Penal Code
107,108B,Abetment outside Singapore of an offence in Si...,https://sso.agc.gov.sg/Act/PC1871#pr108B-,Penal Code
108,109,Punishment of abetment if the act abetted is c...,https://sso.agc.gov.sg/Act/PC1871#pr109-,Penal Code
...,...,...,...,...
522,505,Statements conducing to public mischief,https://sso.agc.gov.sg/Act/PC1871#pr505-,Penal Code
523,506,Punishment for criminal intimidation,https://sso.agc.gov.sg/Act/PC1871#pr506-,Penal Code
524,507,Criminal intimidation by an anonymous communic...,https://sso.agc.gov.sg/Act/PC1871#pr507-,Penal Code
525,511,Attempt to commit offence,https://sso.agc.gov.sg/Act/PC1871#pr511-,Penal Code


### 3.1.2. Putting the scraping into a function

Having tested the code on the Penal Code, I will proceed to place it into a function to be used for the other Statutes.  

This code will create a dataframe containing the `section`, `title` of each section, their internal `link`, and the name of the `statute`.

In [16]:
def get_statute_sections(statute):
    """
    Input: Statute as a string
    Output: Database of sections, titles, and urls for the statute
    """
    print(statute)
    url = statutes[statutes['statute'] == statute].link.values[0]
    print(url)
    headers = requests.utils.default_headers()
    headers.update(
        {
            'User-Agent': 'Mozilla/5.0',
        }
    )
    # Establishing the connection to the web page:
    response = requests.get(url, headers=headers)

    # You can use status codes to understand how the target server responds to your request.
    # Ex., 200 = OK, 400 = Bad Request, 403 = Forbidden, 404 = Not Found.
    print(f'response: {response.status_code}')

    # Pull the HTML string out of requests and convert it to a Python string.
    html = response.text
    statute_html = BeautifulSoup(html, 'lxml')
    # This code collects the sections and links from the statute and puts it into a database 
    # List to store results
    results_list = []

    # Get the relevant elements (name, link) from all 'ul' with class 'nav' (exact match)
    search_results = statute_html.find_all(lambda tag: tag.name == 'ul' and 
                                       tag.get('class') == ['nav'])
    # Iterate through the unordered lists
    for ul in search_results:
        # Find the lists within
        li_list = ul.find_all('li')
        # Iterate through the lists to get each section
        for element in li_list:
            # start a dictionary to store this item's data
            result = {}
            # get the title and full link/url
            a_href = element.find('a')
            # Skip section if it is repealed
            if 'Repealed' in a_href.text.strip():
                continue
            # Split text to section and title
            else:
                text = a_href.text.strip().split(' ', 1)
                result['section'] = text[0]
                result['title'] = text[1]   # element text
            # Add the link
            result['link'] = url+str(a_href['href']) # href link
            result['statute'] = statute
            # Append the section to the results_list
            results_list.append(result)
        
    # Create the dataframe from the results_list
    statute_df = pd.DataFrame(results_list)
    print('no errors, task completed!')
    return statute_df

### 3.1.3 Misuse of Drugs Act  

Now that I have created a function to scrape the sections and links for the statutes, I will use it for the other Statutes within the project's scope, and use my domain knowledge to narrow each statute to the possible criminal offences.  

The dataframes for each statute will be created by only appending the relevant rows containing the offences based on my manual determination.

In [17]:
statute_df = get_statute_sections('Misuse of Drugs Act')

Misuse of Drugs Act
https://sso.agc.gov.sg/Act/MDA1973
response: 200
no errors, task completed!


In [18]:
statute_df

Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/MDA1973#pr1-,Misuse of Drugs Act
1,2,Interpretation,https://sso.agc.gov.sg/Act/MDA1973#pr2-,Misuse of Drugs Act
2,3,Appointment of Director and other officers of ...,https://sso.agc.gov.sg/Act/MDA1973#pr3-,Misuse of Drugs Act
3,4,Advisory committees,https://sso.agc.gov.sg/Act/MDA1973#pr4-,Misuse of Drugs Act
4,5,Trafficking in controlled drugs,https://sso.agc.gov.sg/Act/MDA1973#pr5-,Misuse of Drugs Act
...,...,...,...,...
75,56,Use of weapons,https://sso.agc.gov.sg/Act/MDA1973#pr56-,Misuse of Drugs Act
76,57,Employment of auxiliary police officers as esc...,https://sso.agc.gov.sg/Act/MDA1973#pr57-,Misuse of Drugs Act
77,58,Regulations,https://sso.agc.gov.sg/Act/MDA1973#pr58-,Misuse of Drugs Act
78,58A,Specifying drugs as temporarily listed drugs i...,https://sso.agc.gov.sg/Act/MDA1973#pr58A-,Misuse of Drugs Act


In [19]:
misuse_drugs_act_df = statute_df.iloc[4:22,:]

In [20]:
misuse_drugs_act_df = misuse_drugs_act_df.append(statute_df.iloc[44:47,:])

In [21]:
misuse_drugs_act_df.reset_index(drop = True)

Unnamed: 0,section,title,link,statute
0,5,Trafficking in controlled drugs,https://sso.agc.gov.sg/Act/MDA1973#pr5-,Misuse of Drugs Act
1,6,Manufacture of controlled drugs,https://sso.agc.gov.sg/Act/MDA1973#pr6-,Misuse of Drugs Act
2,7,Import and export of controlled drugs,https://sso.agc.gov.sg/Act/MDA1973#pr7-,Misuse of Drugs Act
3,8,Possession and consumption of controlled drugs,https://sso.agc.gov.sg/Act/MDA1973#pr8-,Misuse of Drugs Act
4,8A,Consumption of drug outside Singapore by citiz...,https://sso.agc.gov.sg/Act/MDA1973#pr8A-,Misuse of Drugs Act
5,9,"Possession of pipes, utensils, etc.",https://sso.agc.gov.sg/Act/MDA1973#pr9-,Misuse of Drugs Act
6,10,"Cultivation of cannabis, opium and coca plants",https://sso.agc.gov.sg/Act/MDA1973#pr10-,Misuse of Drugs Act
7,10A,"Manufacture, supply, possession, import or exp...",https://sso.agc.gov.sg/Act/MDA1973#pr10A-,Misuse of Drugs Act
8,10B,"Regulations on controlled equipment, material ...",https://sso.agc.gov.sg/Act/MDA1973#pr10B-,Misuse of Drugs Act
9,11,"Responsibilities of owners, tenants, etc.",https://sso.agc.gov.sg/Act/MDA1973#pr11-,Misuse of Drugs Act


### 3.1.4 Sedition Act

In [22]:
statute_df = get_statute_sections(statutes['statute'].values[1])

Sedition Act
https://sso.agc.gov.sg/Act/SA1948
response: 200
no errors, task completed!


In [23]:
sedition_act_df = statute_df.iloc[3:4,:].reset_index(drop=True)
sedition_act_df

Unnamed: 0,section,title,link,statute
0,4,Offences,https://sso.agc.gov.sg/Act/SA1948#pr4-,Sedition Act


### 3.1.5 Vandalism Act

In [24]:
statute_df = get_statute_sections(statutes['statute'].values[2])

Vandalism Act
https://sso.agc.gov.sg/Act/VA1966
response: 200
no errors, task completed!


In [25]:
statute_df

Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/VA1966#pr1-,Vandalism Act
1,2,Interpretation,https://sso.agc.gov.sg/Act/VA1966#pr2-,Vandalism Act
2,3,Penalty for acts of vandalism,https://sso.agc.gov.sg/Act/VA1966#pr3-,Vandalism Act
3,4,Written authority or written consent to be pro...,https://sso.agc.gov.sg/Act/VA1966#pr4-,Vandalism Act
4,5,Power to seize article or thing in respect of ...,https://sso.agc.gov.sg/Act/VA1966#pr5-,Vandalism Act
5,6,Offences to be arrestable and non-bailable,https://sso.agc.gov.sg/Act/VA1966#pr6-,Vandalism Act
6,7,Presumption,https://sso.agc.gov.sg/Act/VA1966#pr7-,Vandalism Act
7,8,Revocation of secondhand goods dealer’s licenc...,https://sso.agc.gov.sg/Act/VA1966#pr8-,Vandalism Act


In [26]:
vandalism_act_df = statute_df.iloc[1:4,:].reset_index(drop=True)
vandalism_act_df

Unnamed: 0,section,title,link,statute
0,2,Interpretation,https://sso.agc.gov.sg/Act/VA1966#pr2-,Vandalism Act
1,3,Penalty for acts of vandalism,https://sso.agc.gov.sg/Act/VA1966#pr3-,Vandalism Act
2,4,Written authority or written consent to be pro...,https://sso.agc.gov.sg/Act/VA1966#pr4-,Vandalism Act


### 3.1.6 Children and Young Persons Act

In [27]:
statute_df = get_statute_sections(statutes['statute'].values[3])

Children and Young Persons Act
https://sso.agc.gov.sg/Act/CYPA1993
response: 200
no errors, task completed!


In [28]:
statute_df.head(50)

Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/CYPA1993#pr1-,Children and Young Persons Act
1,2,Interpretation,https://sso.agc.gov.sg/Act/CYPA1993#pr2-,Children and Young Persons Act
2,3,Administration and enforcement of Act,https://sso.agc.gov.sg/Act/CYPA1993#pr3-,Children and Young Persons Act
3,3A,Principles,https://sso.agc.gov.sg/Act/CYPA1993#pr3A-,Children and Young Persons Act
4,4,When child or young person in need of care or ...,https://sso.agc.gov.sg/Act/CYPA1993#pr4-,Children and Young Persons Act
5,5,Ill-treatment of child or young person,https://sso.agc.gov.sg/Act/CYPA1993#pr5-,Children and Young Persons Act
6,6,Contribution to delinquency of child or young ...,https://sso.agc.gov.sg/Act/CYPA1993#pr6-,Children and Young Persons Act
7,7,Sexual exploitation of child or young person,https://sso.agc.gov.sg/Act/CYPA1993#pr7-,Children and Young Persons Act
8,8,Power to obtain and communicate information,https://sso.agc.gov.sg/Act/CYPA1993#pr8-,Children and Young Persons Act
9,8A,Power to order child or young person to be pro...,https://sso.agc.gov.sg/Act/CYPA1993#pr8A-,Children and Young Persons Act


In [29]:
children_young_persons_act_df = statute_df.iloc[5:8,:]
children_young_persons_act_df = children_young_persons_act_df.append(statute_df.iloc[[13, 25],:])
children_young_persons_act_df = children_young_persons_act_df.append(statute_df.iloc[15:17,:])
children_young_persons_act_df = children_young_persons_act_df.reset_index(drop=True)
children_young_persons_act_df

Unnamed: 0,section,title,link,statute
0,5,Ill-treatment of child or young person,https://sso.agc.gov.sg/Act/CYPA1993#pr5-,Children and Young Persons Act
1,6,Contribution to delinquency of child or young ...,https://sso.agc.gov.sg/Act/CYPA1993#pr6-,Children and Young Persons Act
2,7,Sexual exploitation of child or young person,https://sso.agc.gov.sg/Act/CYPA1993#pr7-,Children and Young Persons Act
3,11,Restrictions on children and young persons tak...,https://sso.agc.gov.sg/Act/CYPA1993#pr11-,Children and Young Persons Act
4,22,Offences and penalties,https://sso.agc.gov.sg/Act/CYPA1993#pr22-,Children and Young Persons Act
5,12,"Unlawful transfer of possession, custody or co...",https://sso.agc.gov.sg/Act/CYPA1993#pr12-,Children and Young Persons Act
6,13,Importation of child or young person by false ...,https://sso.agc.gov.sg/Act/CYPA1993#pr13-,Children and Young Persons Act


### 3.1.7 Immigration Act

In [30]:
statute_df = get_statute_sections(statutes['statute'].values[4])
statute_df.tail(25)

Immigration Act
https://sso.agc.gov.sg/Act/IA1959
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
66,46,"Masters, owners, etc., liable for expenses",https://sso.agc.gov.sg/Act/IA1959#pr46-,Immigration Act
67,47,Obligation to afford free passage,https://sso.agc.gov.sg/Act/IA1959#pr47-,Immigration Act
68,47A,Seizure of moneys for purposes of repatriation...,https://sso.agc.gov.sg/Act/IA1959#pr47A-,Immigration Act
69,48,Power to detain vessel,https://sso.agc.gov.sg/Act/IA1959#pr48-,Immigration Act
70,49,"Power to seize, detain and forfeit vessels bel...",https://sso.agc.gov.sg/Act/IA1959#pr49-,Immigration Act
71,50,Power of interrogation,https://sso.agc.gov.sg/Act/IA1959#pr50-,Immigration Act
72,51,Power of search and arrest for offence under Act,https://sso.agc.gov.sg/Act/IA1959#pr51-,Immigration Act
73,51AA,Power of search and arrest for offences commit...,https://sso.agc.gov.sg/Act/IA1959#pr51AA-,Immigration Act
74,51A,Provision of information by Housing and Develo...,https://sso.agc.gov.sg/Act/IA1959#pr51A-,Immigration Act
75,52,Person registered under Enlistment Act,https://sso.agc.gov.sg/Act/IA1959#pr52-,Immigration Act


In [31]:
immigration_act_df = statute_df.iloc[4:11,:]
immigration_act_df = immigration_act_df.append(statute_df.iloc[[17,45,50,53,57,62],:])
immigration_act_df = immigration_act_df.append(statute_df.iloc[21:44,:])
immigration_act_df = immigration_act_df.append(statute_df.iloc[64:66,:])
immigration_act_df = immigration_act_df.append(statute_df.iloc[80:86,:])
immigration_act_df = immigration_act_df.reset_index(drop=True)
immigration_act_df

Unnamed: 0,section,title,link,statute
0,5,Entry into and departure from Singapore,https://sso.agc.gov.sg/Act/IA1959#pr5-,Immigration Act
1,5A,Person entering or leaving Singapore to produc...,https://sso.agc.gov.sg/Act/IA1959#pr5A-,Immigration Act
2,5B,Facilities at authorised areas,https://sso.agc.gov.sg/Act/IA1959#pr5B-,Immigration Act
3,6,Control of entry into and departure from Singa...,https://sso.agc.gov.sg/Act/IA1959#pr6-,Immigration Act
4,6A,Non-citizens born in Singapore,https://sso.agc.gov.sg/Act/IA1959#pr6A-,Immigration Act
5,7,Right of entry,https://sso.agc.gov.sg/Act/IA1959#pr7-,Immigration Act
6,8,Prohibited immigrants,https://sso.agc.gov.sg/Act/IA1959#pr8-,Immigration Act
7,11A,Persons ceasing to be citizens of Singapore,https://sso.agc.gov.sg/Act/IA1959#pr11A-,Immigration Act
8,31A,Power of Controller to remove prohibited immig...,https://sso.agc.gov.sg/Act/IA1959#pr31A-,Immigration Act
9,36,Unlawful return after removal,https://sso.agc.gov.sg/Act/IA1959#pr36-,Immigration Act


### 3.1.8 Income Tax Act

In [32]:
statute_df = get_statute_sections(statutes['statute'].values[5])
statute_df.head(5)

Income Tax Act
https://sso.agc.gov.sg/Act/ITA1947
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/ITA1947#pr1-,Income Tax Act
1,2,Interpretation,https://sso.agc.gov.sg/Act/ITA1947#pr2-,Income Tax Act
2,3,Appointment of Comptroller and other officers,https://sso.agc.gov.sg/Act/ITA1947#pr3-,Income Tax Act
3,3A,Assignment of function or power to public body,https://sso.agc.gov.sg/Act/ITA1947#pr3A-,Income Tax Act
4,4,Powers of Comptroller,https://sso.agc.gov.sg/Act/ITA1947#pr4-,Income Tax Act


In [33]:
income_tax_act_df = statute_df.iloc[[216,218,247],:]
income_tax_act_df = income_tax_act_df.append(statute_df.iloc[265:272,:])
income_tax_act_df = income_tax_act_df.reset_index(drop=True)
income_tax_act_df

Unnamed: 0,section,title,link,statute
0,65C,"Failure to comply with section 64, 65, 65A or 65B",https://sso.agc.gov.sg/Act/ITA1947#pr65C-,Income Tax Act
1,65E,Section 65B notice may be subject to confident...,https://sso.agc.gov.sg/Act/ITA1947#pr65E-,Income Tax Act
2,86,Recovery of tax from persons leaving Singapore,https://sso.agc.gov.sg/Act/ITA1947#pr86-,Income Tax Act
3,94,General penalties,https://sso.agc.gov.sg/Act/ITA1947#pr94-,Income Tax Act
4,94A,Penalty for failure to make return,https://sso.agc.gov.sg/Act/ITA1947#pr94A-,Income Tax Act
5,95,"Penalty for incorrect return, etc.",https://sso.agc.gov.sg/Act/ITA1947#pr95-,Income Tax Act
6,96,Tax evasion and wilful action to obtain PIC bonus,https://sso.agc.gov.sg/Act/ITA1947#pr96-,Income Tax Act
7,96A,Serious fraudulent tax evasion and action to o...,https://sso.agc.gov.sg/Act/ITA1947#pr96A-,Income Tax Act
8,97,Penalties for offences by authorised and unaut...,https://sso.agc.gov.sg/Act/ITA1947#pr97-,Income Tax Act
9,98,Penalty for obstructing Comptroller or officers,https://sso.agc.gov.sg/Act/ITA1947#pr98-,Income Tax Act


### 3.1.9 Companies Act

In [34]:
statute_df = get_statute_sections(statutes['statute'].values[6])
statute_df.iloc[:50,:]

Companies Act
https://sso.agc.gov.sg/Act/CoA1967
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/CoA1967#pr1-,Companies Act
1,2,Division into Parts,https://sso.agc.gov.sg/Act/CoA1967#pr2-,Companies Act
2,3,Repeals,https://sso.agc.gov.sg/Act/CoA1967#pr3-,Companies Act
3,4,Interpretation,https://sso.agc.gov.sg/Act/CoA1967#pr4-,Companies Act
4,5,Definition of subsidiary and holding company,https://sso.agc.gov.sg/Act/CoA1967#pr5-,Companies Act
5,5A,Definition of ultimate holding company,https://sso.agc.gov.sg/Act/CoA1967#pr5A-,Companies Act
6,5B,Definition of wholly owned subsidiary,https://sso.agc.gov.sg/Act/CoA1967#pr5B-,Companies Act
7,6,When corporations deemed to be related to each...,https://sso.agc.gov.sg/Act/CoA1967#pr6-,Companies Act
8,7,Interests in shares,https://sso.agc.gov.sg/Act/CoA1967#pr7-,Companies Act
9,7A,Solvency statement and offence for making fals...,https://sso.agc.gov.sg/Act/CoA1967#pr7A-,Companies Act


In [35]:
companies_act_df = statute_df.iloc[[9,14,18,42,50,85,96,109,113,135,141,144,147,152,155,219,221,232,243,248,253,277,288,296,312,325,334,341,345,356,363],:]
companies_act_df = companies_act_df.append(statute_df.iloc[11:13,:])
companies_act_df = companies_act_df.append(statute_df.iloc[44:46,:])
companies_act_df = companies_act_df.append(statute_df.iloc[64:67,:])
companies_act_df = companies_act_df.append(statute_df.iloc[71:73,:])
companies_act_df = companies_act_df.append(statute_df.iloc[88:91,:])
companies_act_df = companies_act_df.append(statute_df.iloc[124:126,:])
companies_act_df = companies_act_df.append(statute_df.iloc[127:129,:])
companies_act_df = companies_act_df.append(statute_df.iloc[149:151,:])
companies_act_df = companies_act_df.append(statute_df.iloc[159:168,:])
companies_act_df = companies_act_df.append(statute_df.iloc[172:181,:])
companies_act_df = companies_act_df.append(statute_df.iloc[188:190,:])
companies_act_df = companies_act_df.append(statute_df.iloc[192:196,:])
companies_act_df = companies_act_df.append(statute_df.iloc[201:205,:])
companies_act_df = companies_act_df.append(statute_df.iloc[208:210,:])
companies_act_df = companies_act_df.append(statute_df.iloc[211:213,:])
companies_act_df = companies_act_df.append(statute_df.iloc[234:236,:])
companies_act_df = companies_act_df.append(statute_df.iloc[237:241,:])
companies_act_df = companies_act_df.append(statute_df.iloc[250:252,:])
companies_act_df = companies_act_df.append(statute_df.iloc[259:263,:])
companies_act_df = companies_act_df.append(statute_df.iloc[265:267,:])
companies_act_df = companies_act_df.append(statute_df.iloc[268:270,:])
companies_act_df = companies_act_df.append(statute_df.iloc[273:275,:])
companies_act_df = companies_act_df.append(statute_df.iloc[279:282,:])
companies_act_df = companies_act_df.append(statute_df.iloc[293:295,:])
companies_act_df = companies_act_df.append(statute_df.iloc[314:316,:])
companies_act_df = companies_act_df.append(statute_df.iloc[347:349,:])
companies_act_df = companies_act_df.append(statute_df.iloc[368:378,:])
companies_act_df = companies_act_df.append(statute_df.iloc[393:396,:])
companies_act_df = companies_act_df.append(statute_df.iloc[399:407,:])
companies_act_df

Unnamed: 0,section,title,link,statute
9,7A,Solvency statement and offence for making fals...,https://sso.agc.gov.sg/Act/CoA1967#pr7A-,Companies Act
14,8D,"Destruction, mutilation, etc., of company docu...",https://sso.agc.gov.sg/Act/CoA1967#pr8D-,Companies Act
18,8H,Security of information,https://sso.agc.gov.sg/Act/CoA1967#pr8H-,Companies Act
42,26,General provisions as to alteration of constit...,https://sso.agc.gov.sg/Act/CoA1967#pr26-,Companies Act
50,32,Default in complying with requirements as to p...,https://sso.agc.gov.sg/Act/CoA1967#pr32-,Companies Act
...,...,...,...,...
402,404,Fraudulently inducing persons to invest money,https://sso.agc.gov.sg/Act/CoA1967#pr404-,Companies Act
403,405,Penalty for carrying on business without regis...,https://sso.agc.gov.sg/Act/CoA1967#pr405-,Companies Act
404,406,Frauds by officers,https://sso.agc.gov.sg/Act/CoA1967#pr406-,Companies Act
405,407,General penalty provisions,https://sso.agc.gov.sg/Act/CoA1967#pr407-,Companies Act


### 3.1.10 Prevention of Corruption Act

In [36]:
statute_df = get_statute_sections(statutes['statute'].values[7])
statute_df.iloc[:50,:]

Prevention of Corruption Act
https://sso.agc.gov.sg/Act/PCA1960
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/PCA1960#pr1-,Prevention of Corruption Act
1,2,Interpretation,https://sso.agc.gov.sg/Act/PCA1960#pr2-,Prevention of Corruption Act
2,3,Appointment of Director and officers,https://sso.agc.gov.sg/Act/PCA1960#pr3-,Prevention of Corruption Act
3,4,Director and officers deemed to be public serv...,https://sso.agc.gov.sg/Act/PCA1960#pr4-,Prevention of Corruption Act
4,4A,Establishment of Occupational Superannuation S...,https://sso.agc.gov.sg/Act/PCA1960#pr4A-,Prevention of Corruption Act
5,4B,"Benefits not as of right, etc.",https://sso.agc.gov.sg/Act/PCA1960#pr4B-,Prevention of Corruption Act
6,4C,"Non-assignability or attachment of benefits, etc.",https://sso.agc.gov.sg/Act/PCA1960#pr4C-,Prevention of Corruption Act
7,4D,Recovery of benefits granted in ignorance of d...,https://sso.agc.gov.sg/Act/PCA1960#pr4D-,Prevention of Corruption Act
8,4E,Effect of bankruptcy and conviction on Scheme ...,https://sso.agc.gov.sg/Act/PCA1960#pr4E-,Prevention of Corruption Act
9,4F,Scheme to be met out of INVEST Fund,https://sso.agc.gov.sg/Act/PCA1960#pr4F-,Prevention of Corruption Act


In [37]:
prevention_corruption_act_df = statute_df.iloc[10:13,:]
prevention_corruption_act_df = prevention_corruption_act_df.append(statute_df.iloc[14:19,:])
prevention_corruption_act_df = prevention_corruption_act_df.append(statute_df.iloc[[24,27,32],:])
prevention_corruption_act_df = prevention_corruption_act_df.append(statute_df.iloc[34:39,:])
prevention_corruption_act_df = prevention_corruption_act_df.reset_index(drop=True)
prevention_corruption_act_df

Unnamed: 0,section,title,link,statute
0,5,Punishment for corruption,https://sso.agc.gov.sg/Act/PCA1960#pr5-,Prevention of Corruption Act
1,6,Punishment for corrupt transactions with agents,https://sso.agc.gov.sg/Act/PCA1960#pr6-,Prevention of Corruption Act
2,7,Increase of maximum penalty in certain cases,https://sso.agc.gov.sg/Act/PCA1960#pr7-,Prevention of Corruption Act
3,9,Acceptor of gratification to be guilty notwith...,https://sso.agc.gov.sg/Act/PCA1960#pr9-,Prevention of Corruption Act
4,10,Corruptly procuring withdrawal of tenders,https://sso.agc.gov.sg/Act/PCA1960#pr10-,Prevention of Corruption Act
5,11,Bribery of Member of Parliament,https://sso.agc.gov.sg/Act/PCA1960#pr11-,Prevention of Corruption Act
6,12,Bribery of member of public body,https://sso.agc.gov.sg/Act/PCA1960#pr12-,Prevention of Corruption Act
7,13,When penalty to be imposed in addition to othe...,https://sso.agc.gov.sg/Act/PCA1960#pr13-,Prevention of Corruption Act
8,18,Special powers of investigation,https://sso.agc.gov.sg/Act/PCA1960#pr18-,Prevention of Corruption Act
9,21,Public Prosecutor’s powers to obtain information,https://sso.agc.gov.sg/Act/PCA1960#pr21-,Prevention of Corruption Act


### 3.1.11 Customs Act

In [38]:
statute_df = get_statute_sections(statutes['statute'].values[8])
statute_df.iloc[:50,:]

Customs Act
https://sso.agc.gov.sg/Act/CA1960
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/CA1960#pr1-,Customs Act
1,2,Scope of Act,https://sso.agc.gov.sg/Act/CA1960#pr2-,Customs Act
2,3,Interpretation,https://sso.agc.gov.sg/Act/CA1960#pr3-,Customs Act
3,4,Appointment of Director-General and other offi...,https://sso.agc.gov.sg/Act/CA1960#pr4-,Customs Act
4,5,Powers of Director-General to delegate,https://sso.agc.gov.sg/Act/CA1960#pr5-,Customs Act
5,6,Officers of customs to be public servants,https://sso.agc.gov.sg/Act/CA1960#pr6-,Customs Act
6,7,Powers of police officers,https://sso.agc.gov.sg/Act/CA1960#pr7-,Customs Act
7,8,Authority card to be produced,https://sso.agc.gov.sg/Act/CA1960#pr8-,Customs Act
8,9,Persons employed on customs duty to be deemed ...,https://sso.agc.gov.sg/Act/CA1960#pr9-,Customs Act
9,10,Levying of duties,https://sso.agc.gov.sg/Act/CA1960#pr10-,Customs Act


In [39]:
customs_act_df = statute_df.iloc[[16,75,100,103],:]
customs_act_df = customs_act_df.append(statute_df.iloc[97:99,:])
customs_act_df = customs_act_df.append(statute_df.iloc[126:151,:])
customs_act_df = customs_act_df.reset_index(drop=True)
customs_act_df

Unnamed: 0,section,title,link,statute
0,17,"Tax on motor vehicles using heavy fuel oil, etc.",https://sso.agc.gov.sg/Act/CA1960#pr17-,Customs Act
1,82,Duty free shops for tourists,https://sso.agc.gov.sg/Act/CA1960#pr82-,Customs Act
2,104,Power to search vessels and aircraft,https://sso.agc.gov.sg/Act/CA1960#pr104-,Customs Act
3,107,Road barrier,https://sso.agc.gov.sg/Act/CA1960#pr107-,Customs Act
4,102,"Power of Magistrate, etc., to enter and search",https://sso.agc.gov.sg/Act/CA1960#pr102-,Customs Act
5,103,When search may be made without warrant,https://sso.agc.gov.sg/Act/CA1960#pr103-,Customs Act
6,128,Offences in relation to making and signing unt...,https://sso.agc.gov.sg/Act/CA1960#pr128-,Customs Act
7,128A,Offences in relation to falsifying documents,https://sso.agc.gov.sg/Act/CA1960#pr128A-,Customs Act
8,128B,Offences in relation to failure to make declar...,https://sso.agc.gov.sg/Act/CA1960#pr128B-,Customs Act
9,128C,Offences in relation to failure to produce tra...,https://sso.agc.gov.sg/Act/CA1960#pr128C-,Customs Act


### 3.1.12 Road Traffic Act

In [40]:
statute_df = get_statute_sections(statutes['statute'].values[9])
statute_df.iloc[:50,:]

Road Traffic Act
https://sso.agc.gov.sg/Act/RTA1961
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/RTA1961#pr1-,Road Traffic Act
1,2,Interpretation,https://sso.agc.gov.sg/Act/RTA1961#pr2-,Road Traffic Act
2,3,Vehicles to which this Part applies,https://sso.agc.gov.sg/Act/RTA1961#pr3-,Road Traffic Act
3,4,Classification of motor vehicles,https://sso.agc.gov.sg/Act/RTA1961#pr4-,Road Traffic Act
4,5,Prohibition of vehicles not complying with rul...,https://sso.agc.gov.sg/Act/RTA1961#pr5-,Road Traffic Act
5,5A,"No riding of personal mobility devices, etc., ...",https://sso.agc.gov.sg/Act/RTA1961#pr5A-,Road Traffic Act
6,5B,"No riding of personal mobility device, etc., w...",https://sso.agc.gov.sg/Act/RTA1961#pr5B-,Road Traffic Act
7,6,Rules as to use and construction of vehicles,https://sso.agc.gov.sg/Act/RTA1961#pr6-,Road Traffic Act
8,6A,Alteration of fuel-measuring equipment,https://sso.agc.gov.sg/Act/RTA1961#pr6A-,Road Traffic Act
9,6B,Leaving Singapore in motor vehicle with altere...,https://sso.agc.gov.sg/Act/RTA1961#pr6B-,Road Traffic Act


In [41]:
road_traffic_act_df = statute_df.iloc[4:7,:]
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[8:10,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[[12,16,26,31,47,57,66,70,73,75,83,91,121,137,149,153,173,175,182],:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[19:24,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[35:37,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[38:43,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[52:54,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[61:64,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[80:82,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[93:96,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[97:100,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[101:104,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[108:110,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[111:116,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[117:120,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[125:131,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[132:134,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[143:146,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[156:158,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[159:167,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[168:172,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[177:179,:])
road_traffic_act_df = road_traffic_act_df.append(statute_df.iloc[193:195,:])
road_traffic_act_df = road_traffic_act_df.reset_index(drop=True)
road_traffic_act_df

Unnamed: 0,section,title,link,statute
0,5,Prohibition of vehicles not complying with rul...,https://sso.agc.gov.sg/Act/RTA1961#pr5-,Road Traffic Act
1,5A,"No riding of personal mobility devices, etc., ...",https://sso.agc.gov.sg/Act/RTA1961#pr5A-,Road Traffic Act
2,5B,"No riding of personal mobility device, etc., w...",https://sso.agc.gov.sg/Act/RTA1961#pr5B-,Road Traffic Act
3,6A,Alteration of fuel-measuring equipment,https://sso.agc.gov.sg/Act/RTA1961#pr6A-,Road Traffic Act
4,6B,Leaving Singapore in motor vehicle with altere...,https://sso.agc.gov.sg/Act/RTA1961#pr6B-,Road Traffic Act
...,...,...,...,...
86,127B,Powers of search on omnibuses and within bus i...,https://sso.agc.gov.sg/Act/RTA1961#pr127B-,Road Traffic Act
87,131B,"Offences by bodies corporate, etc.",https://sso.agc.gov.sg/Act/RTA1961#pr131B-,Road Traffic Act
88,132,Ticketing of prescribed offences,https://sso.agc.gov.sg/Act/RTA1961#pr132-,Road Traffic Act
89,143,Regulation of traffic in connection with event...,https://sso.agc.gov.sg/Act/RTA1961#pr143-,Road Traffic Act


### 3.1.13  Environmental Public Health Act

In [42]:
statute_df = get_statute_sections(statutes['statute'].values[10])
statute_df.iloc[:50,:]

Environmental Public Health Act
https://sso.agc.gov.sg/Act/EPHA1987
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/EPHA1987#pr1-,Environmental Public Health Act
1,2,Interpretation,https://sso.agc.gov.sg/Act/EPHA1987#pr2-,Environmental Public Health Act
2,3,Appointment of Director-General and authorised...,https://sso.agc.gov.sg/Act/EPHA1987#pr3-,Environmental Public Health Act
3,4,Delegation of power by Director-General,https://sso.agc.gov.sg/Act/EPHA1987#pr4-,Environmental Public Health Act
4,5,"Director-General to cause public streets, etc....",https://sso.agc.gov.sg/Act/EPHA1987#pr5-,Environmental Public Health Act
5,6,Duty of owner and occupier to keep clean priva...,https://sso.agc.gov.sg/Act/EPHA1987#pr6-,Environmental Public Health Act
6,7,Dustbins in streets,https://sso.agc.gov.sg/Act/EPHA1987#pr7-,Environmental Public Health Act
7,8,Director-General may apply systems for collect...,https://sso.agc.gov.sg/Act/EPHA1987#pr8-,Environmental Public Health Act
8,9,"Removal of industrial waste, stable refuse, etc.",https://sso.agc.gov.sg/Act/EPHA1987#pr9-,Environmental Public Health Act
9,10,Director-General may require owner and occupie...,https://sso.agc.gov.sg/Act/EPHA1987#pr10-,Environmental Public Health Act


In [43]:
environmental_public_health_act_df = statute_df.iloc[11:13,:]
environmental_public_health_act_df = environmental_public_health_act_df.append(statute_df.iloc[16:22,:])
environmental_public_health_act_df = environmental_public_health_act_df.append(statute_df.iloc[[23,32,42,74,87,100,103,105,107,119,130,134,141,155],:])
environmental_public_health_act_df = environmental_public_health_act_df.append(statute_df.iloc[27:29,:])
environmental_public_health_act_df = environmental_public_health_act_df.append(statute_df.iloc[34:36,:])
environmental_public_health_act_df = environmental_public_health_act_df.append(statute_df.iloc[45:48,:])
environmental_public_health_act_df = environmental_public_health_act_df.append(statute_df.iloc[51:53,:])
environmental_public_health_act_df = environmental_public_health_act_df.append(statute_df.iloc[59:70,:])
environmental_public_health_act_df = environmental_public_health_act_df.append(statute_df.iloc[110:113,:])
environmental_public_health_act_df = environmental_public_health_act_df.append(statute_df.iloc[126:128,:])
environmental_public_health_act_df = environmental_public_health_act_df.append(statute_df.iloc[145:147,:])
environmental_public_health_act_df = environmental_public_health_act_df.reset_index(drop=True)
environmental_public_health_act_df

Unnamed: 0,section,title,link,statute
0,12,Occupier of house to remove refuse,https://sso.agc.gov.sg/Act/EPHA1987#pr12-,Environmental Public Health Act
1,13,Prohibition on use of nightsoil or human excre...,https://sso.agc.gov.sg/Act/EPHA1987#pr13-,Environmental Public Health Act
2,17,"Prohibition against throwing refuse, etc., in ...",https://sso.agc.gov.sg/Act/EPHA1987#pr17-,Environmental Public Health Act
3,18,"Building works constituting danger to life, he...",https://sso.agc.gov.sg/Act/EPHA1987#pr18-,Environmental Public Health Act
4,19,"Prohibition against dropping, scattering, etc....",https://sso.agc.gov.sg/Act/EPHA1987#pr19-,Environmental Public Health Act
5,20,Prohibition against dumping and disposing,https://sso.agc.gov.sg/Act/EPHA1987#pr20-,Environmental Public Health Act
6,21,Notice to attend Court,https://sso.agc.gov.sg/Act/EPHA1987#pr21-,Environmental Public Health Act
7,21A,Corrective work order,https://sso.agc.gov.sg/Act/EPHA1987#pr21A-,Environmental Public Health Act
8,21C,Breach of corrective work order,https://sso.agc.gov.sg/Act/EPHA1987#pr21C-,Environmental Public Health Act
9,29,Dangerous substance or toxic industrial waste ...,https://sso.agc.gov.sg/Act/EPHA1987#pr29-,Environmental Public Health Act


### 3.1.14 Computer Misuse Act

In [44]:
statute_df = get_statute_sections(statutes['statute'].values[11])
statute_df.iloc[:50,:]

Computer Misuse Act
https://sso.agc.gov.sg/Act/CMA1993
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/CMA1993#pr1-,Computer Misuse Act
1,2,Interpretation,https://sso.agc.gov.sg/Act/CMA1993#pr2-,Computer Misuse Act
2,3,Unauthorised access to computer material,https://sso.agc.gov.sg/Act/CMA1993#pr3-,Computer Misuse Act
3,4,Access with intent to commit or facilitate com...,https://sso.agc.gov.sg/Act/CMA1993#pr4-,Computer Misuse Act
4,5,Unauthorised modification of computer material,https://sso.agc.gov.sg/Act/CMA1993#pr5-,Computer Misuse Act
5,6,Unauthorised use or interception of computer s...,https://sso.agc.gov.sg/Act/CMA1993#pr6-,Computer Misuse Act
6,7,Unauthorised obstruction of use of computer,https://sso.agc.gov.sg/Act/CMA1993#pr7-,Computer Misuse Act
7,8,Unauthorised disclosure of access code,https://sso.agc.gov.sg/Act/CMA1993#pr8-,Computer Misuse Act
8,8A,"Supplying, etc., personal information obtained...",https://sso.agc.gov.sg/Act/CMA1993#pr8A-,Computer Misuse Act
9,8B,"Obtaining, etc., items for use in certain offe...",https://sso.agc.gov.sg/Act/CMA1993#pr8B-,Computer Misuse Act


In [45]:
computer_misuse_act_df = statute_df.iloc[2:12,:]
computer_misuse_act_df = computer_misuse_act_df.reset_index(drop=True)
computer_misuse_act_df

Unnamed: 0,section,title,link,statute
0,3,Unauthorised access to computer material,https://sso.agc.gov.sg/Act/CMA1993#pr3-,Computer Misuse Act
1,4,Access with intent to commit or facilitate com...,https://sso.agc.gov.sg/Act/CMA1993#pr4-,Computer Misuse Act
2,5,Unauthorised modification of computer material,https://sso.agc.gov.sg/Act/CMA1993#pr5-,Computer Misuse Act
3,6,Unauthorised use or interception of computer s...,https://sso.agc.gov.sg/Act/CMA1993#pr6-,Computer Misuse Act
4,7,Unauthorised obstruction of use of computer,https://sso.agc.gov.sg/Act/CMA1993#pr7-,Computer Misuse Act
5,8,Unauthorised disclosure of access code,https://sso.agc.gov.sg/Act/CMA1993#pr8-,Computer Misuse Act
6,8A,"Supplying, etc., personal information obtained...",https://sso.agc.gov.sg/Act/CMA1993#pr8A-,Computer Misuse Act
7,8B,"Obtaining, etc., items for use in certain offe...",https://sso.agc.gov.sg/Act/CMA1993#pr8B-,Computer Misuse Act
8,9,Enhanced punishment for offences involving pro...,https://sso.agc.gov.sg/Act/CMA1993#pr9-,Computer Misuse Act
9,10,Abetments and attempts punishable as offences,https://sso.agc.gov.sg/Act/CMA1993#pr10-,Computer Misuse Act


### 3.1.15 Employment Agencies Act

In [46]:
statute_df = get_statute_sections(statutes['statute'].values[13])
statute_df.iloc[:50,:]

Employment Agencies Act
https://sso.agc.gov.sg/Act/EAA1958
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/EAA1958#pr1-,Employment Agencies Act
1,2,Interpretation,https://sso.agc.gov.sg/Act/EAA1958#pr2-,Employment Agencies Act
2,3,Appointment of officers,https://sso.agc.gov.sg/Act/EAA1958#pr3-,Employment Agencies Act
3,4,Application,https://sso.agc.gov.sg/Act/EAA1958#pr4-,Employment Agencies Act
4,5,Other laws not affected,https://sso.agc.gov.sg/Act/EAA1958#pr5-,Employment Agencies Act
5,6,Requirement for licence,https://sso.agc.gov.sg/Act/EAA1958#pr6-,Employment Agencies Act
6,7,Application for licence,https://sso.agc.gov.sg/Act/EAA1958#pr7-,Employment Agencies Act
7,8,Security,https://sso.agc.gov.sg/Act/EAA1958#pr8-,Employment Agencies Act
8,10,Period of validity of licence,https://sso.agc.gov.sg/Act/EAA1958#pr10-,Employment Agencies Act
9,11,Suspension or revocation of licence,https://sso.agc.gov.sg/Act/EAA1958#pr11-,Employment Agencies Act


In [47]:
employment_agencies_act_df = statute_df.iloc[[5,8,15],:]
employment_agencies_act_df = employment_agencies_act_df.append(statute_df.iloc[10:13,:])
employment_agencies_act_df = employment_agencies_act_df.append(statute_df.iloc[28:33,:])
employment_agencies_act_df = employment_agencies_act_df.append(statute_df.iloc[38:40,:])
employment_agencies_act_df = employment_agencies_act_df.reset_index(drop=True)
employment_agencies_act_df

Unnamed: 0,section,title,link,statute
0,6,Requirement for licence,https://sso.agc.gov.sg/Act/EAA1958#pr6-,Employment Agencies Act
1,10,Period of validity of licence,https://sso.agc.gov.sg/Act/EAA1958#pr10-,Employment Agencies Act
2,15,"Offer of fees, etc., prohibited",https://sso.agc.gov.sg/Act/EAA1958#pr15-,Employment Agencies Act
3,12,Effect of suspension or revocation of licence,https://sso.agc.gov.sg/Act/EAA1958#pr12-,Employment Agencies Act
4,12A,Registration of employment agency personnel,https://sso.agc.gov.sg/Act/EAA1958#pr12A-,Employment Agencies Act
5,12B,Registration cards,https://sso.agc.gov.sg/Act/EAA1958#pr12B-,Employment Agencies Act
6,22,Furnishing false information,https://sso.agc.gov.sg/Act/EAA1958#pr22-,Employment Agencies Act
7,22A,Offence for persons to engage unlicensed persons,https://sso.agc.gov.sg/Act/EAA1958#pr22A-,Employment Agencies Act
8,22B,Offence for licensed employment agencies to ma...,https://sso.agc.gov.sg/Act/EAA1958#pr22B-,Employment Agencies Act
9,22C,Disqualification of key appointment holders or...,https://sso.agc.gov.sg/Act/EAA1958#pr22C-,Employment Agencies Act


### 3.1.16 Employment of Foreign Manpower Act

In [48]:
statute_df = get_statute_sections(statutes['statute'].values[14])
statute_df.iloc[:51,:]

Employment of Foreign Manpower Act
https://sso.agc.gov.sg/Act/EFMA1990
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/EFMA1990#pr1-,Employment of Foreign Manpower Act
1,2,Interpretation,https://sso.agc.gov.sg/Act/EFMA1990#pr2-,Employment of Foreign Manpower Act
2,2A,"Meaning of ""personal identifier""",https://sso.agc.gov.sg/Act/EFMA1990#pr2A-,Employment of Foreign Manpower Act
3,3,Appointment of Controller of Work Passes and e...,https://sso.agc.gov.sg/Act/EFMA1990#pr3-,Employment of Foreign Manpower Act
4,3A,Controller and employment inspectors to be pub...,https://sso.agc.gov.sg/Act/EFMA1990#pr3A-,Employment of Foreign Manpower Act
5,4,Exemption,https://sso.agc.gov.sg/Act/EFMA1990#pr4-,Employment of Foreign Manpower Act
6,5,Prohibition of employment of foreign employee ...,https://sso.agc.gov.sg/Act/EFMA1990#pr5-,Employment of Foreign Manpower Act
7,6,Presumption of employment,https://sso.agc.gov.sg/Act/EFMA1990#pr6-,Employment of Foreign Manpower Act
8,6A,Prohibition of foreigner without work pass ent...,https://sso.agc.gov.sg/Act/EFMA1990#pr6A-,Employment of Foreign Manpower Act
9,7,Application for work pass,https://sso.agc.gov.sg/Act/EFMA1990#pr7-,Employment of Foreign Manpower Act


In [49]:
employment_foreign_manpower_act_df = statute_df.iloc[[6,8,22,39],:]
employment_foreign_manpower_act_df = employment_foreign_manpower_act_df.append(statute_df.iloc[12:14,:])
employment_foreign_manpower_act_df = employment_foreign_manpower_act_df.append(statute_df.iloc[31:35,:])
employment_foreign_manpower_act_df = employment_foreign_manpower_act_df.reset_index(drop=True)
employment_foreign_manpower_act_df

Unnamed: 0,section,title,link,statute
0,5,Prohibition of employment of foreign employee ...,https://sso.agc.gov.sg/Act/EFMA1990#pr5-,Employment of Foreign Manpower Act
1,6A,Prohibition of foreigner without work pass ent...,https://sso.agc.gov.sg/Act/EFMA1990#pr6A-,Employment of Foreign Manpower Act
2,20,"Offences by bodies corporate, etc.",https://sso.agc.gov.sg/Act/EFMA1990#pr20-,Employment of Foreign Manpower Act
3,25B,Directions,https://sso.agc.gov.sg/Act/EFMA1990#pr25B-,Employment of Foreign Manpower Act
4,10,Self-employed foreigners to apply for work passes,https://sso.agc.gov.sg/Act/EFMA1990#pr10-,Employment of Foreign Manpower Act
5,11,Levy in respect of foreign employee or self-em...,https://sso.agc.gov.sg/Act/EFMA1990#pr11-,Employment of Foreign Manpower Act
6,22,General offences,https://sso.agc.gov.sg/Act/EFMA1990#pr22-,Employment of Foreign Manpower Act
7,22A,"Restrictions on receipt, etc., of moneys in co...",https://sso.agc.gov.sg/Act/EFMA1990#pr22A-,Employment of Foreign Manpower Act
8,22B,Proscribed manpower-related practices,https://sso.agc.gov.sg/Act/EFMA1990#pr22B-,Employment of Foreign Manpower Act
9,23,Abetment of offences,https://sso.agc.gov.sg/Act/EFMA1990#pr23-,Employment of Foreign Manpower Act


### 3.1.17 Workplace Safety and Health Act

In [50]:
statute_df = get_statute_sections(statutes['statute'].values[15])
statute_df.iloc[:50,:]

Workplace Safety and Health Act
https://sso.agc.gov.sg/Act/WSHA2006
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/WSHA2006#pr1-,Workplace Safety and Health Act
1,2,Application of Act,https://sso.agc.gov.sg/Act/WSHA2006#pr2-,Workplace Safety and Health Act
2,3,Application of Act to Government,https://sso.agc.gov.sg/Act/WSHA2006#pr3-,Workplace Safety and Health Act
3,4,General interpretation,https://sso.agc.gov.sg/Act/WSHA2006#pr4-,Workplace Safety and Health Act
4,5,"Meanings of ""workplace"" and ""factory""",https://sso.agc.gov.sg/Act/WSHA2006#pr5-,Workplace Safety and Health Act
5,6,"Meanings of ""employee"" and ""employer""",https://sso.agc.gov.sg/Act/WSHA2006#pr6-,Workplace Safety and Health Act
6,7,Appointment of Commissioner for Workplace Safe...,https://sso.agc.gov.sg/Act/WSHA2006#pr7-,Workplace Safety and Health Act
7,8,"Commissioner, Deputy Commissioners, inspectors...",https://sso.agc.gov.sg/Act/WSHA2006#pr8-,Workplace Safety and Health Act
8,9,Identification of inspectors and authorised of...,https://sso.agc.gov.sg/Act/WSHA2006#pr9-,Workplace Safety and Health Act
9,10,Duties according to different capacities,https://sso.agc.gov.sg/Act/WSHA2006#pr10-,Workplace Safety and Health Act


In [51]:
workplace_safety_health_act_df = statute_df.iloc[9:22,:]
workplace_safety_health_act_df = workplace_safety_health_act_df.append(statute_df.iloc[[23,70],:])
workplace_safety_health_act_df = workplace_safety_health_act_df.append(statute_df.iloc[25:27,:])
workplace_safety_health_act_df = workplace_safety_health_act_df.append(statute_df.iloc[33:35,:])
workplace_safety_health_act_df = workplace_safety_health_act_df.append(statute_df.iloc[38:41,:])
workplace_safety_health_act_df = workplace_safety_health_act_df.append(statute_df.iloc[45:48,:])
workplace_safety_health_act_df = workplace_safety_health_act_df.append(statute_df.iloc[49:58,:])
workplace_safety_health_act_df = workplace_safety_health_act_df.reset_index(drop=True)
workplace_safety_health_act_df

Unnamed: 0,section,title,link,statute
0,10,Duties according to different capacities,https://sso.agc.gov.sg/Act/WSHA2006#pr10-,Workplace Safety and Health Act
1,11,Duty of occupier of workplace,https://sso.agc.gov.sg/Act/WSHA2006#pr11-,Workplace Safety and Health Act
2,12,Duties of employers,https://sso.agc.gov.sg/Act/WSHA2006#pr12-,Workplace Safety and Health Act
3,13,Duties of self-employed persons,https://sso.agc.gov.sg/Act/WSHA2006#pr13-,Workplace Safety and Health Act
4,14,Duties of principals,https://sso.agc.gov.sg/Act/WSHA2006#pr14-,Workplace Safety and Health Act
5,14A,Additional duties of principals in relation to...,https://sso.agc.gov.sg/Act/WSHA2006#pr14A-,Workplace Safety and Health Act
6,15,Duties of persons at work,https://sso.agc.gov.sg/Act/WSHA2006#pr15-,Workplace Safety and Health Act
7,16,Duties of manufacturers and suppliers of machi...,https://sso.agc.gov.sg/Act/WSHA2006#pr16-,Workplace Safety and Health Act
8,17,"Duties of persons who erect, install or modify...",https://sso.agc.gov.sg/Act/WSHA2006#pr17-,Workplace Safety and Health Act
9,18,Other related duties of occupiers and employers,https://sso.agc.gov.sg/Act/WSHA2006#pr18-,Workplace Safety and Health Act


### 3.1.18 Work Injury Compensation Act

In [52]:
statute_df = get_statute_sections(statutes['statute'].values[16])
statute_df.iloc[:50,:]

Work Injury Compensation Act
https://sso.agc.gov.sg/Act/WICA2019
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
0,1,Short title and commencement,https://sso.agc.gov.sg/Act/WICA2019#pr1-,Work Injury Compensation Act
1,2,General interpretation,https://sso.agc.gov.sg/Act/WICA2019#pr2-,Work Injury Compensation Act
2,3,"Meanings of ""employee"" and ""employer""",https://sso.agc.gov.sg/Act/WICA2019#pr3-,Work Injury Compensation Act
3,4,Meaning of incapacity,https://sso.agc.gov.sg/Act/WICA2019#pr4-,Work Injury Compensation Act
4,5,Purpose of Act,https://sso.agc.gov.sg/Act/WICA2019#pr5-,Work Injury Compensation Act
5,6,"Assistant Commissioners, investigation officer...",https://sso.agc.gov.sg/Act/WICA2019#pr6-,Work Injury Compensation Act
6,7,Employer’s liability to compensate for work in...,https://sso.agc.gov.sg/Act/WICA2019#pr7-,Work Injury Compensation Act
7,8,Certain accidents deemed to be in course of em...,https://sso.agc.gov.sg/Act/WICA2019#pr8-,Work Injury Compensation Act
8,9,Accidents outside Singapore,https://sso.agc.gov.sg/Act/WICA2019#pr9-,Work Injury Compensation Act
9,10,Employer’s liability to compensate for diseases,https://sso.agc.gov.sg/Act/WICA2019#pr10-,Work Injury Compensation Act


In [53]:
work_injury_compensation_act_df = statute_df.iloc[[24,29,34,49,81],:]
work_injury_compensation_act_df = work_injury_compensation_act_df.append(statute_df.iloc[60:62,:])
work_injury_compensation_act_df = work_injury_compensation_act_df.append(statute_df.iloc[71:73,:])
work_injury_compensation_act_df = work_injury_compensation_act_df.reset_index(drop=True)
work_injury_compensation_act_df

Unnamed: 0,section,title,link,statute
0,25,Offences by employer in relation to insurance,https://sso.agc.gov.sg/Act/WICA2019#pr25-,Work Injury Compensation Act
1,30,Offences relating to provision of insurance,https://sso.agc.gov.sg/Act/WICA2019#pr30-,Work Injury Compensation Act
2,35,Deemed claim when employer has notice of accident,https://sso.agc.gov.sg/Act/WICA2019#pr35-,Work Injury Compensation Act
3,50,Directions by Commissioner,https://sso.agc.gov.sg/Act/WICA2019#pr50-,Work Injury Compensation Act
4,82,Regulations,https://sso.agc.gov.sg/Act/WICA2019#pr82-,Work Injury Compensation Act
5,61,Offence for failing to pay or deposit compensa...,https://sso.agc.gov.sg/Act/WICA2019#pr61-,Work Injury Compensation Act
6,62,Offence of false or misleading information to ...,https://sso.agc.gov.sg/Act/WICA2019#pr62-,Work Injury Compensation Act
7,72,Offences by corporations,https://sso.agc.gov.sg/Act/WICA2019#pr72-,Work Injury Compensation Act
8,73,Offences by unincorporated associations or par...,https://sso.agc.gov.sg/Act/WICA2019#pr73-,Work Injury Compensation Act


### 3.1.19 Corruption, Drug Trafficking and Other Serious Crimes (Confiscation of Benefits) Act

In [54]:
statute_df = get_statute_sections(statutes['statute'].values[17])
statute_df.iloc[:50,:]

Corruption, Drug Trafficking and Other Serious Crimes (Confiscation of Benefits) Act
https://sso.agc.gov.sg/Act/CDTOSCCBA1992
response: 200
no errors, task completed!


Unnamed: 0,section,title,link,statute
0,1,Short title,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr1-,"Corruption, Drug Trafficking and Other Serious..."
1,2,Interpretation,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr2-,"Corruption, Drug Trafficking and Other Serious..."
2,2A,"Meaning of ""item subject to legal privilege""",https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr2A-,"Corruption, Drug Trafficking and Other Serious..."
3,3,Application,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr3-,"Corruption, Drug Trafficking and Other Serious..."
4,3A,Suspicious Transaction Reporting Office,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr3A-,"Corruption, Drug Trafficking and Other Serious..."
5,4,Confiscation orders,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr4-,"Corruption, Drug Trafficking and Other Serious..."
6,5,Confiscation orders for benefits derived from ...,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr5-,"Corruption, Drug Trafficking and Other Serious..."
7,5A,Confiscation order unaffected by confiscation ...,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr5A-,"Corruption, Drug Trafficking and Other Serious..."
8,6,Live video or live television links,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr6-,"Corruption, Drug Trafficking and Other Serious..."
9,7,Assessing benefits of drug dealing,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr7-,"Corruption, Drug Trafficking and Other Serious..."


In [55]:
corruption_drug_serious_crimes_act_df = statute_df.iloc[38:40,:]
corruption_drug_serious_crimes_act_df = corruption_drug_serious_crimes_act_df.append(statute_df.iloc[42:45,:])
corruption_drug_serious_crimes_act_df = corruption_drug_serious_crimes_act_df.append(statute_df.iloc[49:55,:])
corruption_drug_serious_crimes_act_df = corruption_drug_serious_crimes_act_df.append(statute_df.iloc[[56,59,76,79,84],:])
corruption_drug_serious_crimes_act_df = corruption_drug_serious_crimes_act_df.append(statute_df.iloc[61:63,:])
corruption_drug_serious_crimes_act_df = corruption_drug_serious_crimes_act_df.append(statute_df.iloc[66:70,:])
corruption_drug_serious_crimes_act_df = corruption_drug_serious_crimes_act_df.reset_index(drop=True)
corruption_drug_serious_crimes_act_df

Unnamed: 0,section,title,link,statute
0,33,Failure to comply with production order,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr33-,"Corruption, Drug Trafficking and Other Serious..."
1,34,Authority for search,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr34-,"Corruption, Drug Trafficking and Other Serious..."
2,37,Retention of records by financial institutions,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr37-,"Corruption, Drug Trafficking and Other Serious..."
3,38,Register of original documents,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr38-,"Corruption, Drug Trafficking and Other Serious..."
4,39,Duty to disclose knowledge or suspicion,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr39-,"Corruption, Drug Trafficking and Other Serious..."
5,43,Assisting another to retain benefits of drug d...,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr43-,"Corruption, Drug Trafficking and Other Serious..."
6,44,Assisting another to retain benefits from crim...,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr44-,"Corruption, Drug Trafficking and Other Serious..."
7,45,Restriction on revealing disclosure under sect...,https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr45-,"Corruption, Drug Trafficking and Other Serious..."
8,46,"Acquiring, possessing, using, concealing or tr...",https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr46-,"Corruption, Drug Trafficking and Other Serious..."
9,47,"Acquiring, possessing, using, concealing or tr...",https://sso.agc.gov.sg/Act/CDTOSCCBA1992#pr47-,"Corruption, Drug Trafficking and Other Serious..."


## 3.2 Combining the database of crimes  

Now that I have dataframes for each of the statutes, I will combine them into a single database which will be used for the NLP to recognise.

In [56]:
dataframes = [penal_code_df, misuse_drugs_act_df, sedition_act_df, vandalism_act_df, children_young_persons_act_df, immigration_act_df, income_tax_act_df, companies_act_df, prevention_corruption_act_df, customs_act_df, road_traffic_act_df, environmental_public_health_act_df, computer_misuse_act_df, employment_agencies_act_df, employment_foreign_manpower_act_df, workplace_safety_health_act_df, work_injury_compensation_act_df, corruption_drug_serious_crimes_act_df]

In [57]:
statutes_crimes = pd.concat(dataframes).reset_index(drop=True)

In order to allow the Natural Language Processing to identify the crime titles, I will combine the section and statute to a new column `section_statute` as an identifier as there may be sections from different statutes which share the same number.

In [60]:
statutes_crimes['section_statute'] = statutes_crimes['section'] + " " + statutes_crimes['statute']

In [61]:
statutes_crimes.to_csv('../data/statutes_crimes.csv', index=False)

## Observations  

Webscraping to archive the .html files of the judgments was less complicated than I had first imagined as it would be. The process of identifying the structure of the html code to iterate through each page to save the cases and urls, as well as to iterate through each url to save the html file was a fun process which made full use of my logical thinking skills.

However, having performed the above filtering of sections from the statutes manually, it becomes all the more apparent how useful Machine Learning will be in the legal industry.  

It took me roughly 6 hours over 3 days to manually filter the sections which contained offences out of the 18 statutes and add them into dataframes.  

Worthy of a project itself, it would be interesting to explore the use of NLP to identify sections prescribing criminal offences and sections containing their corresponding punishments from the various statutes in order to create a comprehensive database of criminal offences in the statutes of Singapore.  

The webscraping portion of the code has achieved a 100% pulling and archival rate, as no judgments were missed out.

## References

[1] ALL-SIS Task Force on Identifying Skills and Knowledge for Legal Practice, *'A Study of Attorneys' Legal Research Practices and Opinions of New Associates' Research Skills,'* June 2013. [Online]. Available: [https://www.aallnet.org/allsis/wp-content/uploads/sites/4/2018/01/final_report_07102013.pdf](https://www.aallnet.org/allsis/wp-content/uploads/sites/4/2018/01/final_report_07102013.pdf) [Accessed: May. 6, 2021].

[2] LawNet, a service provided by the Singapore Academy of Law, *'Latest Singapore Judgments - Supreme Court Judgments of the last 3 months'*, 2021. [Online]. Available: [https://www.lawnet.sg/lawnet/web/lawnet/free-resources?p_p_id=freeresources_WAR_lawnet3baseportlet&p_p_lifecycle=0&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_pos=2&p_p_col_count=3&_freeresources_WAR_lawnet3baseportlet_action=supreme&_freeresources_WAR_lawnet3baseportlet_page=1](https://www.lawnet.sg/lawnet/web/lawnet/free-resources?p_p_id=freeresources_WAR_lawnet3baseportlet&p_p_lifecycle=0&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_pos=2&p_p_col_count=3&_freeresources_WAR_lawnet3baseportlet_action=supreme&_freeresources_WAR_lawnet3baseportlet_page=1) [Accessed: May 28, 2021].

[3] Glora James-Civetta & Co, Advocates & Solicitors, *'Crimes Punishment'*, 2020. [Online]. Available: [https://www.singaporecriminallawyer.com/crimes-punishment/](https://www.singaporecriminallawyer.com/crimes-punishment/) [Accessed: May 8, 2021].