# Capstone Project: Criminal Case Database

### Overall Contents:
- Background
- Webscraping Lawnet
- Webscraping Singapore Statutes
- Natural Language Pocessing
- Search Function
- [Flask and Google App Engine](6.-Flask-and-Google-App-Engine) **(In this notebook)**
- Conclusion and Recommendation

## Datasets

For the search function, I will use the database which I created previously. 

The datasets that I will use are as follows:- 

* database.csv


## 6. Flask and Google App Engine 

Once again, the goal of the project is to create a database where users can search for and be shown judgments based on different criteria such as statutes, crimes, or case names.  

Thus, I decided to implement a web app where other uses can test the search functions.  

There are many ways for web implementation of python code, but I decided to use `Flask` and `Google App Engine (GAE)` for the deployment of my code.

### 6.1 Flask setup  

In order to use flask, I created a new folder for my project `app` with the following file structure:  

app/  
├── static/  
│   └── images/  
│       ├── ______  
│       └── ______     
├── data/  
│   ├── criminalcasedatabase-e080a527a0e8.json  
│   └── database.csv  
├── templates/  
│   ├── form.html  
│   ├── error.html  
│   └── results.html  
├── app.yaml  
├── main.py  
└── requirements.txt  

The `main.py` file contains the code for the flask app, while the two files in the templates folder, `form.html` and `results.html` contain the html codes for the index page and the output page. The `database.csv` file contains the database used for the search function, while the rest of the files are required for the Google App Engine deployment. 

### 6.1.1 `main.py ` 

#### 6.1.1.1 Imports and initialization

For the code to run through flask, the code has to be kept in `main.py`.  

First, it contains the imports code for the required modules used.  

<details> 
    <summary> <b> Click here for imports and settings </b></summary>
    
```python
# Imports
from flask import Flask, render_template, send_file, make_response, url_for, Response
from flask import *
from google.cloud import storage
import pandas as pd
import re
import numpy as np
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import io
import requests
pd.set_option('display.max_colwidth', 300)
plt.ioff()
        
```  
</details>  

Next, the flask app has to be initialized:  

```python
# Initialize flask
app = Flask(__name__)
```

The `Home` page has to be created next:  

```python
# Route 1: Home
@app.route("/")

# Define home function
def index():
        return render_template('form.html')
```
This returns a home page which displays `form.html`.  


#### 6.1.1.2 `form.html`  

The home page, `form.html` contains a simple html code which displays the title of the project as well as a short summary.  
It also includes a search box and button, with instructions on the use of the search.  

![form.png](../images/form.png)  
Sample `form.html` page

<details> 
    <summary> <b> Click here for `form.html` </b></summary>
    
```html
<!DOCTYPE html>
<html>
<head>
	<title>Criminal Case Database</title>
</head>
<body>
  <h1>Criminal Case Database alpha test</h1>
  <p>This is a proof of concept for my GA DSI21 capstone project. </p>
  <p>This search box should help to provide some summary statistics for recent judgments in the Singapore Courts!</p>
  <p>Try an example search from the following (case-insensitive):</p>
  <li> Section 33 Criminal Procedure Code</li>
  <li> Misuse of Drugs Act</li>
  <li> forgery </li>
  <li> Tang Keng Lai v Public Prosecutor </li>

	<form action="/submit">
		<p>
			<label for="input_string">Case Name / Offence / Statute</label><br>
			<input type="text" name="input_string">
		</p>
		<p><button type="submit">Search</button></p>
	</form>
  <p> *Note that not all searches may yield results as the database is still small. </p>
  <p> Further, the search format should match the examples above.</p>
</body>
</html>

        
```  
</details>  


#### 6.1.1.3 Define functions

Next, the main search functions from the previous page are defined as functions in `main.py`:  
* `classify_search`  
* `search_search`  

However, the rest of the functions have now been split up to return the outputs individually:  
* `aggravating`  

<details> 
    <summary> <b> Click here for `aggravating` function code </b></summary>
    
```python
def aggravating(input_string):
    """
    Input: `input_string` as dtype string.
    Output: `results1` as dtype string containing `aggravated_rate` for this search
    """
    results = search_search(input_string)
    results = results.reset_index(drop=True)
    aggravated_rate = results.aggravation_discussed.mean()
    results1 = f'Aggravating factors were discussed in {round(aggravated_rate*100,1)}% of the cases for this search.'
    return results1
        
```  
</details>  

* `mitigating`  


<details> 
    <summary> <b> Click here for `mitigating` function code </b></summary>
    
```python
def mitigating(input_string):
    """
    Input: `input_string` as dtype string.
    Output: `results1` as dtype string containing `mitigation_rate` for this search
    """
    results = search_search(input_string)
    results = results.reset_index(drop=True)
    mitigation_rate = results.mitigation_discussed.mean()
    results1 = f'Mitigating factors were discussed in {round(mitigation_rate*100,1)}% of the cases for this search.'
    return results1
        
```  
</details>  

* `create_figure` - which contains modified code for the plot to be able to plot and output on GAE

<details> 
    <summary> <b> Click here for `create_figure` function code </b></summary>
    
```python
def create_figure(input_string):
    """
    Input: `input_string` as dtype string.
    Output: `fig` as plot of top 10 citations for this search
    """
    # Takes the `input_string` and performs a search, returning a filtered dataframe
    results = search_search(input_string)
    results = results.reset_index(drop=True)
    
    # Create citations dataframe
    citations = pd.DataFrame(results['citations'])
    
    # Split the values of the `citations` column
    citations['citations'] = citations['citations'].apply(lambda x: x.split(','))
    
    # Dummify the columns of the split results.
    citations2 = pd.DataFrame(pd.get_dummies(citations['citations'].apply(pd.Series).stack()).sum(level=0))
    
    # Create a fig for the plot
    fig, ax = plt.subplots(figsize = (10,8))
    
    # Set colour of the plot background
    fig.patch.set_facecolor('#E8E5DA')

    # Set the x and y values of the plot
    x = citations2.sum().sort_values(ascending=False).head(10)[::-1].index
    y = citations2.sum().sort_values(ascending=False).head(10)[::-1]
    
    # Plot a horizontal bar plot of the data
    ax.barh(x, y, color = "#304C89")
    
    # Set plot title
    plt.title(f'Top citations for {input_string}', size = 15)
    
    # Set plot x_ticks size
    plt.xticks(rotation = 0, size = 12)
    
    # Set plot layout to tight
    plt.tight_layout()

    # Return the plot as output
    return fig
        
```  
</details>  



#### 6.1.1.4 Second route for search output

The second route I created for the flask app is the `/submit` page, which returns the results of the search.  

```python
@app.route('/submit')
def submission():
    """
    Input: `input_string` as dtype string which is given from the search box in `form.html`.
    Output: `results.html` with the information from the searches `mitigation rate` as `results2`, `aggravated rate` as `results1`, `search_results` as `results`
    """
    # Load in the form data from the incoming request
    user_input = request.args
    
    # Manipulate data into a format that we pass to our model
    data = str(escape(user_input['input_string']))
    
    # Perform searches and return the results
    results = search_search(data).reset_index(drop=True)
    results1 = aggravating(data)
    results2 = mitigating(data)
    
    # Convert `input_string` to a suitable format for calling the plot to display as an image
    query = str(data).replace(" ", "+")
    
    # Render `results.html` as the resulting page containing the search results and statistical summary.
    # Also makes a call to `/plot.png` to create a plot for the search results and display it.
    return render_template("results.html", column_names=results.columns.values, row_data=list(results.values.tolist()),
                           link_column="link", zip=zip, plot_name=f"Top citations for '{str.title(data)}':", url=f'/plot.png?input_string={query}', aggravating=results1, mitigating=results2)

```  

Here, the input from the search box in `form_html` is used to perform the search functions.  

The resulting outputs from the functions are then passed to `results.html` which will be displayed.


#### 6.1.1.5 `results.html`  

The results page, `results.html` contains some simple html code with a little formatting. Once again, it has the name of the project and a short description.  

Next, it displays the summary statistics for the search.  

An image of the top 10 top citations as a plot are displayed next.  

Finally, we get a table of the search results.  

![results.html](../images/results.png)  
Sample `results.html`  

The links in the table are clickable and directs the user to the Lawnet page of that judgment.

<details> 
    <summary> <b> Click here for `results.html` </b></summary>
    
```html
<!DOCTYPE html>
<html>
<head>
<style>

table {
    font-family: arial, sans-serif;
    border-collapse: collapse;
    width: 100%;
}

td, th {
    border: 1px solid #dddddd;
    text-align: left;
    padding: 8px;
}

tr:nth-child(even) {
    background-color: #dddddd;
}

h1 {
  font-size: 40px;
  text-align: centre
}

h2 {
  font-size: 20px
}

p {
  font-size: 15px
}

</style>
</head>

<body>
  <h1>Criminal Case Database alpha test</h1>
  <p1>This is a proof of concept for my GA DSI21 capstone project. </p1>

  <h2> Summary Statistics </h2>
  <p> {{mitigating}} </p>
  <p> {{aggravating}} </p>
  <h2>{{ plot_name }}</h2>

  <img src={{url}} alt="Top Citations for the search" height="600" width="800">

  <h2> Search results </h2>
  <p> *Please note that not all Lawnet links will work as free resources are only available for 3 months. </p>
  <p> *Possible offences and possible statutes were extracted and permutated from the judgment text. </p>
  <table>
      <tr>
          {% for col in column_names %}
          <th>{{col}}</th>
          {% endfor %}
      </tr>
      {% for row in row_data %}
      <tr>
          {% for col, row_ in zip(column_names, row) %}
          {% if col == link_column %}
          <td>
              <a href={{ row_ }}><button>Link to Lawnet</button></a>
          </td>
          {% else %}
          <td>{{row_}}</td>
          {% endif %}
          {% endfor %}
      </tr>
      {% endfor %}

  </table>
</body>
</html>

        
```  
</details>  


#### 6.1.1.6 Third route for plotting  

The third route in the flask app is `/plot.png`, which is a page that would not normally be navigated to.  

It contains a function which returns the top 10 citations as a horizontal bar plot.  

```python  
@app.route('/plot.png')
def plot_png():
    # Input arguments
    user_input = request.args
    
    # Manipulate data into a format that we pass to our model
    data = str(escape(user_input['input_string'])).replace("+", " ")

    # Create a figure with the input
    fig = create_figure(data)

    # Store the figure as bytes in an in-memory buffer
    output = io.BytesIO()
    FigureCanvas(fig).print_png(output)

    # Return the figure as an output
    return Response(output.getvalue(), mimetype='image/png')

```

The end of `main.py` also contains the standard code to prevent code from accidentally being run, and the code for the flask app to be hosted and bound to ports.  

```python

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8080, debug=True)
    
```

#### 6.1.1.7 Custom error page  

If an invalid search string is given, it is possible that the app will not display the expected results.  

Hence, I followed [Lakshya Srivastaava's simple guide to create custom error pages in flask](https://www.codementor.io/@lakshyasri/custom-error-pages-in-flask-xrgye5l5e)[1].  

This was a simple process of adding in the following code to my `main.py`:  

```python  
@app.errorhandler(500)
def invalid_search(e):
    return render_template('invalid_search.html'), 500  
```
This code returns the 500 error response as a custom html page where I placed a custom error message.  

![error](../images/error.png)  
The custom error page

<details> 
    <summary> <b> Click here for `invalid_search.html` </b></summary>
    
```html
    <!DOCTYPE html>
<html>
<head>
<style>

* {
  position: relative;
  margin: 0;
  padding: 0;
  box-sizing: border-box;
}

.centered {
  height: 100vh;
  display: flex;
  flex-direction: column;
  justify-content: center;
  align-items: center;
}

h1 {
  margin-bottom: 50px;
  font-size: 50px;
}

.message {
  font-size: 18px;
}


</style>
</head>

<body>
<section class="centered">
  <h1>500 Server Error</h1>
  <div class="container">
    <div><span class="message">No results found.</span></div>
    <div><span class="message">Please ensure your search is in the following format:</span></div>
    <div><span class="message">Case Name (e.g. John v Smith),</span></div>
    <div><span class="message">Part of offence name (e.g. Forgery - try to avoid), or</span></div>
    <div><span class="message">Statute name (e.g. Section 33 Criminal Procedure Code)</span></div>
    <div><span class="message">If your search input was correct, it's probably me, sorry!</span></div>
  </div>
</section>
</body>
```
    
</details>

### 6.1.2 Google App Engine (GAE)  

In order to deploy the code to Google App Engine, I first created a Google Cloud account.  

Next, I had to install the gcloud SDK which allows me to create and manage App Engine apps.  

From the Google App Engine dashboard, I then created a new project for `criminalcasedatabase` and set it up locally by using `gcloud auth login` and `gcloud config set project (project id)` in my command-line.  

After setting up my `app.yaml` and `requirements.txt`, as well as uploading the necessary files on the Google Cloud Storage bucket for my project, the app was then ready for deployment.

#### 6.1.2.1 `requirements.txt`  

The first required file for GAE is `requirements.txt`.  

This basically contains a list of all the required packages for the app to run.  

<details> 
    <summary> <b> Click here for `requirements.txt` </b></summary>
    
```
Flask==1.1.2
beautifulsoup4
lxml
matplotlib
pandas
requests
numpy
google-cloud-storage
gcsfs
requests

    
```  
</details>  

#### 6.1.2.2 `app.yaml`  

The second required file for GAE is `app.yaml`.  

This file contains the App Engine app's settings, as well as the information about the app's code, such as runtime and latest version identifier.  

My `app.yaml` was very simple as there were no settings to worry about.  


<details> 
    <summary> <b> Click here for `app.yaml` </b></summary>
    
```
runtime: python38

    
```  
</details>  


#### 6.1.2.3 Google Cloud Storage  

Finally, the required files such as `database.csv` were uploaded to the Google Cloud Storage bucket for the project.  

This was a simple process of creating folders, and dragging and dropping to upload the files.  

![gcs.png](../images/gcs.png)  

The Google Cloud Storage page

#### 6.1.2.4 GAE deployment  

With the required files setup, deployment to the GAE was as simple as typing in `gcloud app deploy` in my command-line from within the app's folder.  

When asked for the region, I chose the appropriate one for me, `[6] asia-southeast2` and answer `Y` when asked if I wanted to continue deploying my app.  

After a waiting a short while for the app to deploy, it was now online and could be accessed at:  

[Criminal Case Database](https://criminalcasedatabase.et.r.appspot.com)

## Observations  

The process to setup the python code as a Flask and subsequently GAE app was challenging in the beginning as I did not understand the core concepts. However, after a bit of tinkering around and following online guides such as the [Martin Breuss's guide on RealPython](https://realpython.com/python-web-applications/#build-a-basic-python-web-application)[2] and [James Asher's guide to showing matplotlib plots](https://towardsdatascience.com/how-to-easily-show-your-matplotlib-plots-and-pandas-dataframes-dynamically-on-your-website-a9613eff7ae3)[3], I managed to successfully deploy the app to GAE.  

This was overall a fun and learning experience for me, with valuable skills learnt in app deployment for python code.

## References

[1] Lakshya Srivastava, *'Custom Error Pages in Flask,'* Aug 12 2019. [Online]. Available: [https://www.codementor.io/@lakshyasri/custom-error-pages-in-flask-xrgye5l5e](https://www.codementor.io/@lakshyasri/custom-error-pages-in-flask-xrgye5l5e) [Accessed: June 6, 2021].

[2] Martin Breuss, *'Python Web Applications: Deploy Your Script as a Flask App,'* Feb 1 2021. [Online]. Available: [https://realpython.com/python-web-applications/#build-a-basic-python-web-application](https://realpython.com/python-web-applications/#build-a-basic-python-web-application) [Accessed: May 4, 2021].

[3] James Asher, *'How to Easily Show Your Matplotlib Plots and Pandas Dataframes Dynamically on Your Website,'* Feb 10 2021. [Online]. Available: [https://towardsdatascience.com/how-to-easily-show-your-matplotlib-plots-and-pandas-dataframes-dynamically-on-your-website-a9613eff7ae3](https://towardsdatascience.com/how-to-easily-show-your-matplotlib-plots-and-pandas-dataframes-dynamically-on-your-website-a9613eff7ae3) [Accessed: May 4, 2021].