# DSCI 525 - Web and Cloud Computing

***Milestone 4:*** In this milestone, you will deploy the machine learning model you trained in milestone 3.

Milestone 4 checklist :

- [x] Use an EC2 instance. 
> Please note that our group created a new EC2 instance (Owner: student84) for `task 2` deployment of our API. 
- [x] Develop your API here in this notebook.
- [x] Copy it to ```app.py``` file in EC2 instance.
- [x] Run your API for other consumers and test among your colleagues.
- [x] Summarize your journey.

<hr>

## 1. Develop your API

**Script: https://github.com/UBC-MDS/525_group11/blob/main/scripts/app.py**

```python
from flask import Flask, request, jsonify
import joblib

import numpy as np

app = Flask(__name__)

# 1. Load your model here
model = joblib.load("model.joblib")

# 2. Define a prediction function
def return_prediction(data):

    # format input_data here so that you can pass it to model.predict()

    return model.predict(np.array([data]))

# 3. Set up home page using basic html
@app.route("/")
def index():
    # feel free to customize this if you like
    return """
    <h1>Welcome to our rain prediction service</h1>
    To use this service, make a JSON post request to the /predict url with 5 climate model outputs.
    """

# 4. define a new route which will accept POST requests and return model predictions
@app.route('/predict', methods=['POST'])
def rainfall_prediction():
    content = request.json  # this extracts the JSON content we sent
    inputs = content["data"]
    prediction = return_prediction(inputs)
    results = {"Input": inputs, "Prediction": prediction.tolist()}  # return whatever data you wish, it can be just the prediction
                     # or it can be the prediction plus the input data, it's up to you
    return jsonify(results)
```

<hr>

## 2. Deploy your API

**Screenshot**

![](../img/m4_task_2.png)

> Please note that our group created a new EC2 instance (Owner: student84) for task 2 deployment of our API.

<hr>

## 3. Summarize your journey from Milestone 1 to Milestone 4

**Throughout these 4 milestones, our group sequentially took on the roles of 1. Data Engineers, 2. Infrastructure Engineers, 3. Data Scientists, and 4. DevOps to perform prediction tasks on the large `daily rainfall in Australia` [dataset](https://figshare.com/articles/dataset/Daily_rainfall_over_NSW_Australia/14096681) (5.7 GB), and we have achieved the following 4 objectives in each of the milestones.**

* **`Milestone 1`: We got the data from the web using API; we processed it locally, and converted it to an efficient file format. Specifically,**

    * We acknowledged that it was both memory-wide costly and speed-wide costly to load big data (csv files) using `Pandas`.   
    * As a result, we tried 3 different approaches to reduce memory usage when performing EDA in Python, including:
        1. changing the columns' data types in the dataframe; 
        2. loading only the columns we are interested in; 
        3. loading data in chunks.
    * Additionally, we found that `Feather` file format is the most speedy and flexible approach to transfer dataframe from Python to R in our case.
    
    
* **`Milestone 2`: We moved the data to the cloud via AWS; we setup infrastructure in cloud, and made data ready for Machine Learning task. Specifically,**

    * We setup our own `EC2` instance, `S3` bucket, and `TLJH` (Tiny Little JupyterHub).
    * We moved the `daily rainfall in Australia` csv file from `Milestone 1` data folder to our `S3` bucket in cloud.
    * We then got the data from `S3` into our `TLJH` and wrangled the data into a format suitable for training a machine learning model later.
    * Finally we sent back the ready-to-use data to `S3` in cloud.


* **`Milestone 3`: We setup the distributed infrastructure `EMR-Spark` cluster in cloud, and developed a Machine Learning model in `Spark` using the ready-to-use data stored in S3 from `Milestone 2`. Specifically,**

    * We setup our `EMR` cluster with `Spark` and `TLJH`.
    * We setup Firefox as web browser for `EMR` and configured FoxyProxy for Firefox.
    * On `TLJH` running the `Spark` engine, we developed a Machine Learning model using `scikit-learn` and obtained the best hyperparameter settings using `Spark`'s `MLlib`.


* **`Milestone 4`: We deployed the Machine Learning model we trained from `Milestone 3` in cloud using `Flask` so that other consumers can use it. Specifically,**

    * We developed our API called `app.py` using `Flask` on `TLJH`, where we created a new endpoint that accepts a POST request of the 25 features from our user and returns a prediction to the user based on the trained Machine Learning model.
    * We deployed our API successfully and made our server available at our `EC2` instance's IP address on port 8080. 
    * Furthermore, we made our server persistent using `screen`.

**From taking this course and working on these 4 milestones, our team has now gained a better understanding of how to work with a large dataset, storing and accessing them in cloud, while building and deploying machine learning models in cloud.**

<hr>