# Lab 10: Laboratory Notes - Week 10: API Example

You should have an idea, or at least have heard of, how computers can communicate with each other.  In the case of application programs, they can access web services through what is termed an Application Programming Interface (API), such as
* Rest API
* Web API
* SOAP API
* gRPC

For this week's laboratory, we will demonstrate a diabetes prediction service through an API.  We will start with the de-serialisation of some complex data types (a model), write some code to compute the probability of someone having diabetes, and then we will put it into an API service, which we will then access via our browser.

## Diabetic Prediction Model

For this week, we are going to continue on from what we have learned last week about serialisation and de-serialisation.  Let's start by coding this into our Jupyter Notebook.  We only need to import the pickle library.

<span style="color:red">import pickle</span>  

This diabetic prediction model is based on the Pima tribe of Native Americans who live in the central and southern parts of the state of Arizona, US and also in the northwestern states of Sonora and Chihuahua in Mexico.  The tribe has one of the highest percentage incidences of diabetes, and the dataset for this modelling was from the National Institute of Diabetes and Digestive and Kidney Diseases.  The model built has the objective of predicting whether a patient has diabetes. Do note that all patients here are females of at least 21 years of age and are of Pima Indian heritage.  This dataset is also commonly used in many Data Science teaching materials, and you can access it [here](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database). and it can also be found at the [UCI Machine Learning Repository]([https://archive.ics.uci.edu).

A model has been built (I have to admit that is a weak model) with a part of the dataset.  We have serialised and saved that into a file called "<span style="color:red">final_model.sav</span>".   Let's load the model that was already built.

<span style="color:red">classifier = pickle.load(open('final_model.sav', 'rb'))</span>

Now that we have the model "<span style="color:red">classifier</span>" (just a name that we gave our model), we can test it by using the <span style="color:red">classifier.predict()</span> function.  What do we need to input the function?  There are 8 features (variables) that were used, and they are:

* Pregnancies - Number of times pregnant (ranges from 0 - 17)
* Glucose - Plasma glucose concentration over 2 hours in an oral glucose tolerance test (ranges from 0 - 199)
* BloodPressure - Diastolic blood pressure (mm Hg)  (ranges from 0 - 122, logically it would not be 0)
* SkinThickness - Triceps skin fold thickness (mm) (ranges from 0 to 99, generally not thicker than 50)
* Insulin - 2-Hour serum insulin (mu U/ml) (ranges from 0 - 846, normally lower than 300)
* BMI - Body mass index (weight in kg/(height in m)^2) (ranges from 0 - 67, usually between 20 and 45)
* DiabetesPedigreeFunction - Diabetes pedigree function (ranges from 0 - 2.42, most below 1.25)
* Age - Age (years), the model was trained for ages between 21 and 81.

The outcome is either a 0 or 1, where 1 indicates the person is clinically diagnosed as having diabetes.  Let's say that we have the following readings for a patient,

* Pregnancies - 2
* Glucose - 100
* BloodPressure - 80
* SkinThickness - 20
* Insulin - 200
* BMI - 35
* DiabetesPedigreeFunction - 1.0
* Age - 30

We then need to put this into an array, in our case a numpy array.

<span style="color:red">import numpy as np</span>

<span style="color:red">patientData = np.array([2.0, 100.0, 80.0, 20.0, 200.0, 35.0, 1.0, 30.0])  
print(patientData)</span>

You will notice that the patient data is enclosed in a single set of "<span style="color:red">[ ]</span>".  What this means is that it is in a single array as expected.  In order for us to feed it into the model, the model expects to take a list of arrays, meaning that it needs to be in the form of "[[ ], [ ], ... ,[ ]]". We then need to have this in the right form and we call this a reshape.  You can try to conduct the prediction with the patientData, but you should encounter an error, e.g.,

<span style="color:red">y_pred = classifier.predict(patientData)</span>

Understanding that the input is of the wrong format, let's reshape it, and call it <span style="color:red">patientData_reshaped</span>.

<span style="color:red">patientData_reshaped = patientData.reshape(1,-1)</span>

and proceed to do a prediction.

<span style="color:red">y_pred = classifier.predict(patientData_reshaped)</span>

You should get either a <span style="color:red">0.</span> or a <span style="color:red">1.</span> (showing positive).  As mentioned above regarding the "<span style="color:red">[[ ], [ ], ... ,[ ]]</span>", if we are to provide it a few patients readings, the model can predict for each one with a single input.  We have included a file "lab11_data.csv" which consists of unseen data (by the model that has been built) with the respective labels.

#### Exercise 10.1:

Read the file "lab11_data.csv", subset the features and the label, run the prediction using the model loaded above, and display the confusion matrix.

## Code Preparation

As we are building a tool to accept an external input and respond accordingly, we code a function.  In Python, a function is a section of code that performs a specific and usually repeatedly used task.  It will typically involve giving it some input (not always, e.g., asking a function to return the current time), process the input and return some output.  In Python this is written as follows

<span style="color:red">def predict_diabetes():  
  """ The task you want to do"""</span>

It is called in the Python code later using

<span style="color:red">predict_diabetes()</span>

You can provide the input to the function in between the parenthesis and specify the type as well.  Let's define our function and call it predict_diabetes(), and then we have an input of type "str" which produces the output of type "str".

<span style="color:red">def predict_diabetes(name: str) -> str:</span>

We then may want to check the input, to ensure that there is an input provided and it is of the type "str".  (Note, we have indented it).

<p style="margin-left: 40px;"><span style="color:red">if name == False or type(name) != str:</span></p>  
<p style="margin-left: 80px;"><span style="color:red">return "Name must contain letters and be of type string"</span></p>

Once it determines that it is being passed a string (note that in this implementation, we assume that the input string is correct, which is never going to be the case), we need to convert the input string into an numpy array.  We use a new library called <span style="color:red">ast</span> to simplify our work here.  Here is the full code:

<span style="color:red">import pickle  
import numpy as np  
import ast</span>

<span style="color:red">def predict_diabetes(name):</span>
<p style="margin-left: 40px;"><span style="color:red">if name == False or type(name) != str:</span></p>
<p style="margin-left: 80px;"><span style="color:red">return "Name must contain letters and be of type string"</span></p>
<p style="margin-left: 40px;"><span style="color:red">classifier = pickle.load(open('final_model.sav', 'rb'))</span></p>
<p style="margin-left: 40px;"><span style="color:red">X = np.array(ast.literal_eval(name)).reshape(1,-1)</span></p>
<p style="margin-left: 40px;"><span style="color:red">y_pred = classifier.predict(X)</span></p>
<p style="margin-left: 40px;"><span style="color:red">if y_pred == [1.0]:</span></p>
<p style="margin-left: 80px;"><span style="color:red">return "Positive, patient is diabetic"</span></p>
<p style="margin-left: 40px;"><span style="color:red">else:</span></p>
<p style="margin-left: 80px;"><span style="color:red">return "Negative, patient is not diabetic"</span></p>

Now that we have this function, we can call the function.

<span style="color:red">predict_diabetes("2.0, 100.0, 80.0, 20.0, 200.0, 35.0, 1.0, 30.0")</span>

Note: In an actual implementation, we will usually send in input in JSON format, parse it and then create the numpy array for the model.  Upon return, we will take the "<span style="color:red">0.</span>" or "<span style="color:red">1.</span>" and return False or True instead.  However, this is just to illustrate it in the simplest form possible and you can expand your code from here if you like.

## Serving it as an API using Flask

Now, you can save the code above as a standard Python file (without the "predict_diabetes("").  You can actually go to your Jupyter Notebook and then go to "Download as" and select the "Python (.py)".

![P1](picture/P1.png)

It will save it as the name of your Notebook.  Do change the name appropriately, let's call it "app.py".  You would now have a Python application script.

### Install flask

Your Python environment would not have flask installed by default, so you will need to install it.  Other than flask, many commercial applications use Django and some prototyping uses bottle.  However, flask seems to be taking traction lately and we will use it here.  Go to your command line and simply install flask using pip (or other tools, depending on your Python installation or your IDE).  If you have issues installing this (which is common), don't fret, it's not examinable for this course but do try to get it done for your own experience.

<span style="color:red">pip install flask</span>

### Flask Python Code

Edit your app.py script, you can use your IDE or simply just use Notepad (MS-Windows) Or TextEdit (MacOS).  You will need to add some lines before and after your existing code.

<span style="color:red">from flask import Flask, request  
import numpy as np  
import pickle  
import ast</span>

<span style="color:red">app = Flask(__name__)</span>

<span style="color:red">def predict_diabetes(name):</span>
<p style="margin-left: 40px;"><span style="color:red">if name == False or type(name) != str:</span></p>
<p style="margin-left: 80px;"><span style="color:red">return "Name must contain letters and be of type string"</span></p>
<p style="margin-left: 40px;"><span style="color:red">classifier = pickle.load(open('final_model.sav', 'rb'))</span></p>
<p style="margin-left: 40px;"><span style="color:red">X = np.array(ast.literal_eval(name)).reshape(1,-1)</span></p>
<p style="margin-left: 40px;"><span style="color:red">y_pred = classifier.predict(X)</span></p>
<p style="margin-left: 40px;"><span style="color:red">if y_pred == [1.0]:</span></p>
<p style="margin-left: 80px;"><span style="color:red">return "Positive, patient is diabetic"</span></p>
<p style="margin-left: 40px;"><span style="color:red">else:</span></p>
<p style="margin-left: 80px;"><span style="color:red">return "Negative, patient is not diabetic"</span></p>

<span style="color:red">""" This is a HTTP method, you can read about it on your own"""</span>

<span style="color:red">@app.route('/predict', methods=['GET'])</span>

<span style="color:red">def predict():</span>
<p style="margin-left: 40px;"><span style="color:red">return predict_diabetes(str(request.query_string, 'utf-8'))</span></p>

Once you have done that, save the file and go to your command line and run

<span style="color:red">flask run</span>

(You can also try to run it using <span style="color:red">python app.py</span>). You should get something like the image below.

![P2](picture/P2.png)

In short, you now have a web service running on your personal computer (notebook) which you can assess using the HTTP protocol.  Programmers can call this service using different programming languages.  In Python, you will probably need to <span style="color:red">import request</span> and there are some tutorials.  For simplicity, let's use a web browser that can also make the call via the HTTP protocol.  In your browser, key in

<span style="color:red">http://localhost:5000/predict?2.0,100.0,80.0,20.0,200.0,35.0,1.0,30.0</span>

* localhost means your own PC
* 5000 is the port number that was assigned when you ran the app.py script.  You can check the code above.
* predict is the path that was specified in your app.py script as well.
* ? indicates that the follow on (subsequent) entries are the parameters for the GET.  For simplicity for today, we will just use the raw entry after the ?
* 2.0,100.0,80.0,20.0,200.0,35.0,1.0,30.0 is an example parameter of the 8 features used to predict.

There are other ways to do this.  Do note that you won't be assessed on this part of the course, this is to assist in your understanding of deploying a model as an API service. 

This is the last laboratory session for F78DS. We hope that you have gained lots of practice with Python and a good idea of the flow of a typical data science project, and we hope that this has been useful.

## My code part

#### Diabetic Prediction Model

In [4]:
# import libraries
import pickle
import numpy as np

classifier = pickle.load(open('data/final_model.sav', 'rb'))

patientData = np.array([2.0, 100.0, 80.0, 20.0, 200.0, 35.0, 1.0, 30.0])
print(patientData)

[  2. 100.  80.  20. 200.  35.   1.  30.]


In [5]:
y_pred = classifier.predict(patientData)

ValueError: Expected 2D array, got 1D array instead:
array=[  2. 100.  80.  20. 200.  35.   1.  30.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

In [7]:
patientData_reshaped = patientData.reshape(1,-1)

y_pred = classifier.predict(patientData_reshaped)

#### Exercise 10.1:

In [13]:
import pandas as pd
from sklearn.metrics import confusion_matrix

# Read the data
df = pd.read_csv('data/lab11_data.csv')

# Subset the features and the label
X = df.iloc[:, 1:-1].values  
y_true = df.iloc[:, -1].values  

# Préedct the label
y_pred = classifier.predict(X)

# Compute the confusion matrix
conf_matrix = confusion_matrix(y_true, y_pred)
print("Confusion matrix :\n", conf_matrix)

Confusion matrix :
 [[388  88]
 [136 118]]


#### Code Preparation

In [17]:
import pickle
import numpy as np
import ast

def predict_diabetes(name):

    if name == False or type(name) != str:
        return "Name must contain letters and be of type string"
    classifier = pickle.load(open('data/final_model.sav', 'rb'))
    X = np.array(ast.literal_eval(name)).reshape(1,-1)
    y_pred = classifier.predict(X)
    if y_pred == [1.0]:
        return "Positive, patient is diabetic"
    else:
        return "Negative, patient is not diabetic"
    
predict_diabetes("2.0, 100.0, 80.0, 20.0, 200.0, 35.0, 1.0, 30.0")

'Positive, patient is diabetic'