# A notebook to test my API Service

In [62]:
import pandas as pd

from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import f1_score

from urllib import request, parse
import json    
import pickle

***
## Quick Start  
  
1. **To test the FastAPI Service you have to execute the following commands in terminal from _course_project_ directory:**  
  (ensure that you've installed docker on your machine)  
  
  <code>sudo docker build -t course_project .</code>  
  <code>sudo docker run -d -p 8080:8080 -v ~/PROGRAMMING/machine-learning-in-business/models:/app/models course_project</code>  
  
  In the CLI output you'll see your container ID  
  <br>
2. You should also check if your container is alive or just has been terminated:  
  
  <code>sudo docker ps -a</code>  
  
  **Expected output** (pay attention to STATUS, it has to be 'Up'):  
  ```
  CONTAINER ID   IMAGE            COMMAND                  CREATED              STATUS                           PORTS                                       NAMES
  a918114e1522   course_project   "uvicorn main:app --…"   About a minute ago   Up About a minute                0.0.0.0:8080->8080/tcp, :::8080->8080/tcp   youthful_banach
  ```  
  <br>
3. Now check the logs of your container  
  
  <code>sudo docker logs [your container ID]</code>  
  
  If everything is built correctly you will see the next **output**:  
  ```
  INFO:     Started server process [6966]
  INFO:     Waiting for application startup.
  INFO:     Application startup complete.
  INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
  ```  
  
4. **Now you can execute the cells below**  
***

### Loading the data

In [73]:
DATA_PATH = "./data/train.csv"

In [74]:
df = pd.read_csv(DATA_PATH)
df.head(3)

Unnamed: 0,age,workclass,fnlwgt,education,educational-num,marital-status,occupation,relationship,race,gender,capital-gain,capital-loss,hours-per-week,native-country,income_>50K
0,67,Private,366425,Doctorate,16,Divorced,Exec-managerial,Not-in-family,White,Male,99999,0,60,United-States,1
1,17,Private,244602,12th,8,Never-married,Other-service,Own-child,White,Male,0,0,15,United-States,0
2,31,Private,174201,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,40,United-States,1


#### A bit of data preprocessing

In [75]:
df.dropna(inplace=True)

columns_to_drop = ['fnlwgt', 'educational_num', 'workclass', 'race', 'native_country', 'gender']
columns_ = [_.replace('-', '_') for _ in list(df.columns)]
columns = dict(zip(list(df.columns), columns_))
df.rename(columns=columns, inplace=True)
df_new = df.drop(columns=columns_to_drop)

X_new = df_new.loc[:, df_new.columns[:-1]]
y_new = df_new['income_>50K']

X_train, X_test, y_train, y_test = train_test_split(X_new, y_new, random_state=42, test_size=0.2)

### The function to get model predictions via API

In [76]:
def get_prediction(df):
    data = df.to_json(orient='columns')
    dataasbytes = data.encode('utf-8') 
    myurl = "http://0.0.0.0:8080/predict"
    req = request.Request(myurl)
    req.add_header('Content-Type', 'application/json; charset=utf-8')
    req.add_header('Content-Length', len(dataasbytes))
    response = request.urlopen(req, dataasbytes)
    return json.loads(response.read())['predictions']

## Getting model predictions:

In [84]:
y_pred = get_prediction(X_test)

### Comparing the results

In [81]:
filename = '../models/model02.pkl'
loaded_model = pickle.load(open(filename, 'rb'))

actual model score:

In [82]:
cv_score = cross_val_score(loaded_model, X_test, y_test, cv=3, scoring='f1_weighted')
cv_score.mean()

0.8471090916003515

score via API service:

In [83]:
f1_score(y_test, y_pred, average='weighted')

0.8567749043423722