# Building a Web App to use a ML Model


Filepath of notebook: davidbrici/my_portfolio/Machine-Learning/Projects/1_Python-Basics/notebooks/

Filepath of data: davidbrici/my_portfolio/Machine-Learning/Projects/1_Python-Basics/data/raw/ufos.csv/


## Import data

In [5]:
import pandas as pd
import numpy as np

ufos = pd.read_csv('../data/raw/ufos.csv')
ufos.head()

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
0,10/10/1949 20:30,san marcos,tx,us,cylinder,2700.0,45 minutes,This event took place in early fall around 194...,4/27/2004,29.883056,-97.941111
1,10/10/1949 21:00,lackland afb,tx,,light,7200.0,1-2 hrs,1949 Lackland AFB&#44 TX. Lights racing acros...,12/16/2005,29.38421,-98.581082
2,10/10/1955 17:00,chester (uk/england),,gb,circle,20.0,20 seconds,Green/Orange circular disc over Chester&#44 En...,1/21/2008,53.2,-2.916667
3,10/10/1956 21:00,edna,tx,us,circle,20.0,1/2 hour,My older brother and twin sister were leaving ...,1/17/2004,28.978333,-96.645833
4,10/10/1960 20:00,kaneohe,hi,us,light,900.0,15 minutes,AS a Marine 1st Lt. flying an FJ4B fighter/att...,1/22/2004,21.418056,-157.803611


We will now reduce the dataframe to the columns of interest

In [None]:
ufos = pd.DataFrame({'Seconds': ufos['duration (seconds)'], 'Country': ufos['country'], 'Latitude': ufos['latitude'], 'Longitude': ufos['longitude']})

In [13]:
ufos.head()

Unnamed: 0,Seconds,Country,Latitude,Longitude
0,2700.0,us,29.883056,-97.941111
1,7200.0,,29.38421,-98.581082
2,20.0,gb,53.2,-2.916667
3,20.0,us,28.978333,-96.645833
4,900.0,us,21.418056,-157.803611


In [14]:
ufos.Country.unique()

array(['us', nan, 'gb', 'ca', 'au', 'de'], dtype=object)

In [15]:
ufos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80332 entries, 0 to 80331
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Seconds    80332 non-null  float64
 1   Country    70662 non-null  object 
 2   Latitude   80332 non-null  float64
 3   Longitude  80332 non-null  float64
dtypes: float64(3), object(1)
memory usage: 2.5+ MB


## Clean data
Remove null values and only include sightings lasting between 1 and 60 seconds.

In [18]:
ufos.dropna(inplace=True)
ufos = ufos[(ufos['Seconds']>=1) & (ufos['Seconds']<=60)]

In [21]:
ufos.count()

Seconds      25863
Country      25863
Latitude     25863
Longitude    25863
dtype: int64

Convert text values to a number. This uses LabelEncoder object from scikit-learn, however, you can also use pd.get_dummies() to do the encoding.

In [22]:
from sklearn.preprocessing import LabelEncoder
ufos['Country'] = LabelEncoder().fit_transform(ufos['Country'])
ufos.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ufos['Country'] = LabelEncoder().fit_transform(ufos['Country'])


Unnamed: 0,Seconds,Country,Latitude,Longitude
2,20.0,3,53.2,-2.916667
3,20.0,4,28.978333,-96.645833
14,30.0,4,35.823889,-80.253611
23,60.0,4,45.582778,-122.352222
24,3.0,3,51.783333,-0.783333


## Build Model

Select your target variable and feature variables

In [23]:
from sklearn.model_selection import train_test_split

Selected_features = ['Seconds', 'Latitude', 'Longitude']

X = ufos[Selected_features]
y = ufos['Country']

In [24]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Train model using logistic regression

In [26]:
from sklearn.metrics import accuracy_score, classification_report
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

print(classification_report(y_test, predictions))
print('Predicted lables: ', predictions)
print('Accuracy: ', accuracy_score(y_test, predictions))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        41
           1       0.82      0.40      0.54       250
           2       1.00      0.88      0.93         8
           3       0.99      1.00      1.00       131
           4       0.97      1.00      0.98      4743

    accuracy                           0.97      5173
   macro avg       0.96      0.85      0.89      5173
weighted avg       0.96      0.97      0.96      5173

Predicted lables:  [4 4 4 ... 3 4 4]
Accuracy:  0.9665571235260004


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


The accuracy of the logistic regression model is 96.7% which is excellent. This isn't surprising, however, as coordinates are correlated to country name.


## Pickle your model
Once pickled, I will load the pickled model and test it against a sample data array containing values for seconds, latitude, longitude.

In [27]:
import pickle

In [29]:
model_filename = 'ufo-model.pkl'
pickle.dump(model, open(model_filename, 'wb'))

In [34]:
model = pickle.load(open('ufo-model.pkl', 'rb'))
input_data = pd.DataFrame([[50, 44, -12]], columns=['Seconds', 'Latitude', 'Longitude'])
print(model.predict(input_data))

[1]


So with an unseen imput of 50 second and coordinates 44, -12 the predicted label is [1] which is ?USA?

## Build a Flask App
Start by creating a folder called web-app next to the notebook.ipynb file where your ufo-model.pkl file resides.

In that folder create three more folders: static, with a folder css inside it, and templates. You should now have the following files and directories:

1. Create these folders in your cwd for the .ipynb file
web-app/
  static/
    css/
  templates/
notebook.ipynb
ufo-model.pkl

2. Create a requirements.txt file inside web-app with this content:
scikit-learn
pandas
numpy
flask

3. Ensure you have a venv or Conda environment active. In terminal activate it:
    conda activate ml_env
Check if the required packages are install by running:
    conda list

4. The tutorial instructs to install the requirement using " ip install -r requirements.txt" in terminal. However, since I have dependencies managed by conda, I won't need to do it but its good to know there are alternatives. 

