Import pandas, matplotlib, and numpy as you did in previous lessons and import the ufos spreadsheet. You can take a look at a sample data set:

In [2]:
import pandas as pd
import numpy as np

ufos = pd.read_csv('./data/ufos.csv')
ufos.head()

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
0,10/10/1949 20:30,san marcos,tx,us,cylinder,2700.0,45 minutes,This event took place in early fall around 194...,4/27/2004,29.883056,-97.941111
1,10/10/1949 21:00,lackland afb,tx,,light,7200.0,1-2 hrs,1949 Lackland AFB&#44 TX. Lights racing acros...,12/16/2005,29.38421,-98.581082
2,10/10/1955 17:00,chester (uk/england),,gb,circle,20.0,20 seconds,Green/Orange circular disc over Chester&#44 En...,1/21/2008,53.2,-2.916667
3,10/10/1956 21:00,edna,tx,us,circle,20.0,1/2 hour,My older brother and twin sister were leaving ...,1/17/2004,28.978333,-96.645833
4,10/10/1960 20:00,kaneohe,hi,us,light,900.0,15 minutes,AS a Marine 1st Lt. flying an FJ4B fighter/att...,1/22/2004,21.418056,-157.803611


Convert the ufos data to a small dataframe with fresh titles. Check the unique values in the `Country` field.

In [3]:
ufos = pd.DataFrame({'Seconds': ufos['duration (seconds)'], 'Country': ufos['country'],'Latitude': ufos['latitude'],'Longitude': ufos['longitude']})

ufos.Country.unique()

array(['us', nan, 'gb', 'ca', 'au', 'de'], dtype=object)

Now, you can reduce the amount of data we need to deal with by dropping any null values and only importing sightings between 1-60 seconds:

In [4]:
ufos.dropna(inplace=True)

ufos = ufos[(ufos['Seconds'] >= 1) & (ufos['Seconds'] <= 60)]

ufos.info()

<class 'pandas.core.frame.DataFrame'>
Index: 25863 entries, 2 to 80330
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Seconds    25863 non-null  float64
 1   Country    25863 non-null  object 
 2   Latitude   25863 non-null  float64
 3   Longitude  25863 non-null  float64
dtypes: float64(3), object(1)
memory usage: 1010.3+ KB


Import Scikit-learn's `LabelEncoder` library to convert the text values for countries to a number:

✅ LabelEncoder encodes data alphabetically

In [5]:
from sklearn.preprocessing import LabelEncoder

ufos['Country'] = LabelEncoder().fit_transform(ufos['Country'])

ufos.head()

Unnamed: 0,Seconds,Country,Latitude,Longitude
2,20.0,3,53.2,-2.916667
3,20.0,4,28.978333,-96.645833
14,30.0,4,35.823889,-80.253611
23,60.0,4,45.582778,-122.352222
24,3.0,3,51.783333,-0.783333


# Build the model

Select the three features you want to train on as your X vector, and the y vector will be the `Country`. You want to be able to input `Seconds`, `Latitude` and `Longitude` and get a country id to return.

In [7]:
from sklearn.model_selection import train_test_split

Selected_features = ['Seconds', 'Latitude', 'Longitude']

X = ufos[Selected_features]
y = ufos['Country']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

Train your model using logistic regression:

In [14]:
from sklearn.metrics import accuracy_score, classification_report, mean_squared_error, r2_score, accuracy_score
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

print(classification_report(y_test, predictions))
print('Predicted labels:', predictions)
print('Accuracy: ', accuracy_score(y_test, predictions))

# RMSE
rmse = np.sqrt(mean_squared_error(y_test, predictions))
print("RMSE:", rmse)

accuracy = accuracy_score(y_test, predictions)
print("Accuracy (decimal):", accuracy)
print("Accuracy (%):", accuracy * 100, "%")

# R²
r2 = r2_score(y_test, predictions)
print("R²:", r2)

              precision    recall  f1-score   support

           0       1.00      0.98      0.99        41
           1       0.83      0.32      0.47       250
           2       0.88      0.88      0.88         8
           3       0.99      1.00      1.00       131
           4       0.97      1.00      0.98      4743

    accuracy                           0.96      5173
   macro avg       0.93      0.83      0.86      5173
weighted avg       0.96      0.96      0.96      5173

Predicted labels: [4 4 4 ... 3 4 4]
Accuracy:  0.9636574521554224
RMSE: 0.5697103492828886
Accuracy (decimal): 0.9636574521554224
Accuracy (%): 96.36574521554225 %
R²: 0.4111634026240699


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT

Increase the number of iterations to improve the convergence (max_iter=100).
You might also want to scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


# Pickle the model

In [15]:
import pickle

model_filename = 'ufo-model.pkl'
pickle.dump(model, open(model_filename, 'wb'))

model = pickle.load(open('ufo-model.pkl', 'rb'))
print(model.predict([[50, 44, -12]]))

[3]




# Build the flask app

In [29]:
!cd web-app
!pip install -r web-app/requirements.txt


Collecting flask (from -r web-app/requirements.txt (line 4))
  Downloading flask-3.1.2-py3-none-any.whl.metadata (3.2 kB)
Collecting blinker>=1.9.0 (from flask->-r web-app/requirements.txt (line 4))
  Downloading blinker-1.9.0-py3-none-any.whl.metadata (1.6 kB)
Collecting click>=8.1.3 (from flask->-r web-app/requirements.txt (line 4))
  Downloading click-8.2.1-py3-none-any.whl.metadata (2.5 kB)
Collecting itsdangerous>=2.2.0 (from flask->-r web-app/requirements.txt (line 4))
  Downloading itsdangerous-2.2.0-py3-none-any.whl.metadata (1.9 kB)
Collecting jinja2>=3.1.2 (from flask->-r web-app/requirements.txt (line 4))
  Downloading jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)
Collecting markupsafe>=2.1.1 (from flask->-r web-app/requirements.txt (line 4))
  Using cached MarkupSafe-3.0.2-cp311-cp311-win_amd64.whl.metadata (4.1 kB)
Collecting werkzeug>=3.1.0 (from flask->-r web-app/requirements.txt (line 4))
  Using cached werkzeug-3.1.3-py3-none-any.whl.metadata (3.7 kB)
Downloading fla


[notice] A new release of pip is available: 24.0 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [32]:
!python app.py

 * Serving Flask app 'app'
 * Debug mode: on


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 275-041-379
 * Detected change in 'c:\\Users\\pc\\Documents\\ML-For-Beginners\\3-Web-App\\1-Web-App\\Model to Pickle\\app.py', reloading
 * Restarting with stat
Traceback (most recent call last):
  File "c:\Users\pc\Documents\ML-For-Beginners\3-Web-App\1-Web-App\Model to Pickle\app.py", line 7, in <module>
    model = pickle.load(open(r"3-Web-App\1-Web-App\Model to Pickle\ufo-model.pkl", "rb"))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '3-Web-App\\1-Web-App\\Model to Pickle\\ufo-model.pkl'
