# Machine Learning V

We now have a function that, as developers, we can use throughout our code bases to receive inputs and predict. We can do things like predict NSFW images.

But at the moment we don't have much data. We've only used a dataset that has 150 rows. In real life, to train a complicated model, we have millions of rows.

This function that fits and runs the `knn.fit()` takes a long time, and most of the time we're running these types of operations on a GPU, sometimes in the cloud because we wouldn't be able to do it on a regular computer.

In [10]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
y_predict = knn.predict(X_test)

This part is very time consuming. We wouldn't want every user that submits a snake image to actually run this function. We call this model persistence.

Next time we want to make a prediction, we want to save this model to a file and use that file for predictions.

When we're on an iPhone and we're using a machine learning feature, it's not going to run and train a model which is already there on our phone. The idea of __model persistence__ employs __Scikit-Learn__.

In [11]:
from sklearn.datasets import load_iris
iris = load_iris()

X = iris.data  # our input data
y = iris.target # our labels

feature_names = iris.feature_names # column names (pre-defined)
target_names = iris.target_names # target names (pre-defined)
feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [12]:
from sklearn.model_selection import train_test_split

from sklearn.datasets import load_iris
iris = load_iris()

X = iris.data  # our input data
y = iris.target # our labels

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # 80% training and 20% test
print(X_train.shape)
print(X_test.shape)

(120, 4)
(30, 4)


In [13]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
y_predict = knn.predict(X_test)

In [14]:
from sklearn import metrics
print(metrics.accuracy_score(y_test, y_predict))

0.9333333333333333


In [15]:
sample = [[3, 5, 4, 2], [2, 3, 5, 4]]
predictions = knn.predict(sample)
predict_species = [iris.target_names[p] for p in predictions]
print("Predictions: ", predict_species)

Predictions:  ['versicolor', 'virginica']


This stores the trained model in a binary file. If we 

In [18]:
import joblib
from joblib import dump, load
joblib.dump(knn, 'mlbrain.joblib')

['mlbrain.joblib']

We now have the `joblib` file created. Instead of our retraining the model in the future, we can say:

In [19]:
model = joblib.load('mlbrain.joblib')
model.predict(X_test)

array([1, 0, 1, 0, 1, 1, 2, 2, 0, 2, 1, 0, 0, 2, 2, 2, 1, 0, 2, 1, 2, 1,
       1, 2, 1, 2, 0, 2, 1, 1])

and we get our predictions. And if we copy our sample data, it still works:

In [21]:
model = joblib.load('mlbrain.joblib')
model.predict(X_test)
sample = [[3, 5, 4, 2], [2, 3, 5, 4]]
predictions = model.predict(sample)
predict_species = [iris.target_names[p] for p in predictions]
print("Predictions: ", predict_species)

Predictions:  ['versicolor', 'virginica']


We were able to save our model so that we don't have to keep training it, since that takes a lot of computing power.