# The Model-Fitting Paradigm in Python

To fit a machine learning model in python, we typically follow a common workflow. Though, you should always consult the documentation to be sure.

Here's an example using kNN classification with a fruit dataset, following [Susan Li's Medium post titled "Solving A Simple Classification Problem with Python — Fruits Lovers’ Edition"](https://towardsdatascience.com/solving-a-simple-classification-problem-with-python-fruits-lovers-edition-d20ab6b071d2).

Import libraries and load iris data:

In [49]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
## Import kNN tools:
from sklearn.neighbors import KNeighborsClassifier
## Import accuracy calculator:
from sklearn.metrics import accuracy_score
## Import train-test-split tool:
from sklearn.model_selection import train_test_split
## Import data:
fruit = pd.read_table('https://raw.githubusercontent.com/susanli2016/Machine-Learning-with-Python/master/fruit_data_with_colors.txt')


Take a peak at the fruit data:

In [50]:
fruit.head()

Unnamed: 0,fruit_label,fruit_name,fruit_subtype,mass,width,height,color_score
0,1,apple,granny_smith,192,8.4,7.3,0.55
1,1,apple,granny_smith,180,8.0,6.8,0.59
2,1,apple,granny_smith,176,7.4,7.2,0.6
3,2,mandarin,mandarin,86,6.2,4.7,0.8
4,2,mandarin,mandarin,84,6.0,4.6,0.79


First, extract the response as a list, and the predictors as an array. We'll choose fruit_name as the response, and mass, width, height, and color_score as our predictors.

In [51]:
y = fruit["fruit_name"]
X = fruit[["mass", "width", "height", "color_score"]]

Now, split the data into training and test data using `train_test_split(x)`:

In [52]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

Second, "initiate" a model by calling the method's function. For kNN classification, it's `KNeighborsClassifier()`.

In [53]:
model = KNeighborsClassifier()

Now, fit the model by applying the `.fit()` method on our initiated model. This modifies the `model` object!

In [54]:
model.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')

Now we can go ahead and make predictions and evaluate error by appending methods onto `model`. These _do not_ modify the `model` object!

In [55]:
print(model.predict(X_test))

['mandarin' 'apple' 'orange' 'orange' 'lemon' 'lemon' 'orange' 'apple'
 'orange' 'apple' 'orange' 'apple']


In [56]:
model.score(X_test, y_test)

0.5