# Machine Learning 2: Creating a Model

How do we do this? As per below, we've split our data:

In [1]:
from sklearn.datasets import load_iris
iris = load_iris()

X = iris.data  # our input data
y = iris.target # our labels

feature_names = iris.feature_names # column names (pre-defined)
target_names = iris.target_names # target names (pre-defined)
feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [3]:
from sklearn.model_selection import train_test_split

from sklearn.datasets import load_iris
iris = load_iris()

X = iris.data  # our input data
y = iris.target # our labels

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # 80% training and 20% test
print(X_train.shape)
print(X_test.shape)

(120, 4)
(30, 4)


Scikit Learn comes with some tools, different types of algorithms, for us to build our model. Many choices here. We'll choose [Nearest Neighbors](https://scikit-learn.org/stable/modules/neighbors.html). We'll try to divide up the data into different segments, but we do have a lot of algorithms to choose from. ML engineers are adept at deciding which algorithm to use.

We can just use `KNeighborsClassifier` as a function, and all we have to give it is how many neighbors we want; how many classifications do we want. In our case, we know that there are three types of Iris flowers. The point is to give the number of data points we need to observe. We give our parameter thusly in the following example.

We're trying to fit this data and use this algorithm by giving it the x dataset and the y output.

In [4]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

KNeighborsClassifier(n_neighbors=3)

There's no way for us to test this. We need to check the output. We've created the model, but we now need to check the quality of that function.

We can use the `predict` to try out our test. We can compare the output of `y_predict` with the actual answer. We created four different data sets

- `x_train`
- `x_test` - Here, we gave our new model the test data
- `y_train`
- `y_test`

We need to compare the prediction with the `y_test` variable



In [5]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
y_predict = knn.predict(X_test)

scikit learn has a nice way for us to do this. Now we can print the actual accuracy of our model:

In [6]:
from sklearn import metrics
print(metrics.accuracy_score(y_test, y_predict))

0.9666666666666667


Our accuracy is 90%.

So far, we've done:

- import the data
- slit the data
- created a model `KNeighborsClassifier` in scikit learn
- checked the output

Now we can __improve__ our ML algorithm.
