# Getting some data

Scikit-learn comes with some datasets that we can use to produce examples.

In [1]:
from sklearn import datasets

In [2]:
import pandas as pd
import numpy as np

In [3]:
iris = datasets.load_iris()
iris_features = iris.data
iris_target = iris.target

In [4]:
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['target'] = iris.target_names[iris.target]
iris_df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [None]:
iris.target_names

If we would like to predict the species of the flower using the features, then we are doing a **classification** problem.

# Estimators objects

The main API implemented by scikit-learn is that of the **estimator**. An estimator is the object that contains the model that we can use to learn from data.

## 1. Import the estimator (model)

You should know what is the object you would like to use, here you can find a very nice resource from the scikit-lear [documentation](http://scikit-learn.org/stable/tutorial/machine_learning_map/)

In [5]:
from sklearn.neighbors import KNeighborsClassifier

## 2. Create an instance of the estimator

In [6]:
flower_classifier = KNeighborsClassifier(n_neighbors=3)

## 3. Use the data to train the estimator

Remember: 
1. Scikit-learn only accepts numbers
2. The object containing the features must be a two dimentional np.array

In [7]:
iris_features[:10,:]

array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2],
       [ 5.4,  3.9,  1.7,  0.4],
       [ 4.6,  3.4,  1.4,  0.3],
       [ 5. ,  3.4,  1.5,  0.2],
       [ 4.4,  2.9,  1.4,  0.2],
       [ 4.9,  3.1,  1.5,  0.1]])

In [8]:
iris_target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

0 == > setosa

1 == > versicolor

2 == > virginica

In [9]:
flower_classifier.fit(X=iris_features, y=iris_target)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=3, p=2,
           weights='uniform')

## 4. Evaluate the model

We will skip this important step here.

## 5. Use the data to make "predictions"

In [10]:
# The features must be two-dimensional array
new_flower1 = np.array([[5.1, 3.0, 1.1, 0.5]])
new_flower2 = np.array([[6.0, 2.9, 4.5, 1.1]])

0 == > setosa

1 == > versicolor

2 == > virginica

In [11]:
flower_classifier.predict(new_flower1)

array([0])

In [12]:
flower_classifier.predict(new_flower2)

array([1])

In [13]:
new_flowers = np.array([[5.1, 3.0, 1.1, 0.5],[6.0, 2.9, 4.5, 1.1]])
predictions = flower_classifier.predict(new_flowers)
predictions

array([0, 1])