# My first `scikit-learn` notebook

In [13]:
import pandas as pd
import numpy as np
from random import choices
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression

Load a dataset

In [14]:
forecast = pd.read_csv('data/Forecast.csv')
forecast.head()

Unnamed: 0,Temperature,Humidity,Wind_Speed,Go-Out
0,6,85,30,0
1,14,90,35,0
2,15,86,8,1
3,21,56,15,1
4,17,67,9,1


Setup the `numpy` arrays to use to train classifiers

In [17]:
y = forecast.pop('Go-Out').values  # target feature
X = forecast.values                # training data
type(X),type(y)

KeyError: 'Go-Out'

Train a *k*-NN classifier

In [5]:
kNN = KNeighborsClassifier(n_neighbors=3) 
kNN.fit(X,y)
X_test = np.array([[8,70,11],
                   [8,69,15]])
kNN.predict(X_test)

array([1, 0])

All `sklearn` classifiers implement the `Estimator` API.

In [6]:
tree = DecisionTreeClassifier()
tree.fit(X,y)
tree.predict(X_test)

array([1, 1])

In [7]:
lr = LogisticRegression()
lr.fit(X,y)
lr.predict(X_test)

array([0, 0])

Swapping between classifiers (Estimators) makes model selection easy. 

In [8]:
cfrs = [kNN,tree,lr]
for cfr in cfrs:
    cfr.fit(X,y)
    print(cfr.predict(X_test))

[1 0]
[1 1]
[0 0]


## Preprocessing
All preprocessing modules implement the `Transformer`  API.

In [9]:
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(X)   # standardise to zero mean and unit variance
X_scaled = scaler.transform(X)
X_test_scaled = scaler.transform(X_test)
X_test_scaled

array([[-1.59094327, -0.05406252, -0.79537086],
       [-1.59094327, -0.10040182, -0.37117307]])

In [10]:
mm_scaler = preprocessing.MinMaxScaler()        # standardise to range [0,1]
mm_scaler.fit(X)
X_scaled = mm_scaler.transform(X)
X_test_scaled = mm_scaler.transform(X_test)
X_test_scaled

array([[0.125     , 0.6875    , 0.17241379],
       [0.125     , 0.675     , 0.31034483]])