## 2.1 An introduction to machine learning with scikit-learn 

### 2.1.1 Machine learning: the problem setting

- supervised learning
    - classification
    - regression
- unsupervised learning
    - clustering
    - dimension reduction

### 2.1.2 Loading an example dataset

In [None]:
from sklearn import datasets
iris = datasets.load_iris()
digits = datasets.load_digits()

In [None]:
print(digits.data.shape)

In [None]:
print(digits.target)

### 2.1.3 Learning and predicting 

In [None]:
from sklearn import svm
clf = svm.SVC(gamma=0.01,C=100.)
clf.fit(digits.data[:-1], digits.target[:-1])
clf.predict(digits.data[-1:])

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(digits.data[-1].reshape(8,8),cmap='Greys')

### 2.1.4 Model persistence 
It is possible to save a model in scikit-learn by using Python’s built-in persistence model, pickle. This could convert the model to a string

In [None]:
from sklearn import svm
from sklearn import datasets
clf = svm.SVC(gamma='scale')
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X,y)
import pickle
s = pickle.dumps(clf)
print(type(s))

In [None]:
clf2 = pickle.loads(s)
print(clf2.predict(X[:1]))
print(y[0])

use joblib can pickle the model to a disk

In [None]:
from joblib import dump, load
dump(clf,'c022141.joblib')
clf = load('c022141.joblib')
clf

### 2.1.5 Conventions

#### type casting
Unless otherwise speciﬁed, input will be cast to float64

In [None]:
import numpy as np
from sklearn import random_projection
rng = np.random.RandomState(0)
X = rng.rand(10,2000)
X = np.array(X,dtype = 'float32')
print(X.dtype)
transformer = random_projection.GaussianRandomProjection()
X_new = transformer.fit_transform(X) # this line changes the data types
print(X_new.dtype)

Regression targets are cast to float64 and classiﬁcation targets are maintained

In [None]:
from sklearn import datasets
from sklearn.svm import SVC
iris = datasets.load_iris()
clf = SVC(gamma='scale')
clf.fit(iris.data, iris.target)
print(list(clf.predict(iris.data[:3])))
clf.fit(iris.data, iris.target_names[iris.target])
print(list(clf.predict(iris.data[:3])))

#### Reﬁtting and updating parameters
Hyper-parameters of an estimator can be updated after it has been constructed via the set_params() method. Calling fit() more than once will overwrite what was learned by any previous fit()

In [None]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.svm import SVC
X, y = load_iris(return_X_y = True)
clf = SVC()
clf.set_params(kernel='linear').fit(X,y)

In [None]:
clf.predict(X[:5])

In [None]:
clf.set_params(kernel='rbf',gamma='scale').fit(X,y)

In [None]:
clf.predict(X[:5])

#### Multiclass vs. multilabel fitting
When using multiclass classifiers, the learning and prediction task that is performed is dependent on the format of the target data ﬁt upon

In [None]:
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import LabelBinarizer
X = [[1, 2], [2, 4], [4, 5], [3, 2], [3, 1]]
y = [0, 0, 1, 1, 2]

In [None]:
classif = OneVsRestClassifier(estimator=SVC(gamma='scale',random_state=0))
classif.fit(X,y).predict(X)

In the above case,the classiﬁer is ﬁt on a 1d array of multiclass labels and the predict() method therefore provides corresponding multiclass predictions. It is also possible to ﬁt upon a 2d array of binary label indicators

In [None]:
y = LabelBinarizer().fit_transform(y)
classif.fit(X,y).predict(X)

Here, the classiﬁer is fit() on a 2d binary label representation of y, using the LabelBinarizer. In this case predict() returns a 2d array representing the corresponding multilabel predictions.

Note that the fourth and ﬁfth instances returned all zeroes, indicating that they matched none of the three labels fit upon. With multilabel outputs, it is similarly possible for an instance to be assigned multiple labels:

In [None]:
from sklearn.preprocessing import MultiLabelBinarizer
y = [[0, 1], [0, 2], [1, 3], [0, 2, 3], [2, 4]]
y = MultiLabelBinarizer().fit_transform(y)
classif.fit(X,y).predict(X)