# Getting started with scikit-learn

Author : Alexandre Gramfort

This just aims to have you started with scikit-learn. It assumes you
understand machine learning concepts such as training/fitting and
predicting, using cross-validation etc.

Here we do classification using logistic regression
or a linear SVM on a demo dataset.

We encourage you to read:

https://scikit-learn.org/stable/tutorial/index.html

to learn more.

`
Reference:
Scikit-learn: Machine Learning in Python,
Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
`

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

In [None]:
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target

In [None]:
type(X)

In [None]:
X.ndim

In [None]:
X.shape

In [None]:
y.shape

In [None]:
y

In [None]:
plt.scatter(X[y == 0, 0], X[y == 0, 1], color='r');
plt.scatter(X[y == 1, 0], X[y == 1, 1], color='g');
plt.scatter(X[y == 2, 0], X[y == 2, 1], color='b');

# Let's do some machine learning

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
clf = LogisticRegression(C=1., solver='liblinear', multi_class='auto')

In [None]:
clf.fit(X, y)

In [None]:
clf.coef_.shape

In [None]:
y_pred = clf.predict(X)

In [None]:
y_pred

In [None]:
np.mean(y == y_pred)

In [None]:
X_train = X[::2]
y_train = y[::2]
X_test = X[1::2]
y_test = y[1::2]

In [None]:
X_train.shape, X_test.shape

In [None]:
y_pred = clf.fit(X_train, y_train).predict(X_test)

In [None]:
np.mean(y_pred == y_test)

In [None]:
from sklearn.model_selection import cross_val_score, StratifiedKFold
clf = LogisticRegression(C=1., solver='liblinear', multi_class='auto')
cv = StratifiedKFold(n_splits=3)
scores = cross_val_score(clf, X, y, cv=cv)
print(scores)

In [None]:
print("CV Accuracy : %s (std : %s)" % (np.mean(scores), np.std(scores)))

# if you want to do it with a support vector machine

In [None]:
from sklearn.svm import SVC
clf = SVC(C=1., kernel='linear')

In [None]:
y_pred = clf.fit(X_train, y_train).predict(X_test)
np.mean(y_pred == y_test)

In [None]:
scores = cross_val_score(clf, X, y, cv=cv)
print(scores)

In [None]:
print("CV Accuracy : %s (std : %s)" % (np.mean(scores), np.std(scores)))

# To learn more

- https://scikit-learn.org/stable/tutorial/index.html
- https://jakevdp.github.io/PythonDataScienceHandbook/05.00-machine-learning.html