## Scikit-learn
### In this tutorial we will learn Scikit-learn and how to use it.
#### You can find more information on the official website: https://scikit-learn.org/stable/

In [37]:
import numpy as np
import pandas as pd

import seaborn as sns
%matplotlib inline

In [2]:
df = pd.read_csv('https://raw.githubusercontent.com/bulutmf/data-science-toolbox/master/data-wrangling/datasets/googleplaystore.csv')

In [5]:
df.head(2)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up


### Linear Regression

Linear Regression is a methodology which takes one or multiple input variables and output a value that is linear combination of the inputs

In [1]:
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2])

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [3]:
# Lets see the coefficients
reg.coef_

array([0.5, 0.5])

### Logistic Regression

Contrary to what the name suggest (regression), logistic regression makes a classification decision. Classification can be binary or multinomial

In [21]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0, solver='lbfgs', multi_class='multinomial').fit(X, y)
print(clf.predict(X[:2, :]))
print(clf.predict_proba(X[:2, :]))
clf.score(X, y)

[0 0]
[[9.81801790e-01 1.81981959e-02 1.43556907e-08]
 [9.71727348e-01 2.82726221e-02 3.00307256e-08]]




0.9733333333333334

### Perceptron

Does not require learning rate, not regularized and updates only if model made a mistake

In [5]:
from sklearn.datasets import load_digits
from sklearn.linear_model import Perceptron
X, y = load_digits(return_X_y=True)
clf = Perceptron(tol=1e-3, random_state=0)
clf.fit(X, y)  
Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
      fit_intercept=True, max_iter=1000, n_iter_no_change=5, n_jobs=None,
      penalty=None, random_state=0, shuffle=True, tol=0.001,
      validation_fraction=0.1, verbose=0, warm_start=False)
clf.score(X, y) 

0.9393433500278241

### Passive Agressive Classifier

Good for large training size and features: large-scale learning

In [35]:
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_features=4, random_state=0)
clf = PassiveAggressiveClassifier(max_iter=1000, random_state=0, tol=1e-3)
r = clf.fit(X, y)
print(r)
print(clf.coef_)
print(clf.intercept_)
print(clf.predict([[0, 2, 4, -6]]))

PassiveAggressiveClassifier(C=1.0, average=False, class_weight=None,
                            early_stopping=False, fit_intercept=True,
                            loss='hinge', max_iter=1000, n_iter_no_change=5,
                            n_jobs=None, random_state=0, shuffle=True,
                            tol=0.001, validation_fraction=0.1, verbose=0,
                            warm_start=False)
[[0.26642044 0.45070924 0.67251877 0.64185414]]
[1.84127814]
[1]


### Polynomial Regression

Any polynomial equation can be written as linear as well.

In [39]:
# To create polynomial with different degrees
# Basically [x1,x2] transformed into [1,x1,x2,x1^2,x1.x2,x2^2] and that can be used with any linear model
from sklearn.preprocessing import PolynomialFeatures
X = np.arange(6).reshape(3, 2)
print(X)
poly = PolynomialFeatures(degree=2)
poly.fit_transform(X)

[[0 1]
 [2 3]
 [4 5]]


array([[ 1.,  0.,  1.,  0.,  0.,  1.],
       [ 1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  4.,  5., 16., 20., 25.]])