<a href="https://colab.research.google.com/github/Zabiullahkhan/Data_Science/blob/main/Scikit_Learn_Cheat_Sheet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Scikit-Learn Cheat Sheet
Sklearn is a free machine learning library for Python. It features various
classification, regression and clustering algorithms.

### Getting Started
The code below demonstrates the basic steps of using sklearn to create and run a model
on a set of data.
The steps in the code include loading the data, splitting into train and test sets, scaling
the sets, creating the model, fitting the model on the data using the trained model to
make predictions on the test set, and finally evaluating the performance of the model.

In [None]:
from sklearn import neighbors,datasets,preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
iris = datasets.load_iris()
X,y = iris.data[:,:2], iris.target
X_train, X_test, y_train, y_test=train_test_split(X,y)
scaler = preprocessing_StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
knn = neighbors.KNeighborsClassifier(n_neighbors = 5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy_score(y_test, y_pred)

### Loading the Data
The data needs to be numeric and stored as NumPy arrays or SciPy spare matrix
(numeric arrays, such as Pandas DataFrame’s are also ok)

In [None]:
import numpy as np
X = np.random.random((10,5))
array([[0.21,0.33],
[0.23, 0.60],
[0.48, 0.62]])
y = np.array(['A','B','A'])
array(['A', 'B', 'A'])

# Training and Test Data

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,
random_state = 0)#Splits data into training and test set

### Preprocessing The Data

###### Standardization
Standardizes the features by removing the mean and scaling to unit variance.

In [None]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
standarized_X = scaler.transform(X_train)
standarized_X_test = scaler.transform(X_test)

###### Normalization
Each sample (row of the data matrix) with at least one non-zero component

In [None]:
from sklearn.preprocessing import Normalizer
scaler = Normalizer().fit(X_train)
normalized_X = scaler.transform(X_train)
normalized_X_test = scaler.transform(X_test)

##### Binarization
Binarize data (set feature values to 0 or 1) according to a threshold.

In [None]:
from sklearn.preprocessing import Binarizer
binarizer = Binarizer(threshold = 0.0).fit(X)
binary_X = binarizer.transform(X_test)

##### Encoding Categorical Features
Imputation transformer for completing missing values.

In [None]:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit_transform(X_train)

Imputing Missing Values

In [None]:
from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=0, strategy ='mean')
imp.fit_transform(X_train)

Generating Polynomial Features

In [None]:
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(5)
poly.fit_transform(X)

# Create Your Model

### Supervised Learning Models

Linear Regression

In [None]:
from sklearn.linear_model import LinearRegression
lr = LinearRegression(normalize = True)

Support Vector Machines (SVM)

In [None]:
from sklearn.svm import SVC
svc = SVC(kernel = 'linear')

Naive Bayes

In [None]:
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()

KNN

### Unsupervised Learning Models

Principal Component Analysis (PCA)

In [None]:
from sklearn.decomposition import PCA
pca = PCA(n_components = 0.95)

K means

In [None]:
from sklearn.cluster import KMeans
k_means = KMeans(n_clusters = 3, random_state = 0)

### Model Fitting

Fitting supervised and unsupervised learning models onto data.

###### Supervised Learning

In [None]:
lr.fit(X, y) #Fit the model to the data
knn.fit(X_train,y_train)
svc.fit(X_train,y_train)

###### Unsupervised Learning

In [None]:
k_means.fit(X_train) #Fit the model to the data
pca_model = pca.fit_transform(X_train)#Fit to data,then transform

### Prediction

Predict Labels

In [None]:
y_pred = lr.predict(X_test) #Supervised Estimators
y_pred = k_means.predict(X_test) #Unsupervised Estimators

Estimate probability of a label

In [None]:
y_pred = knn.predict_proba(X_test)

# Evaluate Your Model’s Performance

**Classification Metrics**

Accuracy Score

In [None]:
knn.score(X_test,y_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)

Classification Report

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

Confusion Matrix

In [None]:
from sklearn .metrics import confusion_matrix
print(confusion_matrix(y_test,y_pred))

**Regression Metrics**

Mean Absolute Error

In [None]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_test,y_pred)

Mean Squared Error

In [None]:
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test,y_pred)

R2 Score

In [None]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

**Clustering Metrics**

Adjusted Rand Index

In [None]:
from sklearn.metrics import adjusted_rand_score
adjusted_rand_score(y_test,y_pred)

Homogeneity

In [None]:
from sklearn.metrics import homogeneity_score
homogeneity_score(y_test,y_pred)

V-measure

In [None]:
from sklearn.metrics import v_measure_score
v_measure_score(y_test,y_pred)

### Tune Your Model

Grid Search

In [None]:
from sklearn.model_selection import GridSearchCV
params = {'n_neighbors':np.arange(1,3),'metric':['euclidean','cityblock']}
grid = GridSearchCV(estimator = knn, param_grid = params)
grid.fit(X_train, y_train)
print(grid.best_score_)
print(grid.best_estimator_.n_neighbors)