# Scikit-learn
**Scikit-learn** is an open source Python library that implements a range of **machine learning, preprocessing, cross-validation and visualization** algorithms using a unified interface

**Contents**   
[1. Basic Example](#1)  
&nbsp;&nbsp;&nbsp;[1.1 Import modules](#1.1)  
&nbsp;&nbsp;&nbsp;[1.2 Import data](#1.2)  
&nbsp;&nbsp;&nbsp;[1.3 Set features and target](#1.3)  
&nbsp;&nbsp;&nbsp;[1.4 Split train and test](#1.4)  
&nbsp;&nbsp;&nbsp;[1.5 Create model](#1.5)  
&nbsp;&nbsp;&nbsp;[1.6 Train model](#1.6)  
&nbsp;&nbsp;&nbsp;[1.7 Predict test](#1.7)  
&nbsp;&nbsp;&nbsp;[1.8 Calculate accuracy](#1.8)
[2. Load data](#2)  
[3. Train and Test Data](#3)  
[4. Preprocessing Data](#4)  
&nbsp;&nbsp;&nbsp;[4.1 Standardization](#2.1)  
&nbsp;&nbsp;&nbsp;[4.2 Normalization](#2.2)  
&nbsp;&nbsp;&nbsp;[4.3 Binarization](#2.3)  
&nbsp;&nbsp;&nbsp;[4.4 Encoding Categorical Features](#2.4)  
&nbsp;&nbsp;&nbsp;[4.5 Imputing Missing Values](#2.5)  
&nbsp;&nbsp;&nbsp;[4.6 Generating Polynomial Features](#2.6)    
[5. Create Model](#5)  
&nbsp;&nbsp;&nbsp;[5.1 Supervised Learning Estimators](#5.1)  
&nbsp;&nbsp;&nbsp;[4.6 Generating Polynomial Features](#5.2)  
[6. Fit Model](#6)  
[7. Prediction](#7)  
[8. Evaluate Model Performance](#8)  
&nbsp;&nbsp;&nbsp;[8.1 Classification Metrics](#8.1)  
&nbsp;&nbsp;&nbsp;[8.2 Regression Metrics](#8.2)  
&nbsp;&nbsp;&nbsp;[8.3 Clustering Metrics](#8.3)  
&nbsp;&nbsp;&nbsp;[8.4 Cross-Validation](#8.4)  
[9. Tune Model](#9)  
&nbsp;&nbsp;&nbsp;[9.1 Grid Search](#9.1)  
&nbsp;&nbsp;&nbsp;[9.2 Randomized Parameter Optimization](#9.2)  


## <a id="1">1. Basic Example </a>
### <a id="1.1">1.1 Import modules </a>

In [9]:
from sklearn import neighbors, datasets, preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

### <a id="1.2">1.2 Import data </a>

In [10]:
iris = datasets.load_iris()

### <a id="1.3">1.3 Set features and target </a>

In [11]:
X, y = iris.data[:,:2], iris.target

### <a id="1.4">1.4 Split train and test </a>

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=33)

### <a id="1.4">1.4 Preprocess data </a>

In [13]:
scaler = preprocessing.StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

### <a id="1.5">1.5 Create model </a>

In [14]:
knn = neighbors.KNeighborsClassifier (n_neighbors = 5)

### <a id="1.6">1.6 Train model </a>

In [15]:
knn.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')

### <a id="1.7">1.7 Predict test </a>

In [16]:
y_pred = knn.predict(X_test)

### <a id="1.8">1.8 Calculate accuracy </a>

In [17]:
accuracy_score(y_test,y_pred)

0.631578947368421

## <a id="2">2. Load data </a>
Data need to be numeric and storec as NumPy arrays or SciPy sparse matrices. Other types that are convertible to numeric arrays, such as Pandas Dataframe, are also acceptable.

In [18]:
import numpy as np
X = np.random.random((10,5))
y = np.array(['M','M','F','F','F','M','F','M','M','F','F','F'])
X[X<0.7]=0

## <a id="3">3. Train and Test Data </a>

In [20]:
X, y = iris.data[:,:2], iris.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, random_state=0)

## <a id="4">4. Preprocessing Data</a>
### <a id="4.1">4.1 Standardization</a>

### <a id="4.2">4.2 Normalization </a>

### <a id="4.3">4.3 Binarization </a>

### <a id="4.4">4.4 Encoding Categorical Features </a>

### <a id="4.5">4.5 Imputing Missing Values </a>

### <a id="4.6">4.6 Generating Polynomial Features </a>

## <a id="5">5. Create Model </a>

### <a id="5.1">5.1 Supervised Learning Estimators </a>

### <a id="5.2"> 5.2 Unsupervied Learning Estimators </a>

## <a id="6">6. Fit Model </a>

## <a id="7">7. Prediction </a>

## <a id="8">8. Evaluate Model Performance </a>
### <a id="8.1">8.1 Classification Metrics </a>

### <a id="8.2">8.2 Regression Metrics </a>

### <a id="8.3">8.3 Clustering Metrics </a>

### <a id="8.4">8.4 Cross-Validation </a>

## <a id="9">9. Tune Model </a>
### <a id="9.1">9.1 Grid Search </a>

### <a id="9.2">9.2 Randomized Parameter Optimization </a>