### What is Scikit-Learn (Sklearn)?

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python.
It provides a selection of efficient tools for machine learning and statistical modeling
including classification, regression, clustering and dimensionality reduction via a
consistence interface in Python. This library, which is largely written in Python, is built
upon NumPy, SciPy and Matplotlib.

Before we start using scikit-learn latest release, we require the following:
 Python (>=3.5)
 NumPy (>= 1.11.0)
 Scipy (>= 0.17.0)
 Joblib (>= 0.11)
 Matplotlib (>= 1.5.1) is required for Sklearn plotting capabilities.
 Pandas (>= 0.18.0) is required for some of the scikit-learn examples using data
structure and analysis.


In [14]:
import numpy as np
print(np.__version__)
import pandas as pd
print(pd.__version__)
import matplotlib as plt
print(plt.__version__)
import scipy as sci
print(sci.__version__)
import seaborn as sb
print(sb.__version__)

1.19.2
1.1.3
3.3.1
1.5.2
0.11.0


### Modelling Process
- Dataset Loading
- Splitting the dataset
- Train the Model
- Test the Model
- Evaluate the Model

![image-2.png](attachment:image-2.png)



## Dataset Loading
Features: The variables of data are called its features. They are also known as predictors,inputs or attributes.
    
    Feature matrix: It is the collection of features, in case there are more than one.
    
    Feature Names: It is the list of all the names of the features.

Response: It is the output variable that basically depends upon the feature variables. They are also known as target, label or output.
    
    Response Vector: It is used to represent response column. Generally, we have just one response column.
    
    Target Names: It represent the possible values taken by a response vector


In [19]:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names
target_names = iris.target_names
print("Feature names:", feature_names)
print("Target names:", target_names)
print("\nFirst 10 rows of X:\n", X[:10])

Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Target names: ['setosa' 'versicolor' 'virginica']

First 10 rows of X:
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]]


## Splitting the dataset


In [21]:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,random_state=1)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)


(105, 4)
(45, 4)
(105,)
(45,)


## Train the Mode

In [22]:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,random_state=1)
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
classifier_knn = KNeighborsClassifier(n_neighbors=3)
classifier_knn.fit(X_train, y_train)

KNeighborsClassifier(n_neighbors=3)

## Test the Model


In [25]:
y_pred = classifier_knn.predict(X_test)
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.9833333333333333


## Evaluate the model

In [31]:
sample=[[0, 5, 3, 2]]
preds=classifier_knn.predict(sample)
pred_species=[iris.target_names[p] for p in preds]
print("Predictions:",pred_species)

Predictions: ['setosa']


In [8]:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,random_state=1)

from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
classifier_knn = KNeighborsClassifier(n_neighbors=3)
classifier_knn.fit(X_train, y_train)
y_pred = classifier_knn.predict(X_test)

# Finding accuracy by comparing actual response values(y_test)with predictedresponse value(y_pred)
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))

# Providing sample data and the model will make prediction out of that data
sample = [[5, 5, 3, 2], [2, 4, 3, 5]]
preds = classifier_knn.predict(sample)
pred_species = [iris.target_names[p] for p in preds] 
print("Predictions:",pred_species)



Accuracy: 0.9833333333333333
Predictions: ['versicolor', 'virginica']
