___

<a href='https://github.com/ai-vithink'> <img src='https://avatars1.githubusercontent.com/u/41588940?s=200&v=4' /></a>
___


# Machine Learning with Python

**Here we will understand how we will be using Python for Machine Learning and how we will use Sci-Kit Learn package.**

**Scikit Learn** is the most popular machine learning package for python and has a lot of algorithms already built-in.

## Installation

* **conda install scikit-learn**
* **pip install scikit-learn**


## Basic Structure of Scikit-Learn (sklearn)

![image.png](attachment:image.png)

* As we can see machine learning process starts off with data, followed by data cleaning and data formatting so that ML model can accept it. 
* Before giving the data to ML model we have to split the data into a test and training set. Goal is to train our model on training set and then test our model on the test set. 
* Then we iterate the model and tune our parameters until model is ready to deploy.


## An example of using Scikit-Learn

Do not worry about memorizing these steps, we will see these in subsequent lectures and you will get used to these once you start doing these on your own. This is just an overview.

1. Every Algo is exposed in scikit-learn via an "Estimator". First you'll import the model, the general form is 

```python
from sklearn.family import Model
For example : 
from sklearn.linear_model import LinearRegression
```
2. Next step is to instantiate the model, in our case it is LinearRegression

**Estimator Parameters :** All the parameters of an estimator can be set when it is instantiated, and has suitable default values. We can use shift+tab in jupyter to check the possible parameters.

For example 
```python
model = LinearRegression(normalize=True)
print(model) 

LinearRegression(copy_X=True,fit_intercept=True,normalize=True)
```

We only need to tune the parameters when we are looking for something more specific.

3. Once model is created then we move towards fitting model with some data, also remember that we should split the data into a training and test set. Below is an example for the same.

In [43]:
import numpy as np # Create dummy data
from sklearn.linear_model import LinearRegression
model = LinearRegression(normalize=True)
from sklearn.model_selection import  train_test_split # Splitting dummy data into testing and training set
X,y = np.arange(10).reshape((5,2)), range(5) # Sets of data X and y, X are the actual features and y are the label for X
X

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [44]:
#Using train test split we pass in X and y and test_size
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
X_train
# train_test_split takes X and y and it takes test_size, sklearn will automatically output our training set and testing set

# so we have X_train and y_train and then X_test and y_test

array([[0, 1],
       [8, 9],
       [2, 3]])

In [45]:
y_train #We have labels for our training and testing data as well as features for training and testing data

[0, 4, 1]

In [46]:
#After splitting the data we can train/fit the model based on training data, done through model.fit() method

model.fit(X_train,y_train)

# We take the model and then we say model.fit() and we pass in our training data where X_train are features of our data and
# then we pass in y_train which are training labels. After the model has been trained and fit on training data, the model is
# ready to predict labels or values on the test set. This is an example of supervised learning process, process is different for
# unsupervised method, we get the predicted values using predict method.


LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=True)

In [47]:
# We get predictions using predict method called upon feature or X_test and we predict for y_test or label

predictions = model.predict(X_test)

* Now we can evaluate our model by comparing our predictions to correct values. Evaluation depends on what sort of ML algorithm are we using say Regression, Classification, Clustering etc.

# Recap

* Scikit-learn aims to have a uniform interface across all methods.
* Given a scikit-learn estimator object named model, the following methods are available : 

## Available in all Estimators 
    * model.fit() : fit training data
    * For supervised learning applications, this accepts 2 arguments: the data X and labels y e.g. model.fit(X,y)
    * For unsupervised learning applications, this accepts only a single argument, the data or feature X e.g. model.fit(X) as unsupervised learning works with unlabelled data.
    
### Available in supervised estimators :
    
    * model.predict(): given a trained model, predict the label of a new set of data. This method accepts one argument, the new data X_new e.g. model.predict(X_new) and returns the learned label for each object in the array.
    
    * model.predict_proba() : Also available in supervised estimators there is this method, for classification problems, some estimators provide this method, which returns the probability that a new observation has each categorical label. In this case the label with highest probability is returned by model.predict().
    
    * model.score() : For classification or regression problems, most estimators implement a score method. Scores are between 0 and 1 with a large score indicating a better fit.
    
### Available in unsupervised estimators : 


    * model.predict() : Predict labels in clustering algorithms.
    
    * model.transform() : Given an unsupervised model, transform new data into the new basis. This also accepts one argument X_new and returns the new representation of the data based on the unsupervised model.
    
    * model.fit_transform() : Some estimators implement this method, which more efficiently performs a fit and transform on the same input data.

## Choosing an algorithm
![image.png](attachment:image.png)