<h2> Estimators </h2>

The Scitkit-learn library provides a very large variety of pre-built algorithms to perform both supervised and unsupervised machine learning.

The estimator you choose for your project will depend on the data set you have and the problem that you are trying to solve. What makes Scikit-learn so straight forward to use is that regardless of the model or algorithm you are using, the code structure for model training and prediction is the same.
<img src='ml_map.png' width="600" height="900">
Letâ€™s say you are working on a regression problem and want to train a linear regression algorithm and use the resulting model to make predictions.

The code examples throughout this notebook are run on the classic wine data set which can be imported directly from the Scikit-learn API.

In [31]:
from sklearn.datasets import load_wine

#X,y = load_wine(return_X_y=True)
# Data
# Raw Data
wine_data = load_wine()

# Unscaled Data
y = wine_data.target
X = wine_data.data

<h3> Steps</h3>

<ol>
<li>Call the <b>estimator</b> & save it as an object</li>
<li>Split data into train & test datae</li>
<li>Perform data pre-processing</li>
<li>Perform estimator fit on features & target data and save this model as an object</li>
<li>Perform prediction on unseen data</li>
<li>Evaluate prediction model</li>
<li>Model optimisation - where possible</li>
</ol> 

In [32]:
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()

In [37]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [40]:
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='mean')
X_train_clean = imputer.fit_transform(X_train)

# Z-Scored Data
sc_z =StandardScaler(with_mean=True, with_std=True)

#y_z = (y_train - y_train.mean())/y_train.std()
X_z = sc_z.fit_transform(X_train_clean)

In [59]:
model = lr.fit(X_z, y_train)

In [60]:
predictions = model.predict(sc_z.fit_transform(X_test))

In [61]:
from sklearn.metrics import classification_report

print(classification_report(predictions, y_test))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       0.95      1.00      0.98        20
           2       1.00      0.89      0.94         9

    accuracy                           0.98        45
   macro avg       0.98      0.96      0.97        45
weighted avg       0.98      0.98      0.98        45



In [68]:
from sklearn.model_selection import GridSearchCV

param_grid = { 
    'solver': ['newton-cg', 'lbfgs', 'sag'],
    'max_iter': [75, 100, 125],
    'multi_class' : ['auto', 'ovr', 'multinomial'],
    }

CV = GridSearchCV(lr, param_grid, n_jobs= 1)
                  
CV.fit(X_z, y_train)  
print(CV.best_params_)    
print(CV.best_score_)

{'max_iter': 75, 'multi_class': 'ovr', 'solver': 'newton-cg'}
0.9626780626780626


In [53]:
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()
rf_model = rf.fit(X_z, y_train)
rf_predictions = rf_model.predict(sc_z.fit_transform(X_test))

param_grid = { 
    'n_estimators': [20, 50],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth' : [4,5,6],
    'criterion' :['gini', 'entropy']
}

CV = GridSearchCV(rf, param_grid, n_jobs= 1)
                  
CV.fit(X_train_clean, y_train)  
print(CV.best_params_)    
print(CV.best_score_)

{'criterion': 'gini', 'max_depth': 4, 'max_features': 'auto', 'n_estimators': 20}
0.9851851851851852


<h3> Pipelines </h3>

In [72]:
from sklearn.pipeline import Pipeline
pipe = Pipeline([('imputer', SimpleImputer()), ('rf', RandomForestClassifier(criterion='gini', max_depth=4, max_features= 'auto', n_estimators= 20))])
pipeline_model = pipe.fit(X_train, y_train)
pipeline_model.score(X_test, y_test)

0.9777777777777777

#### References:
__<a href="https://towardsdatascience.com/a-beginners-guide-to-scikit-learn-14b7e51d71a4">Link to Reference [1]</a>__