# Tuning Pipeline

In [0]:
from sklearn import set_config; set_config(display='diagram')

👇 Consider the following dataset.

In [0]:
import pandas as pd

data = pd.read_csv("data.csv")

data.head()

- Each observations represents a player
- Each column a characteristic of performance

The target defines whether the player last less than 5 years `0` or 5 years or more `1` as a professional.

In [0]:
X = data.drop(columns="target_5y")
y = data['target_5y']

## Pipeline

👇 We are giving you the simple pipeline below

In [28]:
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
from scipy.stats import uniform

X = data.drop(columns='target_5y')
y = data['target_5y']

pipe = Pipeline([
    ('imputer', SimpleImputer()),
    ('scaler', MinMaxScaler()),
    ('model', SVC())
])

grid_search = GridSearchCV(pipe, 
                           param_grid={
                           'imputer__strategy' :  ['mean', 'median', 'most_frequent', 'constant']},
                           scoring='precision')

grid_search.fit(X, y)
grid_search.best_params_

{'imputer__strategy': 'constant'}

## Fine Tuning

In [25]:
random_search = RandomizedSearchCV(pipe,
                           param_distributions={
                           'model__C' :  uniform(0.1, 10),
                           'model__kernel' : ['linear', 'poly', 'rbf', 'sigmoid']},
                           scoring='precision')

random_search.fit(X, y)
best_model = random_search.best_estimator_
random_search.best_params_

{'model__C': 2.678177601894749, 'model__kernel': 'poly'}

❓ **Fine-tune this pipeline so as to maximize your objective**

- Use the `scoring` metric appropriate for the task
- Grid Search for the optimal:
    - imputing `strategy`
    - `kernel`
    - regularization factor `C`... 


- Store your random search results in a `search`

In [29]:
import pickle

# Export pipeline as pickle file
with open("pipeline.pkl", "wb") as file:
    pickle.dump(best_model, file)