<a href="https://colab.research.google.com/github/cagBRT/Data/blob/main/Cost_Sensitive_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Grid Search Weighted Logistic Regression**<br>

In [None]:
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
from collections import Counter

**Create an imbalanced dataset**

Dataset with two classes

In [None]:
X, y = make_classification(n_samples=10000, n_features=2, n_redundant=0,
n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=2)

**Define the model**

In [None]:
model = LogisticRegression(solver='lbfgs')

**Use grid search to determine the best weight for the dataset**

In [None]:
# define grid
balance = [{0:100,1:1}, {0:10,1:1}, {0:1,1:1}, {0:1,1:10}, {0:1,1:100}] 
param_grid = dict(class_weight=balance)
print(param_grid)

**Use Cross validaion and grid search to determine the best balance**

In [None]:
# define evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# define grid search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=cv,
scoring='roc_auc')
# execute the grid search
grid_result = grid.fit(X, y)

In [None]:
# report the best configuration
print('Best: %f using %s' % (grid_result.best_score_, grid_result.best_params_)) # report all configurations
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
  print('%f (%f) with: %r' % (mean, stdev, param))

**Assignment** 

1. Use the following code to create a dataset with 3 imbalanced classes
2. Do a grid search to find the best weights for 3 classes

Create an imbalanced multiclass dataset

In [None]:
weight_of_classes=[0.98,0.01,0.01]
X_multi, y_multi = make_classification(n_classes = 3,n_samples=10000, n_features=2, n_redundant=0,
      n_clusters_per_class=1, weights=weight_of_classes, flip_y=0, random_state=2)
# summarize class distribution
counter = Counter(y_multi)
print(counter)

Set the weights to be explored

In [None]:
# define grid
balance = [{0:98,1:1,2:1}, {0:10,1:1,2:1}, {0:1,1:1,2:1}, {0:1,1:10,2:1}, {0:1,1:98,2:1}] 
param_grid = dict(class_weight=balance)

Change the scoring for multiclass

In [None]:
# define evaluation procedure
from sklearn.metrics import make_scorer
from sklearn.metrics import accuracy_score

scoring = {'accuracy': make_scorer(accuracy_score),
           'prec': 'precision'}
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# define grid search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=cv,
)
# execute the grid search
grid_result = grid.fit(X_multi, y_multi)