# SWMAL Exercise


## Hyperparameters and Gridsearch 


### Qa Explain GridSearchCV

There are two code cells below: 1) function setup, 2) the actual grid-search.

Review the code cells and write a __short__ summary. Mainly focus on __cell 2__, but dig into cell 1 if you find it interesting (notice the use of local-function, a nifty feature in python).
  
In detail, examine the lines:  
  
```python
grid_tuned = GridSearchCV(model, tuning_parameters, ..
grid_tuned.fit(X_train, y_train)
..
FullReport(grid_tuned , X_test, y_test, time_gridsearch)
```
and write a short description of how the `GridSeachCV` works: explain how the search parameter set is created and the overall search mechanism is functioning (without going into too much detail).

What role does the parameter `scoring='f1_micro'` play in the `GridSearchCV`, and what does `n_jobs=-1` mean? 


"Cell 1: function setup
This cell contains functions and setup related to training and evaluating machine learning models. Key points include:

The SearchReport function generates a report based on the results of grid or randomized search. It includes a GetBestModelCTOR function that constructs a string representation of the best model and its parameters.

The ClassificationReport function generates a detailed classification report based on the trained model and test data.

The FullReport function combines the results of SearchReport and ClassificationReport and prints them.

The LoadAndSetupData function loads and sets up the dataset based on the specified mode.

There is a TryKerasImport function that checks if the import of Keras or TensorFlow.keras is successful.

Cell 2: the actual grid-search
This cell performs the actual grid search:

Data is loaded using the LoadAndSetupData function for the 'iris' mode.

An SVM model (svm.SVC) is defined with a fixed gamma and a set of tuning parameters (tuning_parameters) to be searched through. It also specifies the number of cross-validation folds (CV), verbose level (VERBOSE), and n_jobs=-1 to utilize all available processors.

A GridSearchCV object (grid_tuned) is created with the model, tuning parameters, cross-validation strategy, scoring metric ('f1_micro'), verbose level, and the number of parallel jobs.

The grid search is executed by calling the fit method on grid_tuned with the training data (X_train and y_train).

Results are reported using the FullReport function, which prints the best parameters, best score, and generates a classification report for the test dataset.

In summary, the GridSearchCV is used to systematically search through a specified parameter grid for the best hyperparameters of a given model. The parameter set is created based on the provided tuning parameters. The scoring='f1_micro' parameter indicates that the model's performance is evaluated using the F1 micro score. The n_jobs=-1 parameter means that the search is performed in parallel using all available processors, which can significantly speed up the process."
Answered by chatGPT

In [4]:
# TODO: Qa, code review..cell 1) function setup

from time import time
import numpy as np
import sys

from sklearn import svm
from sklearn.linear_model import SGDClassifier

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split
from sklearn.metrics import classification_report, f1_score
from sklearn import datasets

import sys,os
sys.path.append(os.path.expanduser('../'))
from libitmal import dataloaders as itmaldataloaders # Needed for load of iris, moon and mnist

currmode="N/A" # GLOBAL var!

def SearchReport(model): 
    
    def GetBestModelCTOR(model, best_params):
        def GetParams(best_params):
            ret_str=""          
            for key in sorted(best_params):
                value = best_params[key]
                temp_str = "'" if str(type(value))=="<class 'str'>" else ""
                if len(ret_str)>0:
                    ret_str += ','
                ret_str += f'{key}={temp_str}{value}{temp_str}'  
            return ret_str          
        try:
            param_str = GetParams(best_params)
            return type(model).__name__ + '(' + param_str + ')' 
        except:
            return "N/A(1)"
        
    print("\nBest model set found on train set:")
    print()
    print(f"\tbest parameters={model.best_params_}")
    print(f"\tbest '{model.scoring}' score={model.best_score_}")
    print(f"\tbest index={model.best_index_}")
    print()
    print(f"Best estimator CTOR:")
    print(f"\t{model.best_estimator_}")
    print()
    try:
        print(f"Grid scores ('{model.scoring}') on development set:")
        means = model.cv_results_['mean_test_score']
        stds  = model.cv_results_['std_test_score']
        i=0
        for mean, std, params in zip(means, stds, model.cv_results_['params']):
            print("\t[%2d]: %0.3f (+/-%0.03f) for %r" % (i, mean, std * 2, params))
            i += 1
    except:
        print("WARNING: the random search do not provide means/stds")
    
    global currmode                
    assert "f1_micro"==str(model.scoring), f"come on, we need to fix the scoring to be able to compare model-fits! Your scoreing={str(model.scoring)}...remember to add scoring='f1_micro' to the search"   
    return f"best: dat={currmode}, score={model.best_score_:0.5f}, model={GetBestModelCTOR(model.estimator,model.best_params_)}", model.best_estimator_ 

def ClassificationReport(model, X_test, y_test, target_names=None):
    assert X_test.shape[0]==y_test.shape[0]
    print("\nDetailed classification report:")
    print("\tThe model is trained on the full development set.")
    print("\tThe scores are computed on the full evaluation set.")
    print()
    y_true, y_pred = y_test, model.predict(X_test)                 
    print(classification_report(y_true, y_pred, target_names=target_names))
    print()
    
def FullReport(model, X_test, y_test, t):
    print(f"SEARCH TIME: {t:0.2f} sec")
    beststr, bestmodel = SearchReport(model)
    ClassificationReport(model, X_test, y_test)    
    print(f"CTOR for best model: {bestmodel}\n")
    print(f"{beststr}\n")
    return beststr, bestmodel
    
def LoadAndSetupData(mode, test_size=0.3):
    assert test_size>=0.0 and test_size<=1.0
    
    def ShapeToString(Z):
        n = Z.ndim
        s = "("
        for i in range(n):
            s += f"{Z.shape[i]:5d}"
            if i+1!=n:
                s += ";"
        return s+")"

    global currmode
    currmode=mode
    print(f"DATA: {currmode}..")
    
    if mode=='moon':
        X, y = itmaldataloaders.MOON_GetDataSet(n_samples=5000, noise=0.2)
        itmaldataloaders.MOON_Plot(X, y)
    elif mode=='mnist':
        X, y = itmaldataloaders.MNIST_GetDataSet(load_mode=0)
        if X.ndim==3:
            X=np.reshape(X, (X.shape[0], -1))
    elif mode=='iris':
        X, y = itmaldataloaders.IRIS_GetDataSet()
    else:
        raise ValueError(f"could not load data for that particular mode='{mode}', only 'moon'/'mnist'/'iris' supported")
        
    print(f'  org. data:  X.shape      ={ShapeToString(X)}, y.shape      ={ShapeToString(y)}')

    assert X.ndim==2
    assert X.shape[0]==y.shape[0]
    assert y.ndim==1 or (y.ndim==2 and y.shape[1]==0)    
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=0, shuffle=True
    )
    
    print(f'  train data: X_train.shape={ShapeToString(X_train)}, y_train.shape={ShapeToString(y_train)}')
    print(f'  test data:  X_test.shape ={ShapeToString(X_test)}, y_test.shape ={ShapeToString(y_test)}')
    print()
    
    return X_train, X_test, y_train, y_test

def TryKerasImport(verbose=True):
    
    kerasok = True
    try:
        import keras as keras_try
    except:
        kerasok = False

    tensorflowkerasok = True
    try:
        import tensorflow.keras as tensorflowkeras_try
    except:
        tensorflowkerasok = False
        
    ok = kerasok or tensorflowkerasok
    
    if not ok and verbose:
        if not kerasok:
            print("WARNING: importing 'keras' failed", file=sys.stderr)
        if not tensorflowkerasok:
            print("WARNING: importing 'tensorflow.keras' failed", file=sys.stderr)

    return ok
    
print(f"OK(function setup" + ("" if TryKerasImport() else ", hope MNIST loads works because it seems you miss the installation of Keras or Tensorflow!") + ")")

OK(function setup)


In [5]:
# TODO: Qa, code review..cell 2) the actual grid-search

# Setup data
X_train, X_test, y_train, y_test = LoadAndSetupData(
    'iris')  # 'iris', 'moon', or 'mnist'

# Setup search parameters
model = svm.SVC(
    gamma=0.001
)  # NOTE: gamma="scale" does not work in older Scikit-learn frameworks,
# FIX:  replace with model = svm.SVC(gamma=0.001)

tuning_parameters = {
    'kernel': ('linear', 'rbf'), 
    'C': [0.1, 1, 10]
}

CV = 5
VERBOSE = 0

# Run GridSearchCV for the model
grid_tuned = GridSearchCV(model,
                          tuning_parameters,
                          cv=CV,
                          scoring='f1_micro',
                          verbose=VERBOSE,
                          n_jobs=-1)

start = time()
grid_tuned.fit(X_train, y_train)
t = time() - start

# Report result
b0, m0 = FullReport(grid_tuned, X_test, y_test, t)
print('OK(grid-search)')

DATA: iris..
  org. data:  X.shape      =(  150;    4), y.shape      =(  150)
  train data: X_train.shape=(  105;    4), y_train.shape=(  105)
  test data:  X_test.shape =(   45;    4), y_test.shape =(   45)

SEARCH TIME: 4.11 sec

Best model set found on train set:

	best parameters={'C': 1, 'kernel': 'linear'}
	best 'f1_micro' score=0.9714285714285715
	best index=2

Best estimator CTOR:
	SVC(C=1, gamma=0.001, kernel='linear')

Grid scores ('f1_micro') on development set:
	[ 0]: 0.962 (+/-0.093) for {'C': 0.1, 'kernel': 'linear'}
	[ 1]: 0.371 (+/-0.038) for {'C': 0.1, 'kernel': 'rbf'}
	[ 2]: 0.971 (+/-0.047) for {'C': 1, 'kernel': 'linear'}
	[ 3]: 0.695 (+/-0.047) for {'C': 1, 'kernel': 'rbf'}
	[ 4]: 0.952 (+/-0.085) for {'C': 10, 'kernel': 'linear'}
	[ 5]: 0.924 (+/-0.097) for {'C': 10, 'kernel': 'rbf'}

Detailed classification report:
	The model is trained on the full development set.
	The scores are computed on the full evaluation set.

              precision    recall  f1-score  

### Qb Hyperparameter Grid Search using an SDG classifier

The svm.SVC model has been substituted with an SGDClassifier, and the model's hyperparameters have been adjusted. A  different loss function is now employed. Additionally, there has been set a maximum number of iterations and regularization to mitigate overfitting, though it results in a less flexible model. The parameter eta0 represents the initial learning rate, and penalty determines the type of regularization.

In [32]:
from sklearn.linear_model import SGDClassifier
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html

# Setup data
X_train, X_test, y_train, y_test = LoadAndSetupData(
    'iris')  # 'iris', 'moon', or 'mnist'

# Setup search parameters
model = SGDClassifier(
    loss='hinge',
    random_state=42
)  

tuning_parameters = {
    'loss': ['hinge', 'squared_hinge', 'perceptron'],
    'max_iter': [300, 900, 1400],
    'alpha': [0.0001, 0.001, 0.01],
    'eta0': [0.01, 0.1],
    'penalty': ['l1', 'l2'],
    'learning_rate': ['constant', 'optimal', 'invscaling'],
    'power_t': [0.1, 0.5],
    'early_stopping': [False],
    'validation_fraction': [0.1, 0.2],
    'n_iter_no_change': [5, 10, 20],
}


CV = 5
VERBOSE = 0

# Run GridSearchCV for the model
grid_tuned = GridSearchCV(model,
                          tuning_parameters,
                          cv=CV,
                          scoring='f1_micro',
                          verbose=VERBOSE,
                          n_jobs=-1)

start = time()
grid_tuned.fit(X_train, y_train)
t = time() - start

# Report result
b0, m0 = FullReport(grid_tuned, X_test, y_test, t)
print('OK(grid-search)')

DATA: iris..
  org. data:  X.shape      =(  150;    4), y.shape      =(  150)
  train data: X_train.shape=(  105;    4), y_train.shape=(  105)
  test data:  X_test.shape =(   45;    4), y_test.shape =(   45)

SEARCH TIME: 22.03 sec

Best model set found on train set:

	best parameters={'alpha': 0.001, 'early_stopping': False, 'eta0': 0.01, 'learning_rate': 'optimal', 'loss': 'perceptron', 'max_iter': 300, 'n_iter_no_change': 20, 'penalty': 'l1', 'power_t': 0.1, 'validation_fraction': 0.1}
	best 'f1_micro' score=0.9904761904761905
	best index=1672

Best estimator CTOR:
	SGDClassifier(alpha=0.001, eta0=0.01, loss='perceptron', max_iter=300,
              n_iter_no_change=20, penalty='l1', power_t=0.1, random_state=42)

Grid scores ('f1_micro') on development set:
	[ 0]: 0.905 (+/-0.085) for {'alpha': 0.0001, 'early_stopping': False, 'eta0': 0.01, 'learning_rate': 'constant', 'loss': 'hinge', 'max_iter': 300, 'n_iter_no_change': 5, 'penalty': 'l1', 'power_t': 0.1, 'validation_fraction': 0

### Qc Hyperparameter Random  Search using an SDG classifier


The default parameters for the random search were given and put in to the code for the RandomizedSearchCV


In [37]:
from sklearn.model_selection import RandomizedSearchCV

# Setup data
X_train, X_test, y_train, y_test = LoadAndSetupData('iris')

# Setup search parameters for RandomizedSearchCV
model = SGDClassifier(loss='hinge', random_state=42)

random_tuning_parameters = {
    'loss': ['hinge', 'squared_hinge', 'perceptron'],
    'max_iter': [300, 900, 1400, 2000],
    'alpha': [0.0001, 0.001, 0.01, 0.1],
    'eta0': [0.01, 0.1, 0.2],
    'penalty': ['l1', 'l2'],
    'learning_rate': ['constant', 'optimal', 'invscaling'],
    'power_t': [0.1, 0.5, 0.8],
    'early_stopping': [False],
    'validation_fraction': [0.1, 0.2, 0.3],
    'n_iter_no_change': [5, 10, 20],
}

CV = 5
VERBOSE = 0

# Run RandomizedSearchCV for the model
random_tuned = RandomizedSearchCV(
    model,
    random_tuning_parameters,
    n_iter=20,
    random_state=42,
    cv=CV,
    scoring='f1_micro',
    verbose=VERBOSE,
    n_jobs=-1
)

start_random = time()
random_tuned.fit(X_train, y_train)
t_random = time() - start_random

# Report result for RandomizedSearchCV
b_random, m_random = FullReport(random_tuned, X_test, y_test, t_random)
print('OK(randomized-search)')


DATA: iris..
  org. data:  X.shape      =(  150;    4), y.shape      =(  150)
  train data: X_train.shape=(  105;    4), y_train.shape=(  105)
  test data:  X_test.shape =(   45;    4), y_test.shape =(   45)

SEARCH TIME: 0.17 sec

Best model set found on train set:

	best parameters={'validation_fraction': 0.3, 'power_t': 0.5, 'penalty': 'l1', 'n_iter_no_change': 5, 'max_iter': 2000, 'loss': 'hinge', 'learning_rate': 'optimal', 'eta0': 0.1, 'early_stopping': False, 'alpha': 0.01}
	best 'f1_micro' score=0.9619047619047618
	best index=11

Best estimator CTOR:
	SGDClassifier(alpha=0.01, eta0=0.1, max_iter=2000, penalty='l1',
              random_state=42, validation_fraction=0.3)

Grid scores ('f1_micro') on development set:
	[ 0]: 0.571 (+/-0.447) for {'validation_fraction': 0.1, 'power_t': 0.1, 'penalty': 'l2', 'n_iter_no_change': 10, 'max_iter': 300, 'loss': 'squared_hinge', 'learning_rate': 'constant', 'eta0': 0.2, 'early_stopping': False, 'alpha': 0.01}
	[ 1]: 0.886 (+/-0.177) for {

Grid search is systematic and ensures to explore all combinations and making it exhaustive but potentially computationally expensive.
Random search is more flexible, exploring a random subset of combinations, which can be advantageous when the hyperparameter space is vast.
Grid search may be computationally expensive, especially with large hyperparameter spaces.
Random search might be more time-efficient in such cases but may not guarantee finding the absolute best set of hyperparameters.

Both grid search (GridSearchCV) and random search (RandomizedSearchCV) are used with the same set of initial hyperparameter values. This allows for a direct comparison of their results, demonstrating the trade-offs between an exhaustive search and a more flexible, randomized approach.

__Result of GridSearchCV__
best: dat=iris, score=0.99048, model=SGDClassifier(alpha=0.001,early_stopping=False,eta0=0.01,learning_rate='optimal',loss='perceptron',max_iter=300,n_iter_no_change=20,penalty='l1',power_t=0.1,validation_fraction=0.1)

__Result of RandomizedSearchCV__
best: dat=iris, score=0.96190, model=SGDClassifier(alpha=0.01,early_stopping=False,eta0=0.1,learning_rate='optimal',loss='hinge',max_iter=2000,n_iter_no_change=5,penalty='l1',power_t=0.5,validation_fraction=0.3)

As the results show the GridSearchCV has the highest score of 0.99048 compared to 0.96190 thereby showing that the combination of hyperparameters found by GridSearchCV resulted in a better-performing model on th f1_micro scoring metric.

## Qd MNIST Search Quest II

For the MNIST dataset, we chose the RandomForestClassifier, which is a versatile learning model known for its effectiveness in classification tasks. We think it would be good for the complexity of the MNIST dataset by combining multiple decision trees, thereby improving overall predictive accuracy and mitigating overfitting.

The RandomForestClassifier was configured with a set of hyperparameters defined in the tuning_parameters. These parameters cover various aspects such as the number of trees (n_estimators), tree depth (max_depth), minimum samples required to split an internal node (min_samples_split), and other crucial settings.

To efficiently search through this hyperparameter space, we employed RandomizedSearchCV.

To ensure reproducibility, we set the random_state parameter, and a higher number of iterations (n_iter=100) in RandomizedSearchCV allows for an extensive exploration of hyperparameter combinations, increasing the likelihood of finding optimal settings.


In [None]:
from sklearn.ensemble import RandomForestClassifier

# Setup data
X_train, X_test, y_train, y_test = LoadAndSetupData(
    'mnist')  # 'iris', 'moon', or 'mnist'

# Setup search parameters
model = RandomForestClassifier(
    random_state=83
)  

tuning_parameters = {
    'n_estimators': [10, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'class_weight': [None, 'balanced', 'balanced_subsample'],
    'max_leaf_nodes': [None, 5, 10, 20],
    'min_impurity_decrease': [0.0, 0.1, 0.2],
    'oob_score': [True, False],
    'max_features': ['sqrt', 'log2'],
    'bootstrap': [True],
    'criterion': ['gini', 'entropy'],
    'warm_start': [True, False],
    'ccp_alpha': [0.0, 0.1, 0.2]
}

CV = 5
VERBOSE = 0

# Run GridSearchCV for the model
grid_tuned = RandomizedSearchCV(model, 
                                tuning_parameters, 
                                n_iter=100, 
                                random_state=66, 
                                cv=CV, 
                                scoring='f1_micro', 
                                verbose=VERBOSE, 
                                n_jobs=-1)

start = time()
grid_tuned.fit(X_train, y_train)
t = time() - start

# Report result
b0, m0 = FullReport(grid_tuned, X_test, y_test, t)
print('OK(grid-search)')


__Conclusion__

We made various adjustments, tweaking parameters like the number of iterations (n_iter), max-depth, min sample split, and others. Despite these changes, the accuracy score stayed close to an average of 0.96, with only a slight 0.001 difference after tuning. This hints that the dataset might not contain enough information to achieve a significantly higher accuracy score. The best result is as follows:

Best Result: Dataset=MNIST, Score=0.96553, Model=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight='balanced_subsample', criterion='entropy', max_depth=20, max_features='sqrt', max_leaf_nodes=None, min_impurity_decrease=0.0, min_samples_leaf=1, min_samples_split=5, n_estimators=200, oob_score=False, warm_start=True).