### Recognizing hand written digits

* Example adopted from [scikit-learn](https://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html#sphx-glr-auto-examples-classification-plot-digits-classification-py)

* Train a SVM classifier to recognize hand written digits from [0-9]

* Datasets consists of `8x8` pixel images of digits

In [None]:
import covalent as ct
from sklearn import datasets, svm, metrics
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

### Break the training/model building into discrete steps

1. Load and prepare the dataset

2. Instantiate a base classifier

3. Split entire dataset into `training` and `test` sets

4. Train the classifier using the `training` set

5. Get predictions using the `test` set

6. Generate classification report

### Load the dataset


In [None]:
@ct.electron
def load_dataset():
    return datasets.load_digits()

### Instantiate a SVM classifier

In [None]:
@ct.electron
def build_classifier(gamma: float):
    return svm.SVC(gamma = gamma)

### Split the dataset into train/test splits

In [None]:
@ct.electron
def split_data(features, targets, test_set_size):
    x_train, x_test, y_train, y_test = train_test_split(features, targets, test_size=test_set_size, shuffle=False)
    return x_train, x_test, y_train, y_test

### Train the classifier

In [None]:
@ct.electron
def train_classifier(clf, features, targets):
    return clf.fit(features, targets)

### Get model predictions

In [None]:
@ct.electron
def get_predictions(clf, test_features):
    return clf.predict(test_features)

### Generate classification report

In [None]:
@ct.electron
def get_classification_report(y_test, predictions):
    return metrics.classification_report(y_test, predictions)

### Build the ML workflow

In [None]:
@ct.lattice
def classify_digits(gamma: float):
    dataset = load_dataset()
    clf = build_classifier(gamma)
    x_train, x_test, y_train, y_test = split_data(features=dataset.data, targets=dataset.target, test_set_size=0.5)
    clf = train_classifier(clf, features=x_train, targets=y_train)
    predictions = get_predictions(clf, x_test)
    clf_report = get_classification_report(y_test, predictions)
    return y_test, predictions, clf_report

#### Workflow graph

![SVM workflow](./assets/svm_ml_workflow.png)

### Dispatch the training workflow

In [None]:
dispatch_id = ct.dispatch(classify_digits)(gamma=0.001)

In [None]:
result = ct.get_result(dispatch_id=dispatch_id, wait=True)

In [None]:
y_test, predictions, clf_report = result.result

#### Classification report

In [None]:
print(clf_report)

In [None]:
metrics.ConfusionMatrixDisplay.from_predictions(y_test, predictions)

## Sublattices

In the above example, we arbitrarily chose $\gamma = 0.001$ for the classifier, however this may not the be most optimial choice for this parameter. In machine learning, hyper-parameter optimization is a crucial step in improving the model's performance and to find the subset of parameters that yield the best predictor/classifier.

With `Covalent` iterating exsiting workflows and making them larger by composing is very simple. We can create larger and re-usable workflows by simply decorating existing lattices with the `electron` keyword. This essentially creates a lattice that can be embedded within a larger workflow (`sublattice`)

In the context of optimizing our SVM classifier, we can make the `classify_workflow` a sublattice and make that a node in a larger workflow

In [None]:
@ct.electron
def get_model_accuracy_score(y_test, predictions):
    return accuracy_score(y_test, predictions)*100

In [None]:
@ct.electron
@ct.lattice
def classify_digits(gamma: float):
    dataset = load_dataset()
    clf = build_classifier(gamma)
    x_train, x_test, y_train, y_test = split_data(features=dataset.data, targets=dataset.target, test_set_size=0.5)
    clf = train_classifier(clf, features=x_train, targets=y_train)
    predictions = get_predictions(clf, x_test)
    clf_report = get_classification_report(y_test, predictions)
    return y_test, predictions, clf_report

### Workflow for tuning $\gamma$

In [None]:
@ct.lattice
def hyperparameter_tune_gamma(gamma_values):
    results = {}
    for gamma in gamma_values:
        results[f"{gamma}"] = {}
        y_test, predictions, clf_report = classify_digits(gamma)
        results[f"{gamma}"]["accuracy"] = get_model_accuracy_score(y_test, predictions)
    return results

In [None]:
import numpy as np
gamma_values = np.linspace(0.001, 0.003, 10)
dispatch_id = ct.dispatch(hyperparameter_tune_gamma)(gamma_values)
print(dispatch_id)

In [None]:
results = ct.get_result(dispatch_id=dispatch_id, wait=True)

In [None]:
for gamma in gamma_values:
    print(f"Gamma: {gamma}", f"score: {results.result[f'{gamma}']['accuracy']}")