# Template Model Development Notebook

The instructions in this notebook will guide you to setup your ML project, and help you connect the pieces together for the automated deployment workflow between this ML repository and Algorithmia.

## 1. Train, evaluate and test

In [1]:
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
import numpy as np
from sklearn.model_selection import train_test_split, cross_validate


if __name__ == "__main__":
    X, y = make_classification(
        n_samples=1000, n_features=10, class_sep=0.1, random_state=42
    )
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=0
    )

    clf = RandomForestClassifier(n_estimators=10, max_depth=8, random_state=42)
    clf.fit(X_train, y_train)
    rfc_predict = clf.predict(X_test)

    confuseds = []
    confuseds_y = []
    confidents = []

    raw_proba = clf.predict_proba(X_test)
    probs = raw_proba[:, 0]
    not_confident = 0
    for i in range(X_test.shape[0]):
        proba = probs[i]
        if proba > 0.40 and proba < 0.6:
            confuseds.append(X_test[i])
            confuseds_y.append(y_test[i])
            not_confident += 1
        else:
            confidents.append(X_test[i])
    print(f"not confident: {not_confident}")

    np_confidents = np.array(confidents)
    np_confuseds = np.array(confuseds)
    np_confuseds_y = np.array(confuseds_y)

    raw_proba = clf.predict_proba(confidents)
    probs = raw_proba[:, 0]
    not_confident = 0
    for i in range(len(confidents)):
        proba = probs[i]
        if proba > 0.40 and proba < 0.6:
            not_confident += 1
    print(f"not confident: {not_confident}")

    X_train = np.concatenate((X_train, np_confuseds))
    y_train = np.concatenate((y_train, np_confuseds_y))
    clf = RandomForestClassifier(n_estimators=10, max_depth=8, random_state=42)
    clf.fit(X_train, y_train)
    rfc_predict = clf.predict(X_test)

    raw_proba = clf.predict_proba(X_test)
    probs = raw_proba[:, 0]
    for i in range(X_test.shape[0]):
        proba = probs[i]
        if proba > 0.40 and proba < 0.6:
            not_confident += 1
        else:
            confidents.append(X_test[i])
    print(f"not confident: {not_confident}")


((79, 6), (71, 6))

## 2. Saving model object to a file
When your model is ready, save it locally. Expected model file name by the Github Action is **`model.pkl`** by default. If your model file is named differently, make sure to put it in your **`./github/workflows/algorithmia_deploy.yml`** 

You do not need to check-in your model file to this repository. During the workflow, this notebook will be executed on a Github worker machine and the resulting model file will be uploaded to Algorithmia by our Github Action.

In [None]:
joblib.dump(model, "model.pkl", compress=True)

## 3. Testing serving (Algorithm) code

In [None]:
%run  synthetic_binaryclassifier/src/synthetic_binaryclassifier.py

## Final checks before committing this notebook

- Once you are happy with your tests, you can remove the **`if __name__ == "__main__"`** snippet from your algorithm script. Remember that all of its contents will be pushed to Algorithmia. Don't stress though, you can always edit!

- Make sure that you do not commit/push your Algorithmia API Key!