# Algorithm Tuning

Algorithm Tuning allows us to test different models on a given dataset, and helps to figure out which particular model gives the highest value of a user-defined performance metric on that particular dataset.

Clone the repo with notebooks and corresponding data. 

In [None]:
!git clone https://github.com/TurboML-Inc/colab-notebooks.git

Set up the environment and install TurboML's SDK. 

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()
!bash colab-notebooks/install_turboml.sh

The kernel should now be restarted with TurboML's SDK installed.

In [None]:
cd colab-notebooks

Login to your TurboML instance.

In [None]:
import pandas as pd
import turboml as tb
tb.init(backend_url=BACKEND_URL, api_key=API_KEY)
from sklearn import metrics

Importing the necessary modules and reading the dataset.

In [None]:
transactions_df = pd.read_csv("data/transactions.csv").reset_index()
labels_df = pd.read_csv("data/labels.csv").reset_index()

## Dataset

We use the `PandasDataset` class to create a dataset to be used for tuning, and also configure the dataset to indicate the column with the primary key.

For this example, we use the first 100k rows.

In [None]:
transactions_100k = tb.PandasDataset(
    dataframe=transactions_df[:100000], key_field="index", streaming=False
)
labels_100k = tb.PandasDataset(
    dataframe=labels_df[:100000], key_field="index", streaming=False
)

In [None]:
numerical_fields = [
    "transactionAmount",
]
categorical_fields = ["digitalItemCount", "physicalItemCount", "isProxyIP"]
inputs = transactions_100k.get_input_fields(
    numerical_fields=numerical_fields, categorical_fields=categorical_fields
)
label = labels_100k.get_label_field(label_field="is_fraud")

## Training/Tuning

We will be comparing the `Neural Network` and `Hoeffding Tree Classifier`, and the metric we will be optimizing is `accuracy`.

Configuring the NN according to the dataset.

In [None]:
new_layer = tb.NNLayer(output_size=2)

nn = tb.NeuralNetwork()
nn.layers.append(new_layer)

The `algorithm_tuning` function takes in the models being tested as a list along with the metric to test against, and returns an object for the model which had the highest score for the given metric.

In [None]:
model_score_list = tb.algorithm_tuning(
    models_to_test=[
        tb.HoeffdingTreeClassifier(n_classes=2),
        nn,
    ],
    metric_to_optimize="accuracy",
    input=inputs,
    labels=label,
)
best_model, best_score = model_score_list[0]
best_model

# Testing

After finding out the best performing model, we can use it normally for inference on the entire dataset and testing on more performance metrics.

In [None]:
transactions_full = tb.PandasDataset(
    dataframe=transactions_df, key_field="index", streaming=False
)
features = transactions_full.get_input_fields(
    numerical_fields=numerical_fields, categorical_fields=categorical_fields
)

outputs = best_model.predict(features)

In [None]:
print(
    "Accuracy: ",
    metrics.accuracy_score(labels_df["is_fraud"], outputs["predicted_class"]),
)
print("F1: ", metrics.f1_score(labels_df["is_fraud"], outputs["predicted_class"]))