# Tabular Classification Example

## Introduction

In this notebook, we'll walk-through a detailed example of how you can use Velour to evaluate classifications made on a tabular dataset. We'll use `sklearn's` breast cancer dataset to make a binary prediction about whether a woman has breast cancer based on a table of descriptive features (e.g., mean radius, mean texture, etc.). 

For a conceptual introduction to Velour, [check out our project overview](https://striveworks.github.io/velour/). For a higher-level example notebook, [check out our "Getting Started" notebook](https://github.com/Striveworks/velour/blob/main/examples/getting_started.ipynb).


## Defining Our Datasets

We start by fetching our dataset, dividing it into test/train splits, and uploading both sets to Velour.

In [6]:
%pip install scikit-learn

from velour import Dataset, Model, Datum, Annotation, GroundTruth, Prediction, Label
from velour.enums import TaskType
from velour.client import Client

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report

# connect to Velour API
client = Client("http://localhost:8000")

Looking in indexes: https://pypi.org/simple, https://aws:****@striveworks-724664234782.d.codeartifact.us-east-1.amazonaws.com/pypi/striveworks/simple




Note: you may need to restart the kernel to use updated packages.
Successfully connected to host at http://localhost:8000/


In [7]:
# load data from sklearn
dset = load_breast_cancer()
dset.feature_names

array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
       'mean smoothness', 'mean compactness', 'mean concavity',
       'mean concave points', 'mean symmetry', 'mean fractal dimension',
       'radius error', 'texture error', 'perimeter error', 'area error',
       'smoothness error', 'compactness error', 'concavity error',
       'concave points error', 'symmetry error',
       'fractal dimension error', 'worst radius', 'worst texture',
       'worst perimeter', 'worst area', 'worst smoothness',
       'worst compactness', 'worst concavity', 'worst concave points',
       'worst symmetry', 'worst fractal dimension'], dtype='<U23')

In [8]:
# split datasets
X, y, target_names = dset["data"], dset["target"], dset["target_names"]
X_train, X_test, y_train, y_test = train_test_split(X, y)

# show an example input
X_train.shape, y_train[:4], target_names

((426, 30), array([1, 1, 1, 0]), array(['malignant', 'benign'], dtype='<U9'))

In [9]:
# create train dataset in Velour
velour_train_dataset = Dataset(client, "breast-cancer-train", reset=True)

# create test dataset in Velour
velour_test_dataset = Dataset(client, "breast-cancer-test", reset=True)

### Adding GroundTruths to our Dataset

Now that our two datasets exists in Velour, we can add `GroundTruths` to each dataset.

In [10]:
# format training groundtruths
training_groundtruths = [
    GroundTruth(
        datum=Datum(
            uid=f"train{i}",
        ),
        annotations=[
            Annotation(
                task_type=TaskType.CLASSIFICATION,
                labels=[Label(key="class", value=target_names[t])]
            )
        ]
    )
    for i, t in enumerate(y_train)
]

# format testing groundtruths
testing_groundtruths = [
    GroundTruth(
        datum=Datum(
            uid=f"test{i}",
        ),
        annotations=[
            Annotation(
                task_type=TaskType.CLASSIFICATION,
                labels=[Label(key="class", value=target_names[t])]
            )
        ]
    )
    for i, t in enumerate(y_test)
]

# add the training groundtruths
for gt in training_groundtruths:
    velour_train_dataset.add_groundtruth(gt)

# add the testing groundtruths
for gt in testing_groundtruths:
    velour_test_dataset.add_groundtruth(gt)

### Finalizing Our Datasets

Lastly, we finalize both datasets to prep them for evaluation.

In [11]:
velour_train_dataset.finalize()
velour_test_dataset.finalize()

<Response [200]>

## Defining Our Model

Now that our `Datasets` have been defined, we can describe our model in Velour using the `Model` object.

In [12]:
# fit an sklearn model to our data
pipe = make_pipeline(StandardScaler(), LogisticRegression())
pipe.fit(X_train, y_train)

# get predictions on both of our datasets
y_train_probs = pipe.predict_proba(X_train)
y_test_probs = pipe.predict_proba(X_test)

# show an example output
y_train_probs[:4]

array([[6.69431274e-03, 9.93305687e-01],
       [9.53969288e-05, 9.99904603e-01],
       [2.06847800e-01, 7.93152200e-01],
       [9.99999993e-01, 6.66395645e-09]])

In [13]:
# create our model in Velour
velour_model = Model(client, "breast-cancer-linear-model")

### Adding Predictions to Our Model

With our model defined in Velour, we can post predictions for each of our `Datasets` to our `Model` object. Each `Prediction` should contain a list of `Labels` describing the prediction and its associated confidence score. Since we're running a classification task, the confidence scores over all prediction classes should sum to (approximately) 1.

In [14]:

# define our predictions
training_predictions = [
    Prediction(
        datum=Datum(
            dataset=velour_train_dataset.name,
            uid=f"train{i}",
        ),
        annotations=[
            Annotation(
                task_type=TaskType.CLASSIFICATION,
                labels=[
                    Label(
                        key="class", 
                        value=target_names[j],
                        score=p,
                    )                        
                    for j, p in enumerate(prob)
                ]
            )
        ]
    )
    for i, prob in enumerate(y_train_probs)
]

testing_predictions = [
    Prediction(
        datum=Datum(
            dataset=velour_test_dataset.name,
            uid=f"test{i}",
        ),
        annotations=[
            Annotation(
                task_type=TaskType.CLASSIFICATION,
                labels=[
                    Label(
                        key="class",
                        value=target_names[j],
                        score=p,
                    )                        
                    for j, p in enumerate(prob)
                ]
            )
        ]
    )
    for i, prob in enumerate(y_test_probs)
]

# add the train predictions
for pd in training_predictions:
    velour_model.add_prediction(pd)

# add the test predictions
for pd in testing_predictions:
    velour_model.add_prediction(pd)

ClientException: cannot edit inferences for model`breast-cancer-linear-model` on dataset `breast-cancer-train` since it has been finalized

### Finalizing Our Model

Finally, we finalize our `Model` to prep it for evaluation.

In [None]:
velour_model.finalize_inferences(velour_train_dataset)
velour_model.finalize_inferences(velour_test_dataset)

## Evaluating Performance

With our `Dataset` and `Model` defined, we're ready to evaluate our performance and display the results in a `pd.DataFrame`. Note that we use the `wait_for_completion` method since all evaluations run as a postgres `BackgroundTask`; this method ensures that the evaluation finishes before we display the results.

In [None]:
train_eval_job = velour_model.evaluate_classification(velour_train_dataset)
train_eval_job.wait_for_completion()
results = train_eval_job.results()

results.dataframe

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,value
Unnamed: 0_level_1,Unnamed: 1_level_1,dataset,breast-cancer-train
type,parameters,label,Unnamed: 3_level_2
Accuracy,"{""label_key"": ""class""}",,0.988263
F1,"""n/a""",class: benign,0.991055
F1,"""n/a""",class: malignant,0.982935
Precision,"""n/a""",class: benign,0.989286
Precision,"""n/a""",class: malignant,0.986301
ROCAUC,"{""label_key"": ""class""}",,0.997294
Recall,"""n/a""",class: benign,0.992832
Recall,"""n/a""",class: malignant,0.979592


In [None]:
results.confusion_matrices

[{'label_key': 'class',
  'entries': [{'prediction': 'benign', 'groundtruth': 'benign', 'count': 277},
   {'prediction': 'benign', 'groundtruth': 'malignant', 'count': 3},
   {'prediction': 'malignant', 'groundtruth': 'benign', 'count': 2},
   {'prediction': 'malignant', 'groundtruth': 'malignant', 'count': 144}]}]

As a brief sanity check, we can check Velour's outputs against `sklearn's` own classification report. We see that the two results are equal.

In [None]:
y_train_preds = pipe.predict(X_train)
print(classification_report(y_train, y_train_preds, digits=6, target_names=target_names))

              precision    recall  f1-score   support

   malignant   0.986301  0.979592  0.982935       147
      benign   0.989286  0.992832  0.991055       279

    accuracy                       0.988263       426
   macro avg   0.987794  0.986212  0.986995       426
weighted avg   0.988256  0.988263  0.988253       426

