# Classification Example

## Introduction

In this notebook, we'll walk-through a detailed example of how you can use Velour to evaluate classifications. The concepts explored here can be adjusted to fit most supervised classification problems.

For a conceptual introduction to Velour, [check out our project overview](https://striveworks.github.io/velour/). For a higher-level example notebook, [check out our "Getting Started" notebook](https://github.com/Striveworks/velour/blob/main/examples/getting_started.ipynb).

In [6]:
%pip install pandas

import copy
import pandas as pd
from typing import List
from random import random
from tqdm import tqdm

from velour import (
    Client,
    Dataset,
    Model,
    Datum,
    Annotation,
    GroundTruth,
    Prediction,
    Label,
)
from velour.enums import TaskType

client = Client("http://localhost:8000")

Looking in indexes: https://pypi.org/simple, https://aws:****@striveworks-724664234782.d.codeartifact.us-east-1.amazonaws.com/pypi/striveworks/simple




Note: you may need to restart the kernel to use updated packages.
Successfully connected to host at http://localhost:8000/


## Build the Toy Dataset

We start by defining the possible label sets that can be associated with a classifcation.

In [7]:
dog_labels = [
    Label(key="class", value="dog"),
    Label(key="superclass", value="animal")
]

cat_labels = [
    Label(key="class", value="cat"),
    Label(key="superclass", value="animal")
]

car_labels = [
    Label(key="class", value="car"),
    Label(key="superclass", value="vehicle")
]

truck_labels = [
    Label(key="class", value="truck"),
    Label(key="superclass", value="vehicle")
]

labels = [dog_labels, cat_labels, car_labels, truck_labels]

Now lets define some functions to get labels and add scoring for our predictions.

In [8]:
def get_labels(i: int) -> List[Label]:
    global labels
    return copy.deepcopy(labels[i % 4])


def get_scored_labels_perfect_model(i: int) -> List[Label]:
    global labels
    label_list = []
    for idx, group in enumerate(labels):
        for label_ in group:
            label = copy.deepcopy(label_)
            if i % 4 == idx:
                label.score = 0.97
            else:
                label.score = 0.01
            label_list.append(label)
    return label_list


def get_scored_labels_random_model() -> List[Label]:
    return get_scored_labels_perfect_model(int(random() * 100))

Finally, lets define some functions that generate both groundtruth and prediction annotations for `classification`.

In [9]:
def generate_groundtruth_annotations(i: int) -> List[Annotation]:
    return [
        Annotation(
            task_type=TaskType.CLASSIFICATION,
            labels=get_labels(i),
        )
    ]

def generate_correct_prediction_annotations(i: int) -> List[Annotation]:
    return [
        Annotation(
            task_type=TaskType.CLASSIFICATION,
            labels=get_scored_labels_perfect_model(i),
        )
    ]

def generate_random_prediction_annotations() -> List[Annotation]:
    return [
        Annotation(
            task_type=TaskType.CLASSIFICATION,
            labels=get_scored_labels_random_model(),
        )
    ]

## Create Dataset and Models

To showcase `evaluate_classification`, let's define a model that produces "perfect" predictions and compare its results to a model that produces randomized predictions.

In [10]:
n_samples = 100

dataset = Dataset(client, "my_dataset", delete_if_exists=True)
perfect_model = Model(client, "perfect_model", delete_if_exists=True)
random_model = Model(client, "random_model", delete_if_exists=True)

for i in tqdm(range(n_samples)):
    
    # create datum
    datum = Datum(uid=f"uid{i}", dataset=dataset)

    # create groundtruth
    groundtruth = GroundTruth(
        datum=datum,
        annotations=generate_groundtruth_annotations(i),
    )

    # create "perfect" predictions
    perfect_prediction = Prediction(
        datum=datum,
        annotations=generate_correct_prediction_annotations(i),
    )
    
    # create randomized predictions
    random_prediction = Prediction(
        datum=datum,
        annotations=generate_random_prediction_annotations(),
    )

    # upload groundtruths and predictions 
    dataset.add_groundtruth(groundtruth)
    perfect_model.add_prediction(dataset, perfect_prediction)
    random_model.add_prediction(dataset, random_prediction)

# finalize
dataset.finalize()
perfect_model.finalize_inferences(dataset)
random_model.finalize_inferences(dataset)
    

100%|██████████| 100/100 [00:12<00:00,  7.96it/s]


## Perform the Evaluations

With everything defined, we're ready to evaluate our performance and display the results. Note that we use the `wait_for_completion` method since all evaluations run as background tasks; this method ensures that the evaluation finishes before we display the results.

In [11]:
eval_perfect = perfect_model.evaluate_classification(dataset)
eval_perfect.wait_for_completion()

eval_random = random_model.evaluate_classification(dataset)
eval_random.wait_for_completion()

## Compare Results

Lets export the results into `pandas.DataFrame` objects and concatenate the tables for better viewing.

In [12]:
perfect_results = eval_perfect.results().to_dataframe(("generator","perfect"))
random_results = eval_random.results().to_dataframe(("generator", "random"))
pd.concat([perfect_results, random_results], axis=1, names=["bbox", "raster"])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,value,value
Unnamed: 0_level_1,Unnamed: 1_level_1,generator,perfect,random
type,parameters,label,Unnamed: 3_level_2,Unnamed: 4_level_2
Accuracy,"{""label_key"": ""class""}",,1.0,0.23
Accuracy,"{""label_key"": ""superclass""}",,1.0,0.47
F1,"""n/a""",class: car,1.0,0.26087
F1,"""n/a""",class: cat,1.0,0.192308
F1,"""n/a""",class: dog,1.0,0.186047
F1,"""n/a""",class: truck,1.0,0.271186
F1,"""n/a""",superclass: animal,1.0,0.442105
F1,"""n/a""",superclass: vehicle,1.0,0.495238
Precision,"""n/a""",class: car,1.0,0.285714
Precision,"""n/a""",class: cat,1.0,0.185185
