# The Sparse Tensor Classifier Hyperparameters

In this tutorial you'll learn how to configure the `SparseTensorClassifier` hyperparameters.


## Colab 

This tutorial and the rest in [this sequence](https://github.com/SparseTensorClassifier/tutorial) can be done in Google colab. If you'd like to open this notebook in colab, click [here](https://colab.research.google.com/github/SparseTensorClassifier/tutorial/blob/main/Quickstart_Hyperparameters.ipynb).

![](https://colab.research.google.com/assets/colab-badge.svg)

## Setup

Uncomment and run the following cell to install the packages. Then, import the modules.

In [1]:
# !pip install stc pandas scikit-learn

In [2]:
import numpy as np
import pandas as pd
from stc import SparseTensorClassifier
from sklearn.metrics import accuracy_score

np.random.seed(0)

## Read the dataset

The dataset consists of 101 animals from a zoo. There are 16 variables with various traits to describe the animals. The 7 Class Types are: Mammal, Bird, Reptile, Fish, Amphibian, Bug and Invertebrate. Let's read and shuffle the data.

In [3]:
zoo = pd.read_csv('./data/zoo/zoo.csv')
zoo = zoo.sample(frac=1, random_state=42)
zoo

Unnamed: 0,animal_name,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize,class_type
84,squirrel,1,0,0,1,0,0,0,1,1,1,0,0,2,1,0,0,Mammal
55,oryx,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,Mammal
66,porpoise,0,0,0,1,0,1,1,1,1,1,0,1,0,1,0,1,Mammal
67,puma,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,Mammal
45,lion,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,Mammal
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60,pike,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,Fish
71,rhea,0,1,1,0,0,0,1,0,1,1,0,0,2,1,0,1,Bird
14,crab,0,0,1,0,0,1,1,0,0,0,0,0,4,0,0,0,Invertebrate
92,tuna,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,Fish


## Add Noise

Let's convert the data to JSON and add noise to better understand the impact of the STC hyperparameters. Each animal is now represented with a set of categorical features such as `hair=1` (has hair), `eggs=0` (does not lay eggs), ..., as well 1000 random values to confound the classifier. 

In [4]:
items = []
for i, (_, row) in enumerate(zoo.iterrows()):
    item = {}
    item['features'] = [f"{f}={str(row[f])}" for f in zoo.columns[1:] if f not in ['class_type']] 
    item['features'] += list(np.random.binomial(10, 0.5, 1000))
    item['class_type'] = [row['class_type']]
    items.append(item)

print(items[0])

{'features': ['hair=1', 'feathers=0', 'eggs=0', 'milk=1', 'airborne=0', 'aquatic=0', 'predator=0', 'toothed=1', 'backbone=1', 'breathes=1', 'venomous=0', 'fins=0', 'legs=2', 'tail=1', 'domestic=0', 'catsize=0', 5, 6, 5, 5, 5, 6, 5, 7, 8, 5, 6, 5, 5, 7, 3, 3, 2, 7, 6, 7, 8, 6, 5, 6, 3, 6, 3, 7, 5, 5, 4, 6, 5, 5, 2, 5, 5, 5, 7, 6, 4, 5, 6, 3, 6, 6, 4, 3, 4, 4, 5, 5, 8, 3, 4, 3, 6, 4, 5, 4, 3, 3, 6, 3, 4, 4, 6, 3, 7, 3, 8, 5, 8, 5, 6, 2, 4, 3, 4, 3, 4, 5, 3, 6, 5, 4, 5, 3, 5, 7, 4, 6, 3, 6, 4, 4, 5, 2, 7, 1, 6, 4, 6, 8, 4, 5, 5, 5, 4, 8, 5, 7, 6, 4, 6, 5, 7, 5, 7, 6, 6, 5, 8, 6, 5, 5, 2, 4, 6, 4, 5, 5, 3, 4, 5, 5, 5, 6, 6, 5, 7, 4, 5, 7, 6, 6, 3, 7, 6, 9, 3, 7, 3, 5, 3, 7, 6, 5, 5, 3, 6, 5, 6, 7, 8, 7, 2, 4, 6, 3, 5, 2, 4, 2, 6, 4, 4, 7, 6, 2, 3, 5, 5, 4, 7, 5, 5, 5, 6, 4, 5, 4, 4, 7, 6, 5, 4, 4, 3, 5, 4, 6, 5, 4, 2, 3, 6, 5, 5, 7, 9, 4, 6, 4, 2, 6, 4, 5, 5, 7, 6, 7, 4, 6, 4, 8, 6, 4, 8, 6, 4, 4, 5, 2, 4, 5, 4, 5, 4, 5, 7, 3, 5, 3, 6, 5, 5, 4, 3, 5, 4, 7, 6, 6, 7, 3, 5, 5, 8, 4, 4, 3, 2, 

## Initialize Sparse Tensor Classifier

Let's instruct STC to predict `class_type` based on the animal's `features`.

In [5]:
STC = SparseTensorClassifier(targets=['class_type'], features=['features'])

## Fit the training data

In [6]:
STC.fit(items[0:70])



## Predict the test data

Sparse Tensor Classifier can be tuned with the following hyperparameters:
- `balance` $b\geq0$: STC deals with imbalanced data by artificially balancing the sample when setting $b=1$. For $0<b<1$ the sample is semi-balanced, increasing the weight of the less frequent classes but not enough to have a balanced sample. For $b>1$ the sample is super-balanced, where the weight of the less frequent classes is greater than the weight of the most frequent classes.
- `entropy` $h\geq0$: The entropic weights are set with $h=1$, dropped with $h=0$, and their intensity can be tuned more in general with $h\geq 0$. Higher values of the entropy $h$ lead to predictions based on less but more relevant features, thus more robust to noise.
- `power` $p>0$: The power $p$ controls the probability amplitude. Smaller values of $p$ give similar weight to all the features regardless of their distribution. 

The hyperparameters can be set on-the-fly at prediction time, as the fitting is independent from these hyperparameters. Below some special cases are illustrated, but a standard cross-validation strategy to learn the optimal hyperparameters is also possible.

**The particular choice**
```py 
{"entropy": 0, "balance": 0, "power": 1}
```
corresponds to using classical probability and classifying with Bayes' rule.

In [7]:
STC.set({"entropy": 0, "balance": 0, "power": 1})
labels, probability, explain = STC.predict(items[70:])
accuracy_score(zoo['class_type'][70:], labels)



0.25806451612903225

**The particular choice**
```py 
{"entropy": 1, "balance": 0, "power": 1}
```
corresponds to using a robust version of classical probability and classifying with Bayes' rule.

In [8]:
STC.set({"entropy": 1, "balance": 0, "power": 1})
labels, probability, explain = STC.predict(items[70:])
accuracy_score(zoo['class_type'][70:], labels)



0.9032258064516129

**The particular choice**
```py 
{"entropy": 0, "balance": 1, "power": 0.5}
```
corresponds to using quantum probability and classifying with Born's key postulate of quantum mechanics.

In [9]:
STC.set({"entropy": 0, "balance": 1, "power": 0.5})
labels, probability, explain = STC.predict(items[70:])
accuracy_score(zoo['class_type'][70:], labels)



0.9354838709677419

**The particular choice**
```py 
{"entropy": 1, "balance": 1, "power": 0.5}
```
corresponds to using a robust version of quantum probability and classifying with Born's key postulate of quantum mechanics (**default configuration**).

In [10]:
STC.set({"entropy": 1, "balance": 1, "power": 0.5})
labels, probability, explain = STC.predict(items[70:])
accuracy_score(zoo['class_type'][70:], labels)



0.967741935483871

# Congratulations! 

Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with Sparse Tensor Classifier, we encourage you to finish the rest of the tutorials in [this series](https://github.com/SparseTensorClassifier/tutorial). Don't forget to [star the repository](https://github.com/SparseTensorClassifier/stc)! 

![GitHub Repo stars](https://img.shields.io/github/stars/SparseTensorClassifier/stc?style=social)

<div>
    Thanks by <a href="https://sparsetensorclassifier.org">https://sparsetensorclassifier.org</a>  
    <span style="float:right">
        Questions? Open an <a href="https://github.com/SparseTensorClassifier/tutorial/issues">issue</a>
    </span> 
</div>