<a href="https://colab.research.google.com/github/ambwhl/datasci_223/blob/exercise-4/exercises/4-classification/exercise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classification on `emnist`

## 1. Create `Readme.md` to document your work

Explain your choices, process, and outcomes.

## 2. Classify all symbols

### Choose a model

Your choice of model! Choose wisely...

### Train away!

Is do you need to tune any parameters? Is the model expecting data in a different format?

### Evaluate the model

Evaluate the models on the test set, analyze the confusion matrix to see where the model performs well and where it struggles.

### Investigate subsets

On which classes does the model perform well? Poorly? Evaluate again, excluding easily confused symbols (such as 'O' and '0').

### Improve performance

Brainstorm for improving the performance. This could include trying different architectures, adding more layers, changing the loss function, or using data augmentation techniques.

## 2. Classify digits vs. letters model showdown

Perform a full showdown classifying digits vs letters:

1. Create a column for whether each row is a digit or a letter
2. Choose an evaluation metric
3. Choose several candidate models to train
4. Divide data to reserve a validation set that will NOT be used in training/testing
5. K-fold train/test
    1. Create train/test splits from the non-validation dataset
    2. Train each candidate model (best practice: use the same split for all models)
    3. Apply the model the the test split
    4. (*Optional*) Perform hyper-parametric search
    5. Record the model evaluation metrics
    6. Repeat with a new train/test split
6. Promote winner, apply model to validation set
7. (*Optional*) Perform hyper-parametric search, if applicable
8. Report model performance

In [21]:
# Uncomment and install below packages if not already installed
%pip install -q numpy pandas scikit-learn emnist matplotlib


In [22]:
%reset -f

In [23]:
# Import packages
import os
import string
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#import seaborn as sns
import emnist
from IPython.display import display, Markdown

# Random Forest
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score


In [24]:
##help function
def int_to_char(label):
    if label < 10:
        return str(label)
    elif label < 36:
        return chr(label - 10 + ord('A'))
    else:
        return chr(label - 36 + ord('a'))

##Display performance metrics and confusion matrix for a model.
def display_metrics(task, model_name, metrics_dict):
    metrics = metrics_dict[task][model_name]
    acc = metrics['accuracy']
    prec = metrics['precision']
    rec = metrics['recall']
    f1 = metrics['f1']
    cm = metrics['confusion_matrix']
    metrics_df = pd.DataFrame({
        'Accuracy': [acc],
        'Precision': [prec],
        'Recall': [rec],
        'F1 Score': [f1]
    })
    cm_df = pd.DataFrame(cm, index=['Actual Class {}'.format(i) for i in range(len(cm))],
                         columns=['Predicted Class {}'.format(i) for i in range(len(cm[0]))])

    # Display performance metrics and confusion matrix
    display(Markdown(f"### Performance Metrics for {model_name}"))
    display(metrics_df)
    display(Markdown(f"### Confusion Matrix for {model_name}"))
    display(cm_df)

metrics_dict = {}
metrics_dict['letter_vs_digit'] = {}
metrics_dict['all symbols'] = {}


In [25]:
# Load train data, 16 secs use google colab CPU
image, label = emnist.extract_training_samples('byclass')
train = pd.DataFrame()
train['image'] = list(image)
train['image_flat'] = train['image'].apply(lambda x: np.array(x).reshape(-1))
train['label'] = label

# Add a column with the character corresponding to the label
class_label = np.array([int_to_char(l) for l in label])
train['class'] = class_label
train = train[:1000]

# load test set
imaget, labelt = emnist.extract_test_samples('byclass')
class_labelt = np.array([int_to_char(l) for l in labelt])
valid = pd.DataFrame()
valid['image'] = list(imaget)
valid['image_flat'] = valid['image'].apply(lambda x: np.array(x).reshape(-1))
valid['label'] = labelt
valid['class'] = class_labelt
valid = valid[:1000]

In [26]:
#### Task1: Classify all symbols ####

# train in RandomForest，use T4 GPU provided by google colab
task = 'all symbols'
model_name = 'random_forest'
metrics_dict[task] = {model_name: {}}

# Initialize random forest classifier
rf_clf = RandomForestClassifier(n_estimators=50, random_state=42)##n_estimators more than 50 collapses

# Train model
rf_clf.fit(train['image_flat'].tolist(), train['label'])


In [27]:
##evaluate in test set
y_pred = rf_clf.predict(valid['image_flat'].tolist())

In [28]:
#analyze the confusion matrix
acc = accuracy_score(valid['label'], y_pred)
prec = precision_score(valid['label'], y_pred,average = 'weighted')
rec = recall_score(valid['label'], y_pred,average = 'weighted')
f1 = f1_score(valid['label'], y_pred,average = 'weighted')
cm = confusion_matrix(valid['label'], y_pred)

metrics_dict[task][model_name] = {'accuracy': acc,
                                  'precision': prec,
                                  'recall': rec,
                                  'f1': f1,
                                  'confusion_matrix': cm}

display_metrics(task, model_name, metrics_dict)

### Performance Metrics for random_forest

Unnamed: 0,Accuracy,Precision,Recall,F1 Score
0,0.549,0.506411,0.549,0.493616


### Confusion Matrix for random_forest

Unnamed: 0,Predicted Class 0,Predicted Class 1,Predicted Class 2,Predicted Class 3,Predicted Class 4,Predicted Class 5,Predicted Class 6,Predicted Class 7,Predicted Class 8,Predicted Class 9,...,Predicted Class 52,Predicted Class 53,Predicted Class 54,Predicted Class 55,Predicted Class 56,Predicted Class 57,Predicted Class 58,Predicted Class 59,Predicted Class 60,Predicted Class 61
Actual Class 0,26,0,0,0,0,0,0,0,2,0,...,0,0,0,0,0,0,0,0,0,0
Actual Class 1,0,40,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Actual Class 2,0,0,41,0,1,1,1,3,0,0,...,0,0,0,0,0,0,0,0,0,0
Actual Class 3,0,0,3,53,0,1,0,1,2,1,...,0,0,0,0,0,0,0,0,0,0
Actual Class 4,0,0,0,0,37,0,3,0,0,2,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Actual Class 57,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,3,0,0,0,0
Actual Class 58,0,1,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Actual Class 59,0,0,2,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Actual Class 60,0,0,0,0,3,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


In [29]:
# Subset `train` and `valid` to only include digits
symbols_list = ['0', '1',' 2', '3', '4', '5', '6', '7', '8', '9']

mask_train = train['class'].apply(lambda x: x in symbols_list)
train_01 = train[mask_train]
train_01.reset_index(drop=True, inplace=True)

mask_valid = valid['class'].apply(lambda x: x in symbols_list)
valid_01 = valid[mask_valid]
valid_01.reset_index(drop=True, inplace=True)

In [30]:
# train subsets in RandomForest，use CPU provided by google colab
rf_clf.fit(train_01['image_flat'].tolist(), train_01['label'])
y_pred = rf_clf.predict(valid_01['image_flat'].tolist())
acc = accuracy_score(valid_01['label'], y_pred)
prec = precision_score(valid_01['label'], y_pred,average = 'weighted')
rec = recall_score(valid_01['label'], y_pred,average = 'weighted')
f1 = f1_score(valid_01['label'], y_pred,average = 'weighted')
cm = confusion_matrix(valid_01['label'], y_pred)

metrics_dict[task][model_name] = {'accuracy': acc,
                                  'precision': prec,
                                  'recall': rec,
                                  'f1': f1,
                                  'confusion_matrix': cm}

display_metrics(task, model_name, metrics_dict)


### Performance Metrics for random_forest

Unnamed: 0,Accuracy,Precision,Recall,F1 Score
0,0.875576,0.878843,0.875576,0.875821


### Confusion Matrix for random_forest

Unnamed: 0,Predicted Class 0,Predicted Class 1,Predicted Class 2,Predicted Class 3,Predicted Class 4,Predicted Class 5,Predicted Class 6,Predicted Class 7,Predicted Class 8
Actual Class 0,40,0,0,0,1,0,0,2,1
Actual Class 1,0,44,0,0,1,0,1,0,0
Actual Class 2,0,0,51,0,2,1,4,3,1
Actual Class 3,0,0,0,39,0,3,0,0,0
Actual Class 4,0,1,2,0,34,0,0,1,0
Actual Class 5,0,0,0,0,3,39,0,0,0
Actual Class 6,0,0,0,1,1,0,51,3,0
Actual Class 7,0,0,7,2,0,0,1,47,3
Actual Class 8,0,0,0,1,2,0,6,0,35


In [31]:
##initial RandomForest with higher n_estimators
rf_clf_1 = RandomForestClassifier(n_estimators=100, random_state=42)##

##train and test new model in subsets, use CPU provided by google colab, 8 min 31sec
rf_clf_1.fit(train_01['image_flat'].tolist(), train_01['label'])
y_pred = rf_clf_1.predict(valid_01['image_flat'].tolist())
acc = accuracy_score(valid_01['label'], y_pred)
prec = precision_score(valid_01['label'], y_pred,average = 'weighted')
rec = recall_score(valid_01['label'], y_pred,average = 'weighted')
f1 = f1_score(valid_01['label'], y_pred,average = 'weighted')
cm = confusion_matrix(valid_01['label'], y_pred)

metrics_dict[task][model_name] = {'accuracy': acc,
                                  'precision': prec,
                                  'recall': rec,
                                  'f1': f1,
                                  'confusion_matrix': cm}

display_metrics(task, model_name, metrics_dict)

### Performance Metrics for random_forest

Unnamed: 0,Accuracy,Precision,Recall,F1 Score
0,0.887097,0.891604,0.887097,0.88746


### Confusion Matrix for random_forest

Unnamed: 0,Predicted Class 0,Predicted Class 1,Predicted Class 2,Predicted Class 3,Predicted Class 4,Predicted Class 5,Predicted Class 6,Predicted Class 7,Predicted Class 8
Actual Class 0,41,0,0,0,1,0,0,2,0
Actual Class 1,0,43,0,0,1,0,1,1,0
Actual Class 2,0,0,54,0,2,1,3,1,1
Actual Class 3,0,0,0,39,0,3,0,0,0
Actual Class 4,0,1,1,0,35,0,0,1,0
Actual Class 5,0,0,0,0,3,39,0,0,0
Actual Class 6,0,0,0,1,0,0,52,3,0
Actual Class 7,0,0,5,1,0,0,4,47,3
Actual Class 8,0,0,0,1,2,0,6,0,35


In [32]:
###Task2: Classify digits vs. letters model showdown

#1.Create a column for whether each row is a digit or a letter
train['is_letter'] = train['label'] >= 10
valid['is_letter'] = valid['label'] >= 10

# Display the first few rows of the dataset
display(train.head())



Unnamed: 0,image,image_flat,label,class,is_letter
0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",35,Z,True
1,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",36,a,True
2,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",6,6,False
3,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",3,3,False
4,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",22,M,True


In [33]:
## new display function
def display_metrics_bi(task, model_name, metrics_dict):
    """Display performance metrics and confusion matrix for a model."""
    metrics_df = pd.DataFrame()
    cm_df = pd.DataFrame()
    for key, value in metrics_dict[task][model_name].items():
        if type(value) == np.ndarray:
            cm_df = pd.DataFrame(value, index=['actual 0', 'actual 1'], columns=['predicted 0', 'predicted 1'])
        else:
            metrics_df[key] = [value]
    display(Markdown(f'# Performance Metrics: {model_name}'))
    display(metrics_df)
    display(Markdown(f'# Confusion Matrix: {model_name}'))
    display(cm_df)

In [37]:
#2.Choose an Letter vs Digit evaluation metric: Logistic Regression

task = 'letter_vs_digit'
model_name = 'logistic_regression'

# load Logistic Regression
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# Initialize logistic regression classifier
lr_clf = LogisticRegression(max_iter=1000, random_state=42)

# Scale the data, use CPU on Google colab, 11sec
# When running without scaling the data, the model does not converge
scaler = StandardScaler()
train_scaled = scaler.fit_transform(train['image_flat'].tolist())
valid_scaled = scaler.transform(valid['image_flat'].tolist())

# Train and evaluate model,use CPU on Google colab, 6min48sec
lr_clf.fit(train_scaled, train['is_letter'])
y_pred = lr_clf.predict(valid_scaled)

# Calculate performance metrics
acc = accuracy_score(valid['is_letter'], y_pred)
prec = precision_score(valid['is_letter'], y_pred)
rec = recall_score(valid['is_letter'], y_pred)
f1 = f1_score(valid['is_letter'], y_pred)
cm = confusion_matrix(valid['is_letter'], y_pred)

# Store performance metrics in dictionary
metrics_dict[task][model_name] = {'accuracy': acc,
                                  'precision': prec,
                                  'recall': rec,
                                  'f1': f1,
                                  'confusion_matrix': cm}

# Display performance metrics
display_metrics_bi(task, model_name, metrics_dict)


# Performance Metrics: logistic_regression

Unnamed: 0,accuracy,precision,recall,f1
0,0.675,0.705757,0.638996,0.670719


# Confusion Matrix: logistic_regression

Unnamed: 0,predicted 0,predicted 1
actual 0,344,138
actual 1,187,331


In [38]:
#3.Choose several candidate models:random forest
task = 'letter_vs_digit'
model_name = 'random forest'

# Initialize random forest classifier and train, use CPU on Google colab, 9min7sec
rf_clf_2 = RandomForestClassifier(n_estimators=50, random_state=42)
rf_clf_2.fit(train['image_flat'].tolist(), train['is_letter'])

##evaluate in test set
y_pred = rf_clf_2.predict(valid['image_flat'].tolist())

# Calculate performance metrics
acc = accuracy_score(valid['is_letter'], y_pred)
prec = precision_score(valid['is_letter'], y_pred)
rec = recall_score(valid['is_letter'], y_pred)
f1 = f1_score(valid['is_letter'], y_pred)
cm = confusion_matrix(valid['is_letter'], y_pred)

# Store performance metrics in dictionary
metrics_dict[task][model_name] = {'accuracy': acc,
                                  'precision': prec,
                                  'recall': rec,
                                  'f1': f1,
                                  'confusion_matrix': cm}

# Display performance metrics
display_metrics(task, model_name, metrics_dict)



### Performance Metrics for random forest

Unnamed: 0,Accuracy,Precision,Recall,F1 Score
0,0.78,0.819742,0.737452,0.776423


### Confusion Matrix for random forest

Unnamed: 0,Predicted Class 0,Predicted Class 1
Actual Class 0,398,84
Actual Class 1,136,382


In [39]:
#4.Divide data to reserve a validation set that will NOT be used in training/testing
byclass = pd.concat([train, valid], ignore_index=True)
byclass = byclass.sample(frac=1).reset_index(drop=True)
byclass['is_letter'] = train['label'] >= 10
valid_n = byclass[:500].reset_index(drop=True)
non_valid= byclass[500:2000].reset_index(drop=True)

In [40]:
#5.3 fold train/test, 48sec google colab CPU
import warnings
from sklearn.exceptions import UndefinedMetricWarning
warnings.filterwarnings("ignore", category=UndefinedMetricWarning)

rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
lr_clf = LogisticRegression(max_iter=1000, random_state=42)
scaler = StandardScaler()


##split 3-fold data
metricsrecord = []
train_n1 = non_valid[500:1500].reset_index(drop=True)
test_n1 = non_valid[:500].reset_index(drop=True)

train_n2 = non_valid[:1000].reset_index(drop=True)
test_n2 = non_valid[1000:1500].reset_index(drop=True)

train_n3 = pd.concat([non_valid.head(500), non_valid.tail(500)], ignore_index=True)
test_n3 = non_valid[500:1000].reset_index(drop=True)

###round ONE
task = "round 1："
##random forest
model_name = 'random_forest'
rf_clf.fit(train_n1['image_flat'].tolist(), train_n1['label'])
y_pred = rf_clf.predict(test_n1['image_flat'].tolist())

acc = accuracy_score(test_n1['label'], y_pred)
prec = precision_score(test_n1['label'], y_pred,average = 'macro')
rec = recall_score(test_n1['label'], y_pred,average = 'macro')
f1 = f1_score(test_n1['label'], y_pred,average = 'macro')
cm = confusion_matrix(test_n1['label'], y_pred)

metricsrecord.append(task)
metricsrecord.append(model_name)
metricsrecord.append(acc)

## ##logistic model
model_name = 'logistic_regression'
train_scaled1 = scaler.fit_transform(train_n1['image_flat'].tolist())
test_scaled1 = scaler.transform(test_n1['image_flat'].tolist())

lr_clf.fit(train_scaled1, train_n1['label'])
y_pred = lr_clf.predict(test_scaled1)

acc = accuracy_score(test_n1['label'], y_pred)
prec = precision_score(test_n1['label'], y_pred,average = 'macro')
rec = recall_score(test_n1['label'], y_pred,average = 'macro')
f1 = f1_score(test_n1['label'], y_pred,average = 'macro')
cm = confusion_matrix(test_n1['label'], y_pred)

metricsrecord.append(model_name)
metricsrecord.append(acc)
#print(metricsrecord)

###round TWO
task = "round 2："
##random forest
model_name = 'random_forest'
rf_clf.fit(train_n2['image_flat'].tolist(), train_n2['label'])
y_pred = rf_clf.predict(test_n2['image_flat'].tolist())

acc = accuracy_score(test_n2['label'], y_pred)
prec = precision_score(test_n2['label'], y_pred,average = 'macro')
rec = recall_score(test_n2['label'], y_pred,average = 'macro')
f1 = f1_score(test_n2['label'], y_pred,average = 'macro')
cm = confusion_matrix(test_n2['label'], y_pred)

metricsrecord.append(task)
metricsrecord.append(model_name)
metricsrecord.append(acc)

## ##logistic model
model_name = 'logistic_regression'
train_scaled2 = scaler.fit_transform(train_n2['image_flat'].tolist())
test_scaled2 = scaler.transform(test_n2['image_flat'].tolist())

lr_clf.fit(train_scaled2, train_n2['label'])
y_pred = lr_clf.predict(test_scaled2)

acc = accuracy_score(test_n2['label'], y_pred)
prec = precision_score(test_n2['label'], y_pred,average = 'macro')
rec = recall_score(test_n2['label'], y_pred,average = 'macro')
f1 = f1_score(test_n2['label'], y_pred,average = 'macro')
cm = confusion_matrix(test_n2['label'], y_pred)

metricsrecord.append(model_name)
metricsrecord.append(acc)

###round Three
task = "round 3："
##random forest
model_name = 'random_forest'
rf_clf.fit(train_n2['image_flat'].tolist(), train_n2['label'])
y_pred = rf_clf.predict(test_n2['image_flat'].tolist())

acc = accuracy_score(test_n2['label'], y_pred)
prec = precision_score(test_n2['label'], y_pred,average = 'macro')
rec = recall_score(test_n2['label'], y_pred,average = 'macro')
f1 = f1_score(test_n2['label'], y_pred,average = 'macro')
cm = confusion_matrix(test_n2['label'], y_pred)

metricsrecord.append(task)
metricsrecord.append(model_name)
metricsrecord.append(acc)

## ##logistic model
model_name = 'logistic_regression'
train_scaled2 = scaler.fit_transform(train_n2['image_flat'].tolist())
test_scaled2 = scaler.transform(test_n2['image_flat'].tolist())

lr_clf.fit(train_scaled2, train_n2['label'])
y_pred = lr_clf.predict(test_scaled2)

acc = accuracy_score(test_n2['label'], y_pred)
prec = precision_score(test_n2['label'], y_pred,average = 'macro')
rec = recall_score(test_n2['label'], y_pred,average = 'macro')
f1 = f1_score(test_n2['label'], y_pred,average = 'macro')
cm = confusion_matrix(test_n2['label'], y_pred)

metricsrecord.append(model_name)
metricsrecord.append(acc)

print("\n".join(map(str, metricsrecord)))



round 1：
random_forest
0.546
logistic_regression
0.52
round 2：
random_forest
0.564
logistic_regression
0.544
round 3：
random_forest
0.564
logistic_regression
0.544


In [41]:
#6.Promote winner, apply model to validation set

#Based on the accuracies above, I would choose random forest model
task = 'all symbols'
model_name = 'random_forest'

y_pred = rf_clf.predict(valid_n['image_flat'].tolist())
acc = accuracy_score(valid_n['label'], y_pred)
prec = precision_score(valid_n['label'], y_pred,average = 'macro')
rec = recall_score(valid_n['label'], y_pred,average = 'macro')
f1 = f1_score(valid_n['label'], y_pred,average = 'macro')
cm = confusion_matrix(valid_n['label'], y_pred)

metrics_dict[task][model_name] = {'accuracy': acc,
                                  'precision': prec,
                                  'recall': rec,
                                  'f1': f1,
                                  'confusion_matrix': cm}

display_metrics(task, model_name, metrics_dict)

#8.Report model performance
##The performance metrics of the random forest classifier suggest that the model is struggling,
##likely due to the small size of the dataset and the possible absence of certain labels from the training set.


### Performance Metrics for random_forest

Unnamed: 0,Accuracy,Precision,Recall,F1 Score
0,0.596,0.394614,0.34944,0.340807


### Confusion Matrix for random_forest

Unnamed: 0,Predicted Class 0,Predicted Class 1,Predicted Class 2,Predicted Class 3,Predicted Class 4,Predicted Class 5,Predicted Class 6,Predicted Class 7,Predicted Class 8,Predicted Class 9,...,Predicted Class 48,Predicted Class 49,Predicted Class 50,Predicted Class 51,Predicted Class 52,Predicted Class 53,Predicted Class 54,Predicted Class 55,Predicted Class 56,Predicted Class 57
Actual Class 0,12,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
Actual Class 1,0,15,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Actual Class 2,0,0,26,0,0,0,0,3,0,0,...,0,0,0,0,0,0,0,0,0,0
Actual Class 3,0,0,2,24,0,1,0,2,2,0,...,0,0,0,0,0,0,0,0,0,0
Actual Class 4,0,0,0,0,17,0,0,0,0,1,...,0,0,0,0,0,1,0,0,0,0
Actual Class 5,0,0,1,1,0,16,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
Actual Class 6,0,0,0,0,1,0,19,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Actual Class 7,0,0,0,0,0,0,0,28,0,0,...,0,0,0,0,0,1,0,0,0,0
Actual Class 8,0,0,0,2,0,0,0,0,23,1,...,0,0,0,0,0,0,0,0,0,0
Actual Class 9,0,1,0,0,1,1,0,1,0,17,...,0,0,0,0,0,0,0,0,0,0
