# Tutorial

This is a guide on how to use the functions in this repository. Be aware, I assume here that I am inside the folder Hierarchicl_reject of the repository. If you download this folder and use the functions outside of this folder, you will have to adapt the function paths when you import them.

For the following analyses, we will assume that the AMB dataset is downloaded from [https://zenodo.org/records/3357167] and that the paths to the data files are specified down below.

In [None]:
AMBPath = ...
LabelsAMBPath = ...

## Preprocessing
Preprocessing can be performed easily just by using the correct preprocessing function

In [None]:
from Preprocessing.Preprocessing_AMB import Preprocessing_AMB
Data, Labels = Preprocessing_AMB(AMBPath, LabelsAMBPath)

## Flat annotation and evaluation
The AMB Data is not present in a sparse format (this is only the case for the Azimuth PBMC dataset) and thus the non-sparse functions can be used. Let's say we want to run flat annotation with 5-fold cross-validation and HVG selection. Together with the Logistic regression classifier of sklearn and we also want to tune the regularization strength parameter of the Logistic Regression classifier ('C')

In [None]:
from Run.General_analyses import Run_Flat_KF, SaveResultsKF
from sklearn.linear_model import LogisticRegression

## Define the classifier and parameters
clf = LogisticRegression(penalty = 'l2', multi_class = 'multinomial', n_jobs = 1)
params = {'C': [1,100,10000], 'top_genes' : [10 000, 30 000, 50 000]}

## Run the analyses
Predictions, Probs, Act, Acc, Bestparams, Classifiers, Xtest, ytests =Run_Flat_KF(clf, 5, Data, Labels, params, Norm = True, HVG = True, save_clf = True)

## (Optional) save the results
dir_ = ...
name = ...

SaveResultsKF(Predictions, Act, Acc, Bestparams, dir_, name)

Based on these results the accuracy score or balanced accuracy score or other metrics can be calculated. To construct Accuracy-rejection curves, the Evaluation_AR_Flat function can be used. The AMB label hierarchy is balanced (unlike all the other datasets).

In [None]:
from Evaluation.Functions_Accuracy_Reject import Evaluate_AR_Flat

results = Evaluate_AR_Flat(Classifiers, Xtest, ytests, Predictions, Probs, b = True, scores = False)


In [None]:
import matplotlib.pyplot as plt

# accuracy rejection curves
plt.plot(results['steps'], results['acc'])

# rejection percentage curves
plt.plot(results['steps'], results['perc'])

## Hierarchical annotation and evaluation

In [None]:
from Run.General_analyses import Run_H_KF, SaveResultsKF
from Evaluation.Functions_Accuracy_Reject import Evaluate_AR
import matplotlib.pyplot as plt

## Define the classifier and parameters
clf = LogisticRegression(penalty = 'l2', multi_class = 'multinomial', n_jobs = 1)
params = {'C': [1,100,10000], 'top_genes' : [10 000, 30 000, 50 000]}

## Run the analyses
Predictions, Probs, Act, Acc, Bestparams, Classifiers, Xtests, ytests =Run_H_KF(clf, 5, Data, Labels, params, 1, None, greedy_ = False, Norm = True, HVG = True, save_clf = True)
# Note: for the number of cores, be careful as n_jobs (classifier) * n_jobsHCL can be used
# If you don't want to make accuracy rejection curves, but just perform partial rejection directly, modify reject_thresh. 
# Full rejection can easily be applied through simple thresholding aftwards, based on the entire label

## (Optional) save the results
dir_ = ...
name = ...

SaveResultsKF(Predictions, Act, Acc, Bestparams, dir_, name)

results = Evaluate_AR(Classifiers, Xtests, ytests, Predictions, greedy = False)

# accuracy rejection curves
plt.plot(results['steps'], results['acc'])

# rejection percentage curves
plt.plot(results['steps'], results['perc'])