# N-ACT Tutorial: Evaluating NACT's Supervised Automatic Cell Type Identifcation (ACTI)

In this notebook, we will go over how to load a pre-trained NACT model for classifying pre-labeled cell types.

In [2]:
try:
    from nbproject import header
    header(filepath="/home/aheydari/SindiLabTutorials/N-ACT/Supervised ACTI/"
           "NACT_tutorial_supervisedACTI_GSE154567.ipynb")

except ModuleNotFoundError:
    print("Please install nbproject (pip install nbproject) to see header"
         "dependencies.")

0,1
id,OU1PGCOxUshr
version,0
time_init,2023-01-27 00:13
time_run,2023-01-27 00:13
pypackage,nact==0.1.0 nbproject==0.8.1 numpy==1.23.5 pandas==1.5.2 scanpy==1.9.1 torch==1.13.1


In [2]:
import nact
from nact.utilities import *
from nact import scanpy_to_dataloader
from nact import AttentionQuery
import numpy as np
import os
import pandas as pd
import scanpy as sc
import torch

In [3]:
print(f"NACT Version: {nact.__version__}")
print(f"Scanpy Version: {sc.__version__}")

NACT Version: 0.1.0
Scanpy Version: 1.9.1


In [4]:
%load_ext autoreload
%autoreload 2

## Setting Up Result Folder and Data Paths

In [5]:
abs_path = "/home/aheydari/"
local_path = "data/NACT_Data/Supervised Benchmarking/"

In [6]:
# label for the dataset folder we want to make
dataset_name = "GSE154567"

## Load in pre-trained NACT model

Since our implementation is in pytorch, we can use the `load` funtion that pytorch provides. Our model is stored as a dict, with `epoch` corresponding to the current epoch, and `Saved_Model` corresponding to the model.

## Load in the data that we want

For example, here we will load in cluster 1 

In [7]:
path_to_data = (f"{abs_path}{local_path}{dataset_name}"
                "_qc_hvg_anno_5k_raw_train_split.h5ad")

In [8]:
train_data_loader, test_data_loader = scanpy_to_dataloader(path_to_data,
                                                test_no_valid = True, 
                                                verbose = False, 
                                                raw_x = True)

==> Reading in Scanpy/Seurat AnnData
    -> Trying adata.raw.X instead of adata.X!
    -> Splitting Train and Validation Data
==> Using cluster info for generating train and validation labels
==> Checking if we have sparse matrix into dense


In [9]:
model_dict = torch.load(f"{abs_path}data/NACT_Trained_Models/"
                              "NACT_Jan2023Benchmarks/NACT-Pojections+Attention"
                              f"-{dataset_name}.pth",
                              map_location=torch.device('cpu'))

trained_nact_model = model_dict["Saved_Model"]
trained_nact_model.eval()

NACTProjectionAttention(
  (masking_layer): Identity()
  (attention_module): Linear(in_features=5000, out_features=5000, bias=True)
  (projection_block1): Projection(
    (projection): Linear(in_features=5000, out_features=5000, bias=True)
    (output_dropout): Dropout(p=0.0, inplace=False)
    (normalization): LayerNorm((5000,), eps=1e-05, elementwise_affine=True)
  )
  (projection_block2): Projection(
    (projection): Linear(in_features=5000, out_features=5000, bias=True)
    (output_dropout): Dropout(p=0.0, inplace=False)
    (normalization): LayerNorm((5000,), eps=1e-05, elementwise_affine=True)
  )
  (pwff): PointWiseFeedForward(
    (first_layer): Sequential(
      (0): Linear(in_features=5000, out_features=128, bias=True)
      (1): ReLU()
    )
    (second_layer): Linear(in_features=128, out_features=5000, bias=True)
    (normalization): LayerNorm((5000,), eps=1e-05, elementwise_affine=True)
  )
  (task_module): Sequential(
    (0): Linear(in_features=5000, out_features=9, bia

In [10]:
if torch.cuda.is_available():
        device = "cuda";
        print('==> Using GPU (CUDA)')
        
elif(torch.backends.mps.is_available()):
    device = torch.device("mps");
    print('==> Using M1 GPUs')

else:
    device = "cpu"
    print('==> Using CPU')
    print('    -> Warning: Using CPUs will yield to slower training time than GPUs')
    
trained_nact_model.device = device

==> Using GPU (CUDA)


### Checking the Accuracy of ACTI on Test Set

In [11]:
# This utility function outputs arrays related to classification, but since we
# do not need them we ignore them (i.e. use `_` as the variable)

_, _ , _ , _ ,  _ = evaluate_classifier(test_data_loader, 
                    trained_nact_model, 
                    classification_report=True,
                    device=trained_nact_model.device)

==> Evaluating on Validation Set:
    -> Accuracy of classifier network on validation set:92.6698 %
    -> Non-Weighted F1 Score on validation set: 0.8986
    -> Weighted F1 Score on validation set: 0.9265
              precision    recall  f1-score   support

         0.0       0.97      0.98      0.98      1862
         1.0       0.94      0.95      0.95      5103
         2.0       0.86      0.88      0.87      1812
         3.0       0.93      0.87      0.90       800
         4.0       0.94      0.91      0.93      1096
         5.0       0.97      0.97      0.97      1238
         6.0       0.93      0.89      0.91        63
         7.0       0.89      0.70      0.78        79
         8.0       0.82      0.80      0.81       907

    accuracy                           0.93     12960
   macro avg       0.92      0.88      0.90     12960
weighted avg       0.93      0.93      0.93     12960

