# Trial Summary 

Data preparation:
- Filter genes with less than 10 non-zero expressions spots
- Apply log transformation on the expressions

Model:
- Auto Encoders - encoding genes
- RMSE training loss excluding zeros

Results:
- Valid RMSE: 
- Test RMSE:

# Imports

In [14]:
from os import path, listdir
from copy import deepcopy
import stlearn as st
import numpy as np
import pandas as pd
import seaborn as sns
import torch
import torch.optim as optim
import matplotlib.pyplot as plt

%load_ext autoreload
%autoreload 2

import trainer_ae as trainer
import data_ae as get_data
from models import get_model
import tester_ae as tester
from loss import *
from results_analysis import *

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [2]:
plt.rcParams.update({'font.size': 12})
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)

Using device: cuda


# Load Data 

In [3]:
min_counts = 500
min_cells = 177
apply_log = True
batch_size = 128

In [4]:
dl_train, dl_valid, dl_test, _ = get_data.main(
    min_counts=min_counts,
    min_cells=min_cells,
    apply_log=apply_log, 
    batch_size=batch_size, 
    device=device
)

  utils.warn_names_duplicates("var")


# spots: 1185 | # genes: 32285
New shape after filtering: (1185, 6279)
Log transformation step is finished in adata.X
Data shape: (7440615, 3)
Number of genes: 6279
Number of spots: 1185
Train shape:(7440615, 3)
Valid shape:(7440615, 4)
Test shape:(7440615, 4)
Finish loading the data


# Modelling

## Set HyperParameters

In [7]:
model_name = 'AE'
max_epochs = 150
early_stopping = 5
model_params = {
    'learning_rate': 0.1,
    'optimizer': "SGD",
    'latent_dim': 40,
    'batch_size': batch_size
}

## Build Model 

In [8]:
model = get_model(model_name, model_params, dl_train)
optimizer = getattr(optim, model_params['optimizer'])(model.parameters(), lr=model_params['learning_rate'])
criterion = NON_ZERO_RMSELoss_AE()

## Train Model 

In [10]:
model, valid_loss = trainer.train(
    model=model,
    optimizer=optimizer,
    criterion=criterion,
    max_epochs=max_epochs,
    early_stopping=early_stopping,
    dl_train=dl_train,
    dl_test=dl_valid, 
    device=device
)

2022-09-27 14:21:38.625887: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-27 14:21:38.849034: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-27 14:21:38.849109: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-09-27 14:21:38.913623: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-09-27 14:21:40.329904: W tensorflow/stream_executor/platform/de

Training Results - Epoch[1] Avg loss: 0.99
Validation Results - Epoch[1] Avg loss: 0.99
Training Results - Epoch[2] Avg loss: 0.96
Validation Results - Epoch[2] Avg loss: 0.97
Training Results - Epoch[3] Avg loss: 0.94
Validation Results - Epoch[3] Avg loss: 0.94
Training Results - Epoch[4] Avg loss: 0.92
Validation Results - Epoch[4] Avg loss: 0.93
Training Results - Epoch[5] Avg loss: 0.91
Validation Results - Epoch[5] Avg loss: 0.91
Training Results - Epoch[6] Avg loss: 0.88
Validation Results - Epoch[6] Avg loss: 0.89
Training Results - Epoch[7] Avg loss: 0.87
Validation Results - Epoch[7] Avg loss: 0.87
Training Results - Epoch[8] Avg loss: 0.86
Validation Results - Epoch[8] Avg loss: 0.86
Training Results - Epoch[9] Avg loss: 0.84
Validation Results - Epoch[9] Avg loss: 0.85
Training Results - Epoch[10] Avg loss: 0.83
Validation Results - Epoch[10] Avg loss: 0.84
Training Results - Epoch[11] Avg loss: 0.82
Validation Results - Epoch[11] Avg loss: 0.82
Training Results - Epoch[12]

2022-09-27 14:22:08,552 ignite.handlers.early_stopping.EarlyStopping INFO: EarlyStopping: Stop training


Training Results - Epoch[39] Avg loss: 0.73
Validation Results - Epoch[39] Avg loss: 0.74


In [12]:
train_res = 0.73
valid_res = 0.74
print(f'Train final results (after log transform) = {train_res}')
print(f'Train final results = {np.exp(train_res)}')
print(f'Valid final results (after log transform) = {valid_res}')
print(f'Valid final results = {np.exp(valid_res)}')

Train final results (after log transform) = 0.73
Train final results = 2.0750806076741224
Valid final results (after log transform) = 0.74
Valid final results = 2.0959355144943643


## Test 

In [17]:
test_loss, df_test_preds = tester.test(
    model=model,
    criterion=criterion,
    dl_test=dl_test,
    device=device
)
print(f'Test loss = {test_loss}')
print(f'Test loss after exponent = {torch.exp(test_loss)}')

Test loss = 0.0058470191434025764
Test loss after exponent = 1.005864143371582


# Results Analysis 

In [24]:
df_test = pd.DataFrame(dl_test.dataset.data.to('cpu'))
df_test.head(1)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1175,1176,1177,1178,1179,1180,1181,1182,1183,1184
0,0.0,0.693147,0.0,0.693147,0.693147,0.0,0.0,0.693147,0.0,0.0,...,0.0,1.098612,0.693147,0.0,0.0,0.0,0.693147,0.0,0.0,0.693147


In [19]:
df_test_preds.head(1)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1175,1176,1177,1178,1179,1180,1181,1182,1183,1184
0,0.59935,0.974985,0.978322,0.97757,0.974359,0.741697,0.941436,0.860837,0.922654,0.94961,...,0.895405,0.96693,0.94286,0.723971,0.903497,0.76264,0.881921,0.895229,0.908059,0.934601


## Errors Distribution 

## Spots Errors Distribution 

## Genes Errors Distribution 

## Errors Heat Map 