## Individiual evaluation/interpretation notebook

Inspect individual predictions and metrics per/across cell types.
Use the python modules (evaluation.py & interpret.py) to perform evaluations accross the entire test set.

In [1]:
!module load cuDNN/8.7.0.84-CUDA-11.8.0  # change to your CUDA version
!module load cluster/wice/dedicated_big_gpu

In [1]:
import tensorflow as tf
import numpy as np
import pyfaidx
import os
from utils.one_hot_encoding import get_hot_encoding_table, regions_to_hot_encoding
from utils.plot import plot_predictions_vs_groundtruth

2023-12-22 12:05:48.830866: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-22 12:05:48.879183: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-22 12:05:48.879210: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-22 12:05:48.879243: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-22 12:05:48.889792: I tensorflow/core/platform/cpu_feature_g

In [3]:
# Load data
MODEL_DIR = '../../checkpoints/mouse/2023-12-21_10:19/'  # change this
GENOME_FASTA_PATH = '../../data/raw/genome.fa'  # change this
SPLIT = 'test'  # change this

In [4]:
# Find idx of interest to inspect
CHR = 'chr18'  # change this
REGION_START = 61106000  # change this
REGION_END = 61110000  # change this

In [5]:
regions_bed_file = os.path.join(MODEL_DIR, "regions.bed")
one_hot_encoding_table = get_hot_encoding_table()
genomic_pyfasta = pyfaidx.Fasta(GENOME_FASTA_PATH, sequence_always_upper=True)
targets = np.load(os.path.join(MODEL_DIR, "targets.npz"))[SPLIT]
region_split_ids = np.load(os.path.join(MODEL_DIR, "region_split_ids.npz"))[SPLIT]

classnames = []
with open(
    os.path.join(MODEL_DIR, "cell_type_mapping.tsv"), "r"
) as cell_mapping:
    for line in cell_mapping:
        classnames.append(line.strip().split("\t")[1])

In [6]:
# One hot encode regions of interest
regions_of_interest = {}

with open(regions_bed_file, "r") as f:
    for i, line in enumerate(f):
        line = line.strip()
        if line:  # Check if the line is not empty
            chrom, start, end = line.split()[:3]
            start, end = int(start), int(end)

            if chrom == CHR and REGION_START <= start <= REGION_END and REGION_START <= end <= REGION_END:
                regions_of_interest[i] = line.split('\t')

idx_of_interest = regions_of_interest.keys()

seqs_one_hot = regions_to_hot_encoding(
    regions_bed_filename=regions_bed_file,
    genomic_pyfasta=genomic_pyfasta,
    hot_encoding_table=one_hot_encoding_table,
    idx=idx_of_interest,
)

One hot encoding sequences...


100%|██████████| 2/2 [00:00<00:00, 1070.79it/s]


In [7]:
regions_of_interest

{250544: ['chr18', '61106697', '61108811', 'chr18:61106697-61108811'],
 250545: ['chr18', '61107570', '61109684', 'chr18:61107570-61109684']}

In [8]:
print("One hot shape:", seqs_one_hot.shape)

One hot shape: (2, 2114, 4)


### Inspect individual predictions vs ground truth accross multiple models

In [9]:
# compare evolution of model training
# you can also manually change to a list of absolute model paths you want to compare
model_paths = [os.path.join(MODEL_DIR, 'checkpoints', name) for name in os.listdir(os.path.join(MODEL_DIR, 'checkpoints'))]  
model_paths

['../../checkpoints/mouse/2023-12-21_10:19/checkpoints/30.keras',
 '../../checkpoints/mouse/2023-12-21_10:19/checkpoints/28.keras',
 '../../checkpoints/mouse/2023-12-21_10:19/checkpoints/29.keras',
 '../../checkpoints/mouse/2023-12-21_10:19/checkpoints/27.keras']

In [10]:
for model in model_paths:
    model = tf.keras.models.load_model(model, compile=False)
    prediction = model.predict(seqs_one_hot)
    print(prediction)
    break
plot_predictions_vs_groundtruth(...)

# TODO:finish

INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK
Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: NVIDIA A100-SXM4-80GB, compute capability 8.0


2023-12-22 11:50:19.322695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 78913 MB memory:  -> device: 0, name: NVIDIA A100-SXM4-80GB, pci bus id: 0000:18:00.0, compute capability: 8.0
2023-12-22 11:50:20.771953: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory


: 