# Testing the `train` module

**Authorship:**
Adam Klie, *07/12/2022*
***
**Description:**
Notebook for testing out the `train` module.

In [1]:
if 'autoreload' not in get_ipython().extension_manager.loaded:
    %load_ext autoreload
%autoreload 2

import os
import torch
import numpy as np
import pandas as pd
import eugene as eu
import matplotlib.pyplot as plt

Global seed set to 13


GPU is available: True
Number of GPUs: 1
Current GPU: 0
GPUs: Quadro RTX 5000


In [2]:
sdata = eu.datasets.random1000()
eu.pp.ohe_seqs_sdata(sdata)
eu.pp.train_test_split_sdata(sdata)

One-hot encoding sequences:   0%|          | 0/1000 [00:00<?, ?it/s]

SeqData object modified:
	ohe_seqs: None -> 1000 ohe_seqs added
SeqData object modified:
    seqs_annot:
        + train_val


In [4]:
model = eu.models.DeepBind(input_len=100, output_dim=10)

In [5]:
model

DeepBind(
  (train_metric): R2Score()
  (val_metric): R2Score()
  (test_metric): R2Score()
  (conv1d_tower): Conv1DTower(
    (layers): Sequential(
      (0): Conv1d(4, 16, kernel_size=(16,), stride=(1,), padding=valid)
      (1): ReLU()
      (2): Dropout(p=0.25, inplace=False)
    )
  )
  (max_pool): MaxPool1d(kernel_size=85, stride=85, padding=0, dilation=1, ceil_mode=False)
  (avg_pool): AvgPool1d(kernel_size=(85,), stride=(85,), padding=(0,))
  (dense_block): DenseBlock(
    (layers): Sequential(
      (0): Linear(in_features=32, out_features=32, bias=True)
      (1): ReLU()
      (2): Dropout(p=0.25, inplace=False)
      (3): Linear(in_features=32, out_features=10, bias=True)
    )
  )
)

In [9]:
from torch.profiler import profile, record_function, ProfilerActivity

In [7]:
sdataset = sdata.to_dataset(target_keys=[f"activity_{i}" for i in range(10)])
sdataloader = sdataset.to_dataloader(batch_size=32, shuffle=True, num_workers=2)

No transforms given, assuming just need to tensorize.


In [8]:
from pytorch_lightning.profilers import PyTorchProfiler

ModuleNotFoundError: No module named 'pytorch_lightning.profilers'

In [None]:
profiler = PyTorchProfiler()
trainer = Trainer(profiler=profiler)

In [9]:
def test_fit(sdata, model):
    eu.settings.logging_dir = "../../_logs/"
    eu.train.fit(model, sdata, target_keys=[f"activity_{i}" for i in range(10)], epochs=1, name="test_fit", version="v0")
    assert os.path.exists(f"{eu.settings.logging_dir}/test_fit/v0/checkpoints/")
test_fit(sdata, model)

Global seed set to 13
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

  | Name      | Type                      | Params
--------------------------------------------------------
0 | hp_metric | R2Score                   | 0     
1 | convnet   | BasicConv1D               | 1.0 K 
2 | max_pool  | MaxPool1d                 | 0     
3 | avg_pool  | AvgPool1d                 | 0     
4 | fcn       | BasicFullyConnectedModule | 1.4 K 
--------------------------------------------------------
2.4 K     Trainable params
0         Non-trainable params
2.4 K     Total params
0.010     Total estimated model params size (MB)


Dropping 0 sequences with NaN targets.
No transforms given, assuming just need to tensorize.
No transforms given, assuming just need to tensorize.


  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")


Validation sanity check: 0it [00:00, ?it/s]

  f"The dataloader, {name}, does not have many workers which may be a bottleneck."
Global seed set to 13
  f"The dataloader, {name}, does not have many workers which may be a bottleneck."
  f"The number of training samples ({self.num_training_batches}) is smaller than the logging interval"


Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

---