# Welcome to the EUGENE project!

**Authorship:**
Adam Klie, *07/07/2022*
***
**Description:**
Excited to have you try out the tool. This notebook is meant to be a starting place for using and developing EUGENE. It hasn't been converted to tutorial yet, but it should test the CORE functionality of EUGENE to make sure your compute environment is set-up properly. If you run into any bugs, please report them to the EUGENE team via email (aklie@eng.ucsd.edu or on Github issues)

This dataset used in this notebook is small enough to be run without a gpu.

# Set-up

In [1]:
import numpy as np
import pandas as pd

# Autoreload extension
if 'autoreload' not in get_ipython().extension_manager.loaded:
    %load_ext autoreload
%autoreload 2

In [2]:
import eugene as eu

Global seed set to 13
Global seed set to 13
Global seed set to 13


# Load data

In [3]:
# Loads a random dataset of 1000 sequences of length 66
sdata = eu.datasets.random1000()

# Preprocess data

In [4]:

# Preprocess the data for training (prepare_data wraps these)
eu.pp.prepare_data(sdata)
sdata

  0%|          | 0/3 [00:00<?, ?it/s]

SeqData object modified:
	rev_seqs: None -> 1000 rev_seqs added
	ohe_seqs: None -> 1000 ohe_seqs added
    seqs_annot:
        + TRAIN


SeqData object with = 1000 seqs
seqs = (1000,)
names = (1000,)
rev_seqs = (1000,)
ohe_seqs = (1000, 66, 4)
ohe_rev_seqs = None
    seqs_annot: 'TARGETS', 'TRAIN'

# Instantiate model

In [5]:
# Loads the default DeepBind architecture
eugene = eu.models.DeepBind(input_len=66)
eu.models.base.init_weights(eugene)

# Prepare Dataloader

In [6]:
sdataset = sdata.to_dataset(label="TARGETS", seq_transforms=["one_hot_encode"], transform_kwargs={"transpose": True})
sdataloader = sdataset.to_dataloader()

# Train the model

In [7]:
eu.train.fit(eugene, sdata=sdata, epochs=5, log_dir="../_logs")

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

  | Name      | Type                      | Params
--------------------------------------------------------
0 | max_pool  | MaxPool1d                 | 0     
1 | avg_pool  | AvgPool1d                 | 0     
2 | convnet   | BasicConv1D               | 272   
3 | fcn       | BasicFullyConnectedModule | 146 K 
4 | r_squared | R2Score                   | 0     
--------------------------------------------------------
147 K     Trainable params
0         Non-trainable params
147 K     Total params
0.588     Total estimated model params size (MB)


Validation sanity check: 0it [00:00, ?it/s]

  f"The dataloader, {name}, does not have many workers which may be a bottleneck."
Global seed set to 13
  f"The dataloader, {name}, does not have many workers which may be a bottleneck."
  f"The number of training samples ({self.num_training_batches}) is smaller than the logging interval"


Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
  f"The dataloader, {name}, does not have many workers which may be a bottleneck."


Predicting: 0it [00:00, ?it/s]

Predicting: 0it [00:00, ?it/s]

SeqData object modified:
    seqs_annot:
        + PREDICTIONS


In [8]:
sdata.seqs_annot

Unnamed: 0,TARGETS,TRAIN,PREDICTIONS
seq001,1.0,True,0.359638
seq002,0.0,True,0.410613
seq003,1.0,True,0.396085
seq004,0.0,False,0.405854
seq005,0.0,False,0.401179
...,...,...,...
seq996,0.0,True,0.382790
seq997,0.0,True,0.397430
seq998,0.0,True,0.380698
seq999,0.0,True,0.418305


# Predict with model 

In [9]:
eu.predict.predictions(eugene, sdataloader=sdataloader, target_label="TARGETS", batch_size=1, num_workers=0, out_dir="../_out/test_")

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
  f"The dataloader, {name}, does not have many workers which may be a bottleneck."


Predicting: 0it [00:00, ?it/s]

  t = np.concatenate(np.array(trainer.predict(model, sdataloader)), axis=0)


Unnamed: 0,0
seq001,0.35963845
seq002,0.41061306
seq003,0.39608538
seq004,0.40585387
seq005,0.40117908
...,...
seq996,0.3827901
seq997,0.39743
seq998,0.38069797
seq999,0.41830504


# Interpret the model

In [10]:
eu.interpret.feature_attribution(eugene, sdata)

  0%|          | 0/31 [00:00<?, ?it/s]

array([[[ 0.        ,  0.        ,  0.00641138, ...,  0.00272099,
          0.        ,  0.        ],
        [-0.        ,  0.        , -0.        , ..., -0.        ,
          0.        ,  0.00364729],
        [ 0.00399181, -0.        , -0.        , ..., -0.        ,
          0.00248242,  0.        ],
        [-0.        ,  0.00491963,  0.        , ...,  0.        ,
          0.        ,  0.        ]],

       [[-0.        ,  0.00451582,  0.        , ...,  0.        ,
          0.        ,  0.00158742],
        [-0.        ,  0.        ,  0.        , ...,  0.        ,
         -0.        ,  0.        ],
        [ 0.        , -0.        ,  0.        , ..., -0.        ,
          0.        ,  0.        ],
        [ 0.00161232,  0.        ,  0.00241995, ...,  0.00435137,
          0.00183829, -0.        ]],

       [[-0.00056602,  0.00576064,  0.        , ...,  0.        ,
          0.        , -0.        ],
        [ 0.        , -0.        ,  0.        , ...,  0.01109879,
          0.

In [12]:
eu.get_pfms(eugene, sdata)

AttributeError: module 'eugene' has no attribute 'get_pfms'

---

# Scratch