# Welcome to the EUGENE project!

**Authorship:**
Adam Klie, *07/07/2022*
***
**Description:**
Excited to have you try out the tool. This notebook is meant to be a starting place for using and developing EUGENE. It hasn't been converted to tutorial yet, but it should test the CORE functionality of EUGENE to make sure your compute environment is set-up properly. If you run into any bugs, please report them to the EUGENE team via email (aklie@eng.ucsd.edu or on Github issues)

This dataset used in this notebook is small enough to be run without a gpu.

# Set-up

In [1]:
import numpy as np
import pandas as pd

# Autoreload extension
if 'autoreload' not in get_ipython().extension_manager.loaded:
    %load_ext autoreload
%autoreload 2

In [2]:
import eugene as eu

Global seed set to 13
Global seed set to 13
Global seed set to 13


# Load data

In [8]:
# Loads a random dataset of 1000 sequences of length 66
sdata = eu.datasets.random1000()

# Preprocess data

In [9]:

# Preprocess the data for training (prepare_data wraps these)
eu.pp.reverse_complement_data(sdata)
eu.pp.one_hot_encode_data(sdata)
eu.pp.train_test_split_data(sdata)
sdata

SeqData object modified:
	rev_seqs: None -> 1000 rev_seqs added
SeqData object modified:
	ohe_seqs: None -> 1000 ohe_seqs added
	ohe_rev_seqs: None -> 1000 ohe_rev_seqs added
800
SeqData object modified:
    seqs_annot:
        + TRAIN


SeqData object with = 1000 seqs
seqs = (1000,)
names = (1000,)
rev_seqs = (1000,)
ohe_seqs = (1000, 66, 4)
ohe_rev_seqs = (1000, 66, 4)
    seqs_annot: 'TARGETS', 'TRAIN'

# Instantiate model

In [10]:
# Loads the default DeepBind architecture
eugene = eu.models.DeepBind(input_len=66)
eu.models.base.init_weights(eugene)

# Prepare Dataloader

In [11]:
sdataset = sdata.to_dataset(label="TARGETS", seq_transforms=["one_hot_encode"], transform_kwargs={"transpose": True})
sdataloader = sdataset.to_dataloader()

# Train the model

In [12]:
eu.train.fit(eugene, sdata=sdata, epochs=5, log_dir="../_logs")

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

  | Name      | Type                      | Params
--------------------------------------------------------
0 | max_pool  | MaxPool1d                 | 0     
1 | avg_pool  | AvgPool1d                 | 0     
2 | convnet   | BasicConv1D               | 272   
3 | fcn       | BasicFullyConnectedModule | 146 K 
4 | r_squared | R2Score                   | 0     
--------------------------------------------------------
147 K     Trainable params
0         Non-trainable params
147 K     Total params
0.588     Total estimated model params size (MB)


Validation sanity check: 0it [00:00, ?it/s]

  f"The dataloader, {name}, does not have many workers which may be a bottleneck."
Global seed set to 13
  f"The dataloader, {name}, does not have many workers which may be a bottleneck."
  f"The number of training samples ({self.num_training_batches}) is smaller than the logging interval"


Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

# Predict with model 

In [14]:
eu.predict.predictions(eugene, sdataloader=sdataloader, target_label="TARGETS", batch_size=1, num_workers=0, out_dir="../_out/test_")

Global seed set to 13
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs


# Interpret the model

---

# Scratch