# How to train your DragoNN
## Exploring convolutional neural network (CNN) architectures for simulated genomic data. 

Make sure you are using the **Tutorial 2** kernel. This will (hopefully) ensure you have the necessary packages to run everything in the notebook. You can change the kernel by clicking on the button showing the active kernel at the top-right of the notebook or with the "Kernel" drop down menu on the menu bar.

As you run this, Tensorflow will spit out warnings and errors all over the place, which you can safely ignore.

## Outline<a name='outline'>
<ol>
    <li><a href=#2>Key properties of regulatory DNA sequences</a></li>
    <li><a href=#3>Learning to localize homotypic motif density</a></li>
    <li><a href=#4>Getting simulation data</a></li>  
    <li><a href=#4.5>Running DragoNN on your own data: starting with FASTA files</a></li>
    <li><a href=#5>Defining CNN architecture</a></li>
    <li><a href=#6>Single layer, multiple filter model</a></li>
    <li><a href=#7>Model Interpretation</a></li>    
    <li><a href=#8>A multi-layer DragoNN model</a></li>
</ol>

We start by loading DragoNN's tutorial utilities and reviewing properties of regulatory sequence that transcription factors bind.

In [None]:
import warnings
warnings.filterwarnings('ignore')
from notebook.services.config import ConfigManager
cm = ConfigManager().update('notebook', {'limit_output': 250})
# Making sure our results are reproducible
from numpy.random import seed
seed(1234)
import tensorflow as tf
tf.set_random_seed(1234)

%reload_ext autoreload
%autoreload 2
%matplotlib inline

## Key properties of regulatory DNA sequences <a name='2'>

![sequence properties 1](https://github.com/kundajelab/dragonn/blob/master/paper_supplement/primer_tutorial_images/sequence_properties_1.jpg?raw=1)
![sequence properties 2](https://github.com/kundajelab/dragonn/blob/master/paper_supplement/primer_tutorial_images/sequence_properties_2.jpg?raw=1)

## Learning to localize homotypic motif density <a name='3'>

In this tutorial we will learn how to localize a homotypic motif cluster. We will simulate a positive set of sequences with multiple instances of a motif in the center and a negative set of sequences with multiple motif instances positioned anywhere in the sequence:

![homotypic motif density localization](https://github.com/kundajelab/dragonn/blob/master/tutorials/tutorial_images/homotypic_motif_density_localization.jpg?raw=1)

We will then train a binary classification model to classify the simulated sequences. To solve this task, the model will need to learn the motif pattern and whether instances of that pattern are present in the central part of the sequence.

![classification task](https://github.com/kundajelab/dragonn/blob/master/tutorials/tutorial_images/homotypic_motif_density_localization_task.jpg?raw=1)

We start by getting the simulation data.

## Getting simulation data <a name='4'>

DragoNN provides a set of simulation functions. We will use the `simulate_motif_density_localization()` function to simulate homotypic motif density localization. First, we obtain documentation for the simulation parameters.

In [None]:
from dragonn.simulations import *
print_simulation_info("simulate_motif_density_localization")

Next, we define parameters for a TAL1 motif density localization in 1500bp long sequence, with 0.4 GC fraction, and 2-4 instances of the motif in the central 150bp for the positive sequences. We simulate a total of 3000 positive and 3000 negative sequences.

In [None]:
motif_density_localization_simulation_parameters = {
    "motif_name": "TAL1_known4",
    "seq_length": 1500,
    "center_size": 150,
    "min_motif_counts": 2,
    "max_motif_counts": 4, 
    "num_pos": 3000,
    "num_neg": 3000,
    "GC_fraction": 0.4}

We get the simulation data by calling the `get_simulation_data()` function with the simulation name and the simulation parameters as inputs. 1000 sequences are held out for a test set, 1000 sequences for a validation set, and the remaining 4000 sequences are in the training set.

In [None]:
simulation_data = get_simulation_data("simulate_motif_density_localization",
                                      motif_density_localization_simulation_parameters,
                                      validation_set_size=1000, test_set_size=1000)

simulation_data provides training, validation, and test sets of input sequences X and sequence labels y. The inputs X are matrices with a one-hot-encoding of the sequences:
<img src="https://github.com/kundajelab/dragonn/blob/master/tutorials/tutorial_images/one_hot_encoding.png?raw=1" width="500">



Simulation data is an object. It contains an attribute called X_train that is a numpy array of 4 dimensions. We can call the `shape` function on `X_train` to get it's dimensions. 

In [None]:
simulation_data.X_train.shape

Here are the first 10bp of a sequence in our training data:

In [None]:
#The first dimension indicates the index of the training samples. 
# The second dimension is 1, and is only necessary because we are 
# performing 2D convolutions. We could omit this "dummy" dimension if
# we used 1D convolutions. 
# The third dimension indicates the base index. 
# The fourth dimension indicates the base pair channels: A,C,G,T. 

simulation_data.X_train[0, :, :10, :]

We can convert this one-hot-encoded matrix back into a DNA string:

In [None]:
from dragonn.utils import *
get_sequence_strings(simulation_data.X_train)[0][0:10]

Let's examine the shape of training, validation, and test matrices: 

In [None]:
print(simulation_data.X_train.shape)
print(simulation_data.y_train.shape)

In [None]:
print(simulation_data.X_valid.shape)
print(simulation_data.y_valid.shape)

In [None]:
print(simulation_data.X_test.shape)
print(simulation_data.y_test.shape)

## Running DragoNN on your own data: Starting with FASTA files <a name='4.5'>

If you are running DragoNN on your own data, you can provide data in FASTA sequence format. We recommend generating 6 fasta files for model training: 
* Training positives 
* Training negatives 
* Validation positives 
* Validation negatives 
* Test positives 
* Test negatives 

To indicate how this could be done, we export the one-hot-encoded matrices from **simulation_data** to a FASTA file, and then show how this fasta file could be loaded back to a one-hot-encoded matrix.

In [None]:
from dragonn.utils import fasta_from_onehot

#get the indices of positive and negative sequences in the training, validation, and test sets 
train_pos=np.nonzero(simulation_data.y_train==True)
train_neg=np.nonzero(simulation_data.y_train==False)
valid_pos=np.nonzero(simulation_data.y_valid==True)
valid_neg=np.nonzero(simulation_data.y_valid==False)
test_pos=np.nonzero(simulation_data.y_test==True)
test_neg=np.nonzero(simulation_data.y_test==False)

#Generate gzipped  fasta files -- it is always a good idea to gzip your fasta files. This is less 
# important for our tiny example files, but becomes more relevant as the size of the files increases. 
# The fasta_from_onehot function gzips output fasta files. 
fasta_from_onehot(np.expand_dims(simulation_data.X_train[train_pos],axis=1),"X.train.pos.fasta.gz")
fasta_from_onehot(np.expand_dims(simulation_data.X_valid[valid_pos],axis=1),"X.valid.pos.fasta.gz")
fasta_from_onehot(np.expand_dims(simulation_data.X_test[test_pos],axis=1),"X.test.pos.fasta.gz")

fasta_from_onehot(np.expand_dims(simulation_data.X_train[train_neg],axis=1),"X.train.neg.fasta.gz")
fasta_from_onehot(np.expand_dims(simulation_data.X_valid[valid_neg],axis=1),"X.valid.neg.fasta.gz")
fasta_from_onehot(np.expand_dims(simulation_data.X_test[test_neg],axis=1),"X.test.neg.fasta.gz")

Let's examine "X.train.pos.fasta.gz" to verify that it's in the standard gzipped FASTA format. 

In [None]:
! zcat X.train.pos.fasta.gz | head

We can then load fasta format data to generate training, validation, and test splits for our models:

In [None]:
from dragonn.utils import encode_fasta_sequences
X_train_pos=encode_fasta_sequences("X.train.pos.fasta.gz")
X_train_neg=encode_fasta_sequences("X.train.neg.fasta.gz")
X_valid_pos=encode_fasta_sequences("X.valid.pos.fasta.gz")
X_valid_neg=encode_fasta_sequences("X.valid.neg.fasta.gz")
X_test_pos=encode_fasta_sequences("X.test.pos.fasta.gz")
X_test_neg=encode_fasta_sequences("X.test.neg.fasta.gz")

X_train=np.concatenate((X_train_pos,X_train_neg),axis=0)
X_valid=np.concatenate((X_valid_pos,X_valid_neg),axis=0)
X_test=np.concatenate((X_test_pos,X_test_neg),axis=0)


In [None]:
y_train=np.concatenate((np.ones(X_train_pos.shape[0]),
                        np.zeros(X_train_neg.shape[0])))
y_valid=np.concatenate((np.ones(X_valid_pos.shape[0]),
                        np.zeros(X_valid_neg.shape[0])))
y_test=np.concatenate((np.ones(X_test_pos.shape[0]),
                        np.zeros(X_test_neg.shape[0])))


Now, having read in the FASTA files, converted them to one-hot-encoded matrices, and defined label vectors, we are ready to train our model. 

# Defining the convolutional neural network model architecture  <a name='5'>

A locally connected linear unit in a CNN model can represent a [position-specific scoring matrix (PSSM)](https://en.wikipedia.org/wiki/Position_weight_matrix) (a). A sequence PSSM score is obtained by multiplying the PSSM across the sequence, thresholding the PSSM scores, and taking the max (b). A PSSM score can also be computed by a CNN model with tiled, locally connected linear units, amounting to a convolutional layer with a single convolutional filter representing the PSSM, followed by ReLU thresholding and maxpooling (c).
![dragonn vs pssm](https://github.com/kundajelab/dragonn/blob/master/tutorials/tutorial_images/dragonn_and_pssm.jpg?raw=1)\

By utilizing multiple convolutional layers with multiple convolutional filters, CNN's can represent a wide range of sequence features in a compositional fashion:
![dragonn model figure](https://github.com/kundajelab/dragonn/blob/master/tutorials/tutorial_images/dragonn_model_figure.jpg?raw=1)

We will use the deep learning library [keras](http://keras.io/) with the [TensorFlow](https://github.com/tensorflow/tensorflow) backend to generate and train the CNN models. 

In [None]:
#To prepare for model training, we import the necessary functions and submodules from keras
from keras.models import Sequential
from keras.layers.core import Dropout, Reshape, Dense, Activation, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.optimizers import Adadelta, SGD, RMSprop;
import keras.losses;
from keras.constraints import maxnorm;
from keras.layers.normalization import BatchNormalization
from keras.regularizers import l1, l2
from keras.callbacks import EarlyStopping, History
from keras import backend as K 
K.set_image_data_format('channels_last')


# Single layer, multi-filter model <a name='6'>

We define a simple DragoNN model with one convolutional layer with 15 convolutional filters, followed by maxpooling of width 35. 

The model parameters are: 

* Input sequence length 1500 
* 15 filter: there are neurons that act as  local pattern detectors on the input profile. 
* Convolutional filter width =  10: this metric defines the dimension of the filter weights; the model scans the entire input profile for a particular pattern encoded by the weights of the filter. 
* Max pool of width 35: computes the maximum value per-channel in sliding windows of size 35. We add the pooling layer becase DNA sequences are typically sparse in terms of the number of positions in the sequence that harbor TF motifs. The pooling layer allows us to reduce the size of the output profile of convolutional layers by employing summary statistics. 

![simArch1Layer](https://github.com/kundajelab/dragonn/blob/master/tutorials/tutorial_images/SimArch1Layer.png?raw=1)


In [None]:
#Define the model architecture in keras
multi_filter_keras_model=Sequential() 
multi_filter_keras_model.add(Conv2D(filters=15,kernel_size=(1,10),input_shape=simulation_data.X_train.shape[1::]))
multi_filter_keras_model.add(BatchNormalization(axis=-1))
multi_filter_keras_model.add(Activation('relu'))
multi_filter_keras_model.add(MaxPooling2D(pool_size=(1,35), strides=35))
multi_filter_keras_model.add(Flatten())
multi_filter_keras_model.add(Dense(1))
multi_filter_keras_model.add(Activation("sigmoid"))

##compile the model, specifying the Adam optimizer, and binary cross-entropy loss. 
multi_filter_keras_model.compile(optimizer='adam',
                               loss='binary_crossentropy')

In [None]:
multi_filter_keras_model.summary()

"Non-trainable params" refers to Batch Normalization parameter whose weights don't get updated during training. 

In [None]:
##compile the model, specifying the Adam optimizer, and binary cross-entropy loss. 
multi_filter_keras_model.compile(optimizer='adam',
                               loss='binary_crossentropy')

We train the model for 150 epochs, with an early stopping criterion -- if the loss on the validation set does not improve for 3 consecutive epochs, the training is halted. In each epoch, the model performs a complete pass over the training data, and updates its parameters to minimize the loss, which quantifies the error in the model predictions. After each epoch, the performance metrics for the model on the validation data were stored. 

The performance metrics include balanced accuracy, area under the receiver-operating curve ([auROC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic)), are under the precision-recall curve ([auPRC](https://en.wikipedia.org/wiki/Precision_and_recall)), and recall for multiple false discovery rates  (Recall at [FDR](https://en.wikipedia.org/wiki/False_discovery_rate)).

In [None]:
from dragonn.callbacks import * 
#We define a custom callback to print training and validation metrics while training. 
metrics_callback=MetricsCallback(train_data=(simulation_data.X_train,simulation_data.y_train),
                                 validation_data=(simulation_data.X_valid,simulation_data.y_valid))


We now proceed to train the model. We do this with the keras "fit" function. The "fit" function has a few key parameters: 

* **batch_size** -- the number of training and validation samples to be propagated through the network simultaneously. 
* **epochs** -- An epoch is a measure of the number of times all of the training vectors are used once to update the weights. For batch training all of the training samples pass through the learning algorithm simultaneously in one epoch before weights are updated.
* **callbacks** -- Keras callbacks return information from a training algorithm while training is taking place. A callback is a set of functions to be applied at given stages of the training procedure. You can use callbacks to get a view on internal states and statistics of the model during training.
* **EarlyStopping** -- a Keras callback that gets called at the end of each epoch. If the loss has not decreased for a consecutive n epochs, where n is referred to as the patience, the training is interrupted. 


In [None]:
## use the keras fit function to train the model for 150 epochs with early stopping after 3 epochs 
history_multi_filter=multi_filter_keras_model.fit(x=simulation_data.X_train,
                                  y=simulation_data.y_train,
                                  batch_size=128,
                                  epochs=150,
                                  verbose=1,
                                  callbacks=[EarlyStopping(patience=3,restore_best_weights=True),
                                            metrics_callback],
                                  validation_data=(simulation_data.X_valid,
                                                   simulation_data.y_valid))


### Evaluate the model on the held-out test set 

In [None]:
## Use the keras predict function to get model predictions on held-out test set. 
test_predictions=multi_filter_keras_model.predict(simulation_data.X_test)
## Generate a ClassificationResult object to print performance metrics on held-out test set 
print(ClassificationResult(simulation_data.y_test,test_predictions))

### Visualize the model's performance

In [None]:
#import functions foro visualization of data 
%matplotlib inline
from dragonn.vis import *

plot_learning_curve(history_multi_filter)

We can see that the training and validation loss decrease, but the validation loss is somewhat higher than the training loss. This is indicative of over-fitting to the training data. 

## Visualize the learned parameters 

Next, let's visualize the filter learned in this model

### Dense layer

In [None]:
plot_model_weights(multi_filter_keras_model)

### Convolutional layer 

In [None]:
W_conv, b_conv = multi_filter_keras_model.layers[0].get_weights()

In [None]:
W_conv.shape

In [None]:
b_conv.shape

In [None]:
plot_filters(multi_filter_keras_model, simulation_data)

# Model Interpretation <a name='7'>
    
As you can see, the filters/model parameters are difficult to be interepreted directly. However, there are alternative approaches of interepreting sequences.
    
Let's examine a positive and negative example from our simulation data:

In [None]:
#get the indices of the first positive and negative examples in the validation data split
pos_indx=np.flatnonzero(simulation_data.y_valid==1)[0]
print(pos_indx)
pos_X=simulation_data.X_valid[pos_indx:pos_indx+1]

neg_indx=np.flatnonzero(simulation_data.y_valid==0)[0]
print(neg_indx)
neg_X=simulation_data.X_valid[neg_indx:neg_indx+1]

### Motif Scores

In [None]:
from dragonn.utils import * 
pos_motif_scores=get_motif_scores(pos_X,simulation_data.motif_names,return_positions=True)
neg_motif_scores=get_motif_scores(neg_X,simulation_data.motif_names,return_positions=True)

In [None]:
from dragonn.vis import * 
plot_motif_scores(pos_motif_scores,title="Positive example",ylim=(0,20))
plot_motif_scores(neg_motif_scores,title="Negative example",ylim=(0,20))

The motif scan yields a group of three high-scoring motif alignment positions at a fixed distance near the center of the sequence in the positive example. The spacing of the high-scoring motif alignments is random in the negative sequence. 

Note: If you find that your negative example is too close to the positive examle (i.e. the randomly spaced motifs happen to have a spacing close to the positive example, feel free to provide another index value to select a different negative). 

For example, you can change the code to select a negative example to the below: 


In [None]:
neg_indx=np.flatnonzero(simulation_data.y_valid==0)[10]
print(neg_indx)
neg_X=simulation_data.X_valid[neg_indx:neg_indx+1]

### *In silico* mutagenesis 

To determine how much each position in the input sequence contrinbutes to the model's prediction, we can perform saturation mutagenesis on the sequence. For each position in the input sequence, we introduce each of the four possible bases A, C, G, T and quantify the effect on the model's predictions.

*In silico* mutagenesis entails measuring the effect of a given base pair on the model's prediction of accessibility. The following algorithm is used: 

1. At each position in the input sequence, the reference allele is mutated to each of three possible alternate alleles, and the model predictions with the alternate alleles are obtained. 

2. The logit values for the reference allele are subtracted from the logit values for each of the 4 alleles. (This means that a difference of 0 will be obtained for the reference allele). We refer to these differences in logit at each position between the reference and alternate alleles as the ISM values. ISM values are computed in logit space to avoid any saturation effects from passing the logits through a sigmoid function. 

3. For each position, subtract the mean ISM value for that position from each of the 4 ISM values. 

4. Plot the 4xL heatmap of mean-normalized ISM values 

5. Plot the reference sequence bases weighted by the highest magnitude ISM score. 

In [None]:
pos_X.shape

In [None]:
from dragonn.interpret.ism import *
ism_pos=in_silico_mutagenesis(multi_filter_keras_model,pos_X,0)
ism_neg=in_silico_mutagenesis(multi_filter_keras_model,neg_X,0)

In [None]:
%matplotlib inline
from dragonn.vis import * 

# create discrete colormap of ISM scores 
#zoom into the central 150 bases of the sequence 
plot_ism(ism_pos,pos_X,title="Positive Example",xlim=(675,825))

In [None]:

plot_ism(ism_neg,neg_X,title="Negative Example",xlim=(675,825))

We see clear TAL1 motif patterns emerging for the positive example in the central 150 bases of the input sequence; we do not see clear TAL1 patterns in the central 150 bases of the sequence for the negative example. 

### Gradient x Input 

Consider a neural net being a function: $f(x_1, ..., x_N; w) = y$

One way to tell whether the input feature is important is to compute the gradient of the function with respect to (w.r.t.) model input: $\frac{\partial f}{\partial x_i}$

This approach is called saliency maps: "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps", by Karen Simonyan, Andrea Vedaldi and Andrew Zisserma https://arxiv.org/pdf/1312.6034.pdf

In genomics, we typically visualize only gradients for bases observed in the sequence (called input masked gradients or input*grad).

In [None]:
from dragonn.interpret.input_grad import * 
gradinput_pos=input_grad(multi_filter_keras_model,pos_X)
gradinput_neg=input_grad(multi_filter_keras_model,neg_X)

In [None]:
from dragonn.vis import plot_seq_importance
plot_seq_importance(gradinput_pos,pos_X,title="Positive GradXInput",xlim=(675,825))
plot_seq_importance(gradinput_neg,neg_X,title="Negative GradXInput",xlim=(675,825))

This confirms what we observed with the ISM analysis -- the positive example contains TAL1 motifs in the central 150 base pairs; the negative example does not. 


### DeepLIFT

DeepLIFT is another approach to infer the contribution or importance of individual nucleotides in a specific input sequence to its predicted out. While gradients measure the sensitivity of the output to infinitesimal changes in the input, DeepLIFT scores quantify the sensitivity of the output to finite changes in the input. Specifically, the DeepLIFT algorithm backpropagates a score (analogous to gradients) which is based on comparing the activations of all the neurons in the network for the actual input sequence to those obtained when using neutral ‘reference’ sequences. We use dinucleotide-shuffled versions of any input sequence as reference sequences.

[DeepLIFT](https://arxiv.org/pdf/1605.01713v2.pdf) allows us to obtain scores for specific sequence indicating the importance of each position in the sequence. DeepLIFT can accept a custom reference. For our purposes, we provide a dinucleotide-shuffled reference.

We can now load the saved model for use in other applications or for further fine-tuning:

In [None]:
from dragonn.interpret.deeplift import * 
#note that the defaults for the deeplift function use 10 shuffled references per input sequence 
help(deeplift)

In [None]:
### Saving a keras model 

We save the optimal regularized multi-layer keras model to an hdf5 file that contains both the model weights and architecture.

multi_filter_keras_model.save("multi_filter_keras_model.hdf5")

We can now load the saved model for use in other applications or for further fine-tuning:

In [None]:
from keras.models import load_model
model=load_model("multi_filter_keras_model.hdf5")

We first use the saved model to obtain the DeepLIFT scoring function. We use a shuffled reference, with 10 shuffled reference sequences for each example. 

In [None]:
import dragonn
from dragonn.interpret import * 
help(get_deeplift_scoring_function)

In [None]:
#target_layer_idx refers to the second-to-last model layer, which is the input to the sigmoid function
#task_idx indicates that the first task in a multi-tasked model should be interpreted with DeepLIFT. Because 
# this is a single-tasked model, the task index will be 0 in all cases. 
dl_score_func=get_deeplift_scoring_function('multi_filter_keras_model.hdf5',
                                           target_layer_idx=-2,
                                           task_idx=0,
                                           num_refs_per_seq=10,
                                           reference='shuffled_ref',
                                           one_hot_func=None)

In [None]:
#We use the scoring function to calculate deepLIFT scores for the positive and negative examples 
dl_pos=deeplift(dl_score_func,pos_X)
dl_neg=deeplift(dl_score_func,neg_X)

In [None]:
plot_seq_importance(dl_pos,pos_X,title="DeepLift positives",xlim=(675,825))
plot_seq_importance(dl_neg,neg_X,title="DeepLift negatives",xlim=(675,825)) 

# A multi-layer DragoNN model <a name='8'>

Next, we train a 3 layer model for this task. By adding additional layers, we allow the model to learn more complex features in the sequence data, such as the spatial constraint in the spacing of the TAL1 motifs. 

However, by adding additional layers to the model, we also make it more likely that the model will overfit to the training data -- we saw that the validation loss was higher than the training loss in the single-layer multi-filter model. We anticipate that this will only be more pronounced with a multi-filter model, so we add some regularization. We regularize the 3 layer using 0.2 dropout on every convolutional layer.
![MultiLayerTraining](https://github.com/kundajelab/dragonn/blob/master/tutorials/tutorial_images/MultiLayerTraining.png?raw=1)

In [None]:
#Define the model architecture in keras

regularized_keras_model=Sequential() 
regularized_keras_model.add(Conv2D(filters=15,kernel_size=(1,10),input_shape=simulation_data.X_train.shape[1::]))
regularized_keras_model.add(Activation('relu'))
regularized_keras_model.add(Dropout(0.2))

regularized_keras_model.add(Conv2D(filters=15,kernel_size=(1,10),input_shape=simulation_data.X_train.shape[1::]))
regularized_keras_model.add(Activation('relu'))
regularized_keras_model.add(Dropout(0.2))

regularized_keras_model.add(Conv2D(filters=15,kernel_size=(1,10),input_shape=simulation_data.X_train.shape[1::]))
regularized_keras_model.add(Activation('relu'))
regularized_keras_model.add(Dropout(0.2))
regularized_keras_model.add(MaxPooling2D(pool_size=(1,35)))


regularized_keras_model.add(Flatten())
regularized_keras_model.add(Dense(1))
regularized_keras_model.add(Activation("sigmoid"))

##compile the model, specifying the Adam optimizer, and binary cross-entropy loss. 
regularized_keras_model.compile(optimizer='adam',
                               loss='binary_crossentropy')

regularized_keras_model.summary() 

In [None]:
## use the keras fit function to train the model for 150 epochs with early stopping after 3 epochs 
history_regularized=regularized_keras_model.fit(x=simulation_data.X_train,
                                  y=simulation_data.y_train,
                                  batch_size=128,
                                  epochs=150,
                                  verbose=1,
                                  callbacks=[EarlyStopping(patience=3,restore_best_weights=True),
                                            History(),
                                            metrics_callback],
                                  validation_data=(simulation_data.X_valid,
                                                   simulation_data.y_valid))


In [None]:
## Use the keras predict function to get model predictions on held-out test set. 
test_predictions=regularized_keras_model.predict(simulation_data.X_test)
## Generate a ClassificationResult object to print performance metrics on held-out test set 
print(ClassificationResult(simulation_data.y_test,test_predictions))

In [None]:
## Visualize the model's performance 
from dragonn.vis import * 
plot_learning_curve(history_regularized)

In [None]:
regularized_keras_model.save("TAL1.Simulation.Regularized.3ConvLayers.hdf5")

DragoNN provides a single utility function to interpret and plot model predictions through all the methodologies we have examined: 
    * Plots the motif scores (when available) to serve as a "gold standard" for interpretation
    * Plots the ISM heatmap and sequence importance track 
    * Plots the gradient x input importance track
    * Plots the DeepLIFT importance track 
    

In [None]:
from dragonn.interpret import *
help(multi_method_interpret)

In [None]:
#obtain the deepLIFT scoring function for interpretation 
dl_score_func_multi_layer_regularized=get_deeplift_scoring_function("TAL1.Simulation.Regularized.3ConvLayers.hdf5")

In [None]:
from dragonn.vis import * 
pos_interpretations=multi_method_interpret(regularized_keras_model,
                                           pos_X,
                                           0,
                                           dl_score_func_multi_layer_regularized,
                                           motif_names=simulation_data.motif_names)

In [None]:
neg_interpretations=multi_method_interpret(regularized_keras_model,
                                           neg_X,
                                           0,
                                           dl_score_func_multi_layer_regularized,
                                           motif_names=simulation_data.motif_names)

We now plot the interpretation scores for pos_X and neg_X along the full sequence as well as along the central 200 bp.

In [None]:
plot_all_interpretations([pos_interpretations],pos_X,xlim=(675,825))

In [None]:
plot_all_interpretations([neg_interpretations],neg_X,xlim=(675,825))

As expected, additional layers in combination with regularization via dropout lead to improved test set auPRC and decreased overfitting to the training set. 