In [None]:
%matplotlib inline

## Recursive Network Exercise

In this exercise, you should train a recursive neural network which can estimate the _free energy_ of an _RNA secondary structure_. In biology, RNA sequences fold to so-called secondary structures and it is assumed that secondary structures are preferred which have little free energy. Free energy is minimized if base pairs are joined in stable pairs.

For this task, though, you do not need to know anything about the actual biological specifics. You can just train a recursive neural net which infers the correct energy (a simple scalar) from a given tree.

### Report

For the report, please describe the architecture that you used to solve the task and generate the following plot. After training the network, generate 100 further trees and record for each tree the size using the `recursive_oracle.tree_size` function and the error `abs(y - y_predicted)`. Plot error against tree size in a scatter plot.

<strong>Note:</strong> Please use the `exercise_sheet_template.tex` to generate your report. Your report is due on *Friday, March 15th, 10am* as single-page PDF to [aschulz@techfak.uni-bielefeld.de](mailto:aschulz@techfak.uni-bielefeld.de). Please start your e-mail subject with the words *[Deep Learning]*.

### Advice

Do not try to map directly to the energy because a one-dimensional encoding may carry too little information. Rather, apply a recursive neural network to a low-dimensional encoding space first and then another neural network which predicts from the encoding the free energy.

Further, this predictive task is not super easy, so do not try to achieve perfect error values. If you manage to stay consistently below an error of 1 this is already a good result.

In [1]:
# For this exercise, we already provide data generation function (an 'oracle')
# which we can use
from recursive_oracle import generate_rna_tree

# let's have a look at an example tree and its energy value.
# Executing this cell multiple times will yield different trees.
x, y = generate_rna_tree()
print('the tree %s has energy value %g' % (str(x), y))

the tree pair(c, pair(a, hairpin(g, hairpin(g, hairpin(g, hairpin_end(a)))), u), g) has energy value 2.16355


In [2]:
# The oracle also provides us with the arity alphabet for the RNA trees
from recursive_oracle import rna_arity_alphabet

print(rna_arity_alphabet)

{'dangle': 2, 'dangle_end': 1, 'split': 2, 'pair': 3, 'branch': 2, 'hairpin': 2, 'hairpin_end': 1, 'c': 0, 'g': 0, 'a': 0, 'u': 0}
