# Constructing a 2D ensemble of HIV-1 TAR

* **The Goal**:
    * Complete this notebook to generate a 2D ensemble of HIV-1 TAR. 
    * **The Input files are**:
        * Experimental chemical shifts: ```data/chemical_shifts/measured_shifts_7JU1.dat```
        * 200 2D structures of the HIV-1 TAR (see: ```RNA Folding HIV-1 TAR```): ```data/states_HIV_TAR.sav```    
* **Tasks**:
    1. Train a neural network model to predict chemical shifts from 2D structures
    2. Determine the uncertainity of the model by applying it the testing data
    3. Load the saved states of the HIV-1 TAR
    4. Predict and store chemical shifts for each state
    5. Format and write files need for BME
    6. Use BME to determine weights
    7. Determine the conformer with the height weight.

* **Question**:
    * What is the dominant state? How does it compare to the native 2D structure of TAR? If different, explain why.

* **How to submit**:
    * **Please rename this file** as ``Final_Assignment_2D_Ensemble_firstname_lastname.ipynb`` before you submit it to canvas.

In [4]:
%%capture
from machine_learning import *
from PyRNA import *
import pandas as pd
import plotly.express as px
import numpy as np
from sklearn.metrics import mean_absolute_error
import os

## 1. Train a neural network model to predict chemical shifts from 2D structures

Use the function: ``talos_model()`` to train your model. In addition to the training and testing (validation) data, you will need to pass a set of hyperparameters to this function. 

``best_talos_parameters()`` returns a set of optimized parameters you could use to train your model. 

In [5]:
# inspect database, it is a dictionary containing all the data you need to train your model
database = load_entire_database()

In [None]:
history, model = #add code here

## 2. Determine the uncertainity of the model by applying it the testing data

This Notebook 5 provides some hints as to how to do this. Ultimately, for BME, you'll need to write out a chemical shift uncertainity file with the following format:
```
C1' 0.621
C2' 0.704
...
H6 5.346
H8 3.560
```
where column:
1. is the nucleus type or atom name
2. is the value of the uncertainty

## 3. Load the saved states of the HIV-1 TAR

In [None]:
import joblib
filename = 
loaded_model = joblib.load(filename)

## 4. Predict and store chemical shifts for each state of the HIV-1 TAR

Hint see: ```state2CT(), cT2features()```

Ultimately you'll need to write out a file with the following format:
```
1 1 GUA C5' 66.5033 7JU1
1 1 GUA C4' 81.6784 7JU1
1 1 GUA C3' 74.0789 7JU1
...
200 29 CYT H1' 5.67906 7JU1
200 29 CYT H5 5.45511 7JU1
200 29 CYT H6 7.75069 7JU1
```
where column:
1. is the state or model number
2. is the residue number
3. is the nucleus type or atom name
4. is the chemical shift value
5. is an arbitrary label

In [None]:
# add code here

## 5. Formatted and write files need for BME

In [None]:
# add code here

## 6. Use BME to determine weights

In [None]:
# add code here

## 7. Plot or list the weights of each conformer and determine the conformer with the height weight.

In [None]:
# add code here

## **Question**:
What is the dominant state? How does it compare to the native 2D structure of TAR? If different, explain why.

Add you answer here