# Gen559 dictionaries practice notebook with solutions
### 2020.11.04

Below in the variable 'mw_info' are the molecular weights (g/Mol) for RNA (A, C, G, U) and DNA bases (dA, dC, dG, dT) when incorporated into ssRNA and ssDNA oligos, respectively.

*Unless otherwise indiciated, you can assume there are no 5' monophosphate (P) or triphosphate (3P) groups to also consider when calculating the molecular weight of an oligo.* ***To make your calculations fully accurate, you will need to subtract 61.96 g/Mol from your final tally of the MW of any ssDNA or ssRNA oligo to account for the removal of HPO2 from the first base (-63.98 g/mol) and the addition of 2 H molecules to the last base (+2.02)***

[Source](https://www.thermofisher.com/us/en/home/references/ambion-tech-support/rna-tools-and-calculators/dna-and-rna-molecular-weights-and-conversions.html)

In [1]:
mw_info = 'A\t329.2\nC\t305.2\nG\t345.2\nU\t306.2\ndA\t313.2\ndC\t289.2\ndG\t329.2\ndT\t304.2\nP\t79\n3P\t159'

In [2]:
print (mw_info)

A	329.2
C	305.2
G	345.2
U	306.2
dA	313.2
dC	289.2
dG	329.2
dT	304.2
P	79
3P	159


### Practice problem 1a 

In the cell below, write out in words the major operational steps required to solve *Practice problem 1b*.
>**Practice problem 1b** In the cell below, create and print a dictionary containing the nucleotides and molecular weights for the 'A', 'C', 'G', and 'T' DNA bases.

*  Parse out nucletodide, MW values from string
*  Type MW values as floats
*  Build ditionary
* *With just 4 values could also just build by hand, but as complexity of data grows best to parse string directly*

### Practice problem 1b  
In the cell below, create and print a dictionary containing the nucleotides and molecular weights for the 'A', 'C', 'G', and 'T' DNA bases. 

In [3]:
# Split mw_info on \n character and assign to new variable.
mw_split = mw_info.split('\n')

# Make list to hold info about DNA bases
DNA_vals = []

# Populate DNA_vals list
for i in range(0, len(mw_split), 1):
    
    # Check if entry is a DNA base
    if mw_split[i][0] == "d":
        
        # If so, add info to list
        DNA_vals.append(mw_split[i])

# Create an empty dictionary to hold DNA MW info.
dna_weights = {}

# Populate dna_weights dictionary. Split each entry on \t to access key and value separately.
for i in range(0, len(DNA_vals), 1):
    dna_weights[DNA_vals[i].split('\t')[0]] = float(DNA_vals[i].split('\t')[1])
                                                  
# Print dictionary.
print (dna_weights)

{'dA': 313.2, 'dC': 289.2, 'dG': 329.2, 'dT': 304.2}


### Practice problem 2a  
In the cell below, write out in words the major operational steps required to solve *Practice problem 2b*.
>**Practice problem 2b** In the cell below, caluclate the molecular weight of the DNA oligo whose sequence stored in the 'my_DNA1' variable.

*  Count number of each base in string and store in base-specific variable
*  Mulitply MW of each base by the number of times it appears in the string and sum totals together
*  Subtract 61.96 from total
*  *Could also do with ***for*** loop*

### Practice problem 2b  
In the cell below, caluclate the molecular weight of the DNA oligo whose sequence stored in the 'my_DNA1' variable.

In [27]:
my_DNA1 = 'TTAGGGTTAGGGTTAGGGTTAGGG'

# Count the number of time each base appears in the string. Assign to a base-specific variable.
a_counts = my_DNA1.count('A')
c_counts = my_DNA1.count('C')
g_counts = my_DNA1.count('G')
t_counts = my_DNA1.count('T')

# Calculate the MW, substracting 61.96 from final tally.
dna_mw  = a_counts * dna_weights['dA'] + c_counts * dna_weights['dC'] + g_counts * dna_weights['dG'] \
+ t_counts * dna_weights['dT'] - 61.96

# Print result of calcualtion.
print ('%0.2f' % dna_mw)

7574.84


### Practice problem 3  
In the cell below, calculate the molecular weight of the RNA oligo whose sequence is stored in the 'my_RNA1' variable. Your code should utilize a dictionary.

In [28]:
my_RNA1 = 'CAUCGGACAUCACACA'

# Make list to hold info about RNA bases
RNA_vals = []

# Populate RNA_vals list, referencing 'mw_split' variable from solution 1b.
for i in range(0, len(mw_split), 1):
    
    # Check if entry is a RNA base. Doing a different way than the DNA problem for variety.
    if mw_split[i][0] in ['A', 'C', 'G', 'U']:
        
        # If so, add info to list
        RNA_vals.append(mw_split[i])

# Create an empty dictionary to hold DNA MW info.
rna_weights = {}

# Populate dna_weights dictionary. Split each entry on \t to access key and value separately.
for i in range(0, len(RNA_vals), 1):
    rna_weights[RNA_vals[i].split('\t')[0]] = float(RNA_vals[i].split('\t')[1])

## Calculate molecular weight. Doing a different way than the DNA problem for variety.

# Make list to tally weights
MW_list = []

# Process RNA sequence and tally MW.
for i in range(0, len(my_RNA1), 1):
    base = my_RNA1[i]
    MW_list.append(rna_weights[base])

# Calculate MW by summing list and subtracting 61.96.
final_weight = sum(MW_list) - 61.96

# Print result.
print ('%0.2f' % final_weight)

5047.24


### Practice problem 4 (challenge)  
In the cell below, calculate the molecular weight of the DNA/RNA heteroduplex of the 'my_DNA1' DNA oligo paired with its RNA reverse complement. Assume the DNA oligo has a single 5' phosphate group ("P" in the weights table above) and the RNA oligo has a 5' triphosphate group ("3P" in the table above).

In [33]:
# Define DNA:RNA complementary relationships
comps = {'A': 'U', 'C' : 'G', 'G': 'C', 'T' : 'A'}

# Calculate the DNA strand in the heteroduplex, referencing the solution to problem 2b above.
DNA_with_p = dna_mw + 79

## Determine the sequence of the complementary RNA to my_DNA1.

# Make list to hold RNA bases.
rna_base_list = []

# Populate rna_base_list using comps dictionary.
for i in range(0, len(my_DNA1), 1):
    rna_base_list.append(comps[my_DNA1[i]])

# Create string from populated rna_base_list.
comp_rna = ''.join(rna_base_list)

## Calculate molecular weight of comp_rna as in problem 3.

# Make list to tally weights
comp_MW_list = []

# Process RNA sequence and tally MW.
for i in range(0, len(comp_rna), 1):
    comp_MW_list.append(rna_weights[comp_rna[i]])

# Calculate MW by summing list and subtracting 61.96, and adding 159 for the 3P.
comp_final_weight = sum(MW_list) - 61.96 + 159

# Sum the DNA and RNA weights.
heteroduplex_mw = DNA_with_p + comp_final_weight

# Print result.
print ('%0.2f' % heteroduplex_mw)


12860.08
