# homework E: Writing your own functions

Make sure you have worked through **workbook_E.ipynb** before tackling this homework.

> ### Reminder: saving your work
>
> As you work through the work book it is important to regularly save your work. Notice that as you have made  changes the Jupyter window top line will warn you there are `(unsaved changes)` in small text. To save your work in this notebook by either select menu item `File` `Save` or by hit the save button:
> 
> ![Jupyter Notebook Save Button|](https://aru-bioinf-ibds.github.io./images/save_button.png)
>
> 
> ### Reminder: getting help 
> Please see the page:
> [Help with programming](https://canvas.anglia.ac.uk/courses/12178/pages/help-with-programming)
> on ARU Canvas.

## Function for finding the percentage GC content in a DNA sequence
As we have already seen in workbook A, DNAs with higher proportion of GC base pairs compared to AT have a higher melting temperature and parts of the genome with a higher GC content tend to code for proteins, see https://en.wikipedia.org/wiki/GC-content. 

Use the len() function and .count() method to find out the percentage GC content:

$$
percent_{GC} = 100*\frac{N_G + N_C}{N_{total}}
$$

where $N_G$ is the number of G's in the sequence, $N_C$ is the number of C's in the sequence and $N_{total}$ is the total length of the sequence.

**Now it is your turn!**

Write a function that returns the percentage G and C bases in a DNA given sequence.

* First think what would be a good name for the function. The Python convention is
  that function names are lower case with words separated by underscores - so lets call
  it `percentage_gc`
* What arguments should `percentage_gc` have?
* What will `percentage_gc` return.

In [2]:
# Instruction: write a function thats returns the number of G and C bases in a given sequence.
def percent_gc(inputseq):   ### replace the ____

    """This function returns a percentage of a sequence that is G and C (hopefully)"""
    gc_count = 0

    for letter in inputseq:
        if letter in "GgCc":
            gc_count += 1

    gc_perc = 100 * (gc_count / len(inputseq))
    return gc_perc ### replace the ____


Now lets first review the help message for the function:

In [3]:
# Instruction: run this cell to see the help message for the function:
help(percent_gc)

Help on function percent_gc in module __main__:

percent_gc(inputseq)
    This function returns a percentage of a sequence that is G and C (hopefully)



**Question:** is the help message understandable without reading the function?
If not revise the docstring in you percentage_gc function above.

Now we need to test the `percent_gc` function works

First complete this table 

> you will need to edit this Markdown cell to do this double click on this cell 
> and then replace the ? in the table below. When you have finished run the cell
> with the run button above (and save your work).

| **test sequence** | **Percentage GC content** |
| ----------------- | :-----------------------: |
| G                 |   100%   |
| AC                |   50%    |   
| CAT               |   33.333%|
| C                 |   100%   |
| A                 |   0%     |
| AAA               |   0%     |
| GGG               |   100%   |
| GAC               |   66.667%|
| GGGAAACCCTTT      |   50%    |

A good way to store this data is as a list of tuples, where each tuple is (a test sequence, manually calculated %GC content):

In [4]:
# Idea: complete this test the percentage_gc function with 
#       the data from the completed table above.
test_seq_percent_gc = [('G', 100.),
                       ('AC', 50.),
                       ('CAT', 33.33333),
                       ('C', 100),
                       ('A', 0),
                       ('AAA', 0),
                       ('GGG', 100),
                       ('GAC', 66.66667),
                       ('GGGAAACCCTTT', 50)
                      ]
for test_seq, expect_percent in test_seq_percent_gc:
    print('test sequence ' + test_seq)
    print('  expect%:' + str(expect_percent))

    actual_percent = percent_gc(test_seq)
    
    print('actual%: ' + str(actual_percent))

    if round(actual_percent, 3) == round(expect_percent, 3):
        print("PASS")
    else:
        print("FAIL")
   
    

    # Instruction: now compare expect_percent and actual_percent agree to 3 decimal places.

    #              If they agree print 'PASS' otherwise print 'FAIL'
    #               see Hint in next cell for comparing floating numbers
    ### add lines to do comparison.

test sequence G
  expect%:100.0
actual%: 100.0
PASS
test sequence AC
  expect%:50.0
actual%: 50.0
PASS
test sequence CAT
  expect%:33.33333
actual%: 33.33333333333333
PASS
test sequence C
  expect%:100
actual%: 100.0
PASS
test sequence A
  expect%:0
actual%: 0.0
PASS
test sequence AAA
  expect%:0
actual%: 0.0
PASS
test sequence GGG
  expect%:100
actual%: 100.0
PASS
test sequence GAC
  expect%:66.66667
actual%: 66.66666666666666
PASS
test sequence GGGAAACCCTTT
  expect%:50
actual%: 50.0
PASS


**Hint** Comparing floating numbers are exactly equal can lead to problems because of limited precision in numerical calculations. There are many ways to get around the problem (see https://stackoverflow.com/questions/5595425/what-is-the-best-way-to-compare-floats-for-almost-equality-in-python).

A reasonable approach here is to use the round function to compare the numbers to 3 decimal places:

In [5]:
# Instruction run this cell to see help on the round function
help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



In [6]:
# instruction run this example to see how round can be used to compare numbers
a_num = 6.0
b_num = 5.999
print('b_num rounded to 2 d.p. ' + str(round(b_num, ndigits=2)))
if round(a_num, ndigits=2) == round(b_num, ndigits=2):
    print('the two numbers agree to 2 decimal places')
else:
    print('the two numbers do not agree to 2 d.p.')

b_num rounded to 2 d.p. 6.0
the two numbers agree to 2 decimal places


## Predicting the melting temperature of short DNA sequences

The GC content of DNA is an important property. Although DNA forms double helical structures it is
often necessary in both biology and biotechnology to separate the two strands. One way of seperating the strands is to warm up the DNA solution so that the DNA double helix 'melts' into separate strands. 
See video: https://www.youtube.com/watch?v=OblPqlOOgew for further background. If the temperature is reduced the
DNA will 'anneal' back to the double stranded form.

DNA melting and annealing are crucial for [polymerase chain reaction (PCR)](https://www.nature.com/scitable/definition/polymerase-chain-reaction-pcr-110).
PCR is a widely used molecular biology technique to reproduce (amplify) selected 
sections of DNA. Knowing the DNA melting temperature for given DNA sequences is
important for PCR.

For short DNA sequences it is possible to predict the DNA melting temperature (the temperature at which 50% of the DNA is double-helical and 50% is in single strands) from its sequence (from http://biotools.nubic.northwestern.edu/OligoCalc.html). For sequences less than 14 bases the:

$$
T_m = 2(N_A + N_T) + 4(N_G + N_C)
$$

where:
* $T_m$ is the 'basic' melting temperature in $^{\circ}C$
* $N_A$ is the number of A's in the sequence
* $N_T$ is the number of T's in the sequence
* $N_G$ is the number of G's in the sequence
* $N_C$ is the number of C's in the sequence

This formula is from http://biotools.nubic.northwestern.edu/OligoCalc.html who cite Marmur,J., and Doty,P. (1962) *J Mol Biol* **5**:109-118 https://doi.org/10.1016/S0022-2836(62)80066-7

You can see that as we would expect sequences of the same length with a higher GC content have a higher predicted melting temperature.

Write a function basic_tm to calculate the basic melting temperature according to the equation.

In [7]:
# Instruction write a basic_tm function
def basic_tm(sequence):
    """This function accepts a string that contains a short (<14 bases) DNA sequence (As, Ts, Gs, and Cs only)
    and returns its melting temperature in degrees C. 
    
    The formula used can be found at http://biotools.nubic.northwestern.edu/OligoCalc.html who cite 
    Marmur,J., and Doty,P. (1962) *J Mol Biol* **5**:109-118 https://doi.org/10.1016/S0022-2836(62)80066-7
    """
    tm = 0 
    atcount = 0
    gccount = 0
    if len(sequence) >= 14:
        raise ValueError("The string is too long - the output may not be accurate")

    for letter in sequence:

        if letter in "AaTt":
            atcount += 1

        elif letter in "GgCc":
            gccount += 1

        elif letter in "Uu":
            raise ValueError("Whoops! Looks like that might have uracil in there, or you've just written a word with a U in. Either way, try a different sequence with only A, T, G, and C")
        
        else:
            raise ValueError("Doesn't look like that's a valid string of DNA. Try inputting a sequence that contains only As, Ts, Gs and Cs!")
    tm = (2 * atcount) + (4 * gccount)
    return tm 

Now lets review the help message:

In [8]:
# Instruction: run this cell to see the help message for basic_tm:
help(basic_tm)

Help on function basic_tm in module __main__:

basic_tm(sequence)
    This function accepts a string that contains a short (<14 bases) DNA sequence (As, Ts, Gs, and Cs only)
    and returns its melting temperature in degrees C. 
    
    The formula used can be found at http://biotools.nubic.northwestern.edu/OligoCalc.html who cite 
    Marmur,J., and Doty,P. (1962) *J Mol Biol* **5**:109-118 https://doi.org/10.1016/S0022-2836(62)80066-7



**Questions:** 
* Is the help message understandable? 
* Could you use the function without reading its code?
* Have you credited where the formula came from. 
  It is really important as a scientist to correctly cite
  others work. 
  
If not revise the docstring in the basic_tm above.

#### Testing the function:

Does your function work as it should. We need some data to test.
    
Complete the following table by running the basic tm calculation for 
each sequence at http://biotools.nubic.northwestern.edu/OligoCalc.html 

| **test sequence**   | **Basic Tm (degrees C) from OligoCalc** |
| -----------------   | :-------------------------------------: |
| G                   |  4    |
| GG                  |  8    |   
| A                   |  2    |
| GA                  |  6    |
| TTTTCCCC            |  24   |
| AAGGGCTCTATAA       |  36   |
| AAAAAAAAAA    |  20  |

The formula applies for sequences less than 14 bases. So when choosing sequences
it would be a good idea to include two or three sequences that are 13 bases long
with different (high, low, medium) percentage GC.

In [31]:
# create a test data list tuples pairs like we did above 
test_sequences = [("G", 4),
                    ("GG", 8),
                    ("A", 2),
                    ("GA", 6),
                    ("TTTTCCCC", 24),
                    ("AAAAAAAAAA", 20),
                    ("AAGGGCTCTATAA", 36)]

In [32]:
# now test your function with test_seq_basic_tm
for test, temp in test_sequences:
    print("The test sequence is " + test)
    print("The expected value is " + str(temp))
    found_value = basic_tm(test)
    print("The found calculated value is " + str(found_value))
    if found_value == temp:
        print("SUCCESS")
    else:
        print("NO. DO BETTER.")
# FIRST cut and paste and adapt code from the example above

The test sequence is G
The expected value is 4
The found calculated value is 4
SUCCESS
The test sequence is GG
The expected value is 8
The found calculated value is 8
SUCCESS
The test sequence is A
The expected value is 2
The found calculated value is 2
SUCCESS
The test sequence is GA
The expected value is 6
The found calculated value is 6
SUCCESS
The test sequence is TTTTCCCC
The expected value is 24
The found calculated value is 24
SUCCESS
The test sequence is AAAAAAAAAA
The expected value is 20
The found calculated value is 20
SUCCESS
The test sequence is AAGGGCTCTATAA
The expected value is 36
The found calculated value is 36
SUCCESS


In [33]:
# Advanced programmers only! Avoid code duplication 
# in the testing by writing a function  that can test
# both percentage_gc and basic_tm functions against the
# appropriate test data set.

gc_test_data_ex = [("A", 0),
                    ("G", 100),
                    ("ATGC", 50),
                    ("AGG", 66.6666667),
                    ("AAAA", 43)]

temp_test_data_ex = [("A", 2),
                    ("AGGT", 12), 
                    ("TGGACT", 18),
                    ("AAAA", 23)]
def temp_and_gc_tester(gc_test_data, temp_test_data):

    """This function is able to run a set of test data for calculation of GC percentage, 
    and a different set of test data for calculation of temperature percentage, at the same time.
    It will print a sentence telling the observer whether the output of the function is the same
    as the value contained in the test data."""

    #Testing the GC content function against the test data
    for gc_test_item, gc_test_result in gc_test_data:
        gc_test_result = round(gc_test_result, 3)
        print("The test sequence is " + gc_test_item)
        print("The test GC percentage is " + str(gc_test_result))
        calculated_gc = round(percent_gc(gc_test_item), 3)
        if gc_test_result == calculated_gc:
            print("The calculated GC percentage is also " + str(calculated_gc) + " - well done!")
        else:
            print("The calculated GC percentage is " + str(calculated_gc) + " rather than " + str(gc_test_result) + ". Not quite right!")
    
    #Testing the temperature function against the test data
    for temp_test_item, temp_test_result in temp_test_data:
        print("The test sequence is " + temp_test_item)
        temp_test_result = round(temp_test_result, 3)
        print("The test temperature is " + str(temp_test_result))
        calculated_temp = round(basic_tm(temp_test_item), 3)
        if temp_test_result == calculated_temp:
            print("The calcualated temperature is also " + str(calculated_temp) + " - well done!")
        else:
            print("The calculated temperature is " + str(calculated_temp) + " rather than " + str(temp_test_result) + ". Not quite right!")

temp_and_gc_tester(gc_test_data_ex, temp_test_data_ex)




The test sequence is A
The test GC percentage is 0
The calculated GC percentage is also 0.0 - well done!
The test sequence is G
The test GC percentage is 100
The calculated GC percentage is also 100.0 - well done!
The test sequence is ATGC
The test GC percentage is 50
The calculated GC percentage is also 50.0 - well done!
The test sequence is AGG
The test GC percentage is 66.667
The calculated GC percentage is also 66.667 - well done!
The test sequence is AAAA
The test GC percentage is 43
The calculated GC percentage is 0.0 rather than 43. Not quite right!
The test sequence is A
The test temperature is 2
The calcualated temperature is also 2 - well done!
The test sequence is AGGT
The test temperature is 12
The calcualated temperature is also 12 - well done!
The test sequence is TGGACT
The test temperature is 18
The calcualated temperature is also 18 - well done!
The test sequence is AAAA
The test temperature is 23
The calculated temperature is 8 rather than 23. Not quite right!


### Dealing with longer sequences

Returning to the prediction of the DNA melting temperature. The formula:

$$
T_m = 2(N_A + N_T) + 4(N_G + N_C)
$$

only applies for sequences that are shorter than 14 nucleotide. 
For sequences of 14 nucleotides and longer the OligoCalc Tool
http://biotools.nubic.northwestern.edu/OligoCalc.html 
uses the equation:

$$
T_m= 64.9 +41.\frac{N_G + N_C - 16.4}{N_A + N_T + N_G + N_C}
$$

This equation is taken from
Sambrook,J., and Russell,D.W. (2001) Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press; Cold Spring Harbor, NY. [CHSL Press](http://www.molecularcloning.com/).

First extend your test set to include sequences of 14 bases and upwards:

Complete the following table by running the basic tm calculation for 
each sequence at http://biotools.nubic.northwestern.edu/OligoCalc.html 

| **test sequence**   | **Basic Tm (degrees C) from OligoCalc** |
| -----------------   | :-------------------------------------: |
| `14*'G'`            |  57.9   |
| `14*'A'`            |  16.9  |   
| AGAGTATACCCCCCCTTTTTTTGCTCGGGGGTCAT |   66.8   |
| TATATAACGGTTTTTAATTTTTTGCCGTGGTCAA |  57.2    |
| ACTTTATAACTAGGGCGTCCATTAGACCTAAGAGA |  62.1    |

*Note that Oligocalc quotes Basic Tm to one decimal place. It might
have been more appropriate to round to the nearest figure to
show users these are ball-park predictions? In any case you will
need to take this into account in your comparisons*

In [34]:
# instruction append this test data to your test_data list for basic_tm function
new_test_data = test_sequences + [(14 * "G", 57.8),
                                        (14 * "A", 16.9),
                                        ("AGAGTATACCCCCCCTTTTTTTGCTCGGGGGTCAT", 66.8),
                                        ("TATATAACGGTTTTTAATTTTTTGCCGTGGTCAA", 57.2),
                                        ("ACTTTATAACTAGGGCGTCCATTAGACCTAAGAGA", 62.1)]



In [35]:
# instruction check that the current implementation of basic_tm fails for new test data
for testseq, testresult in new_test_data:
    print(basic_tm(testseq))

4
8
2
6
24
20
36


ValueError: The string is too long - the output may not be accurate

In [36]:
# instruction - replace basic_tm with a new implementation that deals with longer sequences
def less_basic_tm(longsequence):
    """This function accepts a string that contains a DNA sequence (As, Ts, Gs, and Cs) and returns
    the melting temperature of that sequence.
    
    The formula used for strings longer than or equal to 14 bases is: T_m= 64.9 +41.\frac{N_G + N_C - 16.4}{N_A + N_T + N_G + N_C}
    This equation is taken from Sambrook,J., and Russell,D.W. (2001) Molecular Cloning: 
    A Laboratory Manual. Cold Spring Harbor Laboratory Press; Cold Spring Harbor, NY. 
    [CHSL Press](http://www.molecularcloning.com/).

    The formula used for strings shorter than 14 bases can be found at http://biotools.nubic.northwestern.edu/OligoCalc.html who cite 
    Marmur,J., and Doty,P. (1962) *J Mol Biol* **5**:109-118 https://doi.org/10.1016/S0022-2836(62)80066-7
    """
    tm = 0 
    atcount = 0
    gccount = 0

    for letter in sequence:

        if letter in "AaTt":
            atcount += 1

        elif letter in "GgCc":
            gccount += 1

        elif letter in "Uu":
            raise ValueError("Whoops! Looks like that might have uracil in there, or you've just written a word with a U in. Either way, try a different sequence with only A, T, G, and C")
        
        else:
            raise ValueError("Doesn't look like that's a valid string of DNA. Try inputting a sequence that contains only As, Ts, Gs and Cs!")

    if 0 < len(sequence) < 14:
        tm = (2 * atcount) + (4 * gccount)
    elif len(sequence) >= 14:
        tm = 64.9 + (41 * ((gccount - 16.4)/ (atcount + gccount)))
    elif len(sequence) == 0:
        raise ValueError("Where's the sequence? Please provide an input!")
    return tm 


In [37]:
# instruction - check the help message for the new implementation of basic_tm
help(less_basic_tm)

Help on function less_basic_tm in module __main__:

less_basic_tm(longsequence)
    This function accepts a string that contains a DNA sequence (As, Ts, Gs, and Cs) and returns
    the melting temperature of that sequence.
    
    The formula used for strings longer than or equal to 14 bases is: T_m= 64.9 +41.rac{N_G + N_C - 16.4}{N_A + N_T + N_G + N_C}
    This equation is taken from Sambrook,J., and Russell,D.W. (2001) Molecular Cloning: 
    A Laboratory Manual. Cold Spring Harbor Laboratory Press; Cold Spring Harbor, NY. 
    [CHSL Press](http://www.molecularcloning.com/).
    
    The formula used for strings shorter than 14 bases can be found at http://biotools.nubic.northwestern.edu/OligoCalc.html who cite 
    Marmur,J., and Doty,P. (1962) *J Mol Biol* **5**:109-118 https://doi.org/10.1016/S0022-2836(62)80066-7



In [39]:
# instruction - rerun tests and check that basic_tm works.
for sequence, temp in new_test_data:
    print("The test sequence is " + sequence)
    temp = round(temp, 1)
    print("The test temperature is " + str(temp))
    print("The calculated temperature is " + str(round(less_basic_tm(sequence), 1)))
    if temp == round(less_basic_tm(sequence), 1):
        print("Good")
    else:
        print("Bad")

The test sequence is G
The test temperature is 4
The calculated temperature is 4
Good
The test sequence is GG
The test temperature is 8
The calculated temperature is 8
Good
The test sequence is A
The test temperature is 2
The calculated temperature is 2
Good
The test sequence is GA
The test temperature is 6
The calculated temperature is 6
Good
The test sequence is TTTTCCCC
The test temperature is 24
The calculated temperature is 24
Good
The test sequence is AAAAAAAAAA
The test temperature is 20
The calculated temperature is 20
Good
The test sequence is AAGGGCTCTATAA
The test temperature is 36
The calculated temperature is 36
Good
The test sequence is GGGGGGGGGGGGGG
The test temperature is 57.8
The calculated temperature is 57.9
Bad
The test sequence is AAAAAAAAAAAAAA
The test temperature is 16.9
The calculated temperature is 16.9
Good
The test sequence is AGAGTATACCCCCCCTTTTTTTGCTCGGGGGTCAT
The test temperature is 66.8
The calculated temperature is 66.8
Good
The test sequence is TATATA

## Optional: Adapting the basic_tm to deal with  ambiguous sequence X

> Have a go at this if you have time.

* What happens if you were presented with sequences where there was ambiguity and unknown nucleotides were marked with the code `X` ? 
* There are a number of possibilities:
  * We could ignore the `X` completely - but is this sensible?
  * Or we could work out the maximum basic_tm by assuming that all `X` were `G` or `C`
  * Conversely we could find the minimum basic_tm by assuming that all `X` were `A` or `T`
  * Or we could take a 'half-way house' that produces answers averaging the maximum and minimum.
  
Adapt your functions introducing an optional argument:
```
ambiguity='mid'

# docstring description for ambiguity
"""
ambiguity controls how X in sequence are treated. 
If ambiguity is set to 'max' then X's in sequence are treated 
as G/C to give the maximum basic Tm. If set to 'min' then
they are treated as A/T to give the minimum. The default 'mid'
averages the 'max' and 'min' values.
"""
```

In [73]:
# instruction - introduce the ambiguity='mid' into basic_tm
def even_less_basic_tm(longsequence, ambiguity="mid"):
    """This function accepts a string that contains a DNA sequence (As, Ts, Gs, and Cs) and returns
    the melting temperature of that sequence.
    
    The formula used for strings longer than or equal to 14 bases is: T_m= 64.9 +41.\frac{N_G + N_C - 16.4}{N_A + N_T + N_G + N_C}
    This equation is taken from Sambrook,J., and Russell,D.W. (2001) Molecular Cloning: 
    A Laboratory Manual. Cold Spring Harbor Laboratory Press; Cold Spring Harbor, NY. 
    [CHSL Press](http://www.molecularcloning.com/).

    The formula used for strings shorter than 14 bases can be found at http://biotools.nubic.northwestern.edu/OligoCalc.html who cite 
    Marmur,J., and Doty,P. (1962) *J Mol Biol* **5**:109-118 https://doi.org/10.1016/S0022-2836(62)80066-7
    """
    tm = 0 
    atcount = 0
    gccount = 0
    xcount = 0

    for letter in longsequence:

        if letter in "AaTt":
            atcount += 1

        elif letter in "GgCc":
            gccount += 1
        
        elif letter in "Xx":
            xcount += 1

        elif letter in "Uu":
            raise ValueError("Whoops! Looks like that might have uracil in there, or you've just written a word with a U in. Either way, try a different sequence with only A, T, G, and C")
        
        else:
            raise ValueError("Doesn't look like that's a valid string of DNA. Try inputting a sequence that contains only As, Ts, Gs and Cs!")
    

    if ambiguity=="min":
        atcount += xcount
        if 0 < len(longsequence) < 14:
            tm = (2 * atcount) + (4 * gccount)
        elif len(longsequence) >= 14:
            tm = 64.9 + (41 * ((gccount - 16.4)/ (atcount + gccount)))
        elif len(longsequence) == 0:
            raise ValueError("Where's the sequence? Please provide an input!")
        return tm
    elif ambiguity == "max":
        gccount += xcount
        if 0 < len(longsequence) < 14:
            tm = (2 * atcount) + (4 * gccount)
        elif len(longsequence) >= 14:
            tm = 64.9 + (41 * ((gccount - 16.4)/ (atcount + gccount)))
        elif len(longsequence) == 0:
            raise ValueError("Where's the sequence? Please provide an input!")
        return tm
    elif ambiguity == "mid":
        minambi = even_less_basic_tm(longsequence, ambiguity="min")
        maxambi = even_less_basic_tm(longsequence, ambiguity="max")
        midambi = (minambi + maxambi)/2
        return midambi




In [80]:
# test it (manual will do)
testingambiguity = ""
for ambi in ["min", "mid", "max"]:
    print(even_less_basic_tm(testingambiguity, ambiguity=ambi))

ValueError: Where's the sequence? Please provide an input!

Extend the ambiguity optional argument to have an additional value `'range'` that instead of returning a single basic_tm value returns a tuple of the minimum to maximum values.

In [97]:
# instruction - introduce the additional value 'range' for ambiguity
def even_even_less_basic_tm(sequence, ambiguity="mid", range=False):
    if range:
        list_of_values = []
        list_of_ambis = ["min", "mid", "max"]
        for ambi in list_of_ambis:
            list_of_values.append(even_less_basic_tm(sequence, ambiguity = ambi))
        tupleofvalues = tuple(zip(list_of_ambis, list_of_values))
        return tupleofvalues
    elif not range:
        return even_less_basic_tm(sequence, ambiguity)


In [98]:
# test it (manual will do)
testsequence = "AAXGGG"
even_even_less_basic_tm(testsequence, ambiguity="mid", range=True)

(('min', 18), ('mid', 19.0), ('max', 20))

# Real world prediction of DNA melting temperatures using Python

* For real-world prediction of DNA melting temperatures [BioPython](https://biopython.org/)
  includes a MeltingTemp module:

  http://biopython.org/DIST/docs/api/Bio.SeqUtils.MeltingTemp-module.html 

  that implements a wide range of algorithms for Tm prediction. 
  
  An alternative is https://pypi.org/project/melt/
  
Web tools:
* http://biotools.nubic.northwestern.edu/OligoCalc.html
* https://www.ebi.ac.uk/biomodels-static/tools/melting/melt.php

## Optional: Try out the BioPython MeltingTemp module

This is for the keen! 

> ### Reminder: saving your work
>
> As you work through the work book it is important to regularly save your work. Notice that as you have made  changes the Jupyter window top line will warn you there are `(unsaved changes)` in small text. To save your work in this notebook by either select menu item `File` `Save` or by hit the save button:
> 
> ![Jupyter Notebook Save Button|](https://aru-bioinf-ibds.github.io./images/save_button.png)
>
> 
> ### Reminder: getting help 
> Please see the page:
> [Help with programming](https://canvas.anglia.ac.uk/courses/12178/pages/help-with-programming)
> on ARU Canvas.

## Once you have completed this homework, 
Please see ARU Canvas page
[Reflection on Practical E: writing your own functions](https://canvas.anglia.ac.uk/courses/12178/discussion_topics/109556)
for the follow up.