# A first look at DNA

Python will be our tool to explore DNA at the level of sequence information - DNA, RNA, and Protein. There are other ways to explore biology through bioinformatics, but for now these tools will be a great place to start. 

## Representing DNA

In order for us to work with DNA, we will need to represent it virtually, a model:
![](./img/rotating-dna.gif)
The rotating image of DNA is of course not DNA, but a model that illustrates some properties we attribute to DNA (e.g. double-stranded, anti-parallel symetry, 10 paired nuclotides per turn). We could use numerical data to represent and model many of these properties, but for now let's start at one of the most abstract, the properties of a DNA sequence. 



## DNA as a string

In Python, there are basic "types" of information (data structures) that can be manipulated. One of these is called a string. 

In [None]:
# Create a string
string = "5"

This new variable we have created called 'string' is a string because we set its value by giving it information in quotes. We can check if it is a string by using the `type( )` function. To use the type function, place the value you want to check inside the parentheses. 

In [None]:
# check the type of 'string'


Remember, setting a variable by using the "" around that variable will cause Python to call that value a string. 

In [None]:
# Which of the following are strings (use the type function) 

variable_one = "a"
variable_two = ""
variable_three = 3
variable_four = "4"

# place the type() lines below
type(variable_three)

## Why do we need types?

You will notice that a number without quotes is another type for Python called 'int' which is short for integer. We need to have different 'types' of data in Python (like string or integer) because if Python knows (anthropomorphic) what is the type of a variable, Python will know know what properties that variable has and what can be done with it. 

For example, if I tell you that in my ficticious language all numbers ending in "a" are positive (sjushda, duhuda, ygbbda, khsia) and all numbers ending in "o" are negative (suduuo, jnjuho, makkhuso, shuso) you really don't know what those numbers are in English. You do however know several things, such as adding two "a" numbers will generate a sum that moves in a positive direction along a number line, adding an "a" with an "o" number will move in the negative direction along the number line, and multiplying an "a" by an "o" number will generate a number ending in "o". 

Even though you really don't know what the numbers are, you can manipulate these numbers and the same goes for Python working with our Arabic numbers. For more on this line of logic see also the works of the philospher [Searle](https://en.wikipedia.org/wiki/Chinese_room). 

## A string of DNA

Let's examine a string of DNA and some of it's properties

In [21]:
# a DNA string

my_DNA = 'tagctgttcgtacccgtgatcgtttcag'

print(my_DNA)

tagctgttcgtacccgtgatcgtttcag


As a string, there are several things Python can do. For example, a string can be 0 (e.g. the empty string "") or more characters (letters or numbers - alphanumerics). We can can get the length of a string using the `len()` function (length). 

In [22]:
# Check the length of the my_DNA string
len(my_DNA)

28

Another thing we may want to is to count how many times an individual character appears in a string. To do this, we are going to use a function called the `count()` Method\*.

\* we call `count()` a method because in order to use it, we must type a variable name, followed by a '.', followed by the function name. 

In [None]:
# How many 'a' nucleotides are in 'my_DNA'
# notice we need to place what we want to count inside quotes and the paratheneses 

my_DNA.count('a')



In [None]:
# calculate the count of all 4 nuclotides

# Challenge question

One of the properties of a piece of DNA is something called the [melting temperature](http://www.biophp.org/minitools/melting_temperature/demo.php?formula=basic) (Tm). The melting temperature is the temperature at which a double stranded DNA will separate into two single strands. This temperature is very important to know as when we replicate DNA in the lab, we need to determine this temperature in order to set up the replication reaction. 


For DNA fragments less than 14 nuclotides, we can simplfy the formula as:

melting_temp = (#A + #T) * 2 + (#G + #C) * 4

\# stands for the count of the nucleotide in the string


In [None]:
# Using what you know about Python, calculate the melting temperature for my_DNA. Create as many variables as you
# need to hold the intermediate variables (e.g the count of the individual nucleotides). Store the final value 
# as a new variable 'melting_temp'. A line that will print this varable, and a message is at the bottom of this cell






print("The melting temp of the strand is:",melting_temp,"oC" )


If you complete the challenge, you now know how to use Python to model the themodynamic properties of a DNA sequence. If you want to go further, upgrade your code to better model longer DNA sequences according to this formula which is a better fit (more accurate) for DNA sequences longer than 14 nucleotides:

Tm= 64.9 +41*(yG+zC-16.4)/(wA+xT+yG+zC) 