# Notebook 2: Data types

This notebook will introduce basic Python data types and data structures:
-  Numbers (integers, floats, bools)
-  Strings
-  Lists and Tuples
-  Dictionaries

Six exercises are given in (in two groups of three).

In [None]:
# Import the usual stuff first
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt        
%matplotlib inline

## Numbers

In [None]:
# Integers

In [None]:
# Floating point numbers


In [None]:
# You can also do math on numbers


In [None]:
# Boolean variables represent quantities that are True or False

## Strings

In [None]:
# Strings can be defined using either single or double quotes
first_name = 'Barbara'
last_name = "McClintock"
print(first_name, last_name)

In [1]:
# Multiline strings can be defined using triple quotes (single or double)
address = """
Cold Spring Harbor Laboratory
1 Bungtown Rd.
Cold Spring Harbor, NY 11724
"""
print(address)


Cold Spring Harbor Laboratory
1 Bungtown Rd.
Cold Spring Harbor, NY 11724



In [None]:
# The '+' sign concatenates strings


In [None]:
# It is simple to test if one string is contained within another


In [None]:
# The len() function tells you the length of a string


In [None]:
# The contents in a string can be indexed using brackets
# strings are index starting at 0
# str[-n] returns the n'th character from the end.
# str[start:stop]. This is called a 'slice'.
# str[start:stop:stride]
# strings can be reversed using a stride of -1

In [None]:
# You can convert from an string to an integer ...

# ... and from an integer to a string


'String formatting' allows strings to be built up from numbers, other strings, etc. More information is available here: https://docs.python.org/2/library/string.html

In [None]:
# String formatting
print('An int: %d'%np.pi)
print('A float: %.2f'%np.pi)
print('Two strings and a number: %s %s loves %f'%(first_name, last_name, np.pi))

In [None]:
# Make a string uppercase

## Exercises, part 1 of 2

Here is the DNA sequence of the multiple cloning site (MCS) on the plasmid [pcDNA5](https://www.addgene.org/vector-database/2132/), a popular vector for mammalian gene expression.

In [None]:
# Note how to define a long string over multiple lines
seq = 'GAGACCCAAGCTGGCTAGCGTTTAAACTTAAGCTTGGTACCGAGCTCGGATCCACTA' \
      'GTCCAGTGTGGTGGAATTCTGCAGATATCCAGCACAGTGGCGGCCGCTCGAGTCTAG' \
      'AGGGCCCGTTTAAACCCGCTGATCAGCCT'
print(seq)

**E2.1**: Does this MCS contain a restriction site for NheI (GCTAGC)? How about for MscI (TGGCCA)? 

In [None]:
# Answer here

**E2.2**: Using the string method `.find()`, find the location(s) of the above restriction sites within the MCS.

In [None]:
# Answer here

**E2.3**: Using the string method `.replace()`, compute the RNA sequence transcribed from the GFP gene sequence (given below). 

In [None]:
gfp_seq = 'ATGAGTAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATG' \
          'TTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTGAAGGTGATGCAACATACGGAAAACTTAC' \
          'CCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTTTC' \
          'GCGTATGGTCTTCAATGCTTTGCGAGATACCCAGATCATATGAAACAGCATGACTTTTTCAAGA' \
          'GTGCCATGCCCGAAGGTTATGTACAGGAAAGAACTATATTTTTCAAAGATGACGGGAACTACAA' \
          'GACACGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATAGAATCGAGTTAAAAGGTATT' \
          'GATTTTAAAGAAGATGGAAACATTCTTGGACACAAATTGGAATACAACTATAACTCACACAATG' \
          'TATACATCATGGCAGACAAACAAAAGAATGGAATCAAAGTTAACTTCAAAATTAGACACAACAT' \
          'TGAAGATGGAAGCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCT' \
          'GTCCTTTTACCAGACAACCATTACCTGTCCACACAATCTGCCCTTTCGAAAGATCCCAACGAAA' \
          'AGAGAGACCACATGGTCCTTCTTGAGTTTGTAACAGCTGCTGGGATTACACATGGCATGGATGA' \
          'ACTATACAAATAA'

# Answer here

## Lists and tuples

In [None]:
# Define a list using brackets and commas.
v = [1, 'two', 3.0, 'four', 5]
v

In [None]:
# Lists can be indexed using brackets just like strings can.


In [None]:
# Use 'in' to test whether an element is contained in a list.


In [None]:
# Change an element in a list.


In [None]:
# Append an element to the end of a list.


In [None]:
# You get an error if you try to access an index that doesn't exist.


In [None]:
# You also get an error if you pass a non-integer as an index.


In [None]:
# To create a list of numbers from 0 to n, use list(range(n))


In [None]:
# Sort a list of numbers


In [None]:
# Tuples are like lists, though they are defined using parentheses instead of brackets.
# Functions often pass tuples (not lists) back to the user.


In [None]:
# The key difference is that, while lists are "mutable", tuples are "immutable"
# i.e., you cannot change an element in a tuple after it has been created.


In [None]:
# You can join multiple strings together, separating them with a specified character,
# using the .join() string method


In [None]:
# Use the split() string method to chop a string into a list at a specific character


## Dictionaries

Dictionaries are one of Python's most useful datatypes. They can be thought of as a list of key-value pairs that allow values to be rapidly looked up via keys. Keys can be any (immutable) variable. Values can be anything.

In [None]:
# Dictionaries are defined using braces, colons, and commas


In [None]:
# Access dictionary elements using a "key" enclosed in brackets


In [None]:
# You can replace and add elements to a dictionary after it is created.


In [None]:
# From a dictionary, you can get a list of both the keys and the values.


In [None]:
#If you pass a key that doesn't exist, you get an error.


In [None]:
# It is sometimes useful to get a default value instead of an error when a key doesn't exist
    

In [None]:
# You can create a dictionary from a list of keys and values by using 'dict' and 'zip'


## Exercises, part 2 of 2

**E2.4**: Create a dictionary called `rc_dict` that maps DNA bases to their complementary bases. I.e., A -> T, C -> G, etc. 

In [1]:
# Answer here

**E2.5**: By passing this dictionary to the string method `.translate()`, then using indexing with a step of -1, compute the reverse complement of the MCS sequence given above.

In [4]:
# To compute the reverse complement, we need to create a 'translation table',
# which is also a dictionary, but takes numerical ascii values as keys
# instead of strings
rc_table = str.maketrans(rc_dict)
rc_table

**E2.5**: By passing `rc_table` to the string method `.translate()`, then using indexing with a step of -1, compute the reverse complement of the MCS sequence given above.

In [None]:
# Answer here

**E2.6**: We have not yet discussed sets. Using Google, figure out what `set` objects are and explain what they represent. In particular, explain why Python evaluates {2,3,3} < {1,2,3} as True.

In [None]:
# Answer here