# Lecture 15 2018-10-09: Comprehension, design, testing, hardening, efficiency

From specs to deliverables: development; debugging; hardening; making efficient

This worksheet accompanies the lecture notes.

## Comprehension

Let's do some comprehension practice. 


### Squares

Write two code chunks, one with a for loop and one with comprehension, to return a list of the first n squares then convert them to functions. Which of the four is better?

#### Write the code

In [1]:
# squares in a loop want : [0,2,3,4....2n]
mylist = []
n = 10
for i in range(n):
    if i % 2 ==0:
        mylist.append(i)
print mylist

[0, 2, 4, 6, 8]


In [2]:
[2*i for i in range(n)]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [4]:
[i for i in range(n) if i%2 ==0]

[0, 2, 4, 6, 8]

In [5]:
# squares in comprehension
[i*i for i in range(n)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [3]:
# square loop function

In [4]:
# square comprehension function

#### Answers and more

In [5]:
n=100

In [6]:
# squares in a loop
return_value = []
for i in range(n):
    return_value.append((i+1)**2)
x_1 = return_value

In [7]:
# squares in comprehension
x_2 = [(x+1)**2 for x in (list(range(n)))]

In [8]:
# square loop function

In [9]:
def squares(n):
    return_value = []
    for i in range(n):
        return_value.append((i+1)**2)
    return return_value

In [10]:
# square comprehension function

In [11]:
def c_squares(n):
    return [(x+1)**2 for x in (list(range(n)))]

In [12]:
#### Efficiency

In [13]:
%timeit squares(n)

37.2 µs ± 695 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [14]:
%timeit c_squares(n)

32.3 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [15]:
%timeit [(x+1)**2 for x in list(range(n))]

32.9 µs ± 592 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [16]:
%timeit parm_list =[x for x in list(range(n))]; [x**2 for x in parm_list]

38.9 µs ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


### codons in a string

given string *dna*, the codons are all substrings between indexes i and i+3 where i is a multiple of three.

In [17]:
dna = 'accgttggcaaaaaaggtc'
dna = dna.upper()

#### Write the code

#### Answers

In [18]:
[dna[i:i+3] for i in range(len(dna)-2) if i%3 == 0]

['ACC', 'GTT', 'GGC', 'AAA', 'AAA', 'GGT']

### Translation of a string
get the codons, then look up their proteins

In [19]:
codons = ['TAG', 'CCT', 'TAT', 'CTT', 'CAG', 'GTA', 
          'GGT', 'ATT', 'TGT', 'ACC', 'GTC', 'CGT', 
          'AGG', 'GCA', 'TTG', 'AAG', 'AGT', 'CCC', 
          'ACG', 'GGC', 'TCG', 'AAC', 'GAC', 'GAT', 
          'ATA', 'TCC', 'TAC', 'GTT', 'ACA', 'ATC', 
          'CCA', 'CTG', 'GAA', 'TCA', 'CGG', 'AGC', 
          'CAA', 'CAC', 'GCC', 'TGC', 'CGC', 'TTA', 
          'GTG', 'ATG', 'CTC', 'ACT', 'TTT', 'GCT', 
          'CAT', 'TCT', 'AAA', 'TAA', 'GCG', 'CCG', 
          'GAG', 'GGA', 'TGA', 'GGG', 'TTC', 'TGG', 
          'AAT', 'AGA', 'CTA', 'CGA']
amino_acids = ['_', 'P', 'Y', 'L', 'Q', 'V', 'G', 'I', 'C', 
               'T', 'V', 'R', 'R', 'A', 'L', 'K', 'S', 'P', 
               'T', 'G', 'S', 'N', 'D', 'D', 'I', 'S', 'Y', 
               'V', 'T', 'I', 'P', 'L', 'E', 'S', 'R', 'S', 
               'Q', 'H', 'A', 'C', 'R', 'L', 'V', 'M', 'L', 
               'T', 'F', 'A', 'H', 'S', 'K', '_', 'A', 'P', 
               'E', 'G', '_', 'G', 'F', 'W', 'N', 'R', 'L', 'R']

codon_translation = dict(zip(codons, amino_acids))  # complete this function

#### Write the code

In [20]:
[codon_translation[c] for c in 
     [dna[i:i+3] for i in 
          range(len(dna)-2) 
          if i%3 == 0
     ]
]

['T', 'V', 'G', 'K', 'K', 'G']

#### Answers

### Compute code degeneracy

Figure out how many codons encode the same amino acid. We want a dict 
>{ amino_acid : [codon1, ...], ... }

#### Write the code

#### Answers

In [21]:
reverse = {}
# want reverse[aa] = [c1, c2,...]
for codon, aa in codon_translation.items():
    reverse[aa] = reverse.get(aa,[])
    reverse[aa].append(codon)

print(reverse)

{'H': ['CAT', 'CAC'], 'D': ['GAC', 'GAT'], 'R': ['AGA', 'CGG', 'AGG', 'CGA', 'CGC', 'CGT'], 'M': ['ATG'], 'T': ['ACT', 'ACC', 'ACG', 'ACA'], 'L': ['CTT', 'CTA', 'TTG', 'CTC', 'CTG', 'TTA'], 'E': ['GAA', 'GAG'], 'F': ['TTT', 'TTC'], 'K': ['AAG', 'AAA'], 'Y': ['TAC', 'TAT'], '_': ['TAA', 'TAG', 'TGA'], 'Q': ['CAG', 'CAA'], 'G': ['GGG', 'GGC', 'GGT', 'GGA'], 'C': ['TGT', 'TGC'], 'W': ['TGG'], 'N': ['AAC', 'AAT'], 'P': ['CCT', 'CCA', 'CCC', 'CCG'], 'I': ['ATT', 'ATC', 'ATA'], 'V': ['GTC', 'GTT', 'GTA', 'GTG'], 'S': ['TCA', 'AGC', 'AGT', 'TCT', 'TCG', 'TCC'], 'A': ['GCT', 'GCC', 'GCG', 'GCA']}


### Count codons

#### Write the code

#### Answers

In [22]:
[(codon,dna.count(codon))for codon in 
    [dna[i:i+3] for i in 
        range(len(dna)-2) 
            if i%3 == 0
    ]
]

[('ACC', 1), ('GTT', 1), ('GGC', 1), ('AAA', 2), ('AAA', 2), ('GGT', 1)]

### Count codes
Count the number of times each code appears. 

Get percent ambiguity and GC content from this.

In [23]:
nucleotides = {'A', 'C', 'G', 'T'}
dna = 'aayy?cccgtnnyymngg'.upper()

#### Write the code

#### Answers

In [24]:
counts = { code: dna.count(code) for code in dna}
counts

{'?': 1, 'A': 2, 'C': 3, 'G': 3, 'M': 1, 'N': 3, 'T': 1, 'Y': 4}

In [25]:
gc_content = sum([counts[c] for c in {'C','G'}])/len(dna)
gc_content

0.3333333333333333

In [26]:
percent_ambiguity = 1-(sum([counts[c] for c in nucleotides])/len(dna))
percent_ambiguity

0.5

## Software design

Go through HW 3 and 4. talk about design ideas, functions, hardening, etc.