<a href="https://colab.research.google.com/github/SchachtmanLab/Transgenic-sorghum-sorgoleone/blob/master/%5BSTUDENT_COPY%5DMonday_afternoon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Welcome back! Let's dive right in to our afternoon session.

# Welcome back!

Let's go ahead and dive in to our afternoon session. Before we start, we highly recommend you pull up the cheat sheet from Day 1, which is available [here](https://drive.google.com/drive/folders/1JTZ_sJijmZXH1d5OxviX1gnxYIoSNBlz).

# Intro to loops

## `for` loops

As we discussed this morning, **iterables** are a special class of objects that can be accessed using **indexing**. Iterables are special because they can *return* their elements separately: for example, one at a time, or by *slices*.

Lists and strings are two examples of iterables.

In [1]:
# a quick review of creating and nesting lists
weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
weekends = ['Saturday', 'Sunday']
week = [weekdays, weekends]

print('Weekdays:', weekdays)
print('Weekends:', weekends)

Weekdays: ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
Weekends: ['Saturday', 'Sunday']


We can access elements in iterables using *indexing*. The first position of an iterable starts at the `0`th index due to *zero indexing*. For example, consider `weekdays`, which contains five string elements.

| iterable element | index position |
|------------------|----------------|
| Monday           | 0              |
| Tuesday          | 1              |
| Wednesday        | 2              |
| Thursday         | 3              |
| Friday           | 4              |

In [4]:
# first element in weekdays
print(weekdays[0])

# last element in weekdays
print(weekdays[-1])

# slice from the second element through the end of the list
print(weekdays[1:])
print(weekdays[:1])

# second sub-list in week
print(week[1])

# get first element from the second sub-list in week
print(week[1][0])

Monday
Friday
['Tuesday', 'Wednesday', 'Thursday', 'Friday']
['Monday']
['Saturday', 'Sunday']
Saturday


Although indexing allows us to access elements in iterable objects, it doesn't provide a system for systematically or sequentially accessing elements in an iterable.

For example, it would be really inefficient to call `len()` for each element of `weekdays`. How inefficient? Let's try it out.

In [5]:
# try it out:
# use len() and print() to count the length of each element in weekdays



Annoying, right? Thankfully, we can use something called a **`for` loop** to **iterate** through elements of a list or string.

The anatomy of a `for` loop is as follows:
```
for <ITEM> in <ITERABLE>:
	<EXECUTE CODE HERE>
```

First, we use the `for` operator to begin our loop. `<ITEM>` is a **placeholder** variable that we use to represent each element in the iterable during the loop. You can name `<ITEM>` to whatever you want: we suggest you try to use placeholders that make sense to you, relative to the iterable that you're trying to iterate through.

Next, we use the `:` operator to signal that we're transitioning into the code we want to be executed during the loop. The code that you want executed during the loop **must** go on the next line in an **indented block**: any code that is not in the indented block will be run normally (just once).

Python will diligently go through each element in your list or string, perform the code in the indented line(s), then move on to the next element: rinse and repeat.

Below, we're going to use the placeholder `day` to represent each element in `weekdays`. We'll print the length of each element once.

In [6]:
for day in weekdays:
    print(len(day)) # this is part of the loop

print("Now we are outside of the loop!")
print(len(day))
print('The weekdays list: ', weekdays) # this is NOT part of the loop

6
7
9
8
6
Now we are outside of the loop!
6
The weekdays list:  ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']


After going through every value in the list, the placeholder `day` now is the final element in the list:

In [7]:
print(day)

Friday


With just two lines, we can avoid needing to use indices to access each element of `weekdays`.


Let's move on to *nested* `for` loops. Nested `for` loops allow us to iterate over multiple iterables, which can be useful if we need to evaluate combinations of iterable elements.

![96 well plate](https://upload.wikimedia.org/wikipedia/commons/thumb/0/07/96-Well_plate.svg/453px-96-Well_plate.svg.png?20201120040527)

If you work in a lab without multichannels, you may be familiar with the tedious task of having to pipette across a 96-well plate. Imagine that you have to pipette across this whole plate, going row by row from A1 to H12: the following code roughly approximates the strategy you would take.

In [8]:
# warning: this is going to print a lot of text to output

# rows and columns of a 96 well plate
rows = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
columns = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']

for row in rows:
	print('Start of row', row)
	for col in columns:
		print(row+col) # the nested for statement code needs to be indented one more level

Start of row A
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
A11
A12
Start of row B
B1
B2
B3
B4
B5
B6
B7
B8
B9
B10
B11
B12
Start of row C
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
Start of row D
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
Start of row E
E1
E2
E3
E4
E5
E6
E7
E8
E9
E10
E11
E12
Start of row F
F1
F2
F3
F4
F5
F6
F7
F8
F9
F10
F11
F12
Start of row G
G1
G2
G3
G4
G5
G6
G7
G8
G9
G10
G11
G12
Start of row H
H1
H2
H3
H4
H5
H6
H7
H8
H9
H10
H11
H12


Above, our nested `for` loop generated all possible combinations of `rows` and `columns` in only four lines of code.

## Ranges

We can combine `for` loops with the built-in `range()` function to iterate over a specific numeric range.

The `range()` function takes three numeric inputs: a starting value, a stop value, and a step value (set to `1` by default).

In [None]:
# a range from 2 to 10, exclusive of 10
(range(2, 10))

As you can see, this iterable `range` object doesn't show all the values in its range by default: if we want to access each value in its range, we need to iterate over it.

In [9]:
print("First loop (all numbers)")
for number in range(5, 12):
    print(number)

print("Second loop (every third number)")
for number in range(5, 12, 3):
    print(number)

First loop (all numbers)
5
6
7
8
9
10
11
Second loop (every third number)
5
8
11


In [12]:
# try it out:
# print all even numbers between 20 and 30, including 20 and 30
for number in range(20, 32, 2):
    print(number)



20
22
24
26
28
30


One common strategy is to use `range()` in combination with `len()` to create a numeric range over the length of an existing iterable.

In [16]:
alphabet = 'abcdefghijklmnopqrstuvqxyz'
print(len(alphabet))
for index in range(0, len(alphabet)):
    print(alphabet[index])

26
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
q
x
y
z


`range()` can be useful for instances in which we want to use the *indices* of an iterable, rather than the iterable itself. This can be useful for obtaining interval-spaced elements or slices of an iterable.

In [17]:
# every other letter of the alphabet

alphabet = 'abcdefghijklmnopqrstuvqxyz'

for index in range(1, len(alphabet), 2):
  print(alphabet[index])

b
d
f
h
j
l
n
p
r
t
v
x
z


In [18]:
# three-letter interval slices of the alphabet

alphabet = 'abcdefghijklmnopqrstuvqxyz'

for index in range(0, len(alphabet), 3):
  print(alphabet[index:index+4])

abcd
defg
ghij
jklm
mnop
pqrs
stuv
vqxy
yz


# Logic & control flow

## Boolean logic checks
**Boolean logic** refers to a simple logic system where there are only two possible values: "true" and "false". In Python, Boolean logic is used to check for *identity* (ex. "is 5 + 3 equal to 8?"), *membership* ("is 5 in the range of integers between 1 and 7?"), and *conditionality* ("are there fewer than 8 people in the line for coffee?").

### Logical operators

In [19]:
# is 5 + 3 equal to 8?
print(5 + 3 == 8)
print(type(5 + 3 == 8))

True
<class 'bool'>


The first thing you'll notice here is the **`==` operator**: this is the *identity* operator, which checks if the objects on the left and right side share identity (are equal to each other). This operator is one of several operators that we can use to perform **logic checks**.

Although `True` looks like it could be a string, it isn't: it's a special type of its own called a `bool`, which is short for Boolean.

*Tip: you can distinguish strings from booleans in 2 ways: booleans are not surrounded by quotation marks and show up in blue in Colab, while strings are surrouned in quotes and show up in orange in Colab.*

In [21]:
# is the string 'True' equal to the Boolean value True?

print('True' == (True))
print('True' == 'True')

False
True


As you might guess, `False` is also a `bool` type object as well. `True` and `False` are the two Boolean logic types that Python evaluates: `True` indicates that the logic check has passed, and `False` indicates that the logic check has failed.

We can use several more familiar operators to perform logic checks with numeric values:
- `>`: Checks if the left value is greater than the right value.
- `<`: Checks if the right value is greater than the left value.
- `>=`: Checks if the left value is greater than OR equal to the right value.
- `<=`: Checks if the right value is greater than OR equal to the left value.
- `!=`: Checks if the left and right values are *not* equal to each other. (This should return the *opposite* value of what `==` returns.)

### Membership checks
Now that we understand how logical operators work, checking membership of an object in a data structure is a comically simple task: you just use the `in` operator.

In [22]:
mixed_list = ['one', 2, 3.0, 'four', 5]

print('four' in mixed_list) # Is 'four' in mixed_list?

print('seven' in mixed_list) # Is 'seven' in mixed_list?

True
False


Simple, right? Python interprets the `in` operator as a logic check for the *membership* of the element. If the element is present, Python returns `True`, and if not, Python returns `False`.

In [35]:
# try it out:
# replace the comment ### REPLACE ME ### with the appropriate logic check
# for the following operations

# the \n is a newline character that we are using to make the strings print
# in a more human-readable fashion: you can read more in the [Optional] string
# formatting section

print("Is 9 equal to '9'?\n", 9 == '9')

print('Is 5 greater than 3?\n', 5 > 3)

print('Is 9 less than 4?\n', 9 < 4 )

print('Is 12 greater than or equal to 12?\n', 12 >= 12)

print('Is 4.333 less than or equal to 4.33333333?\n', 4.333 <= 4.33333333)

print('Is 5 equal to 5?\n', 5 == 5)

print('Is 5 not equal to 5?\n', (5 != 5))

myList = [1, 2, 3, 4, 5]
print('Is 3 in the list [1, 2, 3, 4, 5]?\n', (3 in myList))

print("Is the substring 'an' present in the string 'banana'?\n", 'an' in 'bananana')



Is 9 equal to '9'?
 False
Is 5 greater than 3?
 True
Is 9 less than 4?
 False
Is 12 greater than or equal to 12?
 True
Is 4.333 less than or equal to 4.33333333?
 True
Is 5 equal to 5?
 True
Is 5 not equal to 5?
 False
Is 3 in the list [1, 2, 3, 4, 5]?
 True
Is the substring 'an' present in the string 'banana'?
 True


## Control flow

Now that we've learned how Python parses identity and membership, we can move on to how Python interprets *conditionality*. You can use **conditional statements** to control the operation of code that you may not always want to run. The use of conditional statements is an essential part of a programming strategy called **control flow**.

### Conditional statements

Let's start with the three main conditional statements, which will form the bulk of your control flow. The first conditional statement is the workhorse of conditionals:

- `if`: Code will be executed *if* the logic check passes (returns `True`).

The following two conditional statements can be used to create additional complexity in your conditionals. These **cannot** be used on their own: they must always follow an `if` statement.
- `else`: An optional secondary statement that must follow an `if` statement. Code will be executed if the preceding `if` statement's logic check fails (returns `False`).
- `elif` ("else if"): An optional secondary statement that must follow an  `if` statement, but come *before* a closing `else` statement. If the preceding `if` statement's logic check fails, then the `elif` statement's logic check will be executed in the same manner as an `if` statement.

Below, we'll start with the simplest possible conditional execution.

In [36]:
mixed_list = ['one', 2, 3.0, 'four', 5]

# we want to iterate through each element in mixed_list
# and print a sentence if the element is a string

for element in mixed_list:
    if type(element) == str:  # str represents the string type
        print(element, "is a string.")

one is a string.
four is a string.


We directed Python to only run `print()` if the element was a string: given that there were only two strings in `mixed_list`, Python only printed text twice.

Now, let's throw in an `else` statement to print something else if the element is *not* a string.

In [37]:
mixed_list = ['one', 2, 3.0, 'four', 5, 6]

# we want to go through each element in mixed_list
# and print different sentences depending on the element type

for element in mixed_list:
    if type(element) == str:
        print(element, "is a string.")
    else:
        print(element, "is not a string, it is a", type(element))

one is a string.
2 is not a string, it is a <class 'int'>
3.0 is not a string, it is a <class 'float'>
four is a string.
5 is not a string, it is a <class 'int'>
6 is not a string, it is a <class 'int'>


Let's add another layer of complexity with the `elif` statement. `elif` statements come into play when the preceding `if` logic check fails: `elif` provides alternative "paths" for the code to take before defaulting to the catch-all `else` statement.

In [39]:
mixed_list = ['one', 2, 3.0, 'four', 5, 6]

for element in mixed_list:
    if type(element) == str:
        print(element, "is a string.")
    elif element % 2 == 0: # remember that % returns the remainder of the division
        print(element, "is an even number.")
    else:
        print(element, "is not a string, it is a", type(element))

one is a string.
2 is an even number.
3.0 is not a string, it is a <class 'float'>
four is a string.
5 is not a string, it is a <class 'int'>
6 is an even number.


### Multiple logic checks
Sometimes logic checks are more complex than a simple one-off check. Perhaps you have multiple conditions that you need to be fulfilled, or maybe just one in a set: that's where complex conditional operators come in.

- `and`: Indicates that multiple logic checks must be passed to yield `True`.
- `or`: Indicates that *at least one* logic check must be passed to yield `True`.

In [40]:
# Decision making in the fruit section: it's complicated!

fruits = ['apple', 'orange', 'banana', 'grapes', 'kiwi']
colors = ['red', 'yellow', 'orange']
sale = ['apple', 'banana']
favorite = ['apple', 'grapes']

for fruit in fruits:
    print("Let me take a look at this", fruit, "...")
    if (fruit in fruits) and (fruit in colors):
        print("Isn't it funny that a(n)", fruit, "is both a fruit and a color?")
    elif (fruit in sale) or (fruit in favorite):
        print("I'll take this today.")
    else:
        print("Not this one.")

Let me take a look at this apple ...
I'll take this today.
Let me take a look at this orange ...
Isn't it funny that a(n) orange is both a fruit and a color?
Let me take a look at this banana ...
I'll take this today.
Let me take a look at this grapes ...
I'll take this today.
Let me take a look at this kiwi ...
Not this one.


# Exercises: Set A

**A1**:
Write a series of nested loops that will print every combination of the departmental titles, prefixes, and suffixes given above.

In [42]:
# how many generic department names can we create?

dept_titles = ['The Department of', 'The Division of', 'The Center for']
dept_prefixes = ['Molecular', 'Physical', 'Quantitative', 'Computational']
dept_suffixes = ['Biology', 'Chemistry', 'Biology', 'Chemistry']

for title in dept_titles:
    for prefix in dept_suffixes:
        for suffix in dept_suffixes:
            print(title, prefix, suffix)

### write your code below ###




The Department of Biology Biology
The Department of Biology Chemistry
The Department of Biology Biology
The Department of Biology Chemistry
The Department of Chemistry Biology
The Department of Chemistry Chemistry
The Department of Chemistry Biology
The Department of Chemistry Chemistry
The Department of Biology Biology
The Department of Biology Chemistry
The Department of Biology Biology
The Department of Biology Chemistry
The Department of Chemistry Biology
The Department of Chemistry Chemistry
The Department of Chemistry Biology
The Department of Chemistry Chemistry
The Division of Biology Biology
The Division of Biology Chemistry
The Division of Biology Biology
The Division of Biology Chemistry
The Division of Chemistry Biology
The Division of Chemistry Chemistry
The Division of Chemistry Biology
The Division of Chemistry Chemistry
The Division of Biology Biology
The Division of Biology Chemistry
The Division of Biology Biology
The Division of Biology Chemistry
The Division of Chem

In [43]:
# tip: in python you can use the += operator to simultaneously add a value to
# a variable and update the variable in place

vowel_count = 0

for char in 'The quick brown fox jumped over the lazy dog':
    if char in 'aeiou':  # logic check for vowels
        vowel_count += 1

print(vowel_count)

12


**A2**: Let's consider some other examples of operations that you can do with a `for` loop. One common operation is counting the instances of an element in an iterable.

Below, write code that will loop through `sample_seq` and count the number of each nucleotide by updating the corresponding variable. Afterwards, print the percentage of each nucleotide in `sample_seq`.

In [45]:
sample_seq = 'TACGTAGCGTGGTCGCACAAGCACAGTAGATCCTCCCCGCGCATCCTATTTATTAAGTTAATTCT'

### write your code below ###
A_count = 0
T_count = 0
G_count = 0
C_count = 0

for nucleotide in sample_seq:
    if nucleotide == "A":
        A_count += 1
    elif nucleotide == "T":
        T_count += 1
    elif nucleotide == "G":
        G_count += 1
    elif nucleotide == "C":
        C_count += 1
print(A_count / len(sample_seq) * 100)

24.615384615384617


**A3**: Let's put together Boolean logic and control flow to solve the "Hello, world" of control flow: a classic programming question called `fizzbuzz`.

Below, we've provided a list of integers called `fb_numbers`. Your task is to write code that will do the following things.
1. Loop through each element in `fb_numbers`.
2. For each element:
  * If the element is cleanly divisible by 2 (no remainder), print `'fizz'`
  * If the element is cleanly divisible by 5 (no remainder), print `'buzz'`
  * If the element is cleanly divisible by both 2 AND 5 (no remainder), print `'fizzbuzz'`

Only *one* of the above `'fizz'`/`'buzz'`/`'fizzbuzz'` should print per number


In [207]:
fb_numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# hints: refer to the numeric operations we learned about this morning
#        the order of your conditionals matters

### write your code below ###
fb_numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for num in fb_numbers:
    if num % 2 == 0 and num % 5 == 0:  # Check divisibility by both 2 and 5 first
        print("fizzbuzz")
    elif num % 2 == 0:  # Then check divisibility by 2
        print("fizz")
    elif num % 5 == 0:  # Finally, check divisibility by 5
        print("buzz")


fizz
fizz
buzz
fizz
fizz
fizzbuzz


# More about functions

Earlier this morning, we learned to use built-in functions, like  `print()`, `len()`, and `type()`. Over the course of this afternoon, we'll be extending our knowledge of functions to incorporate two new kinds of functions, which we'll broadly describe as:

- **Methods**: functions that are specific to their object type
- **Custom functions**: functions written by you!

Writing and/or using these functions will allow you to avoid repeating large chunks of code that could otherwise make your analyses tedious and unmanageable.

## Methods

**Methods** are a type of function that can be called directly on certain object types and data structures in Python. We can think of methods as useful built-in shortcuts that we can use to get information or manipulate our object/data.


### Methods for lists
We'll prime our intuition about methods by practicing some methods for lists. Lists are a useful data structure because they can be altered using methods: elements can be added and removed, and lists can even be combined with each other.

In [None]:
num_list = [5, 9, 2.3, 14, 3, 2, 10]
num_list.append(18)  # .append() adds a single value to the end of the list

print(num_list)

Notice a couple of things here about the *syntax* of using a method.
1. `.append()` goes *after* the variable name, with the `.` acting as a link between the object and the method. <br>This is because methods act upon the specific object that they're pointing to (in this case, `num_list`). You can say that methods operate *in reference* the object that they're linked to.
2. `num_list` did not have to be updated for `18` to show up at the end of the list. <br>This is because methods for lists work **in place**, meaning that they directly modify the list contained within the variable and save it in place. Not all methods will operate in place, and it can be useful to note this on your cheat sheet as you go along.

Here are some of the key methods for modifying lists. **Do not try to memorize these methods, as they are available on your cheat sheet.** We have many data structures to learn about, so pace yourself 😊
- `.append()`: Adds a single element to the end of the list. Sometimes described as **pushing** a value to the list.
- `.extend()`: Adds 1+ elements contained in a data structure (like a list) to the end of the referenced list.
- `.sort()`: Sorts the list from min to max: strings will be sorted alphabetically, numerics will be sorted by value.
- `.reverse()`: Reverses the list.
- `.remove()`: Removes an element from the list.

> For a complete list of methods, click [here](https://www.programiz.com/python-programming/methods/list).

Again, all of the methods listed modify the list **in place**, meaning that you do *not* need to update the variable storing the list. This means that you can use one method after another (**chaining** methods).

In [48]:
# an example of chaining methods

num_list = [5, 9, 2.3, 14, 3, 2, 10]
print('Original list:', num_list)

# use three methods, one after another
num_list.extend([1, 1, 52, 4, 8, 0])
print('Extended:', num_list)
num_list.sort()
print('Sorted:', num_list)
num_list.reverse()
print('Reversed:', num_list)

print('Final version of list:', num_list)

Original list: [5, 9, 2.3, 14, 3, 2, 10]
Extended: [5, 9, 2.3, 14, 3, 2, 10, 1, 1, 52, 4, 8, 0]
Sorted: [0, 1, 1, 2, 2.3, 3, 4, 5, 8, 9, 10, 14, 52]
Reversed: [52, 14, 10, 9, 8, 5, 4, 3, 2.3, 2, 1, 1, 0]
Final version of list: [52, 14, 10, 9, 8, 5, 4, 3, 2.3, 2, 1, 1, 0]


In [55]:
# try it out:
# use print() to view the list at each step

num_list = [5, 9, 2.3, 14, 3, 2, 10]
print(num_list)
# use .reverse() to reverse num_list
num_list.reverse()


# use .extend() to add [1, 8, 1] to the list
num_list.extend([1, 8, 1])
print(num_list)

# use .sort() to sort num_list



# use .remove() to remove 2.3 from the list



# print the final version of num_list: is it what you expect?


[5, 9, 2.3, 14, 3, 2, 10]
[10, 2, 3, 14, 2.3, 9, 5, 1, 8, 1]


Not all methods change the contents of a list: some can just be used to access useful information about the contents of a list.

- `.count()`: Counts the instances of a certain element in the list.
- `.index()`: Returns the index of the first instance of the element in the list.

In [60]:
# try it out:

# print the number of 1 (ones) that exist in num_list
num_list.count(1)
print(num_list.count(1))
# print the index of the first instance of 1


2


### Methods for strings
Now that we've got the hang of how methods are called in relation to objects, we can try out some methods that are specific to strings. There are [far too many methods](https://www.w3schools.com/python/python_ref_string.asp) for strings for us to cover in one session alone, so we'll only get through a few today and leave the rest for if/when they become important for our purposes. (You don't need to memorize these, so just enjoy the ride!)

- `.count()`: Counts the instances of a certain **substring** (a short string of at least one character, contained within the main string).
- `.replace()`: Given a "target" substring and a replacement string, it will replace each instance of the target.
- `.split()`: Given a "target" substring, it will split the input string at each instance of the target, returning a list of split strings.
- `.join()`: Given a "connector" string, it will join an input *list* of strings with the connector string, returning a singular conjoined string.
- `.find()`: Given a substring, it will return the index of the *first* instance of the substring. If the substring is not found, it will return the integer `-1`.

Unlike methods for lists, methods for strings **do not** modify the string in place. Because of this distinction, string methods can be a little more complex than list methods, so we'll go through them one by one so you can understand the quirks. Again, you don't need to have all of these memorized, so just follow along!



In [61]:
# a very imaginary and very short mRNA string

imaginary_seq = 'AGGCATTTAGCATGCATGTAACGATGCTGCGCGTTCA'

In [62]:
# let's start by replacing all the T's with U's
# using .replace()

# 'T' is the target, 'U' is the replacement
print(f"T replaced with U: {imaginary_seq.replace('T', 'U')}")

print(f"But imaginary_seq is *not* modified in place: {imaginary_seq}")  # checks the value of imaginary_seq

T replaced with U: AGGCAUUUAGCAUGCAUGUAACGAUGCUGCGCGUUCA
But imaginary_seq is *not* modified in place: AGGCATTTAGCATGCATGTAACGATGCTGCGCGTTCA


In [63]:
# we have to *manually* update imaginary_seq if we want the string saved

imaginary_seq = imaginary_seq.replace('T', 'U')
print(f'Once imaginary_seq is updated: {imaginary_seq}')

Once imaginary_seq is updated: AGGCAUUUAGCAUGCAUGUAACGAUGCUGCGCGUUCA


In [64]:
# next, let's try to see if there are any start codons (AUG) in this sequence
# using .find()

imaginary_seq.find('AUG')  # if no AUG exists, it'll return -1

11

In [65]:
# now that we know that at least one start codon exists
# let's count how many instances there are
# using .count()

print(f"There are {imaginary_seq.count('AUG')} instances of AUG.")

There are 3 instances of AUG.


In [67]:
# we can break apart the sequence at each instance of AUG
# using .split()

split_seqs = imaginary_seq.split('AUG')
print(f"Substrings after splitting: {split_seqs}")
print(f"imaginary_seq is not modified in place: {imaginary_seq}")

Substrings after splitting: ['AGGCAUUUAGC', 'C', 'UAACG', 'CUGCGCGUUCA']
imaginary_seq is not modified in place: AGGCAUUUAGCAUGCAUGUAACGAUGCUGCGCGUUCA


In [68]:
# we can also reconstruct the sequence, joining the strings in split_seqs
# together, using AUG as our "connector" string
# using .join()

reconstructed_seq = 'AUG'.join(split_seqs)

# syntax note: .join() is a method for *strings*
# that means that the target of .join() must be a string
# split_seqs is a list, which means that 'AUG' must be our target

print('After joining:', reconstructed_seq)

After joining: AGGCAUUUAGCAUGCAUGUAACGAUGCUGCGCGUUCA


In [70]:
# try it out:
# use a logic check to see if reconstructed_seq is the same as imaginary_seq

reconstructed_seq == imaginary_seq

True

In [73]:
# resetting our strings:
imaginary_seq = 'AGGCATTTAGCATGCATGTAACGATGCTGCGCGTTCA'
catless = 'AGGGGTAACGATGCTGCGCGTTCATTAG'

# try it out:
# count each instance of the substring 'CAT'
print(f"There are {imaginary_seq.count('CAT')} instances of CAT in imaginary_seq.")
print(f"There are {catless.count('CAT')} instances of CAT.")


There are 3 instances of CAT in imaginary_seq.
There are 1 instances of CAT.


In [76]:
# try it out:
# split imaginary_seq at each instance of 'CAT'
# and save the resultant list to a variable called imaginary_cats
split_imaginary_seq = imaginary_seq.split('CAT')
split_imaginary_seq

['AGG', 'TTAG', 'G', 'GTAACGATGCTGCGCGTTCA']

In [78]:
# try it out:
# sort imaginary_cats alphabetically
# (this will require a method for lists, not strings!)

split_imaginary_seq.sort()
split_imaginary_seq

['AGG', 'G', 'GTAACGATGCTGCGCGTTCA', 'TTAG']

In [81]:
# try it out:
# join imaginary_cats using an *empty* string ('')
# and save the result in a variable called no_cats
''.join(split_imaginary_seq)
no_cats = ''.join(split_imaginary_seq)

In [88]:
# try it out:
# let's mix in some control flow and logic checks:
# 1) check if no_cats is equal to catless
# 2) if no cats is equal to catless, print the string "No cats!"
if no_cats == catless:
    print("no cats")
else:
    print("yes, cats")

catless

no cats


'AGGGGTAACGATGCTGCGCGTTCATTAG'

## Custom functions

In Python, custom functions must be **defined** by using a `def` statement to signal that we are defining a new function. We follow `def` by our desired function name, then parentheses for the function's inputs, if it has any. Just like with `for` loops, the names provided for inputs are simply placeholder names that represent your input in the function's code.

After that, we use `:` operator to signal that we're transitioning into the code we want to be executed upon running the function.

```
def function_name(input_1, input_2... input_n):
  # code goes here
  # each line is indented
  # just like a for loop
```

Last but not least, any desired output of a function must be specified with a `return` statement. Without a `return` statement, the code will run, but the function itself will not output any objects or values. This is easier shown than explained, so let's go ahead and run the cell below:

In [90]:
def sum_then_square(a, b):
    # This function will coerce a and b to floats, sum the numbers a and b, then square the sum.
    a = float(a)
    b = float(b)
    squared_sum = (a + b)**2
    print(squared_sum)

You'll notice that running the above code cell does not show any output, similar to defining a variable.

Let's go ahead and test our function, using the values `2` and `5`.

In [93]:
# try it out:
# run the function sum_then_square with 2 and 5
sum_then_square(3, 7)


100.0


Nice! Now let's try running the function and saving its output in a new variable called `summed_square`.

In [94]:
# try it out:
# save the output of sum_then_square(2, 5) to a variable called summed_square
# then print the value of summed_square



Although the squared sum was *printed*, the value of `result` is a `None`, not `49.0`. This is because `print()` does *not* have a defined return.

If you don't return something explicitly, Python will default to returning the special object `None`, which has its own special `NoneType`. This is why nested `print()` calls don't work: `print()` doesn't have a defined return, so it returns `None` by default. Thus, printed values cannot be stored or interacted with: they're for display only.

In [95]:
print(print("What happens with a nested print?"))

What happens with a nested print?
None


If we want to actually save or interact with values generated in the function's code, we need to specify that with a `return` statement.

In [99]:
# we can update functions by re-defining them, just like updating a variable by re-assigning it

def sum_then_square(a, b):
    # This function will coerce a and b to floats, sum the numbers a and b, then square the sum.
    a = float(a)
    b = float(b)
    squared_sum = (a + b)**2
    print(squared_sum)
    return squared_sum

In [98]:
# try it out:
# save the output of sum_then_square(2, 5) to a variable called summed_square
# then print the value of summed_square
# you can use the same code as you did before!



While somewhat annoying to new learners, it can be quite useful to use `print()` to show human-readable information in the output while retaining "useful" raw values in the returned output. When in doubt, make sure to use a `return` statement to output the information that you wish to store for future use.

Let's do a quick exercise to make sure you recall the major points of custom functions.

The code cell below contains *imperfect* code that will prevent it from running and/or generating the desired output, which is:

```
String A is  9  characters long.
String B is  11  characters long.
String C is  7  characters long.
The actual saved value of shortest was:  AATGATC
```

In [101]:
# try it out:
# 1) run this cell to see the error message
# 2) correct the error(s) to generate the desired output (see above)

def min_seq(a, b, c):
    # Takes three string sequences and returns the shortest sequence in the trio.

    print(f'String A is {len(a)} characters long.')
    print(f'String B is {len(b)} characters long.')
    print(f'String C is {len(c)} characters long.')

    # find the shortest length of all sequences
    length = [len(a), len(b), len(c)]
    shortest_length = int(min(length))

    # return the sequence with the shortest length
    if len(a) == shortest_length:
        return a
    elif len(b) == shortest_length:
        return b
    elif len(c) == shortest_length:
        return c
    else:
        return None

### do not modify below ###
shortest = min_seq('TAGGATGAA', 'TTATAGCTACG', 'AATGATC')
print('The actual saved value of shortest was: ', shortest)
### do not modify above ###

String A is 9 characters long.
String B is 11 characters long.
String C is 7 characters long.
The actual saved value of shortest was:  AATGATC


# Exercises: Set B



**B1**: Edit the skeleton code below to create a nested list called `nested_plate` that contains one list for each row. The rows will be labelled A, B, or C, and the columns will be labelled 0, 1 ... 9. The output should be:
```
[['A0', 'A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9'],
['B0', 'B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9'],
['C0', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9']]
```

In [None]:
# edit the skeleton code below to create a nested list called nested_plate
# that contains one list for each row

nested_plate = []  # an empty list: append lists to this to create a nested list

rows = ['A','B','C']
columns = list(range(10))

for (write your loop here):  # loop through rows
	sub_row = []
	for (write your loop here):  # loop through columns
		sub_row.append()  # build up the sub_row
	nested_plate.append()  # build up the full plate

nested_plate

In [103]:
### Solution ###
nested_plate = []  # an empty list: append lists to this to create a nested list

rows = ['A','B','C']
columns = list(range(10))

for row in rows:  # loop through rows
	sub_row = []
	for column in columns:  # loop through columns
		sub_row.append(row + str(col))  # build up the sub_row
	nested_plate.append(sub_row)  # build up the full plate

nested_plate

[['A12', 'A12', 'A12', 'A12', 'A12', 'A12', 'A12', 'A12', 'A12', 'A12'],
 ['B12', 'B12', 'B12', 'B12', 'B12', 'B12', 'B12', 'B12', 'B12', 'B12'],
 ['C12', 'C12', 'C12', 'C12', 'C12', 'C12', 'C12', 'C12', 'C12', 'C12']]

## B2

Write a function called `list_mean()` that takes a list of numbers and does the following:
1. Prints the text `'The mean value is'`, followed by the mean value.
2. Returns the mean value.

In [107]:
num_list = [5, 9, 2.3, 14, 3, 2, 10]

# hints:
# 1. you can use sum() to add all elements in a list
# 2. you can use len() to determine the length of an object with multiple elements
# 3. remember that you can print multiple things with one print() call
#    just separate your inputs with commas!

### write your code below ###
def list_mean(x):
    b = sum(x)
    t = b/ len(x)
    print("The mean value is", t)
    return t

list_mean(num_list)

The mean value is 6.471428571428571


6.471428571428571

## B3

Write a function called `list_median()` that takes a list and does four things:

1. Sorts the list using the `.sort()` method.
2. Obtain the median value of the list using indexing, and assigns it to a variable called `median_value`. <br>(For now, assume that *we are only going to work with lists that have odd numbers of elements*.)
3. Prints the text `'The median value is'`, followed by the median value.
4. Returns the median value.

In [117]:
# hints:
# the median index can be found using the formula ((n+1)/2)-1
# the extra -1 accounts for zero-indexing :)

num_list = [5, 9, 2.3, 14, 3, 2, 10]

### write your code below ###



Using a `for` loop, apply `list_median()` to each list inside `nested_nums`.

In [118]:
nested_nums = [[8, 4, 2], [7, 21, -3], [6.8, 3, 9]]

### write your code below ###


# More data structures

Let's take a quick step back to data structures. This morning we taught you about lists, which are a simple data structure that can contain a variety of data types (strings, integers, floats).

We're now going to move on to some more complex data structures: each of these data structures sacrifices some of the flexibility afforded by a list in order to gain a boost in either speed and/or accessibility. We'll broadly describe these gains as **efficiency**, in order to get you to start thinking about picking *efficient* data structures to use in your code.

## Tuples
Our first new data structure is calle a **tuple** (pronounced `tuhp-pull`). We can consider tuples to be sort of like lists, because there's a lot of similarities between the two.
1. **Both can contain multiple values.** They don't *have* to, but they can.
2. **Both can contain multiple types.** Again, they don't *have* to.
3. **Both can be accessed using indexing.** Remember zero-indexing though!
4. **Both can be nested.** You can make a tuple of tuples, or even combine tuples and lists: a list of tuples, a tuple of lists!
5. **Both are iterable.** You can loop over the contents of a tuple just like you would a list.
6. **Both have handy `.count()` and `.index()` methods.** (In fact, these are the [only two methods](https://www.programiz.com/python-programming/methods/tuple) for tuples. We'll discuss why that is in a second.)

In [120]:
num_tuple = (5, 0, 9.2, 8, 0)  # tuples are defined using parentheses
print('The tuple:', num_tuple)

print(num_tuple[0])  # first element
print(num_tuple[-1])  # last element

The tuple: (5, 0, 9.2, 8, 0)
5
0


The key difference between a list and a tuple is that a tuple is **immutable**: tuples can't be changed once they're assigned to a variable. This means that you can't use methods to append, extend, sort, or pop elements from a tuple: the only two available are `.count()` and `.index()`, neither of which change the tuple.

> **"Why would you choose to use a tuple instead of a list if you can't actually do anything to them?"**<br>
It turns out that tuples are **more efficient** than lists, specifically *because* they can't be modified. This principle applies to all data structures in Python: the more flexibility permitted by the structure, the less computationally efficient it is to create and access elements in the structure. In other words, your computer will allocate less memory for tuples because it knows that their size will not change. This **efficiency-flexibility tradeoff** is a key motivator selecting appropriate data structures in your work.

Beyond efficiency reasons, tuples are important for you to know because they're the default data structure for returning multiple values in functions.

In [121]:
def summarize_tuple(input_list):
    # A function that will take a list of numerics return three key summary values:
    # 1. The length
    # 2. The median
    # 3. The mean
    print('Output is in order: length, median, mean')

    return len(input_list), list_median(input_list), list_mean(input_list)

summarize_tuple(num_list)

Output is in order: length, median, mean


NameError: name 'list_median' is not defined

In [122]:
out = summarize_tuple(num_list)

Output is in order: length, median, mean


NameError: name 'list_median' is not defined

In [None]:
print(type(out))  # as promised, the output of elements from a function is a tuple by default
print(out)

Here, the values are returned in a 3-member tuple. This default makes sense if you consider *efficiency*: it's more efficient to return values from a function as a tuple, unless otherwise specified. It's like Occam's Razor, but for efficiency in your functions.

To sum it up, tuples are good data structures if you value efficiency and don't anticipate needing to add, remove, or otherwise alter the structure of your data.

## Sets
Let's move on to sets. **Sets** are a type of data structure that only contains unique values.

You can create a set directly using curly brackets (`{}`), or coerce an existing object into a set. In the latter case, any duplicate values will be dropped, leaving only unique values in the set.

In [123]:
# Method 1: Create a set using curly brackets
groceries = {'grapes', 'milk', 'eggs', 'apple', 'salmon', 'bread'}  # initalized with curly brackets

print(groceries)
print(type(groceries))

{'bread', 'milk', 'eggs', 'grapes', 'salmon', 'apple'}
<class 'set'>


In [127]:
# Method 2: Create a set from an existing object

# example: tuple with duplicate elements
fruit = ('apple', 'orange', 'banana', 'apple', 'grapes', 'orange')
print('The original tuple:', fruit)
print('The coerced set:', set(fruit))  # duplicate values dropped


The original tuple: ('apple', 'orange', 'banana', 'apple', 'grapes', 'orange')
The coerced set: {'banana', 'grapes', 'apple', 'orange'}


In [128]:
# Coercing a single string into a set yields interesting results...
print('Turning a single string into a set:', set('apple'))

Turning a single string into a set: {'l', 'e', 'a', 'p'}


Once you've created a set, you can modify it *in place* using set methods, just like a list: in this aspect, both sets and lists are **mutable**.

Set methods are super useful for performing comparisons or other **set operations** between sets. They're more concise compared to looping through logic and membership checks, and they're also remarkably efficient due to the way that lists are created/stored by Python. Say goodbye to the days of having to use `Ctrl/Cmd + F` to manually check one list of things against another!

Here are some of the more useful set methods:

- `.difference()`: Returns a set containing objects that are not found in both sets.
- `.intersection()`: Returns a set containing objects found in both sets.
- `.union()`: Returns a set with all objects in both sets.
- `.issubset()`: Performs a logic check to see if the target set is a *subset* of the input set.
- `.issuperset()`: Performs a logic check to see if the target set is a *superset* of the input set.

A full list of set methods is available [here](https://www.programiz.com/python-programming/methods/set).

## Dictionaries

So far, we've discussed lists and tuples, which allow us to create flat or nested structures to contain our data. We've also discussed sets, which allow us to efficiently curate and store unique values.

```
sample_list = ['element1', 'element2', 'element3'] # square brackets

sample_tuple = ('element1', 'element2', 'element3') # parentheses

sample_set = {'element1', 'element2', 'element3'} # curly brackets
```

A **dictionary** is a data structure that efficiently stores values that are associated with each other. These values are stored as **key-value pairs**.
- A *key* is a singular data object: it can be a string, integer, or float, but not a data structure that contains multiple objects (lists, tuples, sets). **Each key in a dictionary must be unique.**
- A *value* can be any object: it can be a single object or a data structure containing multiple objects, even multiple objects of different types. **Values do not need to be unique.**

It may help to think of a dictionary as a set that has objects associated with each unique item in the set.

Just like with the previous data structures, you can construct dictionaries in a couple of different ways. The first way is using curly brackets, which we just used to defined sets.

In [130]:
# Method 1: defining a dictionary from scratch
# dictionaries are defined using curly brackets, like sets
# each key-value pair is defined as key: value

eukaryotic_dict = {
    'eIF1': 'initiation factor',
    'eIF3': 'initiation factor',
    'eIF4': 'initiation factor',
    'eIF5': 'initiation factor',
    'eRF1': 'release factor',
    'eRF3': 'release factor'
}

eukaryotic_dict

{'eIF1': 'initiation factor',
 'eIF3': 'initiation factor',
 'eIF4': 'initiation factor',
 'eIF5': 'initiation factor',
 'eRF1': 'release factor',
 'eRF3': 'release factor'}

If we have two coordinated lists of the same length (for example, x-y coordinate pairs), we can efficiently create a dictionary by using the built-in `zip()` and `dict()` functions.

In [134]:
# Method 2: defining a dictionary from existing lists
# zip()` pairs the lists together
# `dict()` coerces the paired lists into a dictionary

eukaryotic_genes = [
    'eIF1',
    'eRF1',
    'eRF3',
    'eIF3',
    'eIF4',
    'eIF5'
]  # gene names

eukaryotic_desc = [
    'initiation factor',
    'release factor',
    'release factor',
    'initiation factor',
    'initiation factor',
    'initiation factor'
]  # gene descriptions

eukaryotic_dict = dict(zip(eukaryotic_genes, eukaryotic_desc))
eukaryotic_dict


{'eIF1': 'initiation factor',
 'eRF1': 'release factor',
 'eRF3': 'release factor',
 'eIF3': 'initiation factor',
 'eIF4': 'initiation factor',
 'eIF5': 'initiation factor'}

Once the dictionary is constructed, you can access the value associated with a key by simply using the key as an index.

In [135]:
# using eIF3 as the key index will return the associated value
print(eukaryotic_dict['eIF3'])

initiation factor


Like lists and sets, dictionaries are also mutable and can be modified *in place*. The easiest way to add or change a key-value pair is by **explicit assignment**.

In [138]:
# Using explicit assignment to add a single key-value pair
eukaryotic_dict['eIF5B'] = 'initiation factor'
print('After eIF5B is added:')
eukaryotic_dict

After eIF5B is added:


{'eIF1': 'initiation factor',
 'eRF1': 'release factor',
 'eRF3': 'release factor',
 'eIF3': 'initiation factor',
 'eIF4': 'initiation factor',
 'eIF5': 'initiation factor',
 'eIF5B': 'initiation factor'}

In cases where you want to add or update multiple values, it's easier to use the `.update()` method.

In [139]:
# Using .update() to add multiple key-value pairs
more_genes = {
    'eIF6': 'initiation factor',
    'eEF1': 'elongation factor',
    'eEF2': 'elongation factor',
    'eRF1': 'release factor',
    'eRF3': 'release factor'
}  # a new dictionary with more entries

eukaryotic_dict.update(more_genes) # updates the dict in place
print('After multiple entries are added:')
eukaryotic_dict


After multiple entries are added:


{'eIF1': 'initiation factor',
 'eRF1': 'release factor',
 'eRF3': 'release factor',
 'eIF3': 'initiation factor',
 'eIF4': 'initiation factor',
 'eIF5': 'initiation factor',
 'eIF5B': 'initiation factor',
 'eIF6': 'initiation factor',
 'eEF1': 'elongation factor',
 'eEF2': 'elongation factor'}

Last but not least, dictionaries have three methods for accessing their contents "in bulk".

* `.items()`: Returns the key-value pairs as an iterable of tuples.
* `.keys()`: Returns the keys as an iterable.
* `.values()`: Returns the values as an iterable.

Note that we refer to the returned objects as generic "iterables". Although you *can* use these iterables in `for` loops or other operations where you want the elements one at a time, you can't retrieve individual elements unless you coerce the iterable into a structure that supports indexing (list, tuple).

In [140]:
print('The .keys() iterable object:')
print(eukaryotic_dict.keys())


The .keys() iterable object:
dict_keys(['eIF1', 'eRF1', 'eRF3', 'eIF3', 'eIF4', 'eIF5', 'eIF5B', 'eIF6', 'eEF1', 'eEF2'])


In [141]:
print('We can use the iterable in a loop:')
for key in eukaryotic_dict.keys():
  print(f'The key {key} is a {type(key)}')

We can use the iterable in a loop:
The key eIF1 is a <class 'str'>
The key eRF1 is a <class 'str'>
The key eRF3 is a <class 'str'>
The key eIF3 is a <class 'str'>
The key eIF4 is a <class 'str'>
The key eIF5 is a <class 'str'>
The key eIF5B is a <class 'str'>
The key eIF6 is a <class 'str'>
The key eEF1 is a <class 'str'>
The key eEF2 is a <class 'str'>


In [142]:
# in a list, index 0 would be 'eIF1'
# but it won't work for this generic iterable

eukaryotic_dict.keys()[0]

TypeError: 'dict_keys' object is not subscriptable

In [154]:
# try it out:
# coerce the generic iterable generated by .keys() into a structure
# that supports indexing, then see if you can access index 0

list(eukaryotic_dict.keys())[0]
tuple(eukaryotic_dict.keys())[0]

eukaryotic_dict
eukaryotic_dict.items()

dict_items([('eIF1', 'initiation factor'), ('eRF1', 'release factor'), ('eRF3', 'release factor'), ('eIF3', 'initiation factor'), ('eIF4', 'initiation factor'), ('eIF5', 'initiation factor'), ('eIF5B', 'initiation factor'), ('eIF6', 'initiation factor'), ('eEF1', 'elongation factor'), ('eEF2', 'elongation factor')])

In [158]:
# or even more streamlined way of doing this (not as understandable)
dict_keys_list = list(eukaryotic_dict.items())
print(dict_keys_list)
#print(dict_keys_list[0])


[('eIF1', 'initiation factor'), ('eRF1', 'release factor'), ('eRF3', 'release factor'), ('eIF3', 'initiation factor'), ('eIF4', 'initiation factor'), ('eIF5', 'initiation factor'), ('eIF5B', 'initiation factor'), ('eIF6', 'initiation factor'), ('eEF1', 'elongation factor'), ('eEF2', 'elongation factor')]


That wraps it up for the need-to-know operations with dictionaries for the remainder of the bootcamp. If you'd like to look at the complete list of methods available for dictionaries, you can review them [here](https://www.programiz.com/python-programming/methods/dictionary).

# Exercises: Set C

That's it for new content: time for some wrap-up exercises. As we mentioned before, the solutions to these exercises are available in the Day 1 Solutions notebook. We'll also post the lecturer's copy of the notebook after the end of each day.

**C1**: Write code that will perform and print the result of the set operation.

In [181]:
groceries = {'grapes', 'milk', 'eggs', 'apple', 'salmon', 'bread'}
fruits = set(('apple', 'orange', 'banana', 'apple', 'grapes', 'orange'))
refrigerator = {'milk', 'salmon', 'eggs', 'grapes'}
favorites = {'apple', 'grapes'}

# check the difference between groceries and fruits
print(groceries.difference(fruit))
# check the intersection between groceries and fruits

print(groceries.intersection(fruit))
# check the union between groceries and fruits
print(groceries.union(fruits))

print("fruits is a subset of grocerries", fruits.issubset(groceries))
# check for a set that is a subset of groceries


# check for a set that is a superset of favorites
print("fruits is the superset of favorites", groceries.issuperset(favorite))
print("fruits is the superset of refrigerator", fruits.issuperset(refrigerator))


{'bread', 'milk', 'eggs', 'salmon'}
{'grapes', 'apple'}
{'banana', 'bread', 'orange', 'grapes', 'salmon', 'apple', 'milk', 'eggs'}
fruits is a subset of grocerries False
fruits is the superset of favorites True
fruits is the superset of refrigerator False


**C2**: Using sets, count how many unique gene names and unique gene descriptions exist in `eukaryotic_dict`.

In [201]:
### your code goes here ###
eukaryotic_dict
print(eukaryotic_dict.keys()) #genes names
print(eukaryotic_dict.values()) #gene descriptions

unique_name_set = set(eukaryotic_dict.keys())
unique_des_set = set(eukaryotic_dict.values())

print("num of unique gene is", len(unique_name_set))
print(len(unique_des_set))
list(unique_des_set)

dict_keys(['eEF1', 'eEF2', 'eIF1', 'eIF3', 'eIF4', 'eIF5', 'eIF5B', 'eIF6', 'eRF1', 'eRF3'])
dict_values(['elongation factor', 'elongation factor', 'initiation factor', 'initiation factor', 'initiation factor', 'initiation factor', 'initiation factor', 'initiation factor', 'release factor', 'release factor'])
num of unique gene is 10
3


['elongation factor', 'initiation factor', 'release factor']

**C3**:

 Use a `for` loop to iterate over the key-value pairs in `eukaryotic_dict`, printing the following sentence for each key-value pair:
```
The gene <KEY> is a <VALUE>
```

Your result should look like:
```
The gene eIF1 is a initiation factor
The gene eRF1 is a release factor
The gene eRF3 is a release factor
The gene eIF3 is a initiation factor
The gene eIF4 is a initiation factor
The gene eIF5 is a initiation factor
The gene eIF5B is a initiation factor
The gene eIF6 is a initiation factor
The gene eEF1 is a elongation factor
The gene eEF2 is a elongation factor
```

In [204]:
### do not change below ###
eukaryotic_dict = {
    'eEF1': 'elongation factor',
    'eEF2': 'elongation factor',
    'eIF1': 'initiation factor',
    'eIF3': 'initiation factor',
    'eIF4': 'initiation factor',
    'eIF5': 'initiation factor',
    'eIF5B': 'initiation factor',
    'eIF6': 'initiation factor',
    'eRF1': 'release factor',
    'eRF3': 'release factor'
}
### do not change above ###

# hint:
# .items() generates an iterable of key-value pairs in a tuple structure
# each key-value tuple can be indexed

### write your code below ###
eukaryotic_dict.items()
#first solution
#for item in eukaryotic_dict.items():
#    gene_name = item[0]
#    gene_descr = item[1]

#    print(gene_name, "is a", gene_descr)

#second solution
for (gene_name, gene_descr) in eukaryotic_dict.items():

    print(gene_name, "is a", gene_descr)

list(eukaryotic_dict.items())
for item in eukaryotic_dict.items():
    print(item)

print(item[0])
print(item[1])

eEF1 is a elongation factor
eEF2 is a elongation factor
eIF1 is a initiation factor
eIF3 is a initiation factor
eIF4 is a initiation factor
eIF5 is a initiation factor
eIF5B is a initiation factor
eIF6 is a initiation factor
eRF1 is a release factor
eRF3 is a release factor
('eEF1', 'elongation factor')
('eEF2', 'elongation factor')
('eIF1', 'initiation factor')
('eIF3', 'initiation factor')
('eIF4', 'initiation factor')
('eIF5', 'initiation factor')
('eIF5B', 'initiation factor')
('eIF6', 'initiation factor')
('eRF1', 'release factor')
('eRF3', 'release factor')
eRF3
release factor


**C4**: Write a function called `revcomp()` that takes a string and does the following:
1. Prints the reverse complement of the string.
2. Returns a tuple of the original string and its reverse complement.

In [205]:
# tip: You can use the following indexing trick (`[::-1]`) to reverse a string or a list.

abc_string = 'ABC'
abc_tuple = ('A', 'B', 'C')

print('Reversed string:', abc_string[::-1])
print('Reversed tuple:', abc_tuple[::-1])

Reversed string: CBA
Reversed tuple: ('C', 'B', 'A')


In [206]:
imaginary_seq = 'AGGCATTTAGCATGCATGTAACGATGCTGCGCGTTCA'

# hints: the += operator *also* works for adding characters to strings

# this dictionary contains the complement for every nucleotide
nuc_comp_dict = {"A":"T", "T":"A", "G":"C", "C":"G"}

### write your code below ###

def revcomp(my_sequence):
    #1. reverse the seq
    my_sequence = my_sequence[::-1]
    #2. Find the complement for nucleotides
    complemented_seq = []
    for nuc in my_sequence:
        nuc_com = nuc_comp_dict[nuc]
        complemented_seq.append(nuc_com)

    print(complemented_seq)


revcomp(imaginary_seq)


['T', 'G', 'A', 'A', 'C', 'G', 'C', 'G', 'C', 'A', 'G', 'C', 'A', 'T', 'C', 'G', 'T', 'T', 'A', 'C', 'A', 'T', 'G', 'C', 'A', 'T', 'G', 'C', 'T', 'A', 'A', 'A', 'T', 'G', 'C', 'C', 'T']


**C5**: Write a function called `p_nucleotides()` that uses the `.count()` method to count the number of nucleotides in the input sequence. The function should return a dictionary of nucleotides and their corresponding frequencies.

In [None]:
### write your code below ###

def p_nucleotides(my_sequence):
    ... code goes here ...

p_nucleotides(imaginary_seq)

**Challenge 1**: Write a function called `list_joiner()`, which should take a list containing *any* types and:
1. Check the type of each element in the input list.
  - If the element is a integer *or* a float, coerce the element into a string.
  - If the element contains a period, replace the period with an empty string (`''`).
2. After both of the above are checked and/or done, append the element to `to_be_joined`.
3. Sort `to_be_joined` in place.
4. Use `.join()` on `to_be_joined`, using a dash (`-`) as a connector. Save the resultant string to a new variable called `has_been_joined`.
5. Print `to_be_joined` and return `has_been_joined`.

When you're done, try out `list_joiner()` on `jumbled_list`.



In [None]:
jumbled_list = ['5', 9, '8.0', 15, 3.2, '0', 1]

### write your code below ###
def list_joiner(my_list):
   ... code goes here ...

list_joiner(jumbled_list)

**Challenge 2**: Read over the section at the bottom of the notebook titled `[Optional] List comprehensions`. (This is an optional section because we won't use them much in the future, but they *are* a useful technique for lists!)

1. Use a list comprehension instead of a `for` loop to get the maximum value of each sub-list in `nested_nums` AND said value into a string. Save this to a list called `max_nums`.
2. Join `max_nums` using an empty string (`''`), updating `max_nums` with the resultant string.
3. Use coercion to update `max_nums` into a float.

In [None]:
nested_nums = [[8, 4, 2], [7, 21, -3], [6.8, 3, 9]]

### write your code below ###


# [Resource] Python Code Visualizer

If you're having trouble figuring out how exactly code gets executed (example: how things like nested function calls work), the [Python Code Visualizer](http://www.pythontutor.com/visualize.html) is an excellent resource. You can enter the contents of any code cell to see how Python executes the code step-by-step.

# [Optional] String formatting
`.format()` is an interesting method that allows us to form-fill parts of strings. We may occasionally use `.format()` to generate and print more readable descriptions of things we're doing during the next days of the bootcamp.

The target string for `.format()` is a special "blueprint" string: when writing your blueprint string, place curly braces `{}` where you'd like variables to appear. This is akin to providing a blank space in a form.

```
cat_format = 'I have a {}, {} of them. Does this make me a hoarder?'
```

When you want to form-fill the string, you can use `.format()` and provide a list of values/variables as input. Values/variables that are not strings will be coerced to string type.

In [None]:
cat_format = 'I have a {}, {} of them. Does this make me a hoarder?'
print("The 'blueprint' string:", cat_format)
print("Formatted string:", 'I have a {}, {} of them. Does this make me a hoarder?' .format('cat', 13))

You can provide additional information about the desired placement of values by assigning a *placeholder name* within the curly braces. You can treat these placeholder names as you would with placeholder names for inputs in a function. This can be quite useful for formatting long strings with many variables, as shown in the RefSeq information below.

In [None]:
# below, we provide seven placeholder names:
# 1) refseq
# 2) INSDC
# 3) size
# 4) GC
# 5) protein
# 6) datatype
# 7) name

ref_format = '''Type	Name\tRefSeq\tINSDC\tSize (Mb)\tGC%\tProtein
{datatype}\t{name}\t{refseq}\t{INSDC}\t{size}\t{GC}\t{protein}'''

# format the string with data for one sequence...
print('Formatted string #1:\n',
      ref_format.format(refseq = 'NC_000913.3',
                        INSDC = 'U00096.3',
                        size = 4.64,
                        GC = 50.8,
                        protein = 4242,
                        datatype = 'Chr',
                        name = '-'))

# and then do the same with data for another sequence
print('Formatted string #2:\n',
      ref_format.format(datatype = 'Chr',
                        name = '-',
                        refseq = 'NC_003197.2',
                        INSDC = 'AE006468.2',
                        size = 4.86,
                        GC = 52.2,
                        protein = 4446))

# [Optional] List comprehensions

`for` loops are useful for performing certain tasks without needing to manually write code to access each element of the iterable.

For example, we can use a `for` loop to create a new list based on the contents of an existing list: we simply need to create an empty (container) list, then append elements to the container as we loop through our origin list.

In [None]:
# let's say that we want to count the length of each string in weekdays

print(weekdays)

weekday_len = [] # empty (container) list
for day in weekdays:
  weekday_len.append(len(day)) # appending updates the list in place

print(weekday_len)

This isn't the *worst*, but it could be more concise. A **list comprehension** is a shortcut that simultaneously creates a list and fills it in using a modified `for` loop, compressing what was previously three lines of code into one line of code.

In [None]:
# now we'll use a list comprehension

weekday_len = [len(day) for day in weekdays] # no need to append: just use len() directly

print(weekdays)
print(weekday_len)

Given the desired task of applying a function to each element of a list and saving the resultant values to a new list, list comprehensions are both more concise and more efficient in speed performance.

In [None]:
num_list = [5, 9, 2.3, 14, 3, 2, 10]

# try it out:
# create a container list called for_squared
# square the value of each element in num_list using a for loop, appending
# each squared value to for_squared




In [None]:
# try it out:
# use a list comprehension to square each value of num_list, saving the new
# list to a variable called comp_squared



# [Optional] `while` loops

Earlier, we learned that `for` loops are used to iterate over each element of an iterable. This is the most common type of loop you'll use in the bootcamp (and likely beyond). Another type of loop is the `while` loop, which loops a code block *while* a logic check returns `True`. We won't use it much in the coming days, which is why this subject is optional: nevertheless, it may be useful to you.

For example, the below `while` loop continually shortens an input string by slicing it. The `while` loop will only terminate when the string contains one character or fewer.

In [None]:
while_string = 'this will loop until the string is only a period.'

while len(while_string) > 1:
  while_string = while_string[1:] # shortens it by one character
  print(while_string)