Helper Syntax
---

In my notes and questions, you will come across some syntax you should be aware of:

1. `<Shift-Enter>`: This is a keystroke. Hold `Shift`, press `Enter`. Keys will always be capitalized (i.e. `<Tab>` means press `Tab`)
2. `<var1> <operator> <var2>`: This means I want you to enter something, or I am demonstrating a pattern.
    - E.g. `hello + world`. I have replaced `<var1>` with `hello`, `<operator>` with `+`, and `<var2>` with `world`

Documentation
---

By now, you probably realize what `print` is doing. However, if you aren't sure Jupyter, has built in documentation! Run this cell:

In [1]:
?print

If you can't get the documentation from Jupyter:

1. Google it with string: "python <thing you are looking for>", places you might end up:
    - [Stack Overflow](https://stackoverflow.com) - Best place for programming questions
    - [Python Documentation](https://docs.python.org/3) - Good place for Python

Data Structures (Containers)
---

- Lists (`list`):
    - e.g. `[1, 2, 3]`
- Tuple (`tuple`):
    - e.g. `(1, 2, 3)`
- Dictionary (`dict`):
    - `<key>: <value>` store e.g. `{'a': 1, 'b': 2}`
- Set (`set`)
    - e.g. `{1, 2, 3}`
    
#### Definitions

- Ordered:

Consider a container called `a`, if the elements can be accessed using `[n]` syntax where `0 <= n < len(a)` the container is ordered.
Let's consider the following code:

```python
a = [10, 9, 8]
print("Length: {}".format(len(a)))
print("a[0] is: {}".format(a[0]))
```

Above we access `a` with an _index_. The first element `10` is accessed with the index 0, e.g.

```text
list   [10, 9, 8]
         ^  ^  ^
index    0  1  2
```

As an example, try the following cell:

In [None]:
a = [10, 9, 8]
print("Length: {}".format(len(a)))
print("a[0] is: {}".format(a[0]))

`0` is in the range `0 <= n < len(a)` and I can access it, `a` is ordered.

- Mutable:

Consider a container called `a`, if the indexed item can be changed the container is mutable. As an example, try the following cell:

In [None]:
a = [0]
a[0] = 1
print("a[0] is: {}".format(a[0]))

We know `a` is ordered, therefore I can access the elements. I can also change the elements which means `a` is mutable.

#### Tasks

1. Create a `list` called `a` which contains the numbers 4, 4, 8, and 12.
    - Are `list`s ordered?
    - Are they mutable?
    - Try accessing the element `a[len(a)]`, why doesn't this work? How do you access the last element?
2. Create a `tuple` called `b` which contains the number 2, 3, 4.
    - Are `tuple`s ordered? mutable?
    - When would you want to use a `tuple` over a `list`?
3. Create a `dict` called `c` which contains the following key-value pairs `'one': 1`, `'three': 3`, `'five': 5`.
    - Note: Dictionaries are accessed by their keys, e.g. `c['one']` would yield 1
    - Are `dict`s ordered? mutable?
4. Create a `set` called `d` using the same data as `a`
    - Are `set`s ordered?
    - Print `a` and `d`, what is the difference? What does this imply about `set`s
    - Are they mutable?
        - try adding an integer to the set with `<set>.add(<value>)`
5. Keep the definitions of `a,b,c,d`, but comment any code which you used to test order and mutability. Rerun the cell 

- Pro tip: You can access the last element of a list with `<list>[-1]`

Membership Operators
---

- Return `True` or `False` based on membership, e.g.

``` python
1 in [1, 2, 3] # evaluates to True
0 in [4, 5, 6] # evaluates to False

'key' in {'key': 'value'} # True
'value' in {'key': 'value'} # False

# by values:
'value' in {'key': 'value'}.values() # True
```

Membership-based `for` Loops
---

#### Syntax

```python
for <temporary> in <iterable variable>:
    # Do something with <temporary>, e.g.
    print(<temporary>)
```

- An `iterable variable` could be a `list`, `set`, `dict`, etc.

#### Tasks

1. Write membership based `for` loops for `a`, `b`, `c`, and `d` which `print` the contents on a separate line
    - Notice anything interesting?
2. Use the `range` function to print the even integers from 2 to 20 (hint: `?range` will bring up documentation)
    - Definitions: In mathematics, inclusivity (edges are included) is denoted with brackets `[]` and exclusivity (edges are not included) with parenthesis `()`
    - Do all of the following: `[2, 20]`, `(2, 20)`, `[2, 20)`, `(2, 20]`
    - What is the default in the above notation?

Unpacking Complicated Data Structures
---

You probably noticed that your loop over the dictionary only printed the values. What if you wanted both keys and values?

Python allows you to "unpack" dictionaries (using the method `items`) as well as list of lists, e.g.

```python
for key, value in {'codon0': "GTG", 'codon1': "ATG"}.items():
    print("{}: {}".format(key, value))
```

or,

```python
for i, j in [["AAA", "AGA"], ["GTA", "GGT"]]:
    print("{} {}".format(i, j))
```

#### Tasks

1. Write a membership based for loop to print the keys and values of the following dictionary: `{"seq001": "AAGTAGGAATAATATCTTATCATTATAGATAAAAACCTTCTGAATTTGCTTAGTGTGTATACGACTAGACATATATCAGCTCGCCGATTATTTGGATTATTCCCTG", "seq002": "CTTCAATTACCCTGCTGACGCGAGATACCTTATGCATCGAAGGTAAAGCGATGAATTTATCCAAGGTTTTAATTTG"}`
2. Generate the following list of list: `[["ATC", "GAT"], ["CTT", "GTT"], ["TTG", "TCG"]]`. Write a membership based for loop which adds the codons, e.g. "ATC" + "GAT" = "ATCGAT"

Flow Control
---

1. Conditional Statements (`if-elif-else`)
    - Use comparison operators and boolean operators which, when evaluate to true, runs some block of code.
    - `elif` stands for `else if`
  
#### Tasks:

```python
x = 12

if x == 0:
    print("{} is zero".format(x))
elif x > 0:
    print("{} is negative".format(x))
elif x > 0:
    print("{} is positive".format(x))
else:
    print("I don't recognize {}".format(x))
```

1. Given the code above, what do you think is printed?
2. Copy and paste the code into the cell below, what is actually printed? Why?
3. Will the `else` block ever trigger?

#### Bugs vs Errors

- "Bugs" are unintended consquences of running code. For example, the original code tells us that 12 is negative
- "Errors" are produced by the interpreter because something can't be completed

#### Tasks

1. Fix the bugs to print whether a number is positive, negative, or zero

- Unfortunately, we don't have time to talk about error handling now. Check out this documentation at a later time: https://docs.python.org/3/tutorial/errors.html.

Functions
---

Let's start with a definition for a function which squares the input:

```python
def double_codon(x):
    return x + x
```

Notes:

- `def` is the start of the function definition
- `double_codon` is the name of the function. Example use: `a = double_codon("AAA")`
- `return` passes back information to the caller of the function (i.e. `a` above is `"AAAAAA"`)
- Functions don't have to return anything (behind the scenes they return `None`)
- When the interpreter evaluates a `return` it will ignore anything else in the function
- Meaningful function names help you 

#### Tasks

1. Verify that a function without a `return` statement returns `None`
2. Verify that a function with multiple `return` statements only returns the first instance
3. Write the definition of a `triple_codon` function and apply it to the codon `"AGT"`
4. Write the definition of `codon_is_in_sequence`, it will take 2 parameters! Hint" Try: `"A" in "AAA"`
    - Given the sequence: `"CTTCAATTACCCTGCTGACGCGAGATACCTTATGCATCGAAGGTAAAGCGATGAATTTATCCAAGGTTTTAATTTG"`, use your function to determine if it contains the following codons:
        - `"AAT"`
        - `"GGG"`

Anonymous Functions
---

These are unnamed functions which are referred to as "syntactic sugar". You will see this term a lot. It means that the code is translated into a more verbose form by the interpreter, therefore:

```python
double_codon = lambda x: x + x
```

Is exactly equivalent to the `double_codon` function we wrote before, but in a more compact syntax. Note that we did name our `lambda`, but this isn't necessary. We will see examples of unnamed lambdas in later sections.

#### Tasks

1. Write a lambda for the `triple_codon` function
2. Write a lambda for a `squared` function
3. Do you have to input a codon into the `triple_codon` lambda?

Iterators
---

Previously, I called `for value in iterator` a membership based for loop. However, I didn't explain the `iterator` part at all. Some iterators, operate a little differently. For example, `range` doesn't actually build a `list` in memory even though it looked very similar to our `for value in list` example previously. Instead a `range` operates in "constant stack space", i.e. the space it occupies in memory never changes. For example,

```python
n = 10 ** 12 # a really huge number
x = range(n)
```

If the above code did build the entire `x` object to store all of the numbers from 0 to `10 ** 12` it wouldn't fit into RAM, but iterators are smart and only allocate what they actually need.

Other iterators:

- `enumerate`, e.g.

```python
L = ["AAA", "AGA", "CGT", "GGG"]
for index in range(len(L)):
    print(index, L[index])
    
for index, item in enumerate(L):
    print(index, item)
```

- `zip`:

```python
L = ["AGT", "AGC", "ACT"]
R = [3, 6, 9]
for l, r in zip(L, R):
    print(l + r)
```

- `map`:

```python
for doubles in map(lambda x: x + x, ["AAA", "GGG", "CCC", "TTT"]):
    print(doubles)
```

- `filter`:

```python
for with_a in filter(lambda x: 'A' in x, ["AGT", "CTC", "TCA"]):
    print(with_a)
```

#### Tasks

1. Use `enumerate` to print the index and value of the list of codons: `["GAT", "GAA", "TTT", "ATC"]`. Preface the index with "codon".
2. Use `enumerate` and `zip` to sum the two lists into a new list `c`:
    - `a = ["TAC", "CCT", "GCT"]`
    - `b = ["GAC", "GCG", "CAT"]`
    - Initialize `c = [None] * len(a)`
    - Try `list(enumerate(zip(a, b)))` to see the shape of what to unpack
3. Use `map` to print the doubles of the triples of the codons `["GAC", "GCG", "CAT"]` using only anonymous functions
4. Use `filter` to print the doubles of codons (same as in "3.") containing the base pair `'C'` using only anonymous functions

Comprehensions
---

In "2." of the previous question we had to build a list prior to the loop. We could have also used `c = []` and `c.append(x + y)`, but note that `append` is an extremely slow function. Python also provides some syntactic sugar for this case, e.g.

```python
a = ["TAC", "CCT", "GCT"]
b = ["GAC", "GCG", "CAT"]
c = [x + y for x, y in zip(a, b)]
```

This is extremely terse and readable. This is called a "List Comprehension" and is probably my favorite Python feature. You can also append filter clauses! e.g.

```python
a = ["TAC", "CCT", "GCT"]
b = ["GAC", "GCG", "CAT"]
c = [x + y for x, y in zip(a, b) if 'C' in x]
```

There are also dictionary comprehensions! e.g.

```python
codons_dict = {"codon0": "AAA", "codon1": "GGG", "codon2": "CCC"}
double_codons_dict = {k: v + v for k, v in codons_dict.items()}

# or

labels = ["seq0", "seq1", "seq2"]
seqs = ["TAGGTGTGC", "ACGTTAACC", "TACTAGTCA"]
seq_dict = {k: v for k, v in zip(labels, seqs)}
```

#### Tasks

1. Using the `labels` and `seqs`, create the following dictionaries:
    - keys: `labels`, values: `seqs`, only if the `seqs` contain the codon `"TAG"`
    - keys: `labels`, values: `seqs`, where `seqs` are split into a list of base pairs. Hint: try `list("HELLO")`

Modules and Packages
---

Python has a rich ecosystem of packages for all kinds of research domains!

- Syntax variants
``` python
import <package>
from <package> import <member>
```
- Nice feature about Jupyter Notebooks is that they autcomplete member names

#### Tasks

1. Try to import just the `sub` method from the `re` module.

Reading and Writing Files
---

Turns out, we can use `open` to generate a file iterator. We can work on a file line by line and generate just the content into a list in memory (`strip` removes the newline character):

```python
with open(<filename>, "r") as f:
    lines = [line.strip() for line in f]
```

or, maybe you need process each line and write a new file (by default `'w'` overwrites the file):

```python
with open(<input_file>, "r") as inp, open(<output_file>, 'w') as out:
    for line in inp:
        out.write(<modify line in some way>) # Write doesn't append a newline automatically, therefore you may need to write a newline
        out.write('\n') # Or, on one line: f.write("{}\n".format(<modify line in some way>))
```

To append to a file use `open(<filename>, 'a')`.

Project 1
---

#### Description

Given a file of codons (`codons.csv`), for example:

```text
AAA, GGG, CCC, ...
TTT, AAA, GGG,
...
```

Generate:

1. A new file (`subsequences.txt`) combining all codons in each line into a subsequence
2. A new file (`sequence.txt`) combining all of the lines in `subsequences.txt` on one line

Determine:

1. How many of each base pair ('A', 'G', 'C', 'T') in this sequence?
2. What are the unique set of codons which make up this sequence?
3. What is the longest chain of repeating characters?

Hints:

- Reminder: `<str>.replace(<this>, <with_this>)`
- Try: `"ACT,GTC".split(",")`
- Try: `''.join(["ACT", "GTC"])`
- Try:
```python
from itertools import groupby
[list(g) for k, g in groupby(sorted("AAAGTCAA"))]
[list(g) for k, g in groupby("AAAGTCAA")]
```

#### Additional Resources

1. Me! Don't hesitate to email me (bmooreii@pitt.edu) and we can make an appointment
2. This workshop is losely modeled after [World Wind Tour](http://nbviewer.jupyter.org/github/jakevdp/WhirlwindTourOfPython/blob/master/Index.ipynb)
3. [Stack Overflow](https://stackoverflow.com). If you "Google it" it is probably coming from Stack Overlow. Pro tip, search for: `python <search phrase>`

#### [Python Docs](https://docs.python.org/3/library/index.html)

#### Useful Python Packages

1. [docopt](http://docopt.org) - command line arguments (not helpful in a Jupyter Notebook)
2. [pandas](http://pandas.pydata.org) - great for processing data
3. [matplotlib](https://matplotlib.org) - plotting tool, can view inside Jupyter Notebooks!
4. [requests](http://docs.python-requests.org/en/master) - HTTP Requests
5. [numba](https://numba.pydata.org) - just-in-time compilation for Python
6. [subprocess](https://docs.python.org/3/library/subprocess.html) - run external commands

#### Quick Survey

In [None]:
from IPython.display import IFrame
IFrame("", width=760, height=500)