# Intermediate Python for Biologists
## Functional Programming

### Lambda Functions
Lambda functions provide ways to create anonymous functions on the fly. 

The syntax looks like: 
`lambda parameter_list: expression`

Here's how we would write a function using the `def` keyword to reverse an iterable object (e.g. list or string).

Now let's write the same function as a lambda expression. 

(Usually we will use these within higher-order functions, so this will make more sense in combination with other functions later).

### map

This is a built-in higher-order function in Python that applies a function to an iterable (like a list) and returns the new list. Use this when you would have looped over a list and applied a function to each itme.  

The syntax is: `map(function, iterable)`

Let's use this list of animals and get the first letter of each string in the list.

In [10]:
animals = ['sloth', 'cicada', 'daphnia', 'cat', 'sponge']

First, let's do it with a for loop (like we learned in the intro course).

Now, let's do it with a map function.

`map()` doesn’t return a list. It returns an iterator called a map object. To obtain the values from the iterator, you need to either iterate over it or use `list()`

I recommend using `list()` 

In the example above, the lambda functions served as **callback functions** in our higher-order function. Callback functions are functions that are passed in as arguments to other functions. 

### filter

This function filters the given sequence with the help of a function that tests each element in the sequence to be true or not. Use this when you want to select out items from a list based on some condition. 

The syntax is: `filter(function, iterable)`

Let's filter out the numbers in this list that are divisible by 3.

In [28]:
numbers = [3, 45, 29, 81, 9, 34, 2, 10]

First, we'll do it with a for loop (as we have done it before).

Now, let's do it with a filter.

### sorted

The `sorted()` function is another built-in higher-order function. It sorts the elements in a list using any type of custom ordering. 

The syntax is `sorted(list, key=function)`

The key argument is optional and can sort as a sorting key. Otherwise, the function will sort the list alphabetically or numerically. 

Let's use a key function as an argument to sort the list of animals by the length of the string.

The reverse argument will sort the list in descending according to the key function.

### reduce

`reduce()` is a higher order function applies a function to the items in a list two at a time, progressively combining them to produce a single result. 

This function is no longer available in built-in functions in Python, but we can import it from `functools`

The syntax is: `reduce(function, iterable)`

Let's write a program using reduce that adds up the elements of a list.

Python's built-in function sum does the same thing

We can also write a reduce function that finds the maximum value in a list.

(which does the same thing as the max function)

### zip

`zip()` doesn't really fit into the same family of higher-order functions as map, filter, and reduce, but it is super useful when you are working with two (or more) lists. 

It creates an iterator of tuples from the iterators it is passed.

It returns a zip object (lazy evaluation), so we'll need to use `list()`

In [51]:
numbers = [1, 2, 3]
words = ['one', 'two', 'three']

To unzip a list, you can use the `*` operator with the `zip()` function. 

We use something called **tuple unpacking** to assign the variables.

## Independent Work

### DNA Length

Below is a list of DNA sequences. Write a program that gives you a list of their lengths

1. First, write this program using a `for loop`
2. Next, write a program that does the same thing using a built-in higher-order function in Python. 

Feel free to use the example list below or create your own. 

In [1]:
dna_list = ['ATG', 'TAGC', 'ACGTATGC', 'ACGGCTAG', 'GATCGCGC', 'TCGCGCAAAAAA']

### Check for A
Using the same DNA list above (copied below), write a program that creates a list of the sequences that start with 'A'

1. First, write this program using a `for loop`
2. Next, write a program that does the same thing using a built-in higher-order function in Python.

Feel free to use the example list below or create your own. 

In [7]:
dna_list = ['ATG', 'TAGC', 'ACGTATGC', 'ACGGCTAG', 'GATCGCGC', 'TCGCGCAAAAAA']

### Sort by GC content

Write a program that makes a sorted list of DNA sequences by their proportion of GC content using a higher-order function. Have your program order the list DNA sequences from lowest GC content to highest. 

Feel free to use the example list below or create your own. 

In [15]:
dna_list = ['ATG', 'TAGC', 'ACGTATGC', 'ACGGCTAG', 'GATCGCGC', 'TCGCGCAAAAAA']

### Bonus: Sort Poly-A tail

Write a program that will sort a list of DNA sequences with poly-A tails. (Poly-A tails means the sequences ends in multiple A's of varying amounts.) Have your program return a list of sorted sequences with the longest A tail to shortest.

Feel free to use the example list below or create your own. 

*hint: this problem will likely require regular expressions*

In [19]:
poly_a_list = ['ATCGA', 'ACGG', 'CGTAAA', 'ATCGAA']

### Bonus: Processing a BLAST file

In the same folder as the notebooks, there is a file called blast_example.txt. This contains a BLAST result in a tabular format (tab separated file). 

Each row represents a hit and the fields, in order, show you:
1. name of the query sequence
2. name of the subject sequence
3. percentage of positions that are identical between the two sequences
4. alignment length
5. number of mismatches
6. number of gap opens
7. position of the start of the match on the query sequence
8. position of the end of the match on the query sequence
9. position of the start of the match on the subject sequence
10. position of the end of the match on the subject sequence
11. value for the hit
12. bit score for the hit

Using a combination of map, filter, and sorted, answer the following questions about the file:


- How many hits have fewer than 20 mismatches?
- List the subject sequence names for the ten matches with the lowest percentage of identical positions
- For matches where the subject sequence name includes the string "COX1", list the start position on the query as a proportion of the length of the match