### Review

* functions can call other functions and pass arguments to the other functions. Functions can ALSO BE PASSED AS OBJECTS TO OTHER FUNCTIONS (everything in Python is an object)

* list comprehensions (Lecture_2Cii) compared to for loops
Example: 

In [1]:
# **list comprehension**
evens_to_50=[i for i in range(51) if i%2==0]

print(evens_to_50)

# **imperative programming solution**
evens_to_50=[]
for i in range(51):
    if i%2==0:
        evens_to_50.append(i)
        
print(evens_to_50)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50]


## Functional Programming

-----------------------------------------------
##### Tip: remember how I told you about rosettacode.org in an earlier lecture (Lecture_2E, I showed you the monty hall program solution)? You may find it useful to explore that website and compare the solutions for common programming problems in Haskell versus Python to get a feel for the difference between functional programming languages and Python. 
-----------------------------------------------
* Preamble: You can break down programming problems in three major ways in Python (…yet another reason why python is so flexible):

        1. imperative - what we have learned about so far in this course (I have also called it the outdated term: procedural). This is the "top to bottom" executed programming. 
        
        2. OOP – manipulates collections of objects which have methods associated with them.
        
        3. Functional- decomposes a problem into a pipeline of a set of functions which **only take inputs and output and whose output should be dependent on the specific input that is fed to the function**(this means that there should be no ‘side effects’).
      
* Usually you will get a mix of both of these styles since it is challenging to program in only one or the other

* although there are other programs that have other ways of decomposing problems (We will see SQLite/MYSQL in module 4 and this adopts the philosophy of declaration: you craft a query describing the data set you want to retrieve)

* Map, Filter, Sorted and Reduce are fairly universal (Perl, Java, C++, R etc)
    * Guido van Rossum (inventor of Python) is not a fan of functional programming and didn't initially design Python to include these higher order functions (he wanted everything to be a list comprehension). But they are useful and users demanded that he add them. He begrudgingly complied.  

--------------------------
* As part of Python 3, you will also need to use the list() function in conjunction with the higher order functions that we will learn about. Sorry!
--------------------------

### (some) features of functional programming
1. removing side effects
2. liberal use of anonymous functions
3. functions being treated as objects and being passed to other functions
4. Higher order functions (first class): map, filter, sorted, reduce

__________________________________________________________________________________________

#### Removing "side effects" 
##### What is a 'side effect'? 
* Anything besides returning a value is side effect. Modifying an internal state or making changes can mean:
         * Printing something
         * Modifying an 'out of scope' object (such as a list or a mutable object)
         * Saving something
* You can understand side effect as "modifying the outside world". 
__________________________________________________________________________________________

* Functional style discourages functions with side effects that modify internal state or make other changes that aren’t visible in the return value of the function. 

***The best metaphor I have read describing no side effects: If your function code could be switched with a table (even an infinite one) than it has no side effects.***

* Theoretical and technical: Functional programming can be considered the opposite of object-oriented programming. Objects are little capsules containing some internal state along with a collection of method calls that let you modify this state, and programs consist of making the right set of state changes. Functional programming wants to avoid state changes as much as possible and works with data flowing between functions. In Python you might combine the two approaches by writing functions that take and return instances representing objects in your application (e-mail messages, transactions, etc.).

* Practical application: Functional programming means that you can treat functions as variables and pass them to other functions in the pipeline (Not all languages allow you to pass functions as variables so we should appreciate that Python allows us to do this!)

In [3]:
# one of many examples we'll see of a imperative programming function and a 
# companion functional answer for comparison
# imperative program
def increment(a):
    a+=1
    return a
# b has state here since we are setting it to be equal to 0
# state is a problem because we could change the value of b at a different place in the
# program so functional programming removes state. 
b=0
print(increment(b))

# same intention as above but FUNCTIONAL programming 
print("-----------------")
def increment_func(a):
    return a+1
print(increment_func(0))

1
-----------------
1


### Let's return to "STATE"
* In contrast to “Objected oriented programming”(in which we create objects that have attributes and we manipulate those attributes)
    * Approach to programming which tries to avoid the use of state
    * State <-- a variable that changes during the execution of  a program

<div class="alert alert-block alert-warning">
    
* Example of program with state:

        x=0
        print(x)
        for i in range(10):
            x=x+i
        print(x)
        
* We can tell that x has state because the value stored in x changes as the program runs (it is mutable)

* Let's run the program and see what happens **AND THEN RE-WRITE THE PROGRAM SO THAT IT IS STATELESS BUT DOES THE SAME THING**

    * Value of variable y (instead of x) is set once and never changed



In [28]:
x=0
#this is a variable with state - x is initialized with value =0 and then it is changed
#within the loop to be 45 - permanently.
print(x)
for i in range(10):
    print(x)
    x=x+i
    print(x)
    print(str(i)+" ------- ")
print(x)
print("____FIN_____")

0
0
0
0 ------- 
0
1
1 ------- 
1
3
2 ------- 
3
6
3 ------- 
6
10
4 ------- 
10
15
5 ------- 
15
21
6 ------- 
21
28
7 ------- 
28
36
8 ------- 
36
45
9 ------- 
45
____FIN_____


In [1]:
# Re-write to make the above program using y instead of x and make
# the program "stateless"
#Built-in function sum takes list as argument and returns the
# sum of the elements
print(sum(list(range(10))))
# the value is set once and never changed - you can think of this as
# the following (but we are fudging functional programming a bit by printing the variable)
x=sum(list(range(10)))
print(x)

45
45


### Return to "SIDE EFFECTS"
* Functional programming avoids writing functions with side effects (remember "side effects"is another way of saying that the variable state is changed during the execution of the program. We don't want 'side effects' in functional programming!)

* In the following example a list is used an argument and three more elements are added to it by using list extend method so it changes the value of the list that’s given as an argument--> That is a side effect!

* Beneath it is a different solution to the same problem but without side effects.

**Pure Functions** have no side-effects
    * Easier to reason about what a program is doing
    * Ensures program is more predictable
    * Allows you to 'parallelize' the program

* What’s the big deal about side-effects? 
       * If you have a function that calls other functions, if the variable is modified within functions **you have no idea what the value of the variable is at different points in your program! The program may not be doing what you believe it should be doing!
        * A function should always return the same value if given the same input but when a function references variables other than its arguments then it breaks this rule because it will depend on the value of the other variables at the time the function is called

In [7]:
# Example WITH POTENTIAL SIDE EFFECTS:
# this function permanently changes the list- from outside the function - that is 
# passed to it. That is a SIDE EFFECT. 
def my_function(i):
    i.extend(["a","b","c"])
    return(i)

greetings_list=["Hello","Hola","Bonjour"]
print("The initial list is set to be: "+str(greetings_list))
print("my_function is now called and uses the initial list as an argument. What is the initial list after the loop? ")
print(my_function(greetings_list))
print("*******")
print("We are back in the main body of the program. What is the intial list now? ")
print(greetings_list)
print("The original list passed to the function has been permanently changed. D'oh!")
print("___FIN___")

The initial list is set to be: ['Hello', 'Hola', 'Bonjour']
my_function is now called and uses the initial list as an argument. What is the initial list after the loop? 
['Hello', 'Hola', 'Bonjour', 'a', 'b', 'c']
*******
We are back in the main body of the program. What is the intial list now? 
['Hello', 'Hola', 'Bonjour', 'a', 'b', 'c']
The original list passed to the function has been permanently changed. D'oh!
___FIN___


In [16]:
# NOW WITHOUT SIDE EFFECTS! (different variable names though so it doesn't confuse anyone)
# My_list is not changed outside of the function (as you can see by the print statements)
def my_new_function(m):
    #using the concatenate operator doesn't permanently change the list like extend 
    # does (remember this little difference?)
    m=m+["D","E","F"]
    return m

new_greetings_list=["Guten Tag","Namaste","Ciao"]
print("The initial list is set to be: "+str(new_greetings_list))
print("my_function is now called and uses the initial list as an argument.\
 What is the initial list after the loop? ")
print(my_new_function(new_greetings_list))
print("*******")
print("We are back in the main body of the program. What is the intial list now? ")
print(new_greetings_list)
print("Great! No side effects! The original list is still intact. ")

The initial list is set to be: ['Guten Tag', 'Namaste', 'Ciao']
my_function is now called and uses the initial list as an argument. What is the initial list after the loop? 
['Guten Tag', 'Namaste', 'Ciao', 'D', 'E', 'F']
*******
We are back in the main body of the program. What is the intial list now? 
['Guten Tag', 'Namaste', 'Ciao']
Great! No side effects! The original list is still intact. 


__________________________________________________________________________________________
#### Anonymous Functions:  allows us to not bother defining a new function as long as the function conditions are given in a list comprehension
* Use ‘lambda’ as a placeholder for the function definition
* Useful for a small, quick function not a detailed function that you expect to use more than once or twice
* In the cell below, we are passing the **lambda** function to ‘filter’. In this case, the lambda function is equivalent to the following approximate code:
     
         def by_three(x):
             for i in range(10): 
                 if x % 3 == 0
                     return  x

In [2]:
my_boring_list = list(range(10))

#-------------------------------------------------
# In Python 3, range evaluates simultaneously (not one item at a time as in earlier 
# versions of Python) and has its own type so you need to use the list() function 
# when passing in the higher order functions)
#-------------------------------------------------
# You can see that by unhashing the following code. Note: you can still
# pass the results of the range() as an argument, but printing
# to the screen won't work as expected without list
#my_boring_list = range(10)
print(my_boring_list)
#what is this doing?
print(list(filter(lambda x: x % 3 == 0, my_boring_list)))

range(0, 10)
[0, 3, 6, 9]


In [30]:
#here is the same code but in imperative coding
def by_three(x):
    for i in range(x): 
        if i%3==0:
            only_threes.append(i)

only_threes=[]
by_three(10)
print(only_threes)

[0, 3, 6, 9]


In [2]:
# The names are pretty big hints, but can you tell what the following will print out? 
threes_and_fives=list(filter(lambda x: x%3==0 or x%5==0, range(1,16)))
print(threes_and_fives)

[3, 5, 6, 9, 10, 12, 15]


In [2]:
#we could write a full loop to compare to the above cell: 
three_fives=[]
print(three_fives)
for i in range(1,16):
    if i%3==0 or i%5==0:
        three_fives.append(i)
print(three_fives)

[3, 5, 6, 9, 10, 12, 15]


### functions can be objects
* We haven’t officially discussed objects yet (we will in Lecture_3F) (we have seen string objects, match objects, file objects etc)
* VERY Briefly: functions can be thought of as discrete objects that can be passed around which means that they can be stored in variables, passed to other functions and returned from other functions as return values
* Remember: if you have a function that calls other functions, if the variable is modified within functions, you have no idea what the value of the variable is at different points in your function!

* Example of higher order function: 
    * a higher order function is one that takes another function as one of its arguments (it can take built-in functions or ones that you create yourself)

What is the following program doing?
--------------------------------------------------
  <div class="alert alert-block alert-warning">      
       
       def print_list_fun(my_list,my_function):         
            for element in my_list:
                print(my_function(element))


In [24]:
def get_second(word):
    print(word)
    if len(word)>1:
        return(word[1])
    else:
        return("there is no second element")

#def print_list_fun(my_list,get_second):
def print_list_fun(my_list):
    for element in my_list:
        print(element)
        print("********")
        print(get_second(element))

def my_function(i):
    i.extend(["a","b","c"])
    return(i)

my_list=["Hello","Hola","Bonjour"]
print(my_list)
print("~"*20)
print(my_function(my_list))
#we have altered my_list to include three additional elements
print("Drat! We have changed the original list: "+ str(my_list))
print("**********")
print_list_fun(my_list)

['Hello', 'Hola', 'Bonjour']
~~~~~~~~~~~~~~~~~~~~
['Hello', 'Hola', 'Bonjour', 'a', 'b', 'c']
Drat! We have changed the original list: ['Hello', 'Hola', 'Bonjour', 'a', 'b', 'c']
**********
Hello
********
Hello
e
Hola
********
Hola
o
Bonjour
********
Bonjour
o
a
********
a
there is no second element
b
********
b
there is no second element
c
********
c
there is no second element


In [29]:
# a more straightforward example of what we were doing above that you can play with yourself
def my_function(i):
    #extended the list to include 3 more elements
    i.extend(["a","b","c"])
    return(i)

def print_list_fun(my_list,len):# <-- BUILT IN FUNCTION LEN As argument
    print(my_list)
    print("I'm hanging out in this function, now!")
    for element in my_list:
    #A SIMPLE USE OF BUILT IN FUNCTION LEN()
        print(len(element))
    return("Done!")

my_list=["Hello","Hola","Bonjour"]
print(my_list)
print(my_function(my_list))
print(my_list)
#my_function seems to permanently modify my_list so that new longer list is passed to print_list_fun
print(print_list_fun(my_list,len))
#we are applying the len function to each element of this list

['Hello', 'Hola', 'Bonjour']
['Hello', 'Hola', 'Bonjour', 'a', 'b', 'c']
['Hello', 'Hola', 'Bonjour', 'a', 'b', 'c']
['Hello', 'Hola', 'Bonjour', 'a', 'b', 'c']
I'm hanging out in this function, now!
5
4
7
1
1
1
Done!


In [30]:
#More biologically relevant example (but same principles!)
# We will build on this example as we work through the lambda
# and built in higher order functions
dna_list=["TAGC","ACGTATCGC","ATG","ACGGCTAG"]
lengths=[]
for dna in dna_list:
    lengths.append(len(dna))
print(lengths)

[4, 9, 3, 8]


## LAMBDA EXPRESSIONS
* special reserved word called lambda. It approximates "return".
* Very short functions can be compacted into a single line (Like list comprehensions)
* ALMOST ALWAYS USED WITH MAP,FILTER or REDUCE. In fact, if you try to use lambda without these other functions, you will get a pointer <-- in fact, with Python 3, you will get a pointer if you use these other functions but also don't use list() as shown in the next cell. 
--------------------------------------------------------------------------------------------------
* For instance, this function: 

        def fun_lam_ex (x,y):
			return x+y

* could be replaced by the following: 
        fun_lam_ex=lambda x,y:x+y		
--------------------------------------------------------------------------------------------------  
 * **Anonymous Lambda functions** are not assigned to a variable (in the example above, the results of the lambda function are NOT anonymous


In [35]:
# Here is the list
my_list = list(range(16))
print(my_list)

#here is the anonymous lambda function that prints out only numbers divisible by 3 from my_list
print(list(filter(lambda x: x % 3 == 0, my_list)))

# You need to use the list() in front of the higher level functions with Python 3.0+
# if you fail to use list(), you will simply get a pointer
print(filter(lambda x: x % 3 == 0, my_list))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
[0, 3, 6, 9, 12, 15]
<filter object at 0x10b417550>


# Lecture_2Dii begins here!

## Higher Order Functions
* The following website will help you understand the higher order functions discussed here: https://www.python-course.eu/python3_lambda.php
* It will likely help you a lot as you learn about lambda, map, filter and sort (which are challenging concepts). 
1. **Map**
* Allows you to take a list and apply the same operation/function on all elements of the list resulting in the creation of a new list
* **map(function, series)**
* One-to-one ‘mapping’ between elements of old list and new list
* We could do this with for loops as well but it is such a common procedure that python has a built in function 
    * Promotes encapsulation 
* A BIG difference btw python 2 and python 3 for this function <-- be careful!
    * The map function sets up an empty list, iterates over original list and runs transformation function on each element
    * Python 3 map function results in a map object which you can iterate over; python 2 results in a list where the entire list is created at once.
**This is an important distinction because it means that the elements are generated AS NEEDED in python 3 - called “lazy” evaluation. So if we are dealing with a very large list, python 3 is more efficient.**

Why are map and reduce better than loops?:

* they are often one-liners. The important parts of the iteration - the collection, the operation and the return value - are always in the same places in every map and reduce.

* The code in a loop may affect variables defined before it or code that runs after it. By convention, maps and reduces are functional.

* map and reduce are elemental operations. Every time a person reads a for loop, they have to work through the logic line by line. There are few structural regularities they can use to create a scaffolding on which to hang their understanding of the code.

* map and reduce have many friends that provide useful, tweaked versions of their basic behaviour. For example: filter (<--we'll discuss ) all, any and find (<--we won't discuss these ones).

In [2]:
# An example of the use of map function: 
squares = list(map(lambda x: x * x, [0, 1, 2, 3, 4]))
#if you fail to use list()
square_no_List=map(lambda x: x * x, [0, 1, 2, 3, 4])
print(squares)
print(square_no_List)

[0, 1, 4, 9, 16]
<map object at 0x103d48610>


In [38]:
# Using a plain old vanilla for loop that takes nt sequences
# and puts their length into a new list called lengths
dna_list=["TAGC","ACGTATCGC","ATG","ACGGCTAG"]
lengths=[]
for dna in dna_list:
    lengths.append(len(dna))
print(lengths)
print("_____FIN_____")

[4, 9, 3, 8]
_____FIN_____


In [39]:
# Here we have translated the above loop into functional python: 
dna_list_2=["TAGC","ACGTATCGC","ATG","ACGGCTAG"]
lengths_2=list(map(len,dna_list_2))
print(lengths_2)
print("----------")

#If we wanted a user defined function instead of a built in
# function like len, we could use a lambda function instead
# defining the function. 
# Same thing but using a map function and a lambda function
at_contents=list(map(lambda dna : round((dna.count("A")+dna.count("T"))/len(dna),2),dna_list_2))
print(at_contents)

#another example
Is_terminal_C=list(map(lambda x: x.endswith("C"),dna_list_2))
print(Is_terminal_C)

[4, 9, 3, 8]
----------
[0.5, 0.44, 0.67, 0.38]
[True, True, False, False]


In [33]:
%%time
#One last example to discuss. This example SHOULD SHOW THE DIFFERENCE BETWEEN PYTHON 2 
# AND PYTHON 3 
#l=list(range(100000))
l=list(range(1000))
print(l)
# In Python 2, this is too big and will take too long because in Python 2, the entire list
# created simultaneously. 
# but python 3 is 'lazy' - it doesn't read all elements in at the 
# same time so a really looooooong list SHOULD still be easily parsed. 
# However, in our case jupyter is the limiting factor. 
m=list(map(lambda x:x**2,l))
print(m)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221,

2. Filter
    * __we have a list from which we want to select only elements that satisfy a particular condition__
    * once again, we could do this through a for loop (1st example in next cell) or with the higher order function, filter


In [42]:
# OOOOOH! This combines a bunch of stuff that we have been investigating lately.
# a list of anonymous dictionaries. 
human_genes = [{'gene': 'foxp2', 'length': 2145},{'gene': 'EDAR', 'length': 1344},{'gene': 'BRCA1'}]

length_total = 0
length_count = 0
for gene in human_genes:
    print(gene)
    print("------")
    if 'length' in gene:
        print("*******")
        print(gene['length'])
        length_total += gene['length']
        length_count += 1
        print(length_count)

if length_count > 0:
    average_length = length_total / length_count

    print(average_length)
 

{'gene': 'foxp2', 'length': 2145}
------
*******
2145
1
{'gene': 'EDAR', 'length': 1344}
------
*******
1344
2
{'gene': 'BRCA1'}
------
1744.5


In [41]:
# Now we can re-write this with functional programming
# This is a challenging way of thinking. It takes a lot of 
# practice hours to make it stick in your brain...
# or at least in mine...YMMV. 

# ------------------
# We will use the reduce function - we'll see it later - for now, we need to import it
# ------------------

from functools import reduce

def add(x,y):
    return x+y

human_genes = [{'gene': 'foxp2', 'length': 2145},{'gene': 'EDAR', 'length': 1344},{'gene': 'BRCA1'}]

#what are we actually doing in this code?
# Take some time to think about what is happening here!
#syntax of map is: map(function, list)
lengths = list(map(lambda x: x['length'],filter(lambda x: 'length' in x, human_genes)))
print("**********")
print(lengths)
print("~~~~~~~~~~")
if len(lengths) > 0:
    average_length = reduce(add, lengths)/len(lengths)
    print(average_length)

**********
[2145, 1344]
~~~~~~~~~~
1744.5


In [5]:
# For loop way: 
dna_list=["TAGC","ACGTATCGC","ATG","ACGGCTAG"]
long_dna=[]
for dna in dna_list:
    if len(dna) >5:
        long_dna.append(dna)
        
print(long_dna)
print("__FIN__")

['ACGTATCGC', 'ACGGCTAG']
__FIN__


In [34]:
#"Now with the mighty FILTER function"
#even more efficient way of handling this code is the following
def is_long(dna):
    return len(dna)>5

dna_list_2=["TAGC","ACGTATCGC","ATG","ACGGCTAG"]

long_dna_2=list(filter(is_long,dna_list_2))

print(long_dna_2)
#this hasn't changed the original list - see? We can print it out!
print(dna_list_2)
print("__FIN__")

['ACGTATCGC', 'ACGGCTAG']
['TAGC', 'ACGTATCGC', 'ATG', 'ACGGCTAG']
__FIN__


3. sorted
    * The most complicated of the higher order functions
    * Note that it is a different function than the sort method (which can only be used by lists)
    * We supply sorted function with another function that tells python the property of each element that we want to sort on
    * Doesn’t change the original list
    * Can use any kind of custom ordering
        * need to use key= something. The key function manages the sort; it must take a single argument and return the value that we want to sort on (in the second example in the cell below, key=len which means that the criteria for the sort is ths length of the sequence. In the first example, no key is specified so it will use the default value of alphabetical or ascending)
        * you can also combine previous ideas that we have used - such as regular expressions - in a sorted call. 

In [4]:
# basic sorted example with just the built in criteria
dna_list_sorted=["TAGC","ACGTATCGC","ATG","ACGGCTAG"]
# without an explicit order given, it will sort sequences alphabetically and numbers in ascending order
#sorted takes an iterable as the first argument and can take a key function but here we just see an iterable
#in this case, a list
sorted_dna=sorted(dna_list_sorted)
print(sorted_dna)
print("sorted doesn't change the original list: ") 
print(dna_list_sorted)
print("__FIN__")

['ACGGCTAG', 'ACGTATCGC', 'ATG', 'TAGC']
sorted doesn't change the original list: 
['TAGC', 'ACGTATCGC', 'ATG', 'ACGGCTAG']
__FIN__


In [36]:
print("A more complicated example of sorting sequences by length instead of the default of alphabetical ")
dna_list_sorted_2=["TAGC","ACGTATCGC","ATG","ACGGCTAG"]
# we are still providing an iterable as the first argument but a built-in KEY FUNCTION, len, as the sort criteria
sorted_dna=sorted(dna_list_sorted_2, key=len) 
print(sorted_dna)

A more complicated example of sorting sequences by length instead of the default of alphabetical 
['ATG', 'TAGC', 'ACGGCTAG', 'ACGTATCGC']


In [37]:
print("We can also reverse the sort")
dna_list_sorted_2b=["TAGC","ACGTATCGC","ATG","ACGGCTAG"]
sorted_dna_2b=sorted(dna_list_sorted_2b, key=len, reverse=True)
print(sorted_dna_2b)
print("Here is the sorted list: ")
print(sorted_dna)
print("Here is the above criteria reversed")
print(sorted_dna_2b)
print("****FIN 2****")

We can also reverse the sort
['ACGTATCGC', 'ACGGCTAG', 'TAGC', 'ATG']
Here is the sorted list: 
['ATG', 'TAGC', 'ACGGCTAG', 'ACGTATCGC']
Here is the above criteria reversed
['ACGTATCGC', 'ACGGCTAG', 'TAGC', 'ATG']
****FIN 2****


In [41]:
#What about the following code? What does it do? What are we searching for in this case? 
import re
def poly_a_length(dna):
    poly_a_match=re.search(r"A+$",dna)
    if poly_a_match:
        a_len=len(poly_a_match.group())
        return a_len
    else:
        return 0

another_dna_list=["GTAGCA","CCGTATCGCAAAA","AATG","ACGGCTAGAA"]
# -------------------------------------------
# We can sort based on a function that we create ourselves
# instead of relying on built-in functions
# -------------------------------------------
print(sorted(another_dna_list,key=poly_a_length))
print("___FIN3___")

['AATG', 'GTAGCA', 'ACGGCTAGAA', 'CCGTATCGCAAAA']
___FIN3___


In [42]:
# What are we doing here with this sort? 
def get_cg(dna):
    return (dna.count("G")+dna.count("C"))/len(dna)

dna_list_4=["TAGC", "ACGTATCG", "ATG", "ACGGTACG"]
sorted_dna=sorted(dna_list_4,key=get_cg)
print(sorted_dna)

['ACGGTACG', 'TAGC', 'ACGTATCG', 'ATG']


4. reduce
    * As of this course in Python 3, you will need to import a module called 'functools' in order to use the reduce function. 
    * takes two arguments (function, list) and it uses the function to reduce the list to a single value
    * example:
            * What is happening? Reduce calls multiply using 2 and 6 as arguments to get 12. then it takes 3 and 12 as arguments to get 36 and then it will take 36 and 8 etc. 
            * Last common ancestor and phylogenies might use this strategy. 
--------------------------------------------------
<div class="alert alert-block alert-warning">  
    
    from functools import reduce
    def multiply(x,y):
        
        return x*y
	
    numbers=[2,6,3,8,5,4]
    
    print(reduce(multiply, numbers))


In [9]:
from functools import reduce
def multiply(x,y):
    return x*y

numbers=[2,6,3,8,5,4]
print("--------")
print(reduce(multiply,numbers))

--------
5760


### We can mix-in-match our higher order functions and throw in some anonymous lambda function as wel. 

In [6]:
# In this example, we have a tuple with chromosome number, position of gene and gene name. 
# This script applies two rounds of sorting
# 1. Chromosome number
# 2. Within each chromosome the base position 

Loci=[(4,9200,"gene 1"),(6,63788,"gene 2"),(4,8766,"gene 3")]
def get_chromosome(locus):
    return locus[0]
def get_base_position(locus):
    return locus[1]

sorted_by_base=sorted(Loci,key =get_base_position)

final_sort=sorted(sorted_by_base,key =get_chromosome)
print(final_sort)
print("__FIN1__")

#you can combine anonymous lambda functions with higher order functions
# Use map to process characters in a string and then pass the resulting
# iterable object straight to filter

first=list(map(lambda x:x.upper(),"abcdef"))
Second=list(filter(lambda x:x in ["A","B"],first))
print(first)
print("NOW SECOND")
print(Second)

result= list(filter(lambda x:x in ["C","A"],map(lambda x:x.upper(),"atgcag")))
print(result)


[(4, 8766, 'gene 3'), (4, 9200, 'gene 1'), (6, 63788, 'gene 2')]
__FIN1__
['A', 'B', 'C', 'D', 'E', 'F']
NOW SECOND
['A', 'B']
['A', 'C', 'A']


### Function Factories
* 	How to write higher order functions
* __Example:__ 

* We want to write a script that creates a list of overlapping 4-mers for a DNA sequence
--------------------------------------------------
    def get_4mers(dna):
		4mers=[]
		for i in range(len(dna)-1):
				4mers.append(dna[i:i+4])
		return 4mers	
--------------------------------------------------


* Now for 6-mers: 
--------------------------------------------------
    def get_6mers(dna):
		6mers=[]
		for i in range(len(dna)-1):
				6mers.append(dna[i:i+6])
		return 6mers	


* Abstract it out to get a flexible function that can have any size mer specified by the user:   
--------------------------------------------------
    def get_kmers(dna,k):
		kmers=[]
		for i in range(len(dna)-k+1):
				kmers.append(dna[i:i+k])
		return kmers	

