## Notebook 2: Arrays, Lists and Tuples Revisited

### Arrays, Lists, and Tuples

One of the most fundamental data structures in any language is the array, yet Python doesn't have a native array data structure.


Principially, an array can be seen like a list with the following differences:
* All elements have to be of the same type, i.e. integer, float (real) or complex numbers
* The number of elements have to be known a priori, i.e. when the array is created. It can't be changed afterwards.



Next to the array, Python has several other data types that can store a sequence of values. The first one is called a `list` and is entered between square brackets. The second one is a tuple (you are right, strange name), and it is entered with parentheses. The difference is that you can change the values of a list after you create them, and you can not do that with a tuple. Other than that, for now you just need to remember that they exist, and that you *cannot* do math with either lists or tuples. When you do `2 * alist` where `alist` is a list, you don't multiply all values in `alist` with the number 2. What happens is that you create a new list that contains `alist` twice (so it adds them back to back). The same holds for tuples. That can be very useful, but not when your intent is to multiply all values by 2. In the example below, the first value in a list is modified. Try to modify one of the values in `btuple` below and you will see that you get an error message:

**List basics**

A list in Python is just an ordered collection of items which can be of any type. By comparison an array is an ordered collection of items of a single type - so in principle a list is more flexible than an array but it is this flexibility that makes things slightly harder when you want to work with a regular structure. A list is also a dynamic mutable type and this means you can add and delete elements from the list at any time.

Recall, to define a list you simply write a comma separated list of items in square brackets:

In [None]:
myList=[1,2,3,4,5,6]

This looks like an array because you can use "slicing" notation to pick out an individual element - indexes start from 0. For example

In [None]:
print myList[2]

Similarly to change the third element you can assign directly to it:

In [None]:
myList[2]=100 

The slicing notation looks like array indexing but it is a lot more flexible. For example

In [None]:
myList[2:5] # sublist from the third element to the fifth 

myList[5:] # the list up to and not including myList[5]

myList[:] # entire list

myList[0:2]=[0,1]

Finally is it worth knowing that the list you assign to a slice doesn't have to be the same size as the slice - it simply replaces it even if it is a different size.

In [None]:
#different data types
gene_exp=['gene',5.16e-08,.000138511,7.33e-08]

In [None]:
print gene_exp[1]
print gene_exp[-2]

You can reassign values:

In [None]:
gene_exp[0]='Lif'
print gene_exp

In [None]:
motif='acggggtc' 
print motif,"\n",type(motif)

We convert a string to a list by using the **list function**

In [None]:
nt=list(motif)
print nt
print nt[1]

In [None]:
# concatenate lists
gene_exp+=[5.16e-08, 0.000138511] # Note in python 3 '+' is the concatenation operator

### List Comprehension
(PFCB p.168)
Python supports a concept called "list comprehensions" which can be used to construct lists in a very natural way.  Essentially, it is Python's way of implementing a well-known notation for sets as used by mathematicians.  List comprehension is an elegant way to define and create list in Python. These lists have often the qualities of sets, but are not in all cases sets.

**Example: Cross product of two sets:**

In [None]:
organisms = [ "mice", "yeast", "E.coli", "Human" ]
phenotypes = [ "nude", "sterile", "diploid" ]
organism_phenotypes = [ (x,y) for x in organisms for y in phenotypes ]
print organism_phenotypes

In [None]:
#What do you predict the result of the following would be?
myList=range(0,5)
print myList
print myList * 2

There are modules (numpy) that provide more sophistication for applying operations to lists.
Before we get to that, you can accomplish what you want with a for loop:

In [None]:
myList=range(0,5)
print myList
squares =[]
for value in myList:
    squares.append(value **2)
    
print squares

We can acccomplish the same using a *list comprehension* approach:

In [None]:
myList=range(0,5)
print myList
squares =[value**2 for value in myList]
print squares


The list comprehension loops thru the list 'myList' and performs soem operation (\*\*2 in this case) on each 'value', and returns the list of results

Pretty nice

In [None]:
geneList = ['attcagaat', 'tgtgaagt','tgtatcgcg','atgtctcta']
firstCodons= [seq[0:3] for seq in geneList]
firstCodons

What if you want to know the base count from each sequence in geneList:

####Arrays 
Principially, an array can be seen like a list with the following differences:
* All elements have to be of the same type, i.e. integer, float (real) or complex numbers
* The number of elements have to be known a priori, i.e. when the array is created. It can't be changed afterwards.

In [None]:
[seq.count('a') for seq in geneList]

In [None]:
alist = [1,2,3]
print 'alist', alist
btuple = (10,20,30)
print 'btuple', btuple
alist[0] = 7  # Since alist is a list, you can change values 
print 'modified alist', alist
#btuple[0] = 100  # Will give an error
#print 2*alist

Lists and tuples are versatile data types in Python. We already used lists without knowing it when we created our first array with the command `array([1,7,2,12])`. What we did is we gave the `array` function one input argument: the list `[1,7,2,12]`, and the `array` function returned a one-dimensional array with those values. Lists and tuples can consist of a sequences of pretty much anything, not just numbers. In the example given below, `alist` contains 5 *things*: the integer 1, the float 20, the word `python`, an array with the values 1,2,3, and finally, the function `len`. The latter means that `alist[4]` is actually the function `len`. That function can be called to determine the length of an array as, for example, `alist[4](alist[3])`. The latter may be a bit confusing, but it is cool behavior if you take the time to think about it.

In [None]:
alist = [1, 20.0, 'python', np.array([1,2,3]), len]
print alist
print alist[0]
print alist[2]
print alist[4]( alist[3] )  # same as len( np.array([1,2,3]) )