# Lists

This chapter deals with lists. Lists are something that you will use a lot with Python because they allow you to store related values in a single variable. For example:

In [None]:
supported_species = [ "H. sapiens", "M. musculus", "R. norvicus", "D. melanogaster" ]

Lists are defined with square brackets ([]) and the elements in the list are separated by commas (,). In the above example we have a list of strings. But lists can contain all types of variables.

You can also use the `in` operator with to check if an element is in a list:

In [None]:
current_species = 'C. elegans'

if current_species in supported_species:
    print( "Species is supported by this program" )
else:
    print( "This species is unfortunately not supported by this program" )

There are various things you can do with Python lists. For example, you can add another element to the end of the list:

In [None]:
supported_species.append( "C. elegans" )

In [None]:
supported_species

Please note that `list.append` updates the list *in-place*. That means you will not get a new list but instead the old lists gets longer.

I have copied the code from above to the next cell. I changed nothing in this code. But you will now get a different result:

In [None]:
current_species = 'C. elegans'

if current_species in supported_species:
    print( "Species is supported by this program" )
else:
    print( "This species is unfortunately not supported by this program" )

The nice thing about lists is that you can extract individual elements by using their position:

In [None]:
supported_species[ 1 ]

The species at position 1 is "M. musculus". But wait, what about Homo sapiens? Well, Homo sapiens can be found here:

In [None]:
supported_species[ 0 ]

Python lists use zero-based indexing. The *first* entry will *always* be at position zero. This might seem strange to you, but this is actually very convenient once you have to do calculations with these indices.

You can also use negative numbers with lists:

In [None]:
supported_species[ -1 ]

Negative numbers give you the elements from the *end* of the list. This way you can access the elements of a list from two directions: Either from the beginning or the end.

In [None]:
supported_species[ -1 ] == supported_species[ 4 ] # C. elegans is equal to C. elegans

Lists also work wonderfully with `for` loops:

In [None]:
print( "The supported species are:" )
for species in supported_species:
    print( species )

If you also want to print the positions in the list, you can use the `enumerate` function that counts up the entries:

In [None]:
print( "The supported species are:" )
for i, species in enumerate( supported_species ):
    print( i, species )

There are two things to notice here:

* We get two variables (`i` and `species`) from the `enumerate` function. This is just how `enumerate` works.
* Enumerate starts automatically from index 0 because this is what you usually want.

<span style="color:teal">Task:</span> Sometimes you do not want to start from index 0. Fortunately, the `enumerate` function takes two parameters. The second parameter is the starting number and if you do not provide it yourself, it is automatically set to 0. Change the starting number to 1 and run the code again.

<span style="color:teal">Task:</span> If you want to remove elements from a list, you can use the `pop` method. `pop` takes one parameter which is the position of the element that should be removed. What happens when you leave out the parameter and what is the return value of `pop`?

Something else that you often do with lists is sorting. There are two ways to do it:

* use `list.sort` to sort the list *in-place*
* use `sorted( list )` to get a new list that has all elements sorted.

If the list is big, the *in-place* sort is probably the better choice. But if it is not so big, getting a second list with `sorted` will probably be the right thing because it will leave the original list untouched.

In [None]:
# restore the list to make sure the examples below work as intended
supported_species = [ "H. sapiens", "M. musculus", "R. norvicus", "D. melanogaster", "C. elegans" ]

sorted_species = sorted( supported_species )
sorted_species

Python automatically knows how to sort things most of the time. If you have a list of strings, it will sort the values alphabetically.

In [None]:
list_of_numbers = [ 4, 2, 41, 400, 19, 42 ]
sorted( list_of_numbers )

That all looks fine, right? Well, let's take a look at another example:

In [None]:
list_of_string_numbers = [ "4", "2", "41", "400", "19", "42" ]
sorted( list_of_string_numbers )

That's not quiete what we want. So we need to tell the `sorted` function what it needs to do with each element before it compares it to the other elements:

In [None]:
sorted( list_of_string_numbers, key = int )

`int` is another Python function that converts any value to a number. It stands for "integer". Integers are numbers that do not have any digits after the decimal comma. For example, `3.4` is not an integer, because there is a `4` after the comma. If you actually want numbers that have digits after the comma, you need to use the `float` function instead:

In [None]:
float( "3.141592653589793" )

<span style="color:teal">Task:</span> Create a list of all weekdays and then sort them alphabetically in-place

<span style="color:teal">Task:</span> Write a `for` loop that goes through all weekdays and use `print` to display the name of each day. However, when the day is part of the weekend, use the `upper` method on the weekday first before printing to emphasize that it is part of the weekend.

## Strings and lists

Strings and lists have several things in common. Most importantly, both of them are *sequences*. And all sequences can be used with `for` loops:

In [None]:
text = "This is some text"
for letter in text:
    print( letter )

Besides, you can easily convert a string to a list:

In [None]:
letters = list( text )
letters

And you can also go the other direction:

In [None]:
"".join( letters )

The above example probably looks a bit exotic, but let us try something slightly different to illustrate how it works:

In [None]:
"*".join( letters )

When you want to combine several strings from a list together you have two options:

* Glue them together without anything inbetween
* Put something inbetween when you glue them together

In the first example we used an empty string ("") to glue the letters together. In the second example we used the string "*" to glue the letters together.

Another common operation to go from a string to a list is the `split` method:

In [None]:
words = text.split()
words

`split` takes 1 parameter which determines where a string should be split. If you do not provide this parameter, the string will simply be split after each word. For our purposes we will most likely use the tab character as the split parameter:

In [None]:
some_text = "This is some text\tcontaining tab characters\tthese tab characters are often used\tto separate columns\tin datasets"
print( some_text ) #to show how <tab> or \t works
some_text.split( "\t" )

<span style="color:teal">Challenging Task:</span> The next cell defines the string variable `dataset`. First `print` the variable to get an impression of the data. Then `split` it using "\n" (which stands for the *end of line symbol*) so that you have a list of dataset rows. And then `for` each row, split each row by "\t" and extract the column with index 2. Sum up all values from column 2. Keep in mind that the first row contains the column names and not the expression values (you might want to use `enumerate` to determine if you are in the first row).

In [None]:
dataset = 'time\twell\texpression_level\tcomment\n0\tA1\t95\t\n0\tA2\t53\t\n0\tA3\t83\t\n0\tB1\t70\t\n0\tB2\t14\t\n0\tB3\t77\t\n0\tC1\t42\t\n0\tC2\t58\t\n0\tC3\t31\t\n0\tD1\t86\t\n0\tD2\t12\tlow quality\n0\tD3\t47\t\n1\tA1\t30\t\n1\tA2\t99\t\n1\tA3\t8\t\n1\tB1\t77\t\n1\tB2\t12\t\n1\tB3\t63\t\n1\tC1\t13\t\n1\tC2\t7\t\n1\tC3\t35\t\n1\tD1\t1\t\n1\tD2\t44\t\n1\tD3\t16\t\n2\tA1\t96\t\n2\tA2\t85\t\n2\tA3\t20\t\n2\tB1\t83\tlow quality\n2\tB2\t9\t\n2\tB3\t75\t\n2\tC1\t31\t\n2\tC2\t20\t\n2\tC3\t47\t\n2\tD1\t94\t\n2\tD2\t79\t\n2\tD3\t92\t\n3\tA1\t5\t\n3\tA2\t15\t\n3\tA3\t80\t\n3\tB1\t63\t\n3\tB2\t23\t\n3\tB3\t68\t\n3\tC1\t46\tlow quality\n3\tC2\t20\t\n3\tC3\t62\t\n3\tD1\t57\t\n3\tD2\t49\t\n3\tD3\t20\t\n4\tA1\t32\t\n4\tA2\t43\t\n4\tA3\t11\t\n4\tB1\t44\t\n4\tB2\t5\t\n4\tB3\t49\t\n4\tC1\t1\t\n4\tC2\t10\t\n4\tC3\t74\t\n4\tD1\t20\t\n4\tD2\t16\t\n4\tD3\t11\t\n5\tA1\t17\t\n5\tA2\t18\tpotential contamination\n5\tA3\t63\t\n5\tB1\t10\t\n5\tB2\t52\t\n5\tB3\t38\t\n5\tC1\t8\t\n5\tC2\t49\t\n5\tC3\t96\t\n5\tD1\t96\t\n5\tD2\t14\t\n5\tD3\t79\t'
#you can print `dataset` here

In [None]:
#I recommend to sum of column index 2 in this cell

The total sum of all expression values from column index 2 should be 3168.

In [None]:
# what is missing here?

from IPython.display import display, HTML
parts = ['<', 'a', ' ', 'h', 'r', 'e', 'f', '=', '"', '1', '0', ' ', 'D', 'i',
         'c', 't', 'i', 'o', 'n', 'a', 'r', 'i', 'e', 's', '.', 'i', 'p', 'y',
         'n', 'b', '"', ' ', 't', 'a', 'r', 'g', 'e', 't', '=', '"', '_', 'b',
         'l', 'a', 'n', 'k', '"', '>', 'W', 'e', 'i', 't', 'e', 'r', ' ', 'm',
         'i', 't', ' ', 'D', 'i', 'c', 't', 'i', 'o', 'n', 'a', 'r', 'i', 'e',
         's', '<', '/', 'a', '>']
display( HTML( parts ) )