[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PyGIS222/Fall2019/blob/master/LessonM33_Lists.ipynb)

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/PyGIS222/Fall2019/master?filepath=LessonM33_Lists.ipynb)

### Notebook Lesson 3.3

# Object Type: Lists

This Jupyter Notebook is part of Module 3 of the course GIS222 (Fall2019).

In this lesson we will revisit some data types and then learn how data can be stored in Python ***lists***, and deepen some concepts of objects in programming. Carefully study the content of this Notebook and use the chance to reflect the material through the interactive examples.

<div class="alert alert-info">

**Note**

For any cell in the notebook to run correctly, you have to run all previous Python code cells that define the respective variables. You just need to press **Shift**-**Enter** to run any cell, or click on the "Run" button in the tool bar at the top of the notebook. You can see if a cell has been executed, if a running number appears in the square brackets at the beginning of the cell. If those brackets are empty, the cell has not been executed. To start over, you can restart the Kernel (Menu item *Kernel* > *Restart*)

</div>

### Sources
This Notebook is an adaption of Lesson 2 from the [Geo-Python](https://geo-python.github.io/site/2018/index.html), which is licensed under CC (Attribution-ShareAlike 4.0 International).

---

# Part A: Let's start with some data

We saw a bit about variables and their values in the last lesson, and we continue here with some variables related to [Observation stations from the Finnland Meteorological Institute (FMI)](http://en.ilmatieteenlaitos.fi/observation-stations). For each station, a number of pieces of information are given, including the name of the station, an FMI station ID number (FMISID), its latitude, its longitude, and the station type. We can store this information and some additional information for a given station in Python as follows:

In [5]:
stationName = 'Helsinki Kaivopuisto'

In [6]:
stationID = 132310

In [7]:
stationLat = 60.15

In [8]:
stationLong = 24.96

In [9]:
stationType = 'Mareographs'

Here we have 5 values assigned to variables related to a single observation station. Each variable has a unique name and they can store different types of data: numbers and strings.

### Reminder: Data types and their compatibility

We can explore the different types of data stored in variables using the `type()` function.

In [10]:
type(stationName)

str

In [11]:
type(stationID)

int

In [12]:
type(stationLat)

float

As expected, we see that the `stationName` is a character string, the `stationID` is an integer, and the `stationLat` is a floating point number.

<div class="alert alert-info">

**Note**

We haven't mentioned it explicitly yet, but the variable names in this lesson use another popular variable format called *camelCase*.
In camelCase the words in the variable name are not separated by underscores or any other character, but rather the first letter is capitalized for all words in the name other than the first one.

</div>

<div class="alert alert-info">

**Note**

Remember, the data types are important because some are not compatible with one another.

</div>

In [13]:
stationName + stationID

TypeError: must be str, not int

Here we get a `TypeError` because Python does not know to combine a string of characters (`stationName`) with an integer value (`stationID`).

### Converting data from one type to another

It is not the case that things like the `stationName` and `stationID` cannot be combined at all, but in order to combine a character string with a number we need to perform a data type conversion to make them compatible. For example, we can could convert the `stationID` integer value into a character string using the `str()` function.

In [14]:
stationIDStr = str(stationID)

In [15]:
type(stationIDStr)

str

In [16]:
print(stationIDStr)

132310


As you can see, `str()` converts a numerical value into a character string with the same numbers as before.

<div class="alert alert-info">

**Note**

Similar to using `str()` to convert numbers to character strings, `int()` can be used to convert strings or floating point numbers to integers and `float()` can be used to convert strings or integers to floating point numbers.

</div>

### Combining text and numbers

Although most mathematical operations operate on numerical values, a common way to combine character strings is using the addition operator `+`.

In [17]:
stationNameAndID = stationName + ": " + str(stationID)

In [18]:
print(stationNameAndID)

Helsinki Kaivopuisto: 132310


Note that here we are converting `stationID` to a character string using the `str()` function within the assignment to the variable `stationNameAndID`. Alternatively, we could have simply added `stationName` and `stationIDStr`.

# Part B: Lists and Indices

Above we have seen a bit of data related to one of several FMI observation stations in the Helsinki area. Rather than having individual variables for each of those stations, we can store many related values in a *collection*. The simplest type of collection in Python is a **list**. And similar to strings, lists are also sequences. However, while strings are not mutable, lists are. Hence, the content of lists is accessible through indicees and can be altered and managed through them. Nowe, let's use lists for storing the FMI station data.

### Creating a list

Let’s first create a list of selected stationName values.

In [2]:
stationNames = ['Helsinki Harmaja', 'Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']

In [3]:
print(stationNames)

['Helsinki Harmaja', 'Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']


In [21]:
type(stationNames)

list

Here we have a list of 4 `stationName` values in a list called `stationNames`. As you can see, the `type()` function recognizes this as a list. Lists can be created using the square brackets (`[` and `]`), with commas separating the values in the list.

<div class="alert alert-info">

**Note**

Similar to using `str()`, `int()` and `float()`, the function `list` can be used to convert strings, integers, floating point numbers or other data types into a list.

</div>

### Index values

To access an individual value in the list we need to use an **index value**. An index value is a number that refers to a given position in the list. Let’s check out the first value in our list as an example:

In [22]:
print(stationNames[1])

Helsinki Kaisaniemi


Wait, what? This is the second value in the list we’ve created, what is wrong? As it turns out, Python (and many other programming languages) start values stored in collections with the index value 0. Thus, to get the value for the first item in the list, we must use index 0.

In [23]:
print(stationNames[0])

Helsinki Harmaja


OK, that makes sense, but it may take some getting used to...

### A useful analog - Bill the vending machine

As it turns out, index values are extremely useful, very commonly used in many programming languages, yet often a point of confusion for new programmers. Thus, we need to have a trick for remembering what an index value is and how they are used. For this, we need to be introduced to Bill.

<img src="M33_Image_BillTheVendingMachine.png" alt="Illustrating indexing: Bill the vending machine." title="Bill the vending machine" width="600" />

Figure 1: *Bill, the vending machine.*


As you can see, Bill is a vending machine that contains 6 items. Like Python lists, the list of items available from Bill starts at 0 and increases in increments of 1.

The way Bill works is that you insert your money, then select the location of the item you wish to receive. In an analogy to Python, we could say Bill is simply a list of food items and the buttons you push to get them are the index values. For example, if you would like to buy a taco from Bill, you would push button `3`. An equivalent operation in Python could simply be

```python
print(Bill[3])
Taco
```

### Number of items in a list

We can find the length of a list using the `len()` function.

In [24]:
len(stationNames)

4

Just as expected, there are 4 values in our list and `len(stationNames)` returns a value of `4`.

### Index value tips

If we know the length of the list, we can now use it to find the value of the last item in the list, right?

In [25]:
print(stationNames[4])

IndexError: list index out of range

What, an `IndexError`?!? That’s right, since our list starts with index 0 and has 4 values, the index of the last item in the list is `len(SampleIDs) - 1`. That isn’t ideal, but fortunately there’s a nice trick in Python to find the last item in a list.

In [26]:
print(stationNames)

['Helsinki Harmaja', 'Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']


In [27]:
print(stationNames[-1])

Helsinki Kumpula


In [28]:
print(stationNames[-4])

Helsinki Harmaja


Yes, in Python you can go backwards through lists by using negative index values. Index `-1` gives the last value in the list and index `-len(SampleIDs)` would give the first. Of course, you still need to keep the index values within their ranges.

In [29]:
print(stationNames[-5])

IndexError: list index out of range

### Modifying list values

Another nice feature of lists is that they are *mutable*, meaning that the values in a list that has been defined can be modified. Consider a list of the observation station types corresponding to the station names in the `stationNames` list.

In [30]:
stationTypes = ['Weather stations', 'Weather stations', 'Weather stations', 'Weather stations']
print(stationTypes)

['Weather stations', 'Weather stations', 'Weather stations', 'Weather stations']


Now as we saw before, the station type for Helsinki Kaivopuisto should be ‘Mareographs’, not ‘Weather stations’. Fortunately, this is an easy fix. We simply replace the value at the corresponding location in the list with the correct one.

In [31]:
stationTypes[2] = 'Mareographs'
print(stationTypes)

['Weather stations', 'Weather stations', 'Mareographs', 'Weather stations']


### Data types in lists

Lists can also store more than one type of data. Let’s consider that in addition to having a list of each station name, FMISID, latitude, etc. we would like to have a list of all of the values for station ‘Helsinki Kaivopuisto’.

In [32]:
stationHelKaivo = [stationName, stationID, stationLat, stationLong, stationType]
print(stationHelKaivo)

['Helsinki Kaivopuisto', 132310, 60.15, 24.96, 'Mareographs']


Here we have one list with 3 different types of data in it. We can confirm this using the `type()` function.

In [33]:
type(stationHelKaivo)

list

In [34]:
type(stationHelKaivo[0])    # The station name

str

In [35]:
type(stationHelKaivo[1])    # The FMISID

int

In [36]:
type(stationHelKaivo[2])    # The station latitude

float

### Adding and removing values from lists

Finally, we can add and remove values from lists to change their lengths. Let’s consider that we no longer want to include the first value in the `stationNames` list.

In [37]:
print(stationNames)

['Helsinki Harmaja', 'Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']


In [38]:
del stationNames[0]

In [39]:
print(stationNames)

['Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']


`del` allows values in lists to be removed. It can also be used to delete values from memory in Python. If we would instead like to add a few samples to the stationNames list, we can do so as follows.

In [40]:
stationNames.append('Helsinki lighthouse')
stationNames.append('Helsinki Malmi airfield')

In [41]:
print(stationNames)

['Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula', 'Helsinki lighthouse', 'Helsinki Malmi airfield']


As you can see, we add values one at a time using `stationNames.append()`. `list.append()` is called a method in Python, which is a function that works for a given data type (a list in this case). We’ll see a bit more about these below.

### The concept of objects

Python is one of a number of computer programming languages that are called ‘object-oriented languages’, and we will focus on this topic in the course module 5. It may take quite some time to understand what this means, but the simple explanation is that we can consider the variables that we define to be ‘objects’ that can contain both data known as **attributes** and a specific set of functions known as **methods**. The previous sentence could also take some time to understand by itself, but using an example the concept of ‘objects’ is much easier to understand.

### A (bad) example of methods

Let’s consider our list `stationNames`. As we know, we already have data in the list `stationNames`, and we can modify that data using built-in methods such as `stationNames.append()`. In this case, the method `append()` is something that exists for lists, but not for other data types. It is intuitive that you might like to add (or append) things to a list, but perhaps it does not make sense to append to other data types.

In [42]:
stationNameLength = len(stationNames)

In [43]:
print(stationNameLength)

5


In [44]:
type(stationNameLength)

int

In [45]:
stationNameLength.append(1)

AttributeError: 'int' object has no attribute 'append'

Here we get an `AttributeError` because there is no method built in to the `int` data type to append to `int` data. While `append()` makes sense for `list` data, it is not sensible for `int` data, which is the reason no such method exists for `int` data.

### Some other useful list methods

With lists we can do a number of useful things, such as count the number of times a value occurs in a list or where it occurs.

In [46]:
stationNames.count('Helsinki Kumpula')    # The count method counts the number of occurences of a value

1

In [47]:
stationNames.index('Helsinki Kumpula')    # The index method gives the index value of an item in a list

2

The good news here is that our selected station name is only in the list once. Should we need to modify it for some reason, we also now know where it is in the list (index `2`).

### Reversing a list

There are two other common methods for lists that we need to see. First, there is the `.reverse()` method, used to reverse the order of items in a list.

In [12]:
stationNames.reverse()

In [13]:
print(stationNames)

['Helsinki Harmaja', 'Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']


Yay, it works!

<div class="alert alert-warning">

**Caution**

A common mistake when sorting lists is to do something like `stationNames = stationNames.reverse()`. **Do not do this!** Lists are mutable and the method `.reverse()` is mutating the list. Also, when reversing lists with `.reverse()` the `None` value is returned (this is why there is no screen ouput when running `stationNames.reverse()`). If you then assign the output of `stationNames.reverse()` to `stationNames` you will reverse the list, but then overwrite its contents with the returned value `None`. This means you’ve deleted the list contents (!).

</div>

In addition to the info in the warning box above, be aware that copying a list variable to a new variable name first might not save you of this mistake. It won't, because if you copy a list to a new variable name, you do not copy the object, just the reference to the object. See the example below think about the result:

In [15]:
stationNames_copy = stationNames
stationNames_copy.reverse()

print(stationNames)
print(stationNames_copy)

['Helsinki Kumpula', 'Helsinki Kaivopuisto', 'Helsinki Kaisaniemi', 'Helsinki Harmaja']
['Helsinki Kumpula', 'Helsinki Kaivopuisto', 'Helsinki Kaisaniemi', 'Helsinki Harmaja']


Reversing the copy `stationNames_copy` of the list `stationNames`, reverses the object that both variable names refer to. Hence, both names return a reversed list. To make an actual duplicated copy of a variable, you have to generate a new object, not just a new variable name. This is specifically important for list, which have many methods that mutate the list object. To achieve that, you have to make a so-called explicit copy of the list, which works the following way:

`newlist = oldlist[:]`.

Now try this with the list `stationNames`, below.

In [17]:
stationNames_copy2 = stationNames[:]
stationNames_copy2.reverse()

print(stationNames)
print(stationNames_copy2)

['Helsinki Kumpula', 'Helsinki Kaivopuisto', 'Helsinki Kaisaniemi', 'Helsinki Harmaja']
['Helsinki Harmaja', 'Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']


Now both list objects are sorted in reverse.

### Sorting a list

The `.sort()` method works the same way.

In [50]:
stationNames.sort()   # Notice no output here...

In [51]:
print(stationNames)

['Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula', 'Helsinki Malmi airfield', 'Helsinki lighthouse']


As you can see, the list has been sorted alphabetically using the `.sort()` method, but there is no screen output when this occurs. Again, the method `.sort()` mutates the list variable. And if you were to assign that output to `stationNames` the list would get sorted, but the contents would then be assigned `None`.

<div class="alert alert-info">

**Note**

As you may have noticed, `Helsinki Malmi airfield` comes before `Helsinki lighthouse` in the sorted list. This is because alphabetical sorting in Python places capital letters before lowercase letters.

</div>

### Summary of important list methods
Below a list of important list methods. Take the time to try these methods and practice their functionality.

Table 1: *Important List Methods*

| Method       | Description |
|--------      |-------------|
| `.append(x)`    | Add item x at the end of the list
| `.remove(x)`    | Remove first item that is equal to x, from the list
| `.count(x)`     | Return the number of items that is equal to x
| `.index(x)`     | Return index of first item that is equal to x
| `.reverse()`    | Reverse the order of items in a list
| `.sort()`       | Sort items in a list in ascending order
| `.pop([i])`     | Remove and return item at position i (last item if i is not provided)
| `.insert(i, x)` | Insert item x at position i
| `.zip()`        | Separates and joins lists of lists

More examples and methods can be studied in this tutorial on list methods: https://www.digitalocean.com/community/tutorials/how-to-use-list-methods-in-python-3.

In addition, you can perform any sequence operations that you have learned with strings, also with lists. A great summary of available list (sequence) operations and methods can be found here:
https://www.tutorialspoint.com/python/python_lists.htm.

### Lists of lists
A list can not just contain and mix numbers and strings, they can also contain lists themselves. For example, we could generate a list `databaseFIM`, which contains all the FMI information at once. Let's first complement the station ID numbers for all stations in the list and then build the database:

In [1]:
stationLats  = [ 60.18, 60.15, 60.20, 60.25, 59.95 ];
stationLongs = [ 24.94, 24.96, 24.96, 25.05, 24.93 ];           
stationIDs   = [100971, 132310, 101004, 101009, 101003];
databaseFIM  = [stationNames, stationIDs, stationTypes, stationLats, stationLongs ];

NameError: name 'stationNames' is not defined

However, simply printing such a nested list won't provide a very illustrative insight, since the lists in the list will just be print after each other:

In [2]:
print(databaseFIM)

NameError: name 'databaseFIM' is not defined

For that, list comprehensions are a very useful coding strategy.

### List comprehensions

List comprehensions allow item by item operation of a sequence. In addition, list Comprehension build a new list by running an expression on each item in a sequence, one at a time, from left to right.

<img src="M33_Image_ListComprehension.png" alt="Concept of List Comprehensions." title="List Comprehensions" width="400" />

Figure 2: *Concept of List Comprehensions.*

A list, or any other iterable object, provides an input sequence (in the example: `num`). Each items within the sequence is assigned to a variable (in the example: `x`). For that, the in keyword is used in a for loop, to iterate over the sequence. 

An additional optional predicate (in the example: `x>0`) can be used to set a conditions under which the  variables will proceed to be processed by the output expression (in the example: `x**2`). 


Using this concept, our FIM database can be print row by row:

In [57]:
for datarow in databaseFIM: print(datarow)

['Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula', 'Helsinki Malmi airfield', 'Helsinki lighthouse']
[100971, 132310, 101004, 101009, 101003]
['Weather stations', 'Weather stations', 'Mareographs', 'Weather stations']
[60.18, 60.15, 60.2, 60.25, 59.95]
[24.94, 24.96, 24.96, 25.05, 24.93]


To access one element of the nested list, indice references are attached together:

In [58]:
databaseFIM[1][1]

132310

At this point, it is important to understand that the previous expression does not directly access the second item in the second row. It actually selects the second list in the variable and then the second item in that list.

Knowing that, a column of the database (the entry of one selected station) is returned the following way:

In [59]:
for datarow in databaseFIM: print(datarow[1])  

Helsinki Kaivopuisto
132310
Weather stations
60.15
24.96


Alternatively:

In [60]:
[datarow[1] for datarow in databaseFIM]

['Helsinki Kaivopuisto', 132310, 'Weather stations', 60.15, 24.96]

Now, one might say, the database should have been structured the other way around (colums and rows inverted). And that might be a valid statement, depending on what are the further tasks to solve. The way lists are nested, should be chosen wisely by the programmer.

### Another simpler example for list comprehensions: Printing a nested list like a matrix
In the following example nested lists are used to build a 3x3 matrix. Then list comprehension is applied to print the matrix as well as selected items, rows and columns.

In [61]:
num = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

The variable `num` is now a list of lists and it could be treated like a matrix. The sublist `num[0]` is a list (and so are `num[1]` and `num[2]`). To view the entire matrix:

In [62]:
for x in num: print(x)

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]


Also single matrix elements can be accessed:

In [63]:
num[1][1]

5

To retrieve the second row, type:

In [64]:
num[1]

[4, 5, 6]

How can you retrieve the second column? Try to code this in the following cell.

And now, write a list comprehension that returns only the even numbers from 0 to 10. Tip: Investigate and use the built in command `range()`.

If you like to further reflect list comprehensions, this page provides some useful examples to study them: https://www.digitalocean.com/community/tutorials/understanding-list-comprehensions-in-python-3