# Session 2: Collections (Lists, etc), List Comprehensions and Loops (if-then & while)
This notebook provides an introduction to Python lists and to processing of lists using loops and list comprehension. If you are opening the \*.IPYNB file, then this has an associated python kernel and you can change and rerun the python code below at any time.

**Note:** to calculate the contents of a cell, click on the play button in the toolbar or press `<Shift>` and `<Enter>` at the same time. You can insert a new cell following if you press `<Alt>` and `<Enter>`. You can also insert cells above and below by using 'Insert' menu item above or by pressing 'a' (above) or 'b' (below) when a cell is active but not being edited (i.e. the cursor is not in the cell itself).

In [1]:
# Import libraries and print the version numbers for troubleshooting
import sys ; print(f'Python version is {sys.version}')
from matplotlib import __version__ as mpl_ver ; print(f'MatPlotLib version is {mpl_ver}')
from ipympl import __version__ as ipympl_ver ; print(f'IPyMPL version is {ipympl_ver}')

Python version is 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:01:37) [MSC v.1935 64 bit (AMD64)]
MatPlotLib version is 3.8.2
IPythonMPL version is 0.9.3


## Introduction
Collections are groups of data that may be of various types themselves. This allows us to carry out operations on groups of data instead of just individual items, so this is very powerful. The equivalent in Excel is the `Array`, but Excel arrays are quite limited relative to Python collections. Excel VBA provides access to similar collections as provided in Python, so the general principles you learn here will have application to Excel VBA. Note that Excel VBA does not have the 'list comprehension'.


![Excel Arrays](Resources/ExcelImages/Arrays.png)

**Lists** are a 'collection' data type with fast access to the last item added. Other collections include: 
* **tuples** - an immutable list that can be used as a key in a dictionary (immutable means that items cannot be changed once the tuple is initialised)
* **dictionaries** (or keyed collections) - for storage and rapid lookup of data based on keys
* **set** - a sorted list where all entries are unique (if you try to add a duplicate it will be ignored)
* **deque** - a linked list where access to either end of the list is fast (access inside the list is slower than a dictionary)

There are also variations on these, such as:
* **sorteddict** - sorted dictionary (slower for lookup but allow the user to address items using an index)
* **namedtuple** - each element of the tuple has a key

Reference for special collections:
* https://docs.python.org/3/library/collections.html

## Creating Lists

In [None]:
# create a list of squares, assigning it to the variable 'squares'
squares = [1, 4, 9, 16, 25]

In [None]:
# To display the result
squares

In [None]:
# we can also use the list() function - note that it expects to 
#    receive only one item, so we put the numbers in parenthesis (which actually is a data type called a 'tuple')
list((1, 4, 9, 16, 25))

In [None]:
# we can add to it (note that if you run this cell twice, it will add '36' to the list twice, since the 
#                   new 'squares' already contains the '36' that you added previously. 
squares.append(36)
print(squares)

In [None]:
# We can append multiple items, by using the 'extend' function
squares.extend([49, 64])

In [None]:
# You will note that the cell did not provide the changed variable as an 'Out'.
# To see the result, we can type the variable name into a cell
squares

In [None]:
# You can also 'add' lists together 
squares + [81, 100]

In [None]:
# note that the previous operation did not change the value of the variable 'squares'
# we can see this by typing the name into a cell
squares

In [None]:
# We can change the value of 'square by using the '+=' function 
#  - this works in the same way as 'x = 2', then 'x += 3' leads to 'x = 5'
squares += [81, 100]

In [None]:
# to see the result we can type the name again (alternatively we can use the 'print' function explained later)
squares

In [None]:
# You can make lists of strings
fruit = ["apple", "banana", "rambutan", "durian", "papaya"]
fruit

In [None]:
# You can add lists with different data types
[1, 4, 9] + fruit

In [None]:
# You can make lists of lists
[[1, 3, 5],[2, 4, 6]]

In [None]:
# we can also start by creating an empty list and appending to it
emptylist = []
emptylist.append([1, 3, 5])
emptylist.append([2, 4, 6])
emptylist

## Operations on Lists
There are two kinds of functions that operate on lists.

The first are functions that operate on lists and produce a result. We have already met the add operation ('+'). Key ones include 'len()' for length, 'sum()' for summation, 'sorted()' for a sorted list,  We will meet 'print' in the next section. 

The second are ones that modify the list object itself, such as the 'append' and 'extend' functions that we saw before and which followed the name of the variable.

In [None]:
numbers = [2, 5, 7]
sum(numbers)

In [None]:
sorted(fruit)

In [None]:
# note that the original list is not affected - we would need to save as a new variable e.g. sortedfruit = sorted(fruit)
fruit

In [None]:
# If we want to make a change to the list itself, then we need to use the 'sort' method. Note that we are using 
# the print function so that we can print out the values before and after we carry out the operation. Otherwise we 
# will only see the end result. Note that this is a function that appears after the list variable name because
# it acts on the object itself - i.e. list.sort() rather than sort(list).
newfruit = ['cherry', 'blackcurrant', 'pear', 'melon']
print(newfruit)
newfruit.sort()
print(newfruit)

## Editing Lists
You can make changes to the elements of lists, because they are not static. Lists are editable ('mutable'). The elements of a list are accessed using the square bracket notation - e.g. list[3]. Note that the list numbering starts with zero. Fruit[0] = "apple".

In [None]:
# This changes item '1' to 'raspberry'. We can view the revised contents by 'printing to output' - print(list)
fruit[1] = "raspberry"
print(fruit)

In [None]:
# We can find the index of a element in the list using the 'index' function
fruit.index('durian')

In [None]:
# You can change elements by counting from the end as well - use negative integers
fruit[-1] = "blueberry"
print(fruit)

In [None]:
# We can insert items into lists using list.insert()
fruit.insert(3, 'tomato')
print(fruit)

In [None]:
# We can remove items from lists using 'list.remove()' or 'del list[i]'
print(fruit)
fruit.remove('rambutan')
print(fruit)
del fruit[0]
print(fruit)

## List Addressing and List Unpacking
Extracting data from a list can be done by using indexes or it can be setting a collection of variables to be equal to the list (list unpacking).

In [None]:
# List indexing (note that the first item in the list is item '0')
berries = ['raspberry', 'redcurrant']
a = berries[0]
b = berries[1]
print('first berry is', a, ', second berry is', b)

In [None]:
# List unpacking - Python has a special way of extracting the contents of lists and tuples using an asterix
# Multiple variables can be assigned to  providing multiple variables to receive the data
a, b = 3, 5
print('a is', a, '  b is', b)

c, d = berries
print('first berry is', c, ', second berry is', d)

In [None]:
# If the length of the list is unknown then the remainder can be assigned to a variable
a_list = [0,1,2,3,4,5]
e, f, *g = a_list    # note the asterisk '*' before the g
print(e)
print(f)
print(len(g), 'elements remaining:', g)

## List Slicing
This is the process of extracting elements from a list. This is an extension of the square brackets notation we used earlier.

In [None]:
print(fruit)      # let's remind ourselves what the list contains
print(fruit[2])   # prints the fruit at location 2 - i.e. the third in the list
print(fruit[1:2])   # prints the fruit in a range
print(fruit[:2])  # prints the fruit *up to but not including* location 2
print(fruit[2:])  # prints the fruit *including and up from* location 2 - note that fruit = fruit[:2] + fruit[2:]
print(fruit[:-1])  # prints all fruit apart from the last item
print(fruit[:-3])  # prints all fruit apart from the last three items

In [None]:
# You can also specify the steps that should be taken between each item
print(fruit[::2])

In [None]:
# This step can also be negative, which can give you a reversed list
print(fruit[::-1])

## Looping
Python provides a structure for looping through lists using the ***for*** keyword in the  form of `for list_item in list:`, followed by indented commands to be carried out using each candidate of the list.

In [None]:
numbers = [0, 1, 2, 3, 4, 5]
for n in numbers:
    print(2 * n + 1, end=", ")

## List Comprehension
Python has some powerful ways of processing lists - no need to carry out indexed sweeps through data (e.g. for i = 0 to 22...).

In [None]:
numbers = [0, 1, 2, 3, 4, 5]
print(numbers)

In [None]:
# To generate the squares of the numbers
[i * i for i in numbers]

In [None]:
# incidentally, we can also do this using a range function:
number_range = range(5)
print(number_range)
print(list(number_range))
[ i * i for i in number_range]

In [None]:
# we can also add conditions - in this case filtering for odd numbers
[ i * i for i in number_range if i % 2 == 1]

## Multi-factor List Comprehension
List comprehension can be extended to multiple lists, by zipping them together and then extracting corresponding pairs of data using unpacking (in this case `f, b = ...`):

In [None]:
# Using a 'for.. do' on two lists of numbers (it stops once one list is exhausted)
for f, b in zip(numbers, fruit):
    print(f, b)

In [None]:
# Using a list comprehension on two lists of numbers (it stops once one list is exhausted)
# note that the value returned by print() is always None (after it has printed to console)
[print(f, b) for f, b in zip(numbers, fruit)]

If a collection can be enumerated (i.e. 'numbered') then you can use the `enumerate` function to pair up an index number with each item (it returns this data as a pair that can be unpacked).

In [None]:
# We can hide the 'Nones' by assigning them to a variable 
# that we then ignore.
ignore = [print(i, b) for i, b in enumerate(fruit)]

## Practical Example
In this example we are going to read in data from csv data files (you could also read in data from Excel, but we will look at that later).
1. Read in data from a *comma-separated variable* file (csv) using the `csv` library.
2. Clean data - convert from text to floats
3. Plot data (using `matplotlib`)

In [None]:
import csv                       # import the library for reading CSV files
with open("data1.csv") as f:     # 'f' is the pointer to the file
    reader = csv.reader(f)       # 'reader' points to the file contents
    #next(reader)                # uncomment to skip first line if it contains titles
    data1 = []                   # creating an empty list ('data1')
    for row in reader:           # iterating through the file contents, line by line
        data1.append(row)        # adding a line to the list ('data1')

In [None]:
# This data is a list of lists (i.e. nested lists)
# Note that the data is interpreted as text, not as numbers
print(data1)

In [None]:
# We can use nested list comprehension to convert the data
# This uses data unpacking to extract the three values from each top-level item in the list.
data1f = [[float(x), float(y), float(z)] for x,y,z in data1]
data1f

In [None]:
# This could also be achieved using list comprehension:
import csv
with open("data2.csv") as f:
    reader = csv.reader(f)
    #next(reader)    # uncomment to skip first line if it contains titles
    data2 = [r for r in reader]

In [None]:
print(data2)

In [None]:
data2f = [[float(x), float(y), float(z)] for x,y,z in data2]
data2f

### Plotting Data using MatPlotLib
Python has a [plotting library called `matplotlib`](https://matplotlib.org/) that is based on a MatLab paradigm. We will use the [pyplot functions](https://matplotlib.org/api/pyplot_api.html) to plot the data we have imported. More will be provided on this later. 

In [None]:
import matplotlib.pyplot as plt        # importing the plotting library with an alias 'plt'
# a 'magic' command to define how the plot is displayed (three of the options are 'widget', 'inline' & 'notebook' )
%matplotlib widget                   

In [None]:
# extract three lists of data from a list of data triplets
x, y1, y2 = zip(*data1f)               # using unpacking to convert 6 triplets to 3 lists of 6
plt.plot(x, y1, x, y2)                 # generating plot with two pairs of data
plt.show()                             # displaying the plot

## List Data Exercise
Read in the data from the two files which contain vector information as x, y, z (`data3.csv` and `data4.csv` and calculate the angle between two vectors (line by line) using looping and/or list comprehension.
1. Start by creating a function that calculates the angle (hint: use [https://en.wikipedia.org/wiki/Cosine_similarity](cosine) relationship, noting that `A dot B = Ax * Bx + Ay * By + Az * Bz`).
2. Apply that function over the list using either list comprehension or simple loops.

```python
def vector_angle(vec_1, vec_2):
    # your code goes here
    return angle
```

## Useful Resources
* The Python tutorial - https://docs.python.org/3.6/tutorial/index.html
>* Python Lists https://docs.python.org/3.6/tutorial/introduction.html#lists
* Google's Python class - https://developers.google.com/edu/python/
>* Lists https://developers.google.com/edu/python/lists
* The Python Course - http://www.python-course.eu/python3_course.php
>* Sequential data types - http://www.python-course.eu/python3_sequential_data_types.php
>* List manipulation - http://www.python-course.eu/python3_list_manipulation.php 
>* List comprehension - http://www.python-course.eu/python3_list_comprehension.php
