# SAO/LIP Python Primer Course Lecture 4

In this notebook, you will learn about:
- List Comprehension
- Dictionaries
- The `numpy` Library
- `numpy` Array Generation
- Properties of Arrays

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/acorreia61201/SAOPythonPrimer/blob/main/lectures/Lecture4.ipynb)

By now, you've gotten a decent amount of experience with lists and iteration. Let's go over some more robust techniques for generating sequences of data, including with the `numpy` library, another widely-used library throughout scientific computing. 

## List Comprehension

First, let's go over *list comprehension*, an automated method of generating lists with iteration. In the last exercise set, you were asked to generate a list of numbers from $-\pi$ to $\pi$ using only what we've covered so far:

In [1]:
import math as m

test = list(range(-10, 11)) # list of 20 integers from -10 to 10 (recall the stop arg is excluded)

for i in range(len(test)):
    test[i] *= m.pi/10 # scale [-10, 10] to [-pi, pi]
    
test

[-3.141592653589793,
 -2.827433388230814,
 -2.5132741228718345,
 -2.199114857512855,
 -1.8849555921538759,
 -1.5707963267948966,
 -1.2566370614359172,
 -0.9424777960769379,
 -0.6283185307179586,
 -0.3141592653589793,
 0.0,
 0.3141592653589793,
 0.6283185307179586,
 0.9424777960769379,
 1.2566370614359172,
 1.5707963267948966,
 1.8849555921538759,
 2.199114857512855,
 2.5132741228718345,
 2.827433388230814,
 3.141592653589793]

We can actually combine the list generation and iteration into one statement using list comprehension. In fact, we can rewrite the first line altogether using list comprehension; let's do so down below:

In [2]:
test2 = [-10 + i for i in range(21)]
test2

[-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

You can see we get the same result as if we used the syntax above:

In [4]:
test = list(range(-10, 11))
test

[-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Here, we can see the basic syntax for list comprehension. When defining a list, you use the format `list = [expression for item in iterable]`, where `expression` defines how to calculate each element and `item` represents whatever you're iterating over in your `iterable`. Essentially, the syntax can be rewritten as a `for` loop as follows:

In [6]:
test2 = []
for i in range(21):
    test2.append(-10 + i)
    
test2

[-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Let's rewrite all the code from above in one line using list comprehension:

In [8]:
test2 = [i * m.pi/10 for i in range(-10, 11)]
test2

[-3.141592653589793,
 -2.827433388230814,
 -2.5132741228718345,
 -2.199114857512855,
 -1.8849555921538759,
 -1.5707963267948966,
 -1.2566370614359172,
 -0.9424777960769379,
 -0.6283185307179586,
 -0.3141592653589793,
 0.0,
 0.3141592653589793,
 0.6283185307179586,
 0.9424777960769379,
 1.2566370614359172,
 1.5707963267948966,
 1.8849555921538759,
 2.199114857512855,
 2.5132741228718345,
 2.827433388230814,
 3.141592653589793]

Here, we iterate over the range $[-10, 10]$, and we generate a list where each element is a value in that range multiplied by $\pi/10$.

We can also apply `if` conditions when using list comprehension. For example, let's say I have the following list:

In [10]:
fruits = ['apples', 'blueberries', 'cherries', 'dates', 'eggplants']

Let's say I wanted to filter this list so that it only contains fruits that contain the letter a. We can do this by iterating over this list and saving them to a new list only if they contain an a:

In [11]:
a_fruits = [x for x in fruits if 'a' in x]
a_fruits

['apples', 'dates', 'eggplants']

(Note the if statement; the syntax `'substring' in string` is another condition that returns `True` if the string contains `'substring'`.) Here, we've placed the filter after the `for` statement that iterates over `fruits`. We can also include an `if` statement within the condition at the start to control what gets output when certain conditions are met. For example, let's say I wanted to make all the values in `test2` non-negative. I can write:

In [14]:
test3 = [x if x >= 0. else -x for x in test2]
test3

[3.141592653589793,
 2.827433388230814,
 2.5132741228718345,
 2.199114857512855,
 1.8849555921538759,
 1.5707963267948966,
 1.2566370614359172,
 0.9424777960769379,
 0.6283185307179586,
 0.3141592653589793,
 0.0,
 0.3141592653589793,
 0.6283185307179586,
 0.9424777960769379,
 1.2566370614359172,
 1.5707963267948966,
 1.8849555921538759,
 2.199114857512855,
 2.5132741228718345,
 2.827433388230814,
 3.141592653589793]

Here, I generate a new list `test3` using `test2`. If the value in `test2` is non-negative (i.e. greater than or equal to zero), I simply append that value to `test3`. Otherwise, if the value is less than zero, I negate it and add it to `test3`. There are many other uses for list comprehension; knowing when and how to use them can greatly simplify generating lists when doing numerical analyses.

## Dictionaries

So far, all of the data structures we've encountered have been indexed entirely by ordered integer values. Sometimes, it's convenient to store values based not on arbitrary integer values but by some more descriptive or otherwise related value. In Python, there's another data type used exactly for this purpose: the *dictionary*. Unlike lists, which store single data types indexed by integers, dictionaries store *values* that can be referred to by *keys*. 

Let's set up an analogy: Say I wanted to store the names of students in a class I'm teaching. I could easily do this using a list:

In [15]:
student_list = ['Alice', 'Ben', 'Carlos']

As you know, I can call the name of a student using their index in the list. Conversely, I could use `index()` to get what index the student's name takes up:

In [17]:
student_list[1]

'Ben'

In [18]:
student_list.index('Carlos')

2

This is trivial for three students, but what if I had 50 students? It would get annoying if I had to run `index()` every time I wanted to pull up a new student. Furthermore, what if I wanted to simultaneously store information about their grades with their names? I could just use tuples, but recall that tuples are immutable; I'd have to redefine the elements every time their grade updated, and I'd have to remember both their name and their grade to use `index()`.

We can simplify these problems using a dictionary:

In [19]:
student_dict = {'Alice': 92, 'Ben': 74, 'Carlos': 83}

This shows the basic syntax of a dictionary; the elements form a comma-separated list encased in braces `{}` with entries of the format `key: value`. The keys serve the same role as indices in tuples or lists, and calling them will refer to the values assigned to them. For example, if I want to read a value (e.g. the student's grade), I can search the dictionary for the key (e.g. the student's name):

In [20]:
student_dict['Alice']

92

I can add new elements to the dictionary by key:

In [21]:
student_dict['Emma'] = 79
student_dict

{'Alice': 92, 'Ben': 74, 'Carlos': 83, 'Emma': 79}

Be careful when using this, however. Duplicate keys aren't allowed in a `dict`; using the code above for an existing key will replace its value:

In [22]:
student_dict['Alice'] = 72
student_dict

{'Alice': 72, 'Ben': 74, 'Carlos': 83, 'Emma': 79}

There are a couple of techniques I can use to list the keys in a `dict`. Mapping a dictionary to a list will return a list of the keys:

In [23]:
dict_list = list(student_dict)
dict_list

['Alice', 'Ben', 'Carlos', 'Emma']

There's also a method `items()` that will return the `key: value` pairs as an iterable of tuples:

In [24]:
student_dict.items()

dict_items([('Alice', 72), ('Ben', 74), ('Carlos', 83), ('Emma', 79)])

We can use this to print out the `key: value` pairs in a loop:

In [27]:
for name, grade in student_dict.items():
    print('{0} got a {1} on the last assignment'.format(name, grade))

Alice got a 72 on the last assignment
Ben got a 74 on the last assignment
Carlos got a 83 on the last assignment
Emma got a 79 on the last assignment


If we want to delete an existing `key: value` pair from a dictionary, we can use the `del` keyword:

In [29]:
del student_dict['Emma']
student_dict

{'Alice': 72, 'Ben': 74, 'Carlos': 83}

When generating `student_dict`, I wrote exactly what gets printed to screen. However, there are some other ways to generate dictionaries that will produce the same result. I can use the `dict()` function and input a list of tuples representing the `key: value` pairs:

In [30]:
student_dict2 = dict([('Alice', 92), ('Ben', 74), ('Carlos', 83)])
student_dict2

{'Alice': 92, 'Ben': 74, 'Carlos': 83}

We can also use `dict()` by inputting a comma-separated list of variable definitions. The variables will then be interpreted as string keys when the dictionary is created:

In [31]:
student_dict3 = dict(Alice=92, Ben=74, Carlos=83)
student_dict3

{'Alice': 92, 'Ben': 74, 'Carlos': 83}

We can even use a method similar to list comprehension to automatically generate dictionaries. For example, let's say I wanted to create a dictionary where the square roots of integers are keyed by the integers themselves. I can write:

In [28]:
from math import sqrt

sqrt_dict = {x: sqrt(x) for x in range(11)}
sqrt_dict

{0: 0.0,
 1: 1.0,
 2: 1.4142135623730951,
 3: 1.7320508075688772,
 4: 2.0,
 5: 2.23606797749979,
 6: 2.449489742783178,
 7: 2.6457513110645907,
 8: 2.8284271247461903,
 9: 3.0,
 10: 3.1622776601683795}

Using dictionaries is invaluable in scientific programming, such as when you need to keep track of multiple parameter values during a simulation. Deciding between using tuples, lists, and dictionaries is another of many skill you'll develop as you become more familiar with Python.

## The `numpy` Library

There are even more types for storing multiple data elements that we can retrieve from libraries. One of these libraries is `numpy`, used throughout scientific programming for doing numerical linear algebra and advanced mathematics.

This library, along with many others, may not come prepackaged with all Python installations. To install packages, we can use `pip`, the default library installer for Python. We can install `numpy` with the cell below. We use an operator `!` at the start, which in Jupyter Notebook treats the statement as if we were running it on a Unix command line. If `numpy` is already installed, we'll get a line that says `Requirement already satisfied: numpy in <path>`.

In [32]:
!pip install numpy



Now that we're sure that `numpy` is installed, we can import it like any other library:

In [33]:
import numpy as np

### The `ndarray` Type

One of the main features of `numpy` is the introduction of a new data type: the *array*. The simplest way of creating an array is by using the function `numpy.array()` to convert a list to this new data type:

In [41]:
vec = np.array([1, 2, 3])
vec

array([1, 2, 3])

We can see that, when printing out `a`, we get the same list back, albeit encased in this new `array()` type. We can use `type` to see this further.

In [42]:
type(vec)

numpy.ndarray

The internal type for `numpy` arrays is `numpy.ndarray`, which stands for *n-dimensional array*. This name implies an important difference over Python lists and dictionaries: we can generate multidimensional data structures. For example, all of the data types we've seen so far have been either zero-dimensional (i.e. single values like strings and numbers) or one-dimensional (i.e. lists, tuples, dictionaries). In `numpy`, we can generate a two-dimensional array using `numpy.array()`:

In [45]:
mat = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
mat

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

We can see that this is essentially what we'd consider a matrix. The syntax for generating this array is a list of lists, with each list representing a row in the matrix. The function requires that all of the nested lists are the same *shape*:

In [47]:
mat2 = np.array([[1, 2, 3, 4], [4, 5], [6, 7, 8, 9]])
mat2

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

Shape is a generalized term for the length of a list or 1D array. We can use `numpy.shape()` to look at the shapes of our arrays `vec` and `mat` from above:

In [48]:
np.shape(vec)

(3,)

In [49]:
np.shape(mat)

(3, 4)

In the case of `vec`, there's only one dimension to consider, called an *axis*. Since this axis has three elements, the array has a shape of `(3,)`. In the case of `mat`, there are two axes to consider. The first axis counts the number of rows in the array, while the second axis counts the number of columns. So, the 3 by 4 matrix above has a shape of `(3, 4)`, just as you'd expect from its matrix dimensions.

Let's look at a three-dimensional array:

In [53]:
tensor = np.array([[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]], 
                  [[21, 22, 23, 24], [25, 26, 27, 28], [29, 30, 31, 32]]])
tensor

array([[[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]],

       [[21, 22, 23, 24],
        [25, 26, 27, 28],
        [29, 30, 31, 32]]])

This new object `tensor` can now be thought of as two 3x4 matrices stacked on top of each other. (In number theory, this is known as a *rank 3 tensor*, a generalization of vectors and matrices. On this note, vectors are rank 1 tensors and matrices are rank 2 tensors.) What do you think its shape will be?

In [54]:
np.shape(tensor)

(2, 3, 4)

You can think of the axes as "counting down" from the highest order. In this case, the shape of a 3-dimensional array will first measure how many 2D arrays there are, then how many 1D arrays there are per 2D array, then how many 0D arrays (aka elements) there are per 1D array.

There are two other attributes for arrays that can tell you about their properties. The attribute `ndim` returns the number of dimensions in an array:

In [77]:
tensor.ndim

3

In [78]:
mat.ndim

2

The attribute `size` returns how many elements there are in total in the array. This should be equivalent to the product of the axis shapes (i.e. `tensor.size` should equal `2*3*4 = 24`, which we can check below):

In [79]:
tensor.size

24

In [80]:
mat.size

12

If we want to call elements from arrays, we can use a syntax similar to Python lists. For simplicity, we'll look at the matrix `mat`, which I'll copy down below for reference:

In [85]:
mat

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

If I want to read off a single element, I need to specify its position in all axes. Keep in mind that indexing starts from zero, just like with all Python objects:

In [86]:
mat[2, 3]

12

In [87]:
mat[0, 2]

3

If I want to call an entire subarray, I can simply use one index indicating its position along the axis:

In [88]:
mat[1]

array([5, 6, 7, 8])

You can also use conditional statements to print off only certain values. For example, say I wanted only the even elements:

In [90]:
mat[mat%2 == 0]

array([ 2,  4,  6,  8, 10, 12])

Slicing also works similarly to lists, as long as you do so over the correct axes:

In [94]:
mat[1:2]

array([[5, 6, 7, 8]])

In [100]:
mat[1:2, :3]

array([[5, 6, 7]])

In [102]:
mat[:, 1:2]

array([[ 2],
       [ 6],
       [10]])

### Generating Arrays

There are several functions in `numpy` that can be used to generate arrays automatically. We've already looked at `numpy.array()`, the "by hand" method of generating arrays using nested lists. We can also generate arrays full of zeroes or ones using the functions `numpy.zeros()` and `numpy.ones()`:

In [60]:
zero_mat = np.zeros(4)
zero_mat

array([0., 0., 0., 0.])

In [61]:
one_mat = np.ones((4, 3))
one_mat

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

From the examples above, each array generation function can take two types of inputs. Inputting an integer returns a 1D array with that number of elements. Inputting a tuple of n integers returns an n-dimensional array with the shape implied by the tuple.

There's also a function `numpy.empty()` used for creating an "empty" array. Why is it "empty"? Let's see below:

In [62]:
empty_mat = np.empty(4)
empty_mat

array([0., 0., 0., 0.])

It's not actually an empty array; `numpy` requires there to be some sort of placeholders to retain the shape of the matrix. We use `empty()` to generate arrays that are more memory-efficient than a `zeros()` or `ones()` array, which can become important when considering arrays with tens or thousands of elements. 

There are some functions that allow you to construct special matrices that are useful in linear algebra. One of these is `numpy.eye()`, which generates the *identity matrix* (i.e. a matrix whose only non-zero elements are ones along the diagonal) with the specified number of rows and columns.

In [81]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

This is a special case of a *diagonal matrix*, which also only has non-zero diagonal elements except they can have any value. We can generate a matrix like this using `numpy.diag()`:

In [82]:
np.diag([1, 2, 3])

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

The first argument (i.e. the only argument above) gives the diagonal elements, and the function automatically generates a matrix with a shape that fits the diagonal. We can offset the diagonal with an optional second integer argument:

In [83]:
np.diag([1, 2, 3], 1)

array([[0, 1, 0, 0],
       [0, 0, 2, 0],
       [0, 0, 0, 3],
       [0, 0, 0, 0]])

In [84]:
np.diag([1, 2, 3], -2)

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 3, 0, 0]])

There are even some functions that serve similar purposes to those we've seen in base Python. One of these is `numpy.arange()`; from the name you can probably guess that this generates an array of evenly spaced integers similar to how we can use `list(range())`. The syntax is exactly the same as that you've probably become familiar with for `range()`:

In [64]:
rng = np.arange(11)
rng

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [65]:
rng2 = np.arange(2, 9, 2)
rng2

array([2, 4, 6, 8])

One flaw of `range()` and `arange()` is that they only work with integers. This can be resolved with `numpy.linspace()`, which generates an array of evenly spaced `float`s. The syntax is slightly different from `range()` and `arange()`:

In [68]:
lin_grid = np.linspace(0., 10., 5)
lin_grid

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

The arguments are `linspace(start, stop, num)`, where `stop` is now included in the range and `num` represents the number of elements in the resultant array.

For completeness, let me highlight one more point: the values in these arrays aren't of the type `float` or `int` as you'd expect. We can see this by selecting one of the elements from `lin_grid` and `rng`:

In [69]:
type(lin_grid[1])

numpy.float64

In [70]:
type(rng[2])

numpy.int64

`numpy` has its own data types for regular numbers: `numpy.float64` for floating-point values and `numpy.int64` for integers. These behave exactly as you'd expect regular numbers to behave, except they're better suited for handling floating-point errors. Let's recreate an example we did in an earlier lecture, where we added 0.1 ten times:

In [74]:
test = np.array([0.1 for i in range(10)])
test

array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1])

There's a method `sum()` for arrays that adds up its elements:

In [75]:
test.sum()

1.0

As you can see, we don't get the same floating-point errors we got from base Python. When using `arange()` or `linspace()`, we can force the elements to be of a certain type by using the `dtype` keyword argument:

In [76]:
np.linspace(0, 10, 5, dtype=np.int64)

array([ 0,  2,  5,  7, 10])

There are many more ways to generate arrays for more specialized purposes. I encourage you to refer to the official `numpy` documentation at https://numpy.org/doc/stable/user/index.html#user if you're curious. We'll get into manipulating `numpy` arrays with transformations and arithmetic in the next lecture.