Linear algebra in Python

dot product, norm



## Iterables

Fundamental in data processing

## The Python `list`

`[]`

We can use the Python bulit-in function `range` to make a list of numbers:

In [1]:
import sys

x = list(range(100000))

x[:10]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

We can see how many items are in our list using the Python builtin `len`:

In [33]:
len(x)

100000

We can see if an item is in our list:

In [34]:
420 in x

True

In [2]:
sys.getsizeof(x)

900112

The Python list can hold objects of different types
- this gives the programmer flexibility
- cost of memory used by a list

Lists reserve space for more objects than they have

Another common Python iterable is the numpy array, which will use less memory:
- objects being the same size means you can lay out memory more efficiently

https://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference

In [3]:
import numpy as np

sys.getsizeof(np.array(x))

800096

Numpy arrays are also quicker due the operations being written in C:
- numpy = C with Python bindings

### Iterating over a list

In [4]:
x = [2, 4, 6]

for item in x:
    print(item)

2
4
6


## List comprehensions

The `_` is a convention to indicate we want to ignore the variable

In [5]:
_ = [print(item) for item in x]

2
4
6


The list comprehension will return a new list:
- we can make new lists from lists

In [6]:
double = [item*2 for item in x]

double

[4, 8, 12]

## Iteration turns integral into a sum

In [7]:
from scipy.integrate import quad

def integrand(x):
    return x**2

ans, err = quad(integrand, 0, 1)
ans

0.33333333333333337

In [8]:
import numpy as np

x = np.linspace(0, 1, 1000000)

step = x[1] - x[0]

f = [integrand(v) for v in x]

area = sum([step*v for v in f])
area

0.3333338333339938

## Common patterns with looping

Appending to an empty list:

In [9]:
from random import gauss

data = []
for _ in range(5):
    data.append(gauss(0, 1))

data

[-0.9063632069079919,
 -0.6395199980692353,
 0.3156168957571385,
 -0.37433938900291447,
 0.24873817435934317]

Appending dicts to lists and making a pandas `DataFrame`:

In [10]:
from random import random as uniform

import pandas as pd

data = []
for _ in range(5):
    data.append(
        {'standard-normal': gauss(0, 1),
         'uniform': uniform()}
    )

pd.DataFrame(data)

Unnamed: 0,standard-normal,uniform
0,1.49307,0.453665
1,0.353291,0.056607
2,1.502091,0.020804
3,-0.395325,0.715927
4,2.427868,0.846765


Looping over two things at the same time:

In [11]:
f = list(range(0, 6))
s = list(range(6, 12))

for first, second in zip(f, s):
    print(first, second)

0 6
1 7
2 8
3 9
4 10
5 11


Enumerate gives us an integer index as we enumerate:

In [12]:
x = list(range(100, 105))

for idx, item in enumerate(x):
    print(idx, item)

0 100
1 101
2 102
3 103
4 104


We can also start the index at a value other than zero:

In [13]:
for idx, item in enumerate(x, 2):
    print(idx, item)

2 100
3 101
4 102
5 103
6 104


## Exercise

Create a **Cartesian product** - all the combinations between two lists:

In [14]:
colors = ['white', 'black']
sizes = ['small', 'medium', 'large']

## Indexing

Python uses **zero-based indexing**.

Index the first element at `0`:

In [15]:
x = list(range(100000))

x[0]

0

And the last at `-1`:

In [16]:
x[-1]

99999

## Slicing

We can select slices using similar notation:

In [17]:
x[4:8]

[4, 5, 6, 7]

## Strings are iterables

We can slice them:

In [18]:
gita = 'The ignorant work for their own profit, Arjuna: the wise work for the welfare of the world, without thought for themselves - KRISHNA'

gita[:38]

'The ignorant work for their own profit'

We can also add them together - below I use a 

In [19]:
bohr = 'Prediction is very difficult, especially if it is about the future - NEILS BOHR'

quotes = gita + ', ' + bohr
quotes

'The ignorant work for their own profit, Arjuna: the wise work for the welfare of the world, without thought for themselves - KRISHNA, Prediction is very difficult, especially if it is about the future - NEILS BOHR'

The above is a csv (comma separated value) string:

In [20]:
import csv

list(csv.reader([quotes]))

[['The ignorant work for their own profit',
  ' Arjuna: the wise work for the welfare of the world',
  ' without thought for themselves - KRISHNA',
  ' Prediction is very difficult',
  ' especially if it is about the future - NEILS BOHR']]

Above we can see a problem - we have commas in the quotes.  

## Writing data to files

First - see [using an API]()
- assuming knowledge of context management

We can use Python's `open` bulit-in to write to a file:

In [21]:
quotes = [bohr, gita]
with open('./quotes.txt', 'w') as dump:
    for line in quotes:
        dump.write(line)
        dump.write('\n')

Run bash commands to print file (`cat`) and then remove it (`rm`):

In [22]:
!cat quotes.txt
!rm quotes.txt

Prediction is very difficult, especially if it is about the future - NEILS BOHR
The ignorant work for their own profit, Arjuna: the wise work for the welfare of the world, without thought for themselves - KRISHNA


## Using a better data structure - the `namedtuple`

Above we had a quote as a string, which had both the quote text & author in a single string.

We could separate out this by splitting the string.  Note that this returns an iterable:

In [23]:
gita.split(' - ')

['The ignorant work for their own profit, Arjuna: the wise work for the welfare of the world, without thought for themselves',
 'KRISHNA']

Let's instead try a different data structure -  a Python `tuple`:

In [24]:
lewis = (
    'Friendship is born at the moment when one person says to another: ‘What? You too? I thought I was the only one.',
    'C. S. Lewis'
)

turing = (
    'In order to be a perfect and beautiful computing machine, it is not requisite to know what arithmetic is',
    'Alan Turing'
)

These tuples are acting as records - we can index them:

In [25]:
turing[0]

'In order to be a perfect and beautiful computing machine, it is not requisite to know what arithmetic is'

In [26]:
turing[1]

'Alan Turing'

The `namedtuple` offers the ability to name the fields:

In [27]:
from collections import namedtuple

Quote = namedtuple('Quote', ['text', 'author'])

Below we use the `*` notation to **explode** the tuple into the namedtuple:

In [31]:
dataset = [
    Quote(*turing),
    Quote(*lewis),
    Quote(*gita.split(' - ')),
    Quote(*bohr.split(' - '))
]

dataset

[Quote(text='In order to be a perfect and beautiful computing machine, it is not requisite to know what arithmetic is', author='Alan Turing'),
 Quote(text='Friendship is born at the moment when one person says to another: ‘What? You too? I thought I was the only one.', author='C. S. Lewis'),
 Quote(text='The ignorant work for their own profit, Arjuna: the wise work for the welfare of the world, without thought for themselves', author='KRISHNA'),
 Quote(text='Prediction is very difficult, especially if it is about the future', author='NEILS BOHR')]

## Exercise - write a JSON file

Let's try another data format - JSON.

```json
{
    {
        'author': 'KRISHNA',
        'text': 'The ignorant work for their own profit, Arjuna: the wise work for the welfare of the world, without thought for themselves'
    }
    ...
}
```

Hints
- create a JSON string from a single namedtuple (use a loop over the namedtuple)
    "{"author": "KRISHNA", ...}"
- create a list of these JSON strings
- dump them to a file

## List algebra

In Python we can do interesting things with list addition & multiplication:

In [1]:
data = [
    0, 1, 0, 1, 1
]

data * 2

[0, 1, 0, 1, 1, 0, 1, 0, 1, 1]

In [2]:
data + data

[0, 1, 0, 1, 1, 0, 1, 0, 1, 1]