# Lab 2: Revision (2)

In today's lab we'll revise some more Python fundamentals as well as learn how to plot simple diagrams.

## Importing from modules

Python allows programmers to bundle up collections of variables, functions, and other objects such as classes into *modules* that can easily be imported into other projects. One reason Python is such an attractive programming language is the enormous library of modules to help with almost any imaginable type of calculation.

There are a few variations on the syntax to import part or all of a module. These make more sense when discussing a particular example, so let's take the case of the square root function `sqrt()`. As we saw last lab, this isn't included by default when Python is loaded, but is available in the `math` module.

* `import math` will import the whole `math` module. The `sqrt()` function will be available by typing `math.sqrt()`.
* `import math as m` will do the same, but calling the module `m` instead: *i.e.*, you should type `m.sqrt()`. This is convenient if you will use many parts of the module and/or it has a long name.
* `from math import sqrt` will import *only* the `sqrt()` function. It will be available without any prefix (*i.e.*, you should just type `sqrt()`). We say that this function has been imported into the *main namespace*.

You can type `from math import *` to import *all* of the parts of a module into the main namespace. However, this isn't recommended usage, because it can have unintended consequences. To see what can happen, **evaluate the following code.** Can you explain the output?

In [37]:
d = 2
e = 3
from math import *
d*e

5.43656365691809

Now go to the `Kernel` menu at the top of the page and click `Restart` to clear the effects of that `import`!

## Visualisation

Modern computers have made visualisation so easy that it should be the first step of almost any data analysis: it is far easier to see what is going on from a picture than from, say, a table of numbers.

We will use the `matplotlib` package to make pictures, which plays very nicely with Jupyter notebooks. In fact in this lab we will specifically be using the `pylab` module: this is designed to behave similarly to MATLAB plotting, and contains all the functionality we will need for now.

Let's start by evaluating a "magic" command that allows plots to be written directly to our notebook:

In [79]:
%matplotlib notebook

Note that we only need to do this once: then plots will be displayed “inline” for the rest of the session.

Next, we need to import the `plot` and `figure` functions:

In [39]:
from pylab import figure, plot

Now we just need some data to plot! We will generate this at the same time as revising `while` loops by investigating the [*Collatz conjecture*](https://en.wikipedia.org/wiki/Collatz_conjecture). This involves a simple rule for generating a sequence of integers:

- If $n_0$ is even, halve it: $n_1 = n_0/2$.
- If $n_0$ is odd, multiply by three and add one: $n_1 = 3n_0 + 1$.

We generate $n_2$ from $n_1$ in the same way, and so on. The conjecture is that no matter what the starting value, the sequence will eventually reach 1. This has never been proven true, nor has any counterexample been found.

**Complete the following code for a function to perform one Collatz step.**

In [1]:
def collatz_step(n):
    """Returns the next step in the Collatz sequence, starting from n."""
    if n % 2 == 0: # the % operator on integers gives the remainder after division.
        return n // 2 # // performs *integer* division. / would give a float answer in Python 3.
    else : # ... finish the code here
        n == 1
        return (3 * n) + 1
print(collatz_step(5))

16


In [41]:
def collatz_step(n):
    """Returns the next step in the Collatz sequence, starting from n."""
    if n % 2 == 0: # the % operator on integers gives the remainder after division.
        return n // 2 # // performs *integer* division. / would give a float answer in Python 3.
    elif (n - 1) % 4 == 0: 
        return (3 * n) + 1
    else:
        (n + 1) % 4 == 0
        return (3 * n) - 1
print(collatz_step(7))
collatz_step(5)

20


16

Now we can use a `while` loop to see a sample sequence. **Write a `while` loop that will set `n` to the next value given by `collatz_step` until it reaches 1.**

In [42]:
n = 12
while n > 1:
    n = collatz_step(n)
    print(n)
    # ... Put a while loop here

6
3
8
4
2
1


You can see that, because the numbers can either increase or decrease, it's not obvious in advance how many steps will be taken to reach 1.

**Experiment with starting with some different values.**

**Find the value of $n \leq 1000$ that takes the greatest number of steps to reach 1.**


In [56]:
list=[]
for i in range(1000):
    size = []
    n = i + 1
    while n > 1:
        n = collatz_step(n)
        size.append(n)
    print(i + 1, len(size))
    r=len(size)
    list.append(r)
max_value = 0
for n in list:
    if n > max_value: 
        max_value = n
print(max_value)

1 0
2 1
3 7
4 2
5 5
6 8
7 16
8 3
9 19
10 6
11 14
12 9
13 9
14 17
15 17
16 4
17 12
18 20
19 20
20 7
21 7
22 15
23 15
24 10
25 23
26 10
27 111
28 18
29 18
30 18
31 106
32 5
33 26
34 13
35 13
36 21
37 21
38 21
39 34
40 8
41 109
42 8
43 29
44 16
45 16
46 16
47 104
48 11
49 24
50 24
51 24
52 11
53 11
54 112
55 112
56 19
57 32
58 19
59 32
60 19
61 19
62 107
63 107
64 6
65 27
66 27
67 27
68 14
69 14
70 14
71 102
72 22
73 115
74 22
75 14
76 22
77 22
78 35
79 35
80 9
81 22
82 110
83 110
84 9
85 9
86 30
87 30
88 17
89 30
90 17
91 92
92 17
93 17
94 105
95 105
96 12
97 118
98 25
99 25
100 25
101 25
102 25
103 87
104 12
105 38
106 12
107 100
108 113
109 113
110 113
111 69
112 20
113 12
114 33
115 33
116 20
117 20
118 33
119 33
120 20
121 95
122 20
123 46
124 108
125 108
126 108
127 46
128 7
129 121
130 28
131 28
132 28
133 28
134 28
135 41
136 15
137 90
138 15
139 41
140 15
141 15
142 103
143 103
144 23
145 116
146 116
147 116
148 23
149 23
150 15
151 15
152 23
153 36
154 23
155 85
156 36
157 36
15

Before discussing a checkpoint with a demonstrator please do the following:
- ensure the code is working properly, especially the boldface type questions
- ensure you understand how the code you have written works (e.g., the structure, data types, algorithm....etc).  Experiment with editing and running code to build intuition about how it functions. 

When you are ready call over a demonstrator to do your checkpoint.

▶ **CHECKPOINT 1**

This sequence would be easier to understand if we could visualise its trajectory. Let’s record each number that the sequence passes through on its way to 1.

**Modify your `while` loop to create a list of every number in the Collatz sequence, starting from the value of $n$ you determined above.**

*Hint:* Start by creating a list `collatz_history = [n]`, where `n` is the starting number; then use the `append` function to add an item to this list for every cycle through the loop.

In [57]:
n = 871
collatz_history = [n]
while n > 1:
    n = collatz_step(n)
    collatz_history.append(n)

Let’s examine that list:

In [58]:
print(collatz_history)

[871, 2614, 1307, 3922, 1961, 5884, 2942, 1471, 4414, 2207, 6622, 3311, 9934, 4967, 14902, 7451, 22354, 11177, 33532, 16766, 8383, 25150, 12575, 37726, 18863, 56590, 28295, 84886, 42443, 127330, 63665, 190996, 95498, 47749, 143248, 71624, 35812, 17906, 8953, 26860, 13430, 6715, 20146, 10073, 30220, 15110, 7555, 22666, 11333, 34000, 17000, 8500, 4250, 2125, 6376, 3188, 1594, 797, 2392, 1196, 598, 299, 898, 449, 1348, 674, 337, 1012, 506, 253, 760, 380, 190, 95, 286, 143, 430, 215, 646, 323, 970, 485, 1456, 728, 364, 182, 91, 274, 137, 412, 206, 103, 310, 155, 466, 233, 700, 350, 175, 526, 263, 790, 395, 1186, 593, 1780, 890, 445, 1336, 668, 334, 167, 502, 251, 754, 377, 1132, 566, 283, 850, 425, 1276, 638, 319, 958, 479, 1438, 719, 2158, 1079, 3238, 1619, 4858, 2429, 7288, 3644, 1822, 911, 2734, 1367, 4102, 2051, 6154, 3077, 9232, 4616, 2308, 1154, 577, 1732, 866, 433, 1300, 650, 325, 976, 488, 244, 122, 61, 184, 92, 46, 23, 70, 35, 106, 53, 160, 80, 40, 20, 10, 5, 16, 8, 4, 2, 1]


Impressive, but it would be much easier to understand if we plotted it...

In [83]:
from matplotlib.pyplot import plot, figure
figure()
plot(collatz_history) 

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x461872a048>]

That gives a much better picture of what is going on: you can see how the value gradually rises until the sequence hits a number that is a multiple of some high power of 2, when the "halving" step makes it decrease rapidly. You can **try zooming in to the plot using the "rectangle" tool and returning to the original view using the "home" button. When you are finished, press the "power" button in the top right of the figure (next to the title)** to indicate that you no longer wish to change the figure. 

(You don’t actually even need the `figure` function above; the `plot` function will make a new figure if none is open, or plot to the most recent figure if it is still open for input: that is, if you haven’t yet pressed that “power” button. Because this can be confusing if the most recent figure is a long way back in the notebook, I recommend always explicitly creating a new figure with `figure` if you want one.)

You can see that the `plot` function called with just a list as argument plots that list against the sequence number (that is, the first element is plotted at $x = 0$, the second at $x = 1$, and so forth).

The `plot` function can also take a string argument that specifies how to format the plot. Some codes you can use are listed in the table below (full list in the [documentation](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html)); you can specify the colour, marker style, line style, or any combination of these.

 character | meaning 
--------| ------
`'b'` |	blue
`'g'` |	green
`'r'` |	red
`'k'` |	black
`'.'` |	point marker
`'o'` |	circle marker
`'s'` |	square marker
`'*'` |	star marker
`'+'` |	plus marker
`'x'` |	x marker
`'-'` |	solid line style
`'--'`| 	dashed line style
`'-.'`| 	dash-dot line style
`':'` |	dotted line style

Try using this to plot the Collatz sequence starting from 12, using red circles joined by a dashed line. The syntax you want is `plot(list, format_string)`.

In [82]:
n = 871
collatz_history = [n]
while n > 1:
    collatz_history.append(n)
    n = collatz_step(n)
print(collatz_history)
plot(collatz_history[12:],'r--') 

[871, 871, 2614, 1307, 3922, 1961, 5884, 2942, 1471, 4414, 2207, 6622, 3311, 9934, 4967, 14902, 7451, 22354, 11177, 33532, 16766, 8383, 25150, 12575, 37726, 18863, 56590, 28295, 84886, 42443, 127330, 63665, 190996, 95498, 47749, 143248, 71624, 35812, 17906, 8953, 26860, 13430, 6715, 20146, 10073, 30220, 15110, 7555, 22666, 11333, 34000, 17000, 8500, 4250, 2125, 6376, 3188, 1594, 797, 2392, 1196, 598, 299, 898, 449, 1348, 674, 337, 1012, 506, 253, 760, 380, 190, 95, 286, 143, 430, 215, 646, 323, 970, 485, 1456, 728, 364, 182, 91, 274, 137, 412, 206, 103, 310, 155, 466, 233, 700, 350, 175, 526, 263, 790, 395, 1186, 593, 1780, 890, 445, 1336, 668, 334, 167, 502, 251, 754, 377, 1132, 566, 283, 850, 425, 1276, 638, 319, 958, 479, 1438, 719, 2158, 1079, 3238, 1619, 4858, 2429, 7288, 3644, 1822, 911, 2734, 1367, 4102, 2051, 6154, 3077, 9232, 4616, 2308, 1154, 577, 1732, 866, 433, 1300, 650, 325, 976, 488, 244, 122, 61, 184, 92, 46, 23, 70, 35, 106, 53, 160, 80, 40, 20, 10, 5, 16, 8, 4, 2]


<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x4618438f60>]

Let's now consider how to plot a mathematical function. By far the easiest way to do this involves a new data type, provided by the `numpy` package, called an `array`. We will actually generate the array using the `linspace()` function, and the function we want to plot (say) is $\cos(x)$, or `cos()`. For convenience, all of these functions are available in the `pylab` module.  These can be imported into the namespace using the following.

In [84]:
from pylab import linspace, pi, cos

Suppose we want to plot $\cos x$ from $-2\pi$ to $2\pi$. The first step is to generate a set of $x$ values in this range, which is the job of `linspace`. We can get 50 evenly spaced $x$ points by:

In [85]:
x = linspace(-2*pi, 2*pi, 50)

If you evaluate a cell containing simply `x`, you will see what type `x` is and what it contains.  In this case you see that `x` is an `array`: 

In [86]:
x

array([-6.28318531, -6.02672876, -5.77027222, -5.51381568, -5.25735913,
       -5.00090259, -4.74444605, -4.48798951, -4.23153296, -3.97507642,
       -3.71861988, -3.46216333, -3.20570679, -2.94925025, -2.6927937 ,
       -2.43633716, -2.17988062, -1.92342407, -1.66696753, -1.41051099,
       -1.15405444, -0.8975979 , -0.64114136, -0.38468481, -0.12822827,
        0.12822827,  0.38468481,  0.64114136,  0.8975979 ,  1.15405444,
        1.41051099,  1.66696753,  1.92342407,  2.17988062,  2.43633716,
        2.6927937 ,  2.94925025,  3.20570679,  3.46216333,  3.71861988,
        3.97507642,  4.23153296,  4.48798951,  4.74444605,  5.00090259,
        5.25735913,  5.51381568,  5.77027222,  6.02672876,  6.28318531])

One key feature of this data type is that *we can apply mathematical functions to every element at once.* So to calculate our `y` values, we simply type:

In [87]:
y = cos(x)

Then plotting is as simple as

In [88]:
figure()
plot(x,y)

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x4618a09a20>]

**Using this functionality, plot $y = \tanh(x)$ from –5 to 5. Put gridlines on your graph.** 

*Hint:* you can import almost any mathematical function you can think of, including `tanh()`, from `pylab`, and the `grid()` function from the same module will apply gridlines.

In [89]:
from pylab import grid, tanh
x = linspace(-5, 5)
y = tanh(x)
figure()
grid()
plot(x,y)

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x4618cc2518>]

The `numpy` package and `array` type are very useful in scientific analysis and we will repeatedly return to them during this module.

▶ **CHECKPOINT 2**

We now return to further features of the Python standard library.

## Data types, continued

### Dicts

A `dict` (dictionary) is a set of *keys*, each of which has a *value*. Syntatically, these are written using curly brackets `{}`. The keys can be used to index the dict in the same way that numbers are used to index lists and tuples. This can be useful if we want to store information associated with, say, a particular name or ID number:

In [54]:
bank_account = {'Anthony': -0.42, 'Basil': 109287.29}
print(bank_account['Anthony'])

-0.42


Note that dicts do *not* have a guaranteed order, and cannot therefore be indexed by number (unless of course we have specifically defined that number as a key):

In [55]:
bank_account[0]

KeyError: 0

If we try to loop through a dict in the same way as for a list, our variable runs through the *keys* of the dict:

In [56]:
day_length = { # planetary day length in hours
"Mercury": 1408, 
"Venus": 5832, 
"Earth": 24, 
"Mars": 25, 
"Jupiter": 10, 
"Saturn": 11, 
"Uranus": 17, 
"Neptune": 16 
}

for planet in day_length:
    print("The day on", planet, "is", day_length[planet], "Earth hours long.")

The day on Mercury is 1408 Earth hours long.
The day on Venus is 5832 Earth hours long.
The day on Earth is 24 Earth hours long.
The day on Mars is 25 Earth hours long.
The day on Jupiter is 10 Earth hours long.
The day on Saturn is 11 Earth hours long.
The day on Uranus is 17 Earth hours long.
The day on Neptune is 16 Earth hours long.


You can add new keys to a dict simply by referring to them as if they already exist:

In [57]:
day_length['Pluto'] = 153 # Maintain the rage!

To see if a key exists in a dictionary you may use the following

In [58]:
if "Haumea" in day_length:
    print("Yes 'Haumea' key exists in dict")
else:
    print("No 'Haumea' key does not exists in dict")

No 'Haumea' key does not exists in dict


Note this is a logical test with outcome 'True' or 'False' as shown below.

In [59]:
print("Haumea" in day_length)
print("Jupiter" in day_length)

False
True


Comprehensions work in a similar way: for instance, we could create a (rather trivial) dict of even and odd numbers using the following dict comprehension:

In [60]:
even = {i: i % 2 == 0 for i in range(10)} # the % operator gives the remainder after division by an integer
print(even)

{0: True, 1: False, 2: True, 3: False, 4: True, 5: False, 6: True, 7: False, 8: True, 9: False}


**Write a function to create a `dict` whose keys are the characters that occur in a string, and whose values are the number of times those characters occur.**

*Hint*: Using a `for` loop to loop through a string will set the loop variable to each character in turn. That is, if I start a loop by `for ch in "Hello":`, then the first time through the loop `ch` will be `'H'`, the second time `'e'`, and so forth.

*Example input*: `count_characters("The rain in Spain falls mainly on the plain.")` should return

    {' ': 8,
     '.': 1,
     'S': 1,
     'T': 1,
     'a': 5,
     'e': 2,
     'f': 1,
     'h': 2,
     'i': 5,
     'l': 4,
     'm': 1,
     'n': 6,
     'o': 1,
     'p': 2,
     'r': 1,
     's': 1,
     't': 1,
     'y': 1}

In [72]:
characters = []
size = {}
def count_characters(statement):
    for i in statement:
        characters.append(i)
    for i in characters:
        if i in size:
            size[i] += 1
        else:
            size[i] = 1
    return
count_characters("The rain in Spain falls mainly on the plain.")
print(size)

{'T': 1, 'h': 2, 'e': 2, ' ': 8, 'r': 1, 'a': 5, 'i': 5, 'n': 6, 'S': 1, 'p': 2, 'f': 1, 'l': 4, 's': 1, 'm': 1, 'y': 1, 'o': 1, 't': 1, '.': 1}


### Sets

A `set` can only have one copy of each element, and again doesn't have any order. Sets are also made using curly brackets `{}`, but without any keys:

In [73]:
favourite_letters = {'a', 'b', 'a', 'a', 'c'}
print(favourite_letters)

{'c', 'a', 'b'}


One reason to use them where appropriate is that sets are quicker to search through than lists:

In [74]:
my_list = ['a', 'b', 'a', 'a', 'c']
my_set = set(my_list) # another way of making a set

%timeit 'c' in my_list
%timeit 'c' in my_set

102 ns ± 0.862 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
42.3 ns ± 0.858 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


You should find that it takes less time to search for `'c'` in the set than in the list; this time difference becomes more significant when the size of the set grows.

## Format strings

This final section is really a matter of aesthetics – but it's nice to be able to make output look well-laid out and professional!

The `format()` function is a string *method* – a function that applies only to strings. It replaces curly brackets `{}` in the string with the values of its arguments. Consider the following simple example:

In [75]:
"{} is a moon of {}".format("Io", "Jupiter")

'Io is a moon of Jupiter'

You can include information in the curly brackets to specify exactly how a value should be converted to a string. This information always comes after a colon `:` and ends with a letter specifying the sort of number that should be printed. Common examples include 

Letter | Meaning
------ | -------
   `f` | floating-point number, usual notation
   `e` | floating-point number, scientific notation
   `d` | integer, decimal
   `s` | string

In between, we can specify the number of spaces that should be left to print the number and, where relevant, the number of decimal places that should be used:

In [76]:
from numpy import pi

print("pi  = {}".format(pi))         # no special format information given
print("pi  = {:10.5f}".format(pi))   # float, 10 spaces, 5 decimal places
print("pi  = {:8.2f}".format(pi))    # float, 8 spaces, 2 decimal places
print("pi  = {:8.2e}".format(pi))    # float, scientific notation, 8 spaces, 2 decimal places
print("pi  > {:8d}".format(3))       # decimal integer, 8 spaces
print("pie = {:8s}".format('apple')) # string, 8 spaces (by default left-aligned)

pi  = 3.141592653589793
pi  =    3.14159
pi  =     3.14
pi  = 3.14e+00
pi  >        3
pie = apple   


Put this all together by using an appropriate loop together with the `format` function to **print the following data set**, using the `format` function to ensure that the data lines up attractively.

In [166]:
element_densities = { # densities in g.cm^-3
    'Li': 0.534,
    'Be': 1.8477,
    'B' : 2.34,
    'C' : 2.267, 
    'Na': 0.968,
    'Mg': 1.738,
    'Al': 2.70
}   
for element in element_densities:
    print("The elements symbol is {} and the mass per centimetre cubed is {}".format(element,element_densities[element]))

The elements symbol is Li and the mass per centimetre cubed is 0.534
The elements symbol is Be and the mass per centimetre cubed is 1.8477
The elements symbol is B and the mass per centimetre cubed is 2.34
The elements symbol is C and the mass per centimetre cubed is 2.267
The elements symbol is Na and the mass per centimetre cubed is 0.968
The elements symbol is Mg and the mass per centimetre cubed is 1.738
The elements symbol is Al and the mass per centimetre cubed is 2.7


In [100]:
geopoint = {'latitude':41.123,'longitude':71.091}
print('{latitude} {longitude}'.format(**geopoint))

41.123 71.091


▶ **CHECKPOINT 3**

**Extension:** return to your Collatz conjecture code and **find the value of $n \leq 1000$ that reaches the *highest number*** before returning to 1. What is this highest number?

In [None]:
703