# SCI1022 Python workshop 2:  The Collatz conjecture

Welcome to the second workshop of the Python stream! 

Some miscellaneous instructions to begin with:

* Use your two hours in this workshop to work through this Jupyter notebook. 

* Read the instructions as you go. Execute the example code cells provided, and look carefully at the output. 

* Recall that, in order to execute a code cell, then you can left-click on it (either on the interior or the left-hand-side margin of the cell), and then hit `Shift-Enter`, which stands for: "While holding pressed the `Shift` keyboard key, press the keyboard key labelled as `Enter`". 

* If you struggle trying to use the Jupyter notebook, then explore usage instructions and links available at the SCI1022 Python's Moodle page. 

* <span style="color:red">**Tasks**</span> are marked in <span style="color:red">**red**</span> and displayed in indented blocks. The tasks are solved by writing code on the blank code cells provided.

* Ask your instructors for help at any time.

---

## Introducing the Collatz conjecture

While last week we focused on biology - taxonomy in particular, this week we will focus on pure mathematics in order to learn new coding skills. 

But do not worry, we will consider a problem that can be grasped by anyone who understands the basics of whole numbers arithmetics. (From now on, we will use the terms "whole number" and "integer" indistinctly.) Namely, addition, multiplication, and the concepts of quotient and remainder of integer division. Despite this, it is still an unsolved mystery of mathematics.

We are going to see how a very simple function, applied to its result over and over ("mathematical iteration"), can lead to very complicated behaviour. Computers are ideal for exploring such mathematics.

><span style="color:red">**Task 1.**</span> Define a function named `collatz` that takes a number as its unique argument, and:
> * If given an even number, **returns** half the number.
> * If given an odd number, **returns** three times the number plus one.
>
> For simplicity, let us assume, for the time being, that your function will always be given a number of Python type `int` (i.e., a whole number), and that we will not defend from the user providing an argument of a different type, e.g., a `float`, or even something more complex, such as a `list`. Thus, the argument provided will be **always** either even or odd. In other words, it can never be neither. Once you have written the function, test your function on a few integers using the second code cell provided below. 
>
> *Hint*: Python provides the  modulus operator `%`, which divides two numbers, and returns the remainder. For example, `3 % 3` is equal to `0`, and `4 % 3` is equal to `1`.

In [None]:
# Solution to Task 1 goes here

In [None]:
# Tests of your solution to Task 1 go here

><span style="color:red">**Task 2.**</span> Describe in words what happens if `collatz` takes a list of integer numbers, for example, `[1,7,9]`? 

In [None]:
# Write any code that you need to answer Task 2 here

<span style="color:red">**Write your answers to Task 2 in this cell.**</span>

### Integer division - a refresher

><span style="color:red">**Task 3.**</span> What is the data type of the value that your `collatz` function returns when it takes a value of type `int` as an argument? And when it takes as argument a `float` with zero fractional part (for example, `7.0`)? 

In [None]:
# Write any code that you need to answer Task 3 here

<span style="color:red">**Write your answers to Task 3 in this cell.**</span>

It is convenient that `collatz` always returns a value of type `int`, i.e., an integer.  

There are a few ways to do this in Python, for example, using the `int` function.  

However, in the sequel we are going to explore the support in Python for *integer division*.

You probably learnt integer division in school, before you learnt about decimals or fractions.

In Python, the integer division is denoted with the `//` operator. This operator is also known as **floor division** operator, as it can be seen as a variant of the division that divides two numbers and rounds down to an integer.

Execute the following cells to remind yourself about integer quotients and remainders, and see how these can be computed in Python.

In [None]:
numerator   = 7
denominator = 3

quotient  = numerator // denominator
remainder = numerator %  denominator

print(f"Integer division of {numerator} by {denominator} gives quotient={quotient} and remainder={remainder}")

Note that, in the previous cell, we have used some new syntax in the `print` statement, known as `f-strings`. (Note the `f` prefix right before the starting double quotes.) Do not worry about this for now, it does what you think it does!

Let us now look at the data types of `quotient` and `remainder`.

In [None]:
type(quotient)

In [None]:
type(remainder)

><span style="color:red">**Task 4.**</span> Modify your definition of `collatz` so that it returns a value of type `int`. Again, you can assume that your revised `collatz` function will always be called with an argument of type `int`.

In [None]:
# Solution to Task 4 goes here

## Hailstone sequences

The `collatz` function does not seem remarkable. Let us do something more interesting: let us iterate it! 

><span style="color:red">**Task 5.**</span> Write code that given an initial integer, applies the `collatz` function over and over again *indefinitely*, printing on screen the result at each iteration. For now, your code does not need to be a function. Try your code on several initial values. Do you notice any patterns? Some hints:
 * Use a `while True:` loop.
 * It is legal to use the same variable at the left and the right of an assignment statement.
 * Remember you can interrupt Python with `Kernel->Interrupt` using the menus at the top of the Jupyter notebook (or hitting the key `i` twice).

In [None]:
# Code to solve Task 5 goes here

<span style="color:red">**Write your answer to the question in Task 5 in this cell.**</span>

## Verifying Collatz's conjecture programmatically

In 1937, mathematician Lothar Collatz conjectured that this iterated sequence always reaches 1 eventually. 

No one has been able to prove this, despite a lot of mathematicians trying!

><span style="color:red">**Task 6.**</span> Modify the code above such that it stops when the sequence reaches 1.  Change your print statement so that it writes the numbers onto one long line (the Jupyter notebook will wrap the line for you). To do this, pass the keyword parameter `end=''` to the `print` function, i.e., `print(something, end='')`. Try the following and answer to the last question:
 * Try your code for some small and large starting values.
 * Some quite small starting values make long sequences. Try 27.
 * Some large numbers will have very short sequences. Can you think of an example?

In [None]:
# Code to solve Task 6 goes here

<span style="color:red">**Write your answer to the question in Task 6 in this cell.**</span>

## Computing stopping times

We will follow the mathematicians and call the starting value "the seed", and the length of the sequence until it reaches one, the "stopping time".

While you can probably think of examples of seeds with short stopping times, we have to search for seeds with long stopping times. 

Let's do it.

><span style="color:red">**Task 7.**</span> 
* Write a function named `stoptime` that given a seed, calculates its Hailstone sequence, and returns the stopping time (the number of steps until one). Your function must use the `collatz` function you already wrote. Your function shouldn't print anything out on screen. 
* Test your function with several seeds, and write a `for` loop to print out a table of the stopping times of the first 20 positive integer numbers.

In [None]:
# Code to solve Task 7 goes here: stoptime function definition statement

In [None]:
# Code to solve Task 7 goes here: add cells for your tests, and for your table-generating code.

## Revisiting the Hailstone sequences using lists

The code written in the previous section finds the length of a Hailstone sequence (the stopping time).

What if we wanted instead to find the highest value reached instead?

We could change the code and rerun it.

What if we wanted to plot the sequence?

Really, what we want is to store the sequence as a list. Then, we can find its length (via the `len` Python built-in function, try it!), maximum (via the `max` Python built-in function, try it!), plot it or do anything else we can do with a list.

><span style="color:red">**Task 8.**</span> 
Write a function named  `hailstone` which given a seed, returns a *list* of the full sequence from the seed until it reaches one.
>
> *Hint*: to add an element named `x` to the end of a list named `mylist`, use `mylist.append(x)`. Recall that here you are using a method named `append` associated to the object `mylist`. A method is similar to a function, but it uses a slightly different syntax: the period (also known as simply dot) in between the variable (object) and the method. Calling a method is like making a request: we are telling `mylist` to add an element to its end.

In [None]:
# Code to solve Task 8 goes here

At first, we just printed out our sequences. 
This is fine for looking at the numbers, but we can't do any more than that with them.
Now that we have a sequence as a list, we can compute with them, for example:

In [None]:
longseq = hailstone(6171)

In [None]:
len(longseq)

In [None]:
max(longseq)

## Plotting Hailstone sequences

Once we have the sequence as a list, we can plot it.

Plotting is not 'built-in' with Python, but provided by external modules to be imported into our Python program.

Evaluate the cells below to visualise a short and a long hailstone sequence.

If you do not fully understand the code below, do not worry, plotting data will be covered in detail in later workshops.


In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.dpi'] = 100
plt.rcParams['figure.figsize'] = [10, 6]

In [None]:
shortseq = hailstone(22)
plt.plot(shortseq)

In [None]:
longseq = hailstone(6171)
plt.plot(longseq)

In [None]:
plt.plot(longseq)
plt.yscale('log')

## Plotting stopping times

Individual hailstone sequences look fairly random.

But there is much more pattern and mystery in the stopping times!

><span style="color:red">**Task 9.**</span> 
> * Use the functions developed so far to create a list `times` of all stopping times for seeds from `1` to `10000`. 
> * Before iterating, initialize `times` to an empty list, which is denoted as `[]` in Python.
> * Plot this list. 
> * Use `plt.plot(times, '.')` to get a scatterplot, rather than lines-between-points.

In [None]:
# Code to solve Task 9 goes here: times function definition statement

In [None]:
# Code to solve Task 9 goes here: code to evaluate and plot stopping times

That's the end of our exploration of the Collatz conjecture.

Hopefully you've seen that writing short functions, and using them together, enables you to solve increasingly complex problems - without getting bogged down in the complexity. 

Indeed, at this point, you may think of programming as the process of breaking a large, complex task, into smaller and smaller subtasks until these are simpler enough to be performed with a set of coherent, concise, and well-defined functions.

And lists are critically important for storing, analysing and displaying data! We'll see much more about lists, and other data types that are closely related to lists.


# Learning outcomes

In this workshop you built a basic mathematical function `collatz`, and iterated it. Then you built functions that called `collatz`, ultimately collecting the output into a list for further processing and plotting.

These are most, if not all, the ingredients of processing data with code. 

The Collatz conjecture remains unsolved and is considered by mathematicians to be unlikely to be solved any time soon. 

The coding skills you have learnt include:
* Conditionally-defined mathematical functions.
* Modular arithmetic (quotient and remainder operators).
* Awareness of implicit type conversions during calculations.
* Iteration of function calls with stopping condition.
* Iteration over a fixed range.
* Initialising lists, including the empty list `[]`.
* Creating lists by appending items.
* Operations on lists such as `len()` and `max()`.
* Plotting list data, as lines or points.