In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Notebook 1: Advanced Python

In this notebook, we refresh some of the ideas you have seen before and perhaps introduce some new ones.

## Strings

Okay, you have seen strings before, but let's make sure you know about f-strings. If we want to print a message with a value inside, we can do that using Python3's f-strings

In [None]:
pi = 3.14159265359
print(f"pi is equal to {pi}")


Note the `f` in front of the string - this makes it an f-string. You can also define a format, say you wanted to only print 2 decimal places

In [None]:
print(f"pi is equal to {pi:0.2f}")

Or, if you have a big number and want to you exponent notation

In [None]:
big_number = 18947598428945.945
print(f"My big number is {big_number:0.4g}")

You can learn more about these formats in the [documentation](https://docs.python.org/3/library/string.html#formatspec). 

<div class="alert alert-block alert-danger">
<b>Challenge 1.1:</b> Use f-strings and the "Scientific notation" discussed in the documentation above to print the value of the gravitational constant in SI: $6.67430\times10^{-11}$ N/m$^2$/kg$^2$" to 3 significant figures.
</div>

## Functions and Errors

By now, you will no doubt have seen functions in python, but let's give an example anyway:

In [None]:
def my_function(x):
    return x.split("@")[0]

This is a function, but it is a badly written function. As a user, I have no idea what it does, what the inputs are, or what it should return! Let's fix that by adding a `docstring`:

In [None]:
def get_username(email_address):
    """ Returns the username from an email address
     
    Parameters
    ----------
    email_address: string
        The users email address, e.g. user123@rhul.ac.uk
        
    Returns
    -------
    username: str
        The users username
        
    Examples
    --------
    >>> get_username("user123@rhul.ac.uk")
    "user123"
    
    """
    return email_address.split("@")[0]

Okay, that is better, I changed the function name and a docstring which tells me to do with the function (even giving a nice example!). Here, we use the [numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html) format.

But, the program itself is still a bit weird..what happens if the user gives it a `float` instead?

In [None]:
get_username(123)

Oh, that isn't very useful. Let's improve things by telling the user when they do it wrong

In [None]:
def get_username(email_address):
    """ Returns the username from an email address
     
    Parameters
    ----------
    email_address: string
        The users email address, e.g. user123@rhul.ac.uk
        
    Returns
    -------
    username: str
        The users username
        
    Examples
    --------
    >>> get_username("user123@rhul.ac.uk")
    "user123"
    
    """
    if isinstance(email_address, str) and "@" in email_address:
        return email_address.split("@")[0]
    else:
        raise ValueError(f"The input {email_address} is not a valid email address")

Okay, let's check that works as expected

In [None]:
get_username("user123@rhul.ac.uk")

In [None]:
get_username(123)

Okay, that is much better. 

<div class="alert alert-block alert-danger">
<b>Challenge 1.2:</b> Write a function which accepts as input a number and returns a string formatted to 2 decimal places. If the magnitude of the number is greater than $10^3$ or less than $10^{-3}$, it should use scientific notation and give 2 significant figures.
</div>

Docstrings come in very handy when we are using a package. For example, let's say I want to know how to use the `numpy` function `sin()`. I can run the commands

In [None]:
import numpy as np
help(np.sin)

If I am using `IPython` or `Jupyter` notebooks, I can get the same information by adding a question mark to the end of the function, e.g. 
```
[1] np.sin?
```

You can also add two questions marks to take a look at the source code. This can be very useful if the function is not behaving as you expect it to!

<div class="alert alert-block alert-danger">
<b>Challenge 1.3:</b> You have already seen the Zen of Python in PH2150. But, did you know you can print it at any time by running `import this`? Use the Jupyter question mark magic to look up the source code for the zen of python.
</div>

Perhaps the most fun one can have with `functions` is recursive functions, that is a function which calls itself. For example, imagine we have a nested data structure of the names of students in a school. It is given in a dictionary of dictionaries where the top level is the year group, then the next level is the class name (they use animal names for each class). It might look something like

In [None]:
roster = dict(
    year_1=dict(
        tigers=["greg", "surabhi", "jamil", "nicolo"],
        elephants=["kayan", "casper", "emily"]
    ),
    year_2=dict(
        gazelle=["robert", "woody", "charlotte"],
        elephants=["robyn", "rory"]
    )
)

How can we count the total number of students? We can use a recursive function which calls itself!

In [None]:
def count_students(inputs, total=0):
    if isinstance(inputs, dict):
        for key in inputs:
            total = count_students(inputs[key], total)
    elif isinstance(inputs, list):
        total += len(inputs)
    return total
        
count_students(roster)

The nice thing is, we don't need to know in advance how complicated the list is. If teachers add sub-classes, then it will still work.

<div class="alert alert-block alert-danger">
<b>Challenge 1.4:</b> Write a recursive function to calculate the factorial function $f(N) = N\times(N-1)\times(N-2)\ldots1$ and use it to calculate $52!$.
</div>

## Classes

Classes are a powerful way way to tie together data, and methods which act on that data. They also enable *inheritance*. That is the idea that we can build classes from other classes, inheriting there methods. As an example, here is a set of classes which add dimensional units to python floats.

In [None]:
class Unit(float):
    def __init__(self, value, si_base):
        """ Generic base class for units
        
        This base class should not be used directly, but all Unit classes should inherit from it
        
        Parameters
        ----------
        value: float
            The value of the float
        si_base: str
            The si base unit of the quantity
        """
        self.value = value
        self.si_base = si_base
        
    def __str__(self):
        """ When str() is called on the instance, return a string with units attached """
        return f"{self.value} [{self.units}]"
    
    @property
    def units(self):
        """ A units property, returns the si_base unit string
        
        Example
        -------
        >>> x = Unit(value=2, si_base="m")
        >>> x.units
        "m"
        """
        return self.si_base
    
    def __add__(u1, u2):
        """ Method to add to Unit instances together
        
        Note: this does not check that the units are the same!
        """
        return u1.__class__(u1.value + u2.value)
    
    def __sub__(u1, u2):
        """ Method to subtract Unit instances together
        
        Note: this does not check that the units are the same!
        """
        return u1.__class__(u1.value - u2.value)
    
    def __mul__(u1, u2):
        """ Method to multiple Unit instances together"""
        derived_si_base = f"{u1.si_base}*{u2.si_base}"
        return DerivedUnit(value=u1.value * u2.value, si_base=derived_si_base)
    
    def __truediv__(u1, u2):
        """ Method to divide Unit instances"""
        derived_si_base = f"{u1.si_base}/{u2.si_base}"
        return DerivedUnit(value=u1.value * u2.value, si_base=derived_si_base)
    

class DerivedUnit(Unit):
    def __init__(self, value, si_base):
        """ A class for derived units, e.g. the product/division of two Unit classes """
        super().__init__(value, si_base)
        
class Distance(Unit):
    def __init__(self, value):
        """ SI units for distance """
        super().__init__(value, si_base="m")

        
class Time(Unit):
    def __init__(self, value):
        """ SI units for distance """
        super().__init__(value, si_base="s")      

Okay, there is a lot going on in that cell. Skim over the documentation then take a look at the examples below.

We can define a distance and get a representation including units

In [None]:
x = Distance(10) 
x_as_a_string = str(x)
x_as_a_string

This conversion gets done automatically when we `print` the variable as well

In [None]:
print(x)

We can also combine quantities together

In [None]:
x = Distance(10) 
t = Time(3)

print(x / t)

Note this is a fairly limited implementation. The `Unit` classes above have oone cruical issue. I can add distances and times together! 

In [None]:
print(x + t)

<div class="alert alert-block alert-danger">
<b>Challenge 1.5</b> Add a ValueError to the `Unit` class so that an error is raised when you try to add units together which don't have the same units.
</div>

There are many nice packages which implement units (for example [units](https://pypi.org/project/units/)). These include all sorts of clever features. But, it is still fun to implement things from scratch to see how they work

## Data

In computational physics, we frequently need to store data. This might be the output of a simulation, or  data from a telescope/collider. Inside of a `python` program, it is often a good idea to collect data into a class. For example, let's say we have a time series of data recorded from a voltmere, we could store it in a class like this

In [None]:
class VoltageTimeSeries(object):
    def __init__(self, times, data):
        """ An object to store data 
        
        Parameters
        ----------
        times: array
            The array of times in seconds
        data: array
            The array of recorded voltages in Volts
        """
        self.times = times
        self.data = data
        
        
x = np.linspace(0, 1, 5)
y = np.sin(x)
timeseries = VoltageTimeSeries(x, y)

<div class="alert alert-block alert-danger">
<b>Challenge 1.6:</b> Add `plot` method to the TimeSeries class which plots the data. Your method should add axis labels which can be set by the user or have default values.
</div>

## Storing and Reading Data

Now, you may want to store the times series data to disk for later analysis, or perhaps to publish it. There are many ways to do this. You should choose the best method for the problem. Here we give a quick overview of some common read/write formats, implemented as functions which accept a `TimeSeries` instance.

### Comma-separated-files (CSV)

A very common format is that of CSV files. For our example below, each row stores one time and one voltage and the two are separated by a comma. We store these to disc by using the `print()` command with the optional argument `file`, this writes the file.

In [None]:
def use_print_to_write_a_csv(timeseries, filename):
    # Open the file for writing (w+)
    with open(filename, "w+") as file: 
        # Add a header to the file so people know what each column is
        print(f"#time[s],voltage[V]", file=file)
        
        # Loop over the rows and print them to the file
        for t, d in zip(timeseries.times, timeseries.data):
            print(f"{t},{d}", file=file)
        
use_print_to_write_a_csv(timeseries, "use_print_to_write_a_csv_example.csv")

Okay, let's have a look at what the file looks like (the `cat` command is a UNIX command to print the contents of a file)

In [None]:
! cat use_print_to_write_a_csv_example.csv

Now, we can read that data back by opening the file, looping over the elements, and converting them:

In [None]:
times = []
voltages = []
with open("use_print_to_write_a_csv_example.csv", "r") as file:
    for line in file:
        if line[0] == "#":
            pass
        else:
            t, v = line.split(',')
            times.append(float(t))
            voltages.append(float(v))

But, this is a lot of code for a simple program! Fortunately, many packages provide a nice means to read in files. We can use the `numpy` function [genfromtxt](https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html). This is a very powerful module. Here we give it the filename, the "delimiter" (a comma). Then, we transpose it (the `.T` bit) to get two columns:

In [None]:
time, voltage = np.genfromtxt("use_print_to_write_a_csv_example.csv", delimiter=",").T

In fact, `numpy` also 

In [None]:
X = np.array([timeseries.times, timeseries.data]).T
np.savetxt("use_numpy_to_write_a_csv_example.csv", X, header="time[s],voltages[V]", delimiter=",")

In [None]:
! cat use_numpy_to_write_a_csv_example.csv

<div class="alert alert-block alert-danger">
<b>Challenge 1.7:</b> Add `save_data` method to the TimeSeries class which writes the data to a csv file. Your method take as input the filename to use.
</div>

### Binary files

Comma-separated files (and by analogy, space-separated files) are great ways to store data because anyone can read/write them (E.g., you can open your CSV files in Excel). But, if you have several GBs of data, they are very wasteful. To this end, there are alternative ways to save data. One of them is the numpy `.npy` fileformat which works like this

In [None]:
X = np.array([timeseries.times, timeseries.data])
np.save("use_numpy_to_write_a_npy_example.npy", X)
X_loaded = np.load("use_numpy_to_write_a_npy_example.npy")

# Check they are identical
X == X_loaded

<div class="alert alert-block alert-danger">
<b>Challenge 1.8:</b> Add an optional argument "format" to your "TimeSeries.save_data" method. If the user gives "format" as "csv"`, it should write a csv. If instead they give "format" as "npy" it should write a binary file.
</div>

## User input

Most programs need some level of user input. This could be to tell the program where to store the data, set the free parameters of the simulation, or even some optional arguments to change the simulation. As a concrete example, here is a python program using the beautifulsoup module to scrape the BBC's on this day archive site:

In [None]:
%%writefile on_this_day_script.py

import requests
from bs4 import BeautifulSoup as bs

def get_bbc_on_this_day_headline(month, day):
    url = f"http://news.bbc.co.uk/onthisday/hi/dates/stories/{month}/{day}/default.stm"
    article = requests.get(url)
    soup = bs(article.content, "html.parser")
    print("Headlines at the BBC today:")
    print(soup.body.find(class_="h1").text)
    
get_bbc_on_this_day_headline("february", 20)

In [None]:
! python on_this_day_script.py

But, the user may want to set the date themselves!

As with all things in python, there are many ways a user can provide these inputs (you could even program a website where they go an enter details!). Here we will discuss two.

### Command-line interface

Perhaps most frequently, your user may be running the program via a script in a terminal. Lets' modify the on this day script so the user can specify the month and day:

In [None]:
%%writefile on_this_day_script.py

import argparse

import requests
from bs4 import BeautifulSoup as bs

def get_bbc_on_this_day_headline(month, day, print_URL=False):
    url = f"http://news.bbc.co.uk/onthisday/hi/dates/stories/{month}/{day}/default.stm"
    article = requests.get(url)
    soup = bs(article.content, "html.parser")
    print("Headlines at the BBC today:")
    print(soup.body.find(class_="h1").text)
    if print_URL:
        print(f"Find more information at: {url}")
    

def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--month", type=str, help="The month, given without abbreviation", required=True)
    parser.add_argument("--day", type=str, help="The day of the month, given as a numerical value", required=True)
    parser.add_argument("--print-URL",  action="store_true", help="If given, also print the URL so the user can follow the link")
    args = parser.parse_args()
    return args
    
    
# This line tells python to run the command if it is being called from the command line
if __name__ == "__main__":
    args = get_args()
    get_bbc_on_this_day_headline(month=args.month, day=args.day, print_URL=args.print_URL)

In [None]:
! python on_this_day_script.py --month december --day 28 --print-URL

You may notice that there are required arguments and optional arguments (the `print-URL`). The line `__name__ == "__main__"` is arguably one of the uglier parts of python, you can read more about it [here](https://docs.python.org/3/library/__main__.html) and we'll see why it is useful in Notebook 3.

The command line interface enables the user to ask for help with the program

In [None]:
! python on_this_day_script.py --help

<div class="alert alert-block alert-danger">
<b>Challenge 1.9:</b> Write a script with a command-line interface. The script should generate a random password for the user. The user should be able to set the length of the password (required) and if it should include upper-case letters (optional). There is a hint below.
</div>

*Hint*: We can convert an integer to letter of the alphabet with

In [None]:
import string

N = 10
print(f"The {N}th letter of the alphabet is {string.ascii_lowercase[N]}")

*Hint*: We can convert a letter to uppercase with

In [None]:
letter = "s"
letter.upper()

*Hint:* We can generate random numbers with

In [None]:
import random
random.randint(0, 25)