## The basics of Python

### Interpreted vs. compiled languages

Python is an interpreted language. This means that when you write code in Python it isn't compiled before runtime to machine code for execution. Instead an interpreter executed the language during runtime. It can be more easier run on multiple platforms. Languages like C are compiled languages and need to compiled for each platform they run on. Typically, compiled languages will be quicker to execute. 

The disadvantage of interpreted languages is they don't do static type-checking, which can help pick up programming errors before runtime. In newer versions of Python though, you can do type annotations on your variables. At this stage the actual Python interpreter won't enforce static type checking. Hence, to do static type checking of these type annotations you need to use an IDE like PyCharm or run a type checker like mypy.

### Imperative vs. functional programming

Python is predominently an imperative programming language, like most common programming langauges such as C. A series of introductions are executed in a specific order which updates the internal state of the program. Typically, you have code organised into loops.

Python also includes some elements of functional programming. In functional programming languages you break down a problem in a number of functions. You don't keep track of the internal state. Functions are called in recursive manner. Haskell and Lisp are examples of well known functional languages.

### What is the current version of Python?

The current version of Python is 3 (the latest version is 3.7). We would strongly recommend using Python 3 where possible. However, there are still a lot of developers using the older Python 2 (latest version is 2.7), mainly because they have a lot of legacy code written in Python 2, which hasn't yet been migrated to 3. The differences between 2 and 3 aren't huge, but generally speaking it isn't an automatic process to migrate code.

### Why use Python?

Python code isn't the fastest to execute. Furthermore it isn't the most widely used language. Lower level languages like C++ (and also Java) have historically been used much more and are also much faster. However, Python does excel at being a relatively easy to use language. When we factor in how fast code can execute, we also need to understand how long it will take to write. Python has become much more popular in recent years (see https://hackernoon.com/top-3-most-popular-programming-languages-in-2018-and-their-annual-salaries-51b4a7354e06).

### Who uses Python?

Python is a general purpose language, hence it can be used in many different domains (see https://www.python.org/about/apps/), such as:

* web development
* desktop GUIs
* business applications
* data science 

Our focus in this course is using Python to solve data science problems specifically in the financial domain. Historically, Matlab has been a popular lanaguage for data science problems. In recent years, many Matlab users have migrated to Python and R. One major explanation for this shift is because both Python and R are open source. R is geared towards statistical computing, and has many packages related to statistcal methods. It is less of a general purpose language compared to Python. Python has also caught up with support for statistical methods and in some cases has overtaken R, in particular when it comes to cutting edge machine learning libraries.

### Python use cases for financial firms?

There are many use cases for Python for financial firms, below we have mentioned a small number of these. We'll go through many of these later in the course and show some demos too. Can you think of any further use cases?

* Automating processes to replace heavily manual Excel processes
    * Downloading market data
    * Creating reports and e-mailing them
* Risk management calculations
* Option pricing
* Trading strategy backtests
* Web dashboards to display and process market data
* Creating fun visualisations of financial markets
* Doing transaction cost analysis
* Crunch alternative data sources like text and images

Python can work with Excel and C/C++, so it can work with existing libraries and resources you might already have like trader spreadsheets or C/C++ option pricing libraries.

## What are the Python distributions?
As well as there being different Python versions, there are also many different distributions of Python too. Anaconda is a type of Python distribution. The main reference Python distribution is CPython (https://www.python.org). It doesn't have all the bells and whistles of other distributions and only has skeleton set of tools in it. It includes pip for installing additional Python libraries, which you are likely to need.
 
PyPy (https://pypy.org/) is a replacement for the CPython interpreter. It uses just-in-time (JIT) compilation to increase the speed of execution. If you have purely Python code, it can work very well. However, many Python libraries also interact with C code underneath, and not all of these will be compatible with PyPy. NumPy, which is an important scientific library is now on PyPy's compatible list. However, it might be the case that some of the libraries you use won't be.

There's also Jython (https://www.jython.org/) which runs Python code on the Java virtual machine, making it easy to interact with other code also on the JVM. IronPython (https://ironpython.net/) is similar but integrates with Microsoft's .NET Framework. In both cases, however, they might not support all the Python libraries you need. Currently, both Jython and IronPython also don't have Python 3 implementations.

### Anaconda: Python distribution for data analysis
Anaconda is a Python distribution (https://www.anaconda.com) that is designed for scientific computing and I typically tend to use this. It already has important data science packages like NumPy and pandas already installed, saving you from having to do so yourself. It also includes the conda package and virtual environment manager. The main disadvantage is that the file size of the distribution can be very large. In general I would recommend using Anaconda for most data science work, as it can simplify the installation of Python packages. In a separate notebook, there are details on how to install the required environment.


### conda and pip to install new packages
We noted that CPython includes pip for installing new Python packages to your local Python distribution. These can be Python packages fetched from PyPI (https://pypi.org/) or ones which have been manually downloaded. Say we want to install the pandas library, we would simply run the following in the command prompt. pip can also be used to remove or update existing installed packages.

    pip install pandas

We can also specify the version of the library we want to install. Very often we need to be careful which versions of libraries we install. Over time, their APIs can change and break existing code which depends on them. For example, with pandas, the syntax for quite a few of the calls has changed over the years. We can either update our code so it works with newer versions of pandas, or try to support multiple versions of pandas

    pip install pandas==1.2.3

Anaconda also includes conda which can be used to install new Python packages (most will be installable via conda, although some can only use pip). Unlike pip it will check what other packages have been installed in the current Python environment, and it will try to figure out how to install the compatible dependencies. pip will instead just install whatever dependencies you have even if they break existing libraries. Anaconda Navigator is a graphical tool to help manage your various conda pacakges. From the command line conda can be used as follows (note, conda uses = as opposed to == with pip).

    conda install pandas=1.2.3

conda can also be used to manage virtual environments. You may want to have several different Python environments running on the same machines, with different versions of Python and also different packages installed on them. You can think of these virtual environments as sandboxes too, so you can experiment with them, without breaking your main Python environment. The below will create a new conda environment called "py38class" of Python 3.8 on your computer.

    conda create -n py38class python=3.8 anaconda

We can activate that environment by running the following command in the prompt on Windows/Linux/Mac:

    conda activate py38class
    
We can also remove conda environments as follows:

    conda remove --name py38class --all

If you are not running Anaconda, you can still create virtual environments by installing the virtualenv package by running the following command in your command prompt.

    pip install virtualenv

We can then create a virtual environment with virtualenv by running

    virtualenv py27

On Windows we can activate it in the command prompt by running

    py27\Scripts\activate

On Linux the command is slightly different

    source py27/bin/activate

### mamba drop-in replacement for conda

conda generally does a better job at resolving conflicts between libraries than pip. However, the process can be length, and conda can sometimes hang for a long time (eg. hours!), if you have a large number of libraries installed. [mamba](https://github.com/mamba-org/mamba) is a drop in replacement for conda which is very fast, which is also used in some of the scripts used in this course to create the conda py38class environment on your computer. To install mamba, we can run in the Anaconda Prompt:
    
    conda activate
    conda install mamba -n base -c conda-forge
    
We can install libraries with mamba in the Anaconda Prompt, just by replacing `conda install` with `mamba install` eg.

    mamba install pandas

## To install Anaconda and packages for this course

See https://github.com/cuemacro/teaching/blob/master/pythoncourse/installation/installing_anaconda_and_pycharm.ipynb which has full instructions on how to install Anaconda Python and all the various Python packages you'll need for this course. It has several different methods for installing the `py38class` just in case you find you need to troubleshoot.

It also has a section for running the course in Google Colab notebooks.

## Using Python shell vs. Notepad IDE vs Jupyter notebooks to develop Python?

### Hello World example!

Traditionally, the first bit of coding you'll do in a language is printing out "Hello World!", which what we'll do here. Brian Kernighan gave the first "Hello World" example in a tutorial book on C in 1972 (https://blog.hackerrank.com/the-history-of-hello-world/). To write Python code we have several options.

## Interactively in the Python shell

In Windows, we can go to Start / Run and then type "cmd" to get to the command prompt. In Linux, we can right click on the desktop and click "Open Terminal". We then type in "python" and then we can interact with the Python interpreter interactively. This is fine if we want to do run something quickly. However, it's not convenient when we have a lot of code to run and also when we want to distribute code.

## Jupyter Notebook (https://jupyter.org/)

We can also use a Jupyter notebook like this, to execute Python code. The advantage of a Jupyter notebook is that it allows us to mix text, graphics and code in a nice way, which makes it great for tutorials. It works for a number of different languages not just for Python. JupyterLab can be seen as cross between a tradtional Jupyter notebook and an IDE. You can run Jupyter notebooks on you own local Anaconda installation (Jupyter is installed by default) or on the cloud, with a service like Azure notebooks.

To start a Jupyter notebook in your conda environment (assumed to be py38class here), first start the Anaconda Prompt then type in the following (note, you can might need to change your notebook-dir, if omitted, it will usually start it in the current working folder):

    conda activate py38class
    jupyter notebook --notebook-dir='e:/cuemacro/pythoncourse/pythoncourse/notebooks'

We can type in Python code and run it in a Jupyter notebook too!

In [1]:
print('Hello world')

Hello world


There are also many notebook extensions (nbextensions) you can add to Jupyter. These include the `ExecuteTime` function and also a table of contents. We can install these via conda (assuming we have selected the appropriate environment) or via pip.

    conda install -c conda-forge jupyter_contrib_nbextensions jupyter_nbextensions_configurator
    jupyter contrib nbextension install --user
    
We can then enable these through the Jupyter GUI or through the command line. For example, to enable the `ExecuteTime` and table of contents one, we can run:

    jupyter nbextension enable execute_time/ExecuteTime
    jupyter nbextension enable toc2/main
    
You can also run magic commands in Jupyter using %. You can for example run conda directly in the notebook using `%%conda`. There is also `%%timeit` for more accurate benchmarking, which runs a cell several times, and takes an average. We see that we can get quite different times versus ExecuteTime.

In [5]:
%%timeit

2 + 2;

9.49 ns ± 0.0554 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)


In [2]:
2 + 2

4

## Notepad like editors

In many cases, we are likely to want to use an editor to create a Python script and run that separately. In Windows, we can use an editor such as Notepad. In Linux, some common editors include vim, emacs and pico, which are accessible in text mode. There are also a number of Linux GUI based editors like NEdit and GEdit. We can then invoke Python from the command prompt to run the script. Will do a quick demo now of a few editors in Windows and Linux.

## IDE (Integrated Development Environment)

The problem with a text editor like Notepad, is that it won't pick up basic mistakes in your code, like typos, which could be picked up before runtime. An IDE, adds a lot of extra features to make it easier to develop your code. IDEs typically consist of several components which are typically made of the below:

* source code editor 
    * includes syntax highlighting to make it easier to read code
    * autocomplete hints for code
    * picks up basic typos eg. in variable names, lack of brackets/spacing etc.

* debugging
    * enables you to easily pause your code during runtime within the editor to help find mistakes
    * can often remotely debug or deploy your code (eg. write on a Windows machine and run it on Linux), which is particularly useful when you're writing stuff on the cloud (eg. AWS or Google Cloud)

* code profiling
    * so you can work out which parts of your code are the slowest, and need work

* version control
    * integrates with tools like Git to keep track of different versions of code

* refactoring
    * it can automatically change variable names, functions etc.

* support for multiple languages
    * many IDEs can support multiple languages, which is useful when you are mixing several in the same project (eg. Python and HTML)

As to which IDE is the best, that largely depends on your personal choice! Some people prefer to use notepad like tools, but I personally think it's much easier to use an IDE. For learning it might be good to use a notepad like tool though just for the experience! Often you can customise the way these IDEs look, so they fit precisely with your preferences, like font type and size. You can also often configure hotkeys to do common operations, rather than using the mouse all the time.

### PyCharm (https://www.jetbrains.com/pycharm/)

I tend to use PyCharm as my preferred IDE of choice when working with Python. It makes it easy to see all the code you have for your project in one window. It includes typical IDE features like debugging, code highlighting, version control etc. There is a community edition available for free download. The professional edition also includes features like a code profile. The main drawback is that it can sometimes be a bit slow to index all your code, particularly when you first start it up for a specific code project. Usually, on subsequent occasions it starts up much quicker. It is available for Windows, Linux and Mac.

### VSCode (https://code.visualstudio.com)
There are many different IDEs which are available from Microsoft. The Visual Studio IDE, comes in several flavours Community, Professional and Enterprise. VSCode is a lightweight free version for editing and debugging code, and has gained in popularity in recent years. It supports many languages as well as Python, and includes Git integration, code highlighting etc. It is available for Windows, Linux and Mac.

### PyDev (https://www.pydev.org)
PyDev is built on the popular Eclipse IDE, which is most commonly used for Java development work. It can be a steep learning curve, but it is very customisable, and a good starting point if you have already used Eclipse a lot. It is available for Windows, Linux and Mac.

### Atom (https://atom.io)
Atom bills itself as a "hackable" code editor which has lots of add on features that can be bolted on to customise it for your own use. It also some cool features like the ability to edit code in realtime, in a similar way that multiple users can work on Google Docs at the same time. It is available for Windows, Linux and Mac.

### Repl.it - online Python in your browser (https://repl.it/languages/python3)

You can also use Repl.it. However, note that it won't be as well featured as most other Python development environments (can be tricky to install some Python libraries on it). It can be an easy way to start though and should be ok for the first tutorial https://repl.it/languages/python3 and if you don't need to use many additional Python libraries. I would recommend installing Python on your own laptop/computer for future tutorials though, as they use some libraries which don't work on Repl.it.

### Google Colab - online Jupyter notebooks (https://colab.research.google.com/)

Google's Colab service hosts Jupyter notebooks on the cloud and the interface is identical to a Jupyter notebook that you run locally. The appeal of this, like Repl.it is that you don't need to bother to install Python on your own machine. You do need to create Google account to save down files (if you don't already have one). 

There is a free tier. There are however limits to how much memory you can use. There are paid for tiers, which give you access to more powerful machines, with more memory and also fasters GPUs and TPUs. You can also configure your environment to include libraries which are not in the default installation.

## Basic types

Let's say we want to store something in the computer's memory. We can assign a variable to what we want to store. The variable can then be manipulated by our computer program. There are many different types of variable. Varaible names can't have spaces in them and also shouldn't be reserved keywords, used by Python elsewhere. It is a good idea to use descriptive names for variables to make code readable. It is also good practice to include comments in code. In Python, this can be done by writing text after the `#` symbol. We typically assign them with an `=`.

In Python duck typing is used, so we don't have to specify what the type of the variable is. Instead, the interpreter will work out the type depends on its methods and proeprties, ie. "if it walks like a duck and it quacks like a duck, then it must be a duck".

Below we list the numerical types:

* `int` - whole numbers and these can be as big as you want (providing they fit in memory)
* `float` - decimal numbers stored as 64-bit "double precision" number with a maximum size of 1.8 ⨉ 10^308
* `complex` - specified as <real-part>+<imaginary-part>j

Let's try assigning some numerical variables. Note that Python can detect the "type" automatically. These types are defined in a similar way for many computing languages.

In [3]:
an_int = 1
type(an_int)

int

In [4]:
a_float = 2.0
type(a_float)

float

In [5]:
a_complex=(1+1j)
type(a_complex)

complex

If we want to store text based data, we can store them as strings. We simply need to use quotation marks to indicate a string to differentiate them from variable names. We can use single or double quotation marks, but they must be included on both sides.

In [8]:
a_malformed_string = 'a

SyntaxError: EOL while scanning string literal (<ipython-input-8-e9c37ad5a0f5>, line 1)

In [9]:
a_string = 'hello'
type(a_string)

str

In [12]:
a_double_quoted_string = "hello"
type(a_double_quoted_string)

str

In [10]:
a_multiline_string = '''A
Large
Burger'''

There are also many special characters which consist of an escape sequence (preceded by `\`). These include the newline character `\n`, tab character `\t` which can be included in our strings

In [11]:
a_raw_string = 'A\nlarge\tburger'

We can recall the contents of a variable by using the print command

In [12]:
print(an_int)
print(a_multiline_string)
print(a_raw_string)

1
A
Large
Burger
A
large	burger


We also have boolean types, which can be made True or False. We can chain together booleans and evaluate them using logical operations.

In [13]:
a_bool = True
type(a_bool)

bool

## Basic arithmetic and string operations

### Arithmetic operations

We now know how to store variables and know the basic types in Python. The next step is doing some operations on them. An error is caused if we try to add an integer and a string!

In [14]:
an_int = 3; a_float = 2.5; a_string = 'burger'
an_int + a_string

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Let's give some examples of the arithmetic operators.

In [15]:
an_int + an_int # Addition

6

In [16]:
an_int * an_int # Mulitplication

9

In [17]:
an_int / an_int # Division

1.0

In [18]:
a_float = an_int / a_float # Dividing an integer by a float, results in a float (note: different in Python 2!)
type(a_float)

float

In [19]:
3 // 2 # Integer division

1

In [20]:
3 % 2 # Modulo function (ie. what's remainder when dividing 3 by 2?)

1

In [21]:
3**2 # Exponent

9

We can also combine the arithmetic `+, -, *, /, %, //, **` operations with = to do an assignment and arithmetic operation in one line.

In [22]:
a = 2
a += 2 # ie. a = a + 2
print(a)

4


Python also includes lots of other functions that can be applied to numbers.

In [23]:
abs(-8) # Absolute value

8

In [24]:
max(3,5) # Maximum

5

In [25]:
min(5,9) # Minimum

5

In [26]:
sum([5,6]) # Sums an iterable (eg. list)

11

### String operations

Python is great for manipulating strings. Hence, it is often a good choice for doing work on natural language processing. We shall give a example of the operations you can perform on strings.

In [27]:
'burger' + 'king' # Concatenating strings

'burgerking'

In [28]:
'burgerking'.replace('king', 'queen') # Replacing a substring

'burgerqueen'

In [29]:
a_burger_chain = 'burger king'
print(a_burger_chain[0]) # Print the first character (note indexing from 0, like C)
print(a_burger_chain[-1]) # Print the last chartcter (note use of negative indexing from the end)
print(a_burger_chain[1:-1]) # Print a substring (uses both positive and negative indexing)
print(a_burger_chain[2:]) # Print from the 3 character

b
g
urger kin
rger king


In [30]:
print(a_burger_chain[50]) # Will cause an error indexing beyond last character

IndexError: string index out of range

In [31]:
len(a_burger_chain) # what is the length of our string?

11

In [32]:
print(a_burger_chain.upper()) # make all the characters uppercase

BURGER KING


In [33]:
print(a_burger_chain.lower()) # make all the characters lowercase

burger king


In [23]:
split_burgers = a_burger_chain.split()
print(split_burgers) # split at the whitespaces

['burger', 'king']


Rather than concatentating strings, we can create formatted strings in the following fashion or we can use it directly on the list.

In [24]:
fmt_string = "We split the string into %s and %s" % (split_burgers[0], split_burgers[1])
print(fmt_string)

fmt_string = "We split the string into %s" % split_burgers
print(fmt_string)

We split the string into burger and king
We split the string into ['burger', 'king']


#### Regular expressions on strings

Python also supports the use of regular expressions to search and alter text. These can give us very powerful methods for manipulating text. It can be tricky to learn all the patterns and generally takes some trial and error to get them to work. However, they are generally common across other languages which use regular expressions, so at least you can learn them only once! A full reference can be found at https://docs.python.org/3/howto/regex.html.

In [25]:
import re

txt = "The 1 cheese burger"
found = re.search("^The.*burger$", txt) # Does the text start (^) with 'The' and contain 'burger' at the end ($) (match)

print(found)

# Does the text start (^) with 'the' (match) - will return None given no matches
found = re.search("^the", txt) 

print(found)

# Replace 'burger' with 'and avocado burger'
txt = re.sub("burger", "and avocado burger", txt) 
print(txt)

# Search for the word (\b indicates a word boundary) which starts with c (+ matches more than once)
x = re.search(r"\bc\w+", txt) 
print(x.group())

# Search for all the numbers (returns a list)
x = re.compile(r'\d+') 
print(x.findall(txt))

# Search for all the words (returns a list)
x = re.compile(r'\w+') 
print(x.findall(txt))

# Search for all the whitespaces  (returns a list)
x = re.compile(r'\W+') 
print(x.findall(txt))

<re.Match object; span=(0, 19), match='The 1 cheese burger'>
None
The 1 cheese and avocado burger
cheese
['1']
['The', '1', 'cheese', 'and', 'avocado', 'burger']
[' ', ' ', ' ', ' ', ' ']


### Comparing between variables

It is often the case that we might want to compare between variables. For numbers, this can include finding if they are bigger or smaller, as well as being equal. Below we give some examples of comparison operators.

In [18]:
print(2 == 2) # Equality
print(2 != 2) # Inequality
print(1 > 2) # Greater than
print(2 >= 2) # Greater than or equal
print(1 < 2) # Smaller than
print(2 <= 2) # Smaller than or equal

True
False
False
True
True
True


We can also compare objects using keywords 'is' and 'not'

In [19]:
print('burger' is not 'cheeseburger')

x = 'burger'
print(x is 'burger')

True
True


### Logical operations

Logical operators can be used to combine conditonal statements together and evaluate them. We give some examples below.

In [20]:
True and False

False

In [21]:
True and True

True

In [22]:
True or False

True

In [23]:
True and not False

True

We can obviously combine these with comparison operators, discussed earlier.

In [24]:
(2 > 1) and (3 > 2)

True

In [25]:
('burger' is not 'king') and (4 > 3)

True

In [26]:
x = 'burger'; y = 'king'
len(x) > len(y) and x != y

True

### Converting between types

We can often have the situation where we might want to convert between data types. In this instance, we convert an integer to a string, so we can concatenate it with other strings.

In [36]:
x = 1 # This is an int
print(type(x))

x = float(x) # Cast as a float
print(type(x))

x = str(x) # Cast as a str
print(type(x))

x = x + ' burger is less than ' + x + x # Concatentate strings
print(x)

<class 'int'>
<class 'float'>
<class 'str'>
1.0 burger is less than 1.01.0


## Data structures

We have seen how we can store variables. However, in practice, we often want to combine together variables as groups. 

### List
One of the most common data structures in Python is a list. A string can be thought of as a list of characters. However, we can have lists made up of other data types.

In [37]:
lst_of_numbers = [1, 2, 3]
another_lst_of_numbers = [4, 5, 6]
a_big_lst_of_numbers = another_lst_of_numbers + lst_of_numbers # We can combine the lists together
print(a_big_lst_of_numbers)

[4, 5, 6, 1, 2, 3]


We can do operations on the list as well

In [38]:
print(5 in a_big_lst_of_numbers) # Is 5 in our list?
print(0 in a_big_lst_of_numbers) # Is 0 in our list?

True
False


In [39]:
list.sort(a_big_lst_of_numbers) # Sort the list in order
print(a_big_lst_of_numbers)

[1, 2, 3, 4, 5, 6]


In [40]:
a_big_lst_of_numbers.insert(0, -1) # Put a -1 at the start of the list
print(a_big_lst_of_numbers)

[-1, 1, 2, 3, 4, 5, 6]


In [41]:
a_big_lst_of_numbers.append('burger') # Add a 'burger' in the list (we can have mixed types in a list too)
print(a_big_lst_of_numbers)

[-1, 1, 2, 3, 4, 5, 6, 'burger']


We can index lists like we indexed strings earlier

In [42]:
print(a_big_lst_of_numbers[1]) # Get the second element
print(a_big_lst_of_numbers[-1]) # Get the last element
print(a_big_lst_of_numbers[5::]) # Drop the first 5 elements

1
burger
[5, 6, 'burger']


In [43]:
a_big_lst_of_numbers.remove('burger') # Remove the first item on list which is burger
print(a_big_lst_of_numbers)

[-1, 1, 2, 3, 4, 5, 6]


In [44]:
print(a_big_lst_of_numbers.count(6)) # Count the number of occurances of 6 there are in the list

1


We can do various mathematical operations on lists too:

In [45]:
print(max(a_big_lst_of_numbers))
print(min(a_big_lst_of_numbers))

6
-1


### Dict

Dict data structures in Python are made up of keys and values. In older versions of Python, dicts did not remember the order in which elements were added. From Python 3.6 onwards, they are ordered. To ensure dicts are ordered in older versions of Python you need to use OrderedDict. Using dict structures is very easy.

In [46]:
a_dict = {} # Create an empty dict
a_dict['Burger King'] = 'whopper' # Add value whopper for key 'Burger King'
a_dict['McDonalds'] = 'big mac'
print(a_dict)

{'Burger King': 'whopper', 'McDonalds': 'big mac'}


In [48]:
print(a_dict['Burger King']) # What burger is at Burger King?

whopper


In [49]:
print(a_dict.keys()) # Get all the keys

dict_keys(['Burger King', 'McDonalds'])


In [50]:
print(a_dict.values()) # Get all the values

dict_values(['whopper', 'big mac'])


In [51]:
print(list(a_dict.values())) # Get the values and convert it to a list

['whopper', 'big mac']


In [52]:
a_dict.pop("McDonalds") # Let's say we don't like McDonalds, and want to remove it
print(a_dict)

{'Burger King': 'whopper'}


In [53]:
print('McDonalds' in a_dict) # Is McDonalds in our dict?
print('Burger King' in a_dict) # Is Burger King in our dict?

False
True


In [54]:
a_dict.clear() # Remove all the items in the dict
print(a_dict)

{}


### Tuples

Tuples are similar to lists, except their contents are not changable, so we cannot add or remove items later. We use () brackets to denote tuples.


In [56]:
a_tuple = ("burger", "cheeseburger", "big mac")
print(a_tuple)

('burger', 'cheeseburger', 'big mac')


In [59]:
print(a_tuple[1]) # Get the first element of the tuple

cheeseburger


### Sets

Sets are unordered, unlike lists, which preserve an ordering. They cannot have multiple elements which are the same.

In [61]:
a_set = {'burger', 'burger', 'cheeseburger'} # Final set will only have one burger
print(a_set)

a_dessert_set = {'cheesecake', 'milkshake'}
a_meal = a_set.union(a_dessert_set) # Create the union of the sets (or can use |)
print(a_meal)

an_empty_meal = a_set.intersection(a_dessert_set) # Create the intersection of the sets (or can use &)
print(an_empty_meal)

{'burger', 'cheeseburger'}
{'burger', 'cheesecake', 'cheeseburger', 'milkshake'}
set()


We can use functions like len, is or not with sets.

In [62]:
print('burger' in a_set)

True


Frozensets are similar to sets but cannot be changed, once they are defined. We can use frozensets as elements within a set or as a key in a dict.

In [63]:
a_frozen_set = frozenset({'burger', 'king'})
a_big_set = {1, a_frozen_set}
print('burger' in a_frozen_set)
print(a_big_set)

True
{1, frozenset({'burger', 'king'})}


## Spaces for indentations

In many languages curly brackets (or similar) are used to indicate specific types of code structure. In Python, the key differences is that there are no curly brackets. Instead, we need to use tab characters to define indendation of the code where other languages use brackets. We also don't use the word 'then' in the actual Python code. Note, that whenever a 'tab' is used in Python we should instead use 2 spaces. 

It can cause confusion to mix tabs and spaces. Python 3 also specifically disallows mixing spaces and tabs for indentation. IDEs will usually automatically insert spaces, when you use a tab character (or they can be configured to do so). However, when using a simple editor to edit your code, it is advisible simply to use spaces.

When working in a team make sure everyone is using spaces, otherwise your code will look a mess if some people are using tabs and others are using spaces!

## Conditonals and loops

We talked a lot about conditonal statements. We can use conditional statements to decide which parts of code to execute or not to execute. 

### if.. then.. else statements

The if.. then.. else statements are common to many programming languages.

In [65]:
if 3 > 2: # If 3 is bigger then 2, execute the code below under the tab
    print('Yes it is!')

# We can type on the same line, but it's more difficult to read
if 3 > 2: print('Yes it is on the same line')

Yes it is!
Yes it is on the same line


We can also add an else statement which will execute if the condition in the if statement is not satified

In [66]:
if 3 < 2: # If 3 is smaller than 2, execute the code below, otherwise execute what's under else
    print("This isn't true")
else:
    print("Of course 3 isn't smaller than 2") # Use double quotes so can use single quotes inside the string
    
# Or this is another way to do on one line if an else is on the same line (often used in list comprehensions) 
print('Yes it is on the same line..!') if 3 > 2 else print('No it is not!')

Of course 3 isn't smaller than 2
Yes it is on the same line..!


We can add an elif statement if we want to go try multiple testing conditions.

In [68]:
if 3 < 2: # If 3 is smaller than 2, execute the code below, otherwise execute what's under else
    print("This isn't true")
elif 3 > 2:
    print("But 3 is bigger than 2") # Use double quotes so can use single quotes inside the string
else:
    pass # We use the pass keyword so Python won't throw an error

But 3 is bigger than 2


We can make our conditions as complicated as we want and nest if statements.

In [70]:
x = 3; y = 2; z = 3;
if x > y or x == z: 
    print('Both of these are true')

    if y == z: 
        print('This will not run...')
    else:
        print('Well %s does not equal %s!' % (y, z))

Both of these are true
Well 2 does not equal 3!


### For loops

Let's say we have a list of elements, and we want to do execute an operation on each element in that list. One way to do this is to use a for loop.

In [69]:
lst_numbers = range(0, 6) # Creates a range iterator from [0, 6)
print(lst_numbers)

add_numbers = []

for l in lst_numbers: # Iterate through
    add_numbers.append(l + 100) # Add 100 to each number

print(add_numbers)

range(0, 6)
[100, 101, 102, 103, 104, 105]


Alternatively we can go through a list by index numbers. This code is more difficult to read and more complicated. However, sometimes we might find it useful to have the index number.

In [27]:
lst_numbers = list(range(0, 12, 2)) # Creates a list [0, 2, ..., 10] jumping by 2 each time
print(lst_numbers)

add_numbers = []

for l in range(0, len(lst_numbers)): # Go through by index of the list [0, .., 5]
    add_numbers.append(lst_numbers[l] + 100) # Add 100 to each number
    print("Index is " + str(l))
    print("Element is " + str(lst_numbers[l]))

print(add_numbers)

[0, 2, 4, 6, 8, 10]
Index is 0
Element is 0
Index is 1
Element is 2
Index is 2
Element is 4
Index is 3
Element is 6
Index is 4
Element is 8
Index is 5
Element is 10
[100, 102, 104, 106, 108, 110]


What if we want to iterate through two ranges at the same time? We can use the zip function to create an iterator of tuples, formed from two ranges or lists. We also show how to use 'break' to exit a for loop, if certain conditions are satisfied.

In [28]:
first_numbers = range(0, 6) # Creates a range iterator from [0, 6)
second_numbers = range(6, 12) # Creates a range iterator from [6, 12)
print(lst_numbers)

add_numbers = []

for l, m in zip(first_numbers, second_numbers): # Iterate through
    add_numbers.append(l + m) # Add the 1st, .., 7th element of each list together

print(add_numbers)

break_numbers = []

for l, m in zip(first_numbers, second_numbers): # Iterate through
    if l > 10 or m > 10:
        break
        
    break_numbers.append(l + m) # Add the 1st, .., 7th element of each list together

print(break_numbers)

[0, 2, 4, 6, 8, 10]
[6, 8, 10, 12, 14, 16]
[6, 8, 10, 12, 14]


### While loops

While loops keep on running till a certain condition is satisfied. Below we give an illustration and include an example of the word continue and break.

In [72]:
start_number = 2

while start_number < 10:
    # Multiply by 2 and reassign to start_number
    start_number *= 2 

    # If it's 8 then skip the print statement
    if start_number == 8:
        continue
    elif start_number == 16:
        print('Reached 16, so break entirely from while loop')
        break
    
    print('This has been executed ' + str(start_number) + ' times')

This has been executed 4 times
Reached 16, so break entirely from while loop


### List comprehensions

We can use list comprehensions to quickly create lists in one line, rather than using a for loop to iterate through the list and apply functions to them.

In [58]:
lst = [x + 2 for x in range(0, 6)] # Add 2 to every element in the range (that was easier than a for loop!)
print(lst)

[2, 3, 4, 5, 6, 7]


We can also add if statements (and also else) within a list comprehension.

In [59]:
# If x is not divisible by 2 then add x to it (and include the list) otherwise don't add
lst = [x + 2 for x in range(0,6) if x % 2 == 1]
print(lst)

# If x is divisible by 2 then add x to it (and include the list) otherwise just add x to the list as it is
# Note that we changed the position of the if.. else in the second instances
lst = [x + 2 if x % 2 == 0 else x for x in range(0,6)]
print(lst)

[3, 5, 7]
[2, 1, 4, 3, 6, 5]


## Functions, exceptions, modules and packages

So far we've generally only looked at executing a small number of lines of code. In practice, a program might have thousands or millions of lines of code. These lines of code need to be organised properly, rather than be in one monolithic code base. Furthermore, parts of the code might be reused repeatedly. We clearly do not want to be copying and pasting this code, and making our codebase much larger than it needs to be. CTRL-C and CTRL-V is not code reuse!

### Functions

Functions are a building block to collect together lines of code. They often take in input variables, which are termed as parameters and manipulate them and then return the output. We have already used some built-in Python functions like `max` and `min`. Below we give an example of a function and also how to make comments for a function. We call the function at the end and print the output. Just like with if statements, while and for loops, we don't need to add brackets to denote the start and end of the function. Instead, we need to put tab characters.

In [75]:
def make_a_cheeseburger(burger):
    """This function adds 'cheese to burger'. 
    
    Note the use of a multiline comments here too! It's good to comment your function to describe what it does
    and also what paramters it takes, and what are their expected types. Our comments here use NumPy style
    function comments. There are many other comment styles which are used.
    
    Parameters
    ----------
    burger : str
        Type of burger
    
    Returns
    -------
    str

    """
    
    # None is a special type of value for a variable similar to null which is used in other languages
    if burger is None:
        return "Sorry that was an empty burger"
    
    return "cheese " + burger

print(make_a_cheeseburger('Burger King Whopper'))

cheese Burger King Whopper


Functions don't always need to return any variable or indeed take any parameters, as we illustate below.

In [76]:
def print_burger():
    print("I just made a burger!")
    
print_burger()

I just made a burger!


Sometimes we might want to specify a default value in case the user doesn't actually give a value themselves.

In [80]:
def make_an_impossible_burger(burger, topping='cheese'):
    return burger + ' topped with ' + topping

print(make_an_impossible_burger('Beyond Meat'))
print(make_an_impossible_burger('waygu burger'))

Beyond Meat topped with cheese
waygu burger topped with cheese


We can also make functions within functions, which can access the same variables.

In [81]:
def make_a_burger(burger):
    
    # We can access the burger variable within scope
    def print_a_burger():
        print(burger)
        
    print_a_burger()
    
make_a_burger("KFC tower burger")

KFC tower burger


We can also return our nested function and call this later. This is known as a closure in Python. We also demonstrate how variables defined outside the function also available, within it.

In [84]:
outside_variable = 'all functions can access this'

def make_a_burger(burger):
    
    def print_a_burger():
        print(burger)
        print(outside_variable)
        
    return print_a_burger
    
x = make_a_burger("KFC tower burger")
x()
x()

KFC tower burger
all functions can access this
KFC tower burger
all functions can access this


### Modules

What if we want to save several functions together in a single file to run later? We can group them together into a module which is essentially a module. Python also has lots of built in modules, which we can use. To use a module simply use the import statement.

In [92]:
import datetime # import datetime module

print(datetime.datetime.utcnow())

2022-01-27 19:55:29.511632


We can also choose to import specific submodules from a module. It can sometimes take time to import a large package, hence this approach can speed up our code. If we use many different submodules from a module, however, this approach can make our code very messy.

In [93]:
from datetime import timedelta # timedelta can be used to add and subtract dates

today = datetime.datetime.utcnow()
yesterday = today - timedelta(days=1)

print(yesterday)

2022-01-26 19:55:30.680965


### Packages

A package can help us organise a large number of modules together.

We have create pythoncourse as our top package. In the directory below that, we have created a subpackage packagedemo. Under that directory we have create typechecker.py with functions for checking the type of variables. We could also create other nested levels of packaging. In each package folder we need to create an `__init__.py` file so Python knows to treat that folder as package. We can also specify what submodules should be imported in the `__init__.py` file.

Note that when importing we do not include the py file extension. We can also use the "as" keyword to create an alias that we can use later, saving us from typing in the full address. Below we illustrate several ways to import packages and functions within them.

In [95]:
try:
    import sys
    
    # You can download this code from https://github.com/cuemacro/teaching
    # You'll need to change the code path below
    sys.path.append("e:/cuemacro/teaching/")
except:
    path

import pythoncourse.coursecode.packagedemo.typechecker as typechecker

print(typechecker)
# from typechecker import print_is_a_float, print_is_a_string

typechecker.print_is_a_float('1')
typechecker.print_is_a_string('a string')

# Let's import a specific function
from pythoncourse.coursecode.packagedemo.typechecker import print_is_a_float

print_is_a_float(2.0)

# Or import everything!
from pythoncourse.coursecode.packagedemo.typechecker import *
print_is_a_string('This is a tasty cheeseburger')

<module 'pythoncourse.coursecode.packagedemo.typechecker' from 'e:\\cuemacro\\pythoncourse\\pythoncourse\\coursecode\\packagedemo\\typechecker.py'>
1 is not a float
a string is a str
2.0 is a float
This is a tasty cheeseburger is a str


### Errors and Exceptions

One of the most common errors you might encounter are syntax errors. Basically this is where you have typed in the code incorrectly. For example, there might be a typo in a variable or function name. You might not have intended your code correctly. However, these are relatively easy to fix. In many instances an IDE will likely flag these types of errors.

However, there are many instances where even if your code has no syntax errors, but it might still cause an error, which you need to handle. These are known as exceptions. There are numerous examples of code which can throw an exception. For example, you might wish to create a file, but the hard disk is full. You might try to add to variables, but they are of different types. 

In [98]:
# This will throw a type error

2 + 's'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

If we want to handle this `TypeError`, we can use `try.. except` statements. We can return a more meaningful error message to the user and handle the error gracefully.

In [99]:
add_1 = 1; add_2 = 'x'

try:
    add_1 + add_2
except TypeError:
    print("You shouldn't add variables of different types!")

You shouldn't add variables of different types!


We can also raise our own exceptions (and create our own types of exception too), to allow them to be handled elsewhere in the code. A `finally` statement can be used to executed at the end of a `try` statement. It will get run regardless of whether the `except` statement is tripped.

In [100]:
class BurgerException(Exception):
    """Base class for exceptions in this module."""
    pass

def create_a_burger(ingredients):
    for i in ingredients:
        if 'lamb' in i:
            raise BurgerException("Lamb is too expensive for a burger")
            
try:
    create_a_burger(['cheese', 'lettuce', 'lamb'])
except BurgerException as e:
    print(str(e))
finally:
    print('Let us clean up at the end, no matter whether a burger was cooked')

Lamb is too expensive for a burger
Let us clean up at the end, no matter whether a burger was cooked


### Recursion and iteration

Earlier we discussed how Python is predominently a imperative language. One of the properties is that code is writen in an iterative fashion. This constrasts to functional languages where recursion is a major feature. Here we share illustrate the differeneces between recursion and iteration. We also demonstrate the use a Python lambda function, which is an anonymous function, which doesn't have a name.

In [101]:
# Here we sum a list in an iterative fashion 
# (in practice, we would likely use built-in functions to do this!)
def sum_iter(lst):
    no = 0
    
    for i in lst:
        no += i

    return no

def sum_rec(lst):
    # Base case
    if len(lst) == 1:
        return lst[0]
    
    # Add the first element and run the same function on the rest of the list
    return lst[0] + sum_rec(lst[1:])

def sum_rec_one_line(lst):
    return lst[0] + sum_rec_one_line(lst[1:]) if len(lst) > 1 else lst[0]

# We can use a Python lambda (we've also changed the if statement)
sum_lambda = lambda lst: lst[0] + sum_lambda (lst[1:]) if lst else 0

lst = [0, 1, 2, 3, 4]

print(sum_iter(lst))
print(sum_rec(lst))
print(sum_rec_one_line(lst))
print(sum_lambda(lst))

10
10
10
10


There are several functions in Python which can helpful if you wish to write your code in a functional style, namely `map`, `filter` and `reduce`. Let's start with `map` which applies the same function to every element in our input.

In [72]:
def double_it(x):
     return x * 2

lst = [0, 1, 2, 3, 4]

double_them = list(map(double_it, lst))
print(double_them)

[0, 2, 4, 6, 8]


`filter` meanwhile filters elements of a list according to a certain conditional criteria. Here we use a `lambda` function as our `filter`.

In [73]:
lst = ['burger', 'veggie', 'chicken']

only_burgers = list(filter(lambda x: 'burger' not in x, lst))
print(only_burgers)

['veggie', 'chicken']


`reduce` can run functions to combine together elements of a list somehow. In this instance, we use it to a rolling concatenation on the elements of the list. Then we use it to find the shortest word in a list (note, it will only return a single word, even if multiple words have the same length).

In [74]:
lst = ['burger', 'tomato', 'lettuce']

from functools import reduce
final_burger = reduce((lambda x, y: x + ', ' + y), lst)

print(final_burger)

print(reduce(lambda x,y : x if len(x) < len(y) else y, lst)) 

burger, tomato, lettuce
tomato


## Coding conventions: PEP8 Python style guide

Python gives you a lot of freedom in how you write your code. However, in general, it is a good idea to be relatively consistent in how you code. If you are working in a team, it is also important that everyone uses the same conventions to make the code more readable.

The PEP8 Python style guide by Guido van Rossum, who invented the Python language, defines a number of different coding conventions, which it is recommended that you follow, to make your code easier to read and understand. Here we go through some of the main things flagged in the style guide. It is recommended you also go through it in full at https://www.python.org/dev/peps/pep-0008/. Source code editors in IDEs will often flag if your code doesn't adhere to some elements of the PEP8 standard. It's also worth going on GitHub, to see how some popular libraries are written, eg. pandas or NumPy to give you an idea of the types of conventions used.

### Indentations

Indentations in your Python code should be 4 spaces and be consistent. Code might still run if you are not fully consistent, but it makes it difficult to read in any case. On the question of tabs or spaces, do not mix them. Only use tabs if they are already used in the code. Python 3 doesn't allow the mixing of tabs and spaces as well.

In [102]:
# Bad (will still execute, but looks awful!)
def a_func(x):
        
        
        if x == 'burger':
            print('that is a burger')
        else:
                print('this is not a burger')

# Good.. nicer formatting!
def a_func(x): 
    
    if x == 'burger':
        print('that is a burger')
    else:
        print('this is not a burger')


### Maximum line length

Keep lines to a length of 79 characters. Very long lines are difficult to read.

### Imports

Import statements should be written on different lines, rather than collecting together on one line. Generally import standard Python libraries first, then third party libraries and then imports related to your local project.

### String quotes

Single and double quotation marks may be used. Try to stick to a consistent approach in your code. You might have to use one or the other, if you strings actually contain single or double quotation marks in them.

In [76]:
print('"burger"')
print("'burger'")

"burger"
'burger'


### Whitespace

Try to avoid unnecessary whitespaces in many situations, for example before brackets. For variable assignments, it is fine to use a space, but don't use lots of spaces!

In [103]:
# Bad, with many unnecessary whitespaces
def add(x, y = 1, z = 2):
    return x + y +    z

print(add (1))

x       = 1
y       = 1

# Good
def add(x, y=1, z=2):
    return x + y + z

print(add(1))

# For variable assignment use a space
x = 1
y = 1

4
4


### Other tips

Try to ensure that not everything is squeezed on to the same line where doing if statements or similar statements.

In [104]:
# Bad
if 3 > 2: print('Well, you must be Sherlock Holmes to know that...')
else: print('OK, then')
    
# Good
if 3 > 2: 
    print('Well, you must be Sherlock Holmes to know that...')
else: 
    print('OK, then')


Well, you must be Sherlock Holmes to know that...
Well, you must be Sherlock Holmes to know that...


### Comments

Comments are very important to give further explanation about how your code works. Make sure your comments are up to date, and reflect any changes you've made to your code.

Comments should be sentences and try to write comments in English (unless you are very sure that everyone reading the code will definitely know the language you are writing in).

Try to avoid using inline comments too much and instead write on separate lines. Block comments can be written that go over
multiple lines.

In [79]:
x = 1 # Inline comments

# Assign x as 1 (ok this is obvious...)
x = 1

"""
This is a big block comment. 

It can have multiple lines. Next, we'll double a number...
"""
x = x * 2

print(x)

2


Make sure to write documentation strings (ie. docstrings) for all public modules, functions, classes, and methods. We gave an example of NumPy style docstrings earlier (https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html). Google style docstrings are also popular (https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html#example-google). Block strings

### Naming conventions

When coding we have many choices to make when we create names. We need to name variables, classes, functions etc. We should have a consistent way to name all these things.

Below we define a number of different naming styles.

* `b` (single lowercase letter)
* `B` (single uppercase letter)
* `lowercase`
* `lower_case_with_underscores`
* `UPPERCASE`
* `UPPER_CASE_WITH_UNDERSCORES`
* `CapitalizedWords` (or CapWords, or CamelCase) - when using acronyms in CapWords, capitalize all the letters of the acronym. Thus `HTTPServerError` is better than `HttpServerError`.
* `mixedCase` (differs from CapitalizedWords by initial lowercase character!)
* `Capitalized_Words_With_Underscores` (ugly!)

Try to avoid words or letters which are difficult to read and can look similar in certain fonts. For example `l` (the letter l), looks very much a `1` (the number one). Another example is the confusion between `O` and `0`. Also don't use characters that aren't ASCII compatible eg.

* Package names should be all lowercase and short. Try not to include underscores in them, unless absolutely necessary.
* Class names should be written in CamelCase.
* Exception names should have the word `Error` in them eg. `BurgerError`. 
* Function and variable names should be in all lowercase letters, with underscores if necessary to separate words.
* In instance methods, `self` should always be the first parameter. In class methods, `cls` should be the first parameter.
* For method names and instance variables, use the same convention as function and variable names. 
* For non-public methods and variables use `_` as a prefix.
* Constants should be written in `UPPER_CASE_WITH_UNDERSCORES`.

## Python standard library

We have already talked about a number of modules. Here we shall go through a few more modules and give examples of their usage. The full Python standard library is described in https://docs.python.org/3/library/ and we strongly recommend going through that in more detail.

### datetime

We have already briefly seen this. This enables us to store and manipulate dates and times. Below we give some simple examples of using datetime.

In [105]:
import datetime
from datetime import timedelta

today = datetime.datetime.utcnow()

# We get extract the date and time separately
print(today)
print(today.date())
print(today.time())

# Tomorrow is one day after today
tomorrow = today + timedelta(days=1)

print(tomorrow)

# We can compare dates like we compare numbers
if tomorrow > today: 
    print('Yes ' + str(tomorrow) + ' is after ' + str(today))

2022-01-27 20:18:04.632708
2022-01-27
20:18:04.632708
2022-01-28 20:18:04.632708
Yes 2022-01-28 20:18:04.632708 is after 2022-01-27 20:18:04.632708


### functools

This consists of higher order functions. These includes functions like `map`, `reduce` and `filter` which we discussed earlier.

### math

These functions include a number of different additional mathematical functions, which are typical of the sort you'd find on a scientific calculator. We illustrate a few of these below. `cmath` provides complex number versions of these functions.

In [106]:
import math

print(math.ceil(.9)) # Get the next whole number
print(math.floor(.9)) # Get the next lowest number
print(math.factorial(6)) # Implements the mathematical factorial function
print(math.isnan(float('nan'))) # Is this a Not-A-Number
print(math.sqrt(4)) # Square root

1
0
720
True
2.0


### random

This generates pseudorandom numbers using the Mersenne Twister algorithm. For cryptographic uses this shouldn't be used.

In [107]:
import random

print(random.randint(2, 10)) # Generate a random integer between 2 and 10
print(random.random()) # Generate a random float between 0 and 1
print(random.uniform(1, 12)) # Generate a random float between 1 and 12 using a uniform distribution
print(random.normalvariate(0, 1)) # Generate a random float from the normal distribution (with a mean of 0 and vol of 1)

4
0.89173261699067
5.086920869156231
-2.330254231559188


### io

The module can read and write files from disk.  We open files for read-only, write-only, or read-write access. We can read text files, or we can read files in a binary I/O way (eg. image files).

In [108]:
import io

# Open a file for write access and write it in
with open('burgers.txt', 'w') as file:
    file.write('burgers!!!')
    file.write('great burgers!!!\n')
    file.write('more great burgers!!!\n')
    
# Open a file and write a list in one go
with open('burgers.txt', 'w') as file:
    file.writelines(['one\n', 'more\n', 'burger\n'])

# Read the whole file in one go into a list (read access only)
with open("burgers.txt", "r", encoding="utf-8") as file:
    line = file.readlines()
    
print(line)

# Read line by line till the end and print each line separately (read access only)
with open("burgers.txt", "r", encoding="utf-8") as file:
    while(True):
        line = file.read()
        
        if not(line):
            break
            
        print(line)

['one\n', 'more\n', 'burger\n']
one
more
burger



### pickle

Sometimes we want to write more complex objects to disk, in particular we can persist Python objects to disk, using the `pickle` library. However, note that not every Python object can be pickled. There is also another third party library `dill` which is a bit more flexible than `pickle`. Note, that you should never unpickle files from untrusted sources, as they might contain malicious content. Different versions of Python will have different levels of protocol, which means that pickle files are not always backward compatible.

In [109]:
import pickle

# Create a dictionary with lots of different types of objects
# We could have also used a class
data = {
    'How many burgers eaten': [1.0, 1, 3, 1+6j],
    'Types of burgers': ("burger", b"Big Mac burger byte string"),
    'How truthful was the burger': {None, True, False}
}

# Dump from memory to disk
with open('burger.pkl', 'wb') as f:
    # Pickle the 'data' dictionary using the highest protocol available 
    # Note: when reading protocol doesn't need to be specified
    pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)

# Read back from disk to memory
with open('burger.pkl', 'rb') as f:
    data = pickle.load(f)

# Should look like the original dictionary we made
print(data)

{'How many burgers eaten': [1.0, 1, 3, (1+6j)], 'Types of burgers': ('burger', b'Big Mac burger byte string'), 'How truthful was the burger': {None, True, False}}


### urllib.request

There is a huge amount of information on the web. Python's standard library allows us to directly read webpages with relatively little problems. Note, that actually grabbing the information we want within a webpage is more tricky. We shall discuss libraries to help doing this later, such as BeautifulSoup. Below we give a very simple example loading up the Google page. Note that the start of the webpage is filled with a large amount of code. The library can also be used to post requests to a webpage.

In [110]:
import urllib.request

# Load up the Google webpage
with urllib.request.urlopen('https://www.bbc.com/') as response:
   html = response.read()

# Print the first 1000 characters
if len(html) > 1000:
    print(html[0:1000])
else:
    print(html)

b'    <!DOCTYPE html>\n<html class="b-header--black--white b-pw-1280 b-reith-sans-font">\n\n    <head>\n        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n        <meta name="description" content="Breaking news, sport, TV, radio and a whole lot more.\n        The BBC informs, educates and entertains - wherever you are, whatever your age.">\n        <meta name="keywords" content="BBC, bbc.co.uk, bbc.com, Search, British Broadcasting Corporation, BBC iPlayer, BBCi">\n        <title>BBC - Homepage</title>\n\n        <script>\n            window.orb_fig_blocking = true;\n            window.bbcredirection = {geo: true};\n        </script>\n\n        <!-- WWHPv: 20210923-1449-37491ec2b6e5b4c43bda3673e521e8164a789b87 -->\n        <!-- Webapp: WWHP international homepage -->\n        <meta property="fb:page_id" content="228735667216" />\n        <meta property="fb:admins" content="297814326937641" />\n        <meta property="fb:app_id" content="187214818032936" />\n  

### Concurrency

Very often we might wish to run multiple tasks in parallel, because it's much quicker! However, it can cause complications, particularly if we are reading/writing to the same variables. It can result in some very subtle bugs which are difficult to find. In this instance, we need to be careful to look shared variables, so only one thread can read/write to it in any time.

If your operation is "embarrassingly parallel" and doesn't require the reading/writing to shared memory it is much easier. This could for example involve reading from many different webpages or generating many Monte Carlo paths indepedently.

Python has the GIL, global interpreter lock, which means that at any particular time only one operation can be executed. `threading` (see https://docs.python.org/3/library/threading.html) and `asyncio` modules allow us to kick off concurrent code, but it should be noted, they will only be useful for IO bound operations (as in practice they don't actually allow execution of more than one instruction at the same time). IO bound operations are things like reading/writing from disk or fetching webpages. Most of the time here is spent waiting for the IO operation to complete.

By constrast, heavy number crunching is more limited by the CPU, rather than IO considerations. If we want to parallelise CPU bound operations, we should use a library like `multiprocessing` (see https://docs.python.org/3/library/multiprocessing.html). Given that `multiprocessing` actually kicks off different Python instances to do the computation, underneath it uses `pickle` to serialize/deserialize objects to send them back and forth between the various processes.

There are also third party libraries that are worth looking at such as `multiprocess` and `pathos`, which use `dill` for pickling rather than `pickle` and are a bit more flexible. In all these instances, there's obviously quite a bit of overhead when doing the serialisation/deserialisation, hence, we should try to limit the amount of data which goes back and forth and interprocess communicaiton.

We shall now do a demo comparing `threading` vs `multiprocessing`. We shall use `concurrent.futures` library (see https://docs.python.org/3/library/concurrent.futures.html) which is an abstraction on top of `threading` and `multiprocessing`. Note, that we shall run this script separately, as the `multiprocessing` library does not work when using Python interactively in Windows (it should work in Linux) - although the code is below for your reference. We shall examine how these various libraries can be used to speed up the downloading of webpages.

We'll find that running single threaded webpage downloads is much slower, than the threaded (or process) versions.

    import urllib.request
    import concurrent.futures

    from multiprocessing import Pool

    import time

    ## URLS to download
    URLS = ['http://www.foxnews.com/',
            'http://www.cnn.com/',
            'http://europe.wsj.com/',
            'http://www.bbc.co.uk/'] * 20

    def time_func(func):
        """Wrap a function with a timer

        Parameters
        ----------
        func : func
            Function to decorate

        Returns
        -------
        func
        """

        def wrapper():
            start_time = time.time()
            x = func()
            duration = str(round(time.time() - start_time, 1))

            print("Function ran in %s seconds" %duration)

            return x

        return wrapper

    def load_url(url, timeout=15):
        """Loads the raw text from a URL

        Parameters
        ----------
        url : str
            URL to download

        timeout : int (optional)
            Number of seconds to timeout

        Returns
        -------
        str
        """
        return urllib.request.urlopen(url, timeout=timeout).read()

    @time_func
    def run_single_thread():
        """Loads URLs in a single threaded way.

        Returns
        -------
        str (list)
        """
        print('--- Single thread ---')
        return [load_url(x) for x in URLS]


    @time_func
    def run_concurrent_futures_threadpool():
        """Loads URLs using concurrent.futures (abstraction on top of threading)

        Returns
        -------
        str (list)
        """

        print('--- Concurrent futures threadpool ---')
        with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
            return list(executor.map(load_url, URLS))

    @time_func
    def run_concurrent_futures_processpool():
        """Loads URLs using concurrent.futures (abstraction on top of multiprocessing)

        Returns
        -------
        str (list)
        """

        print('--- Concurrent futures processpool ---')
        with concurrent.futures.ProcessPoolExecutor(max_workers=5) as executor:
            return list(executor.map(load_url, URLS))

    @time_func
    def run_multiproccesing():
        """Loads URLs using multiprocessor directly, rather than using via concurrent.futures

        Returns
        -------
        str (list)
        """

        print('--- Multiprocessing ---')
        with Pool(5) as executor:
            return list(executor.map(load_url, URLS))

    if __name__ == "__main__":
        ### kick off each example and compare timings!

        # the "__main__" indicates the code inside this if statement should only be executed when you run the file which
        # contains those code block. It will not executed if the program is imported as a module.

        # multiprocessing code must be executed under the __main__ block

        single_threaded_webpages = run_single_thread()
        print(len(single_threaded_webpages))

        concurrent_futures_webpages_threadpool = run_concurrent_futures_threadpool()
        print(len(concurrent_futures_webpages_threadpool))

        concurrent_futures_webpages_processpool = run_concurrent_futures_processpool()
        print(len(concurrent_futures_webpages_processpool))

        multiprocessing_webpages = run_multiproccesing()
        print(len(multiprocessing_webpages))


## Objects and classes

Python also has support for object oriented programming. You can create a class, which contains both data and ways of manipulating that data. We can create new instances of the same class. Each instance has attributes which tell us about its current state. Methods in each class instances can manipulate the current state of that class. 

Python supports concepts like inheritance, which enable us to create similar related classes which share functionality.

### Burger example - object oriented example

Below we give a simple example of how to create a class.

In [87]:
class Burger(object):
    
    food = 'burger' # This variable is shared by all the instances!
    
    def __init__(self, burger_type): # The __init__ method gets called when the object is instantiated
        self.burger_type = burger_type # This variable is different
        
    def print_burger_type(self): # Note: for methods, we have to add 'self' as a parameter
        print(self.burger_type)

Let's instantiate several different burgers and call methods within that.

In [88]:
cheese_burger = Burger('cheese')
veggie_burger = Burger('veggie')

print(cheese_burger.food)

cheese_burger.print_burger_type()

print(veggie_burger.food)

veggie_burger.print_burger_type()

burger
cheese
burger
veggie


We can use inheritance to make similar class (`CheeseBurger`), which inherit their methods and attributes, to which we can add more methods and attributes. Note, that we can also choose to override the methods of the superclass (in this case `Burger`). Using the keyword `super` allows us to specifially access the methods and attributes in the superclass.

In [89]:
class CheeseBurger(Burger):
    
    def __init__(self, burger_type, cheese_type):
        super(CheeseBurger, self).__init__(burger_type)
        
        self.cheese_type = cheese_type
    
    def print_cheese_type(self):
        print(self.cheese_type)
        
    def print_burger_type(self):
        print('Calling the super class Burger...')
        
        # Super enables us to call methods in the superclass
        super().print_burger_type()
        
    def another_way_to_print_cheese_type(self):
        
        # We can call other methods in the same class using self
        self.print_cheese_type()
        
special_cheese_burger = CheeseBurger('whopper', 'brie')
special_cheese_burger.print_burger_type()
special_cheese_burger.print_cheese_type()
special_cheese_burger.another_way_to_print_cheese_type()

Calling the super class Burger...
whopper
brie
brie


If we want to change the attributes we can just get and them directly.

In [90]:
special_cheese_burger.cheese_type = 'manchego'
special_cheese_burger.print_cheese_type()

manchego


However, we might want to make an attribute private, so outside callers, can't directly change it. A more Pythonic way to get and set attributes is to use properties. This also enables us to add addtional code when getting and setting an object's attributes. Note, Python will still let outside callers change private attributes directly (unlike languages like Java), but hopefully callers, will recognise which attributes have been made private! It involves adding decorators (which have `@` symbols) before each method.

In [91]:
class CheeseBurgerWithProperties(CheeseBurger):
    
    def __init__(self, burger_type, cheese_type):
        super(CheeseBurgerWithProperties, self).__init__(burger_type, cheese_type)
        
    @property
    def cheese_type(self):
        return self.__cheese_type

    @cheese_type.setter
    def cheese_type(self, cheese_type):
        print('Set cheese type')
        self.__cheese_type = cheese_type
        
    @property
    def burger_type(self):
        return self.__burger_type

    @burger_type.setter
    def burger_type(self, burger_type):
        print('Set burger type')
        self.__burger_type = burger_type
        
burger = CheeseBurgerWithProperties('big mac', 'brie')
print(burger.cheese_type)

Set burger type
Set cheese type
brie


We can also create our own decorators, functions which wrap around calling functions.

In [92]:
def burger_maker(func):
    
    def wrapper():
        print("About to order a burger")
        func()
        print("Burger has been ordered burger")
        
    return wrapper

def go_to_burger_joint():
    print("Going to Burger King")

# We can use "syntactic sugar" to use our decorator function
@burger_maker
def go_to_decorated_burger_joint():
    print("Going to Burger King")
    
get_a_burger = go_to_decorated_burger_joint()

# This equivalent to get_a_burger = burger_maker(go_to_burger_joint)

About to order a burger
Going to Burger King
Burger has been ordered burger


### Student example - object oriented programming

If we think of variables like integers, strings etc. they enable us to store lots of different types of information. We tend to think of these as primative types. We also have abstract data types such as lists, sets and dictionaries, which let us aggregate many of these more basic data types.

We can for example have a list of integers, a dictionary with string keys and int values.

In [93]:
a_sample_list = [1, 2, 3]
a_dict = {'Saeed' : 1, 'Alex' : 3}

But let's say we want to represent something much more complex, which cannot be described with just an integer or a string. Let's say we want to describe of group of students. Each student has many different sorts of properties to describe them. We list some examples below.

* Unique ID number (integer)
* First name (string)
* Second name (string)
* Age (integer)
* Modules they are studying
* Graduated (boolean)

Some of these attributes are unlikely to change, eg. their first name. However, other attributes may well change. They might want to register for new courses, or we might want to change their status to graduated at the end of the course. How do we store their details in Python?

We can create more customised types known as classes, to hold all these various properties. At the same time, within the class, we also want to have methods (basically functions), which can do various tasks for students such as registering them for new courses. We can create a single Student class, and then create (or instantiate) a version of it for each student. Each student will be represented by a version of the Student class.

This approach of creating classes with can store properties and also the methods to manipulate the state of the properties is known as object oriented programming. Python like many computer languages supports object oriented programming.

#### How to code up a class for our Student class

Let's create our very simple class for storing student details. Let's make it simple, so we only store the first name, second name, their age and whether they are graduated or not for each student. These are the only properties we will store. We need to write specific methods to set and get the properties. We also can write methods that use the properties to do some of calculation or output. We shall go through every line.

In [112]:
# We use the class keyword
class Student(object):
    
    
    # The __init__ method is called when the Student class is first created
    # Let's assume that when each student class is created, we are given
    # - first name (str)
    # - second name (str)
    # - age (int)
    # - graduated (bool)
    
    # note, the use of the "self keyword" in every method of the class
    def __init__(self, first_name, surname, age, graduated):
        
        # We need to set the properties for first_name
        # second_name and graduated
        # (note: we'll later write the code for setting these properties)
        
        self.first_name = first_name
        self.surname = surname
        self.age = age
        self.graduated = graduated
        
    # note the use of __ - this is used to denote that this is property
    # shouldn't be directly accessed by calling classes
    # we should only access the variable via these methods
    @property
    def first_name(self):
        return self.__first_name

    @first_name.setter
    def first_name(self, first_name):
        self.__first_name = first_name
        
    @property
    def surname(self):
        return self.__surname

    @surname.setter
    def surname(self, surname):
        self.__surname = surname
        
    @property
    def age(self):
        return self.__age

    @age.setter
    def age(self, age):
        self.__age = age
        
    @property
    def graduated(self):
        return self.__graduated

    @graduated.setter
    def graduated(self, graduated):
        
        # we can also make the property setter method more complicated
        # to check the inputs
        if graduated:
            if self.age > 21:
                self.__graduated = graduated
            else:
                print('Student needs to be more than 21 years old to have graduated, setting student to not graduated')
                self.__graduated = False
        else:
            self.__graduated = graduated
            
    # print the student details
    def print_student_details(self):
        if self.graduated:
            has_graduated = "graduated"
        else:
            has_graduated = "have not graduated"
            
        print(self.first_name + " " + self.surname + " is " + str(self.age) + ". They have " + has_graduated)

Ok, that class looks like a lot of code! However, in practice we can see that most of the properties get/set stuff is relatively repeatative. Note, also the method print_student_details, which uses the properties to execute. We are holding the properties and the method for manipulating in the same body of code/class.

We can then instiantiate Student objects, to represent different students.

In [113]:
saeed = Student('Saeed', 'Amen', 22, True)

We can retrieve the properties using the '.'

In [114]:
age = saeed.age

print(age)

22


In [117]:
surname = saeed.surname

print(surname)

Amen


If we want to change the age, we can set the age property like this.

In [98]:
saeed.age = 23

print(saeed.age)

23


We can instantiate other students..!

In [118]:
jeff = Student('Jeff', 'Bezos', 22, True)

Let's try instiantiating a student who is 20, and say they have graduated. The property setter will complain!

In [120]:
saeed_jr = Student('Saeed Jr', 'Amen', 20, True)

Student needs to be more than 21 years old to have graduated, setting student to not graduated


Just like with primitive types, we can combine classes we create like Student, with abstract datatypes, such as lists and dictionaries.

In [121]:
student_lst = [saeed, jeff, saeed_jr]

We can iterate through this list in the same way that we could do with a list of strings for example.

In [122]:
for s in student_lst:
    s.print_student_details()

Saeed Amen is 22. They have graduated
Jeff Bezos is 22. They have graduated
Saeed Jr Amen is 20. They have have not graduated


We could also create a dictionary of Student objects too. We can make the key, an integer, which represents the student surname.. However, we have two students with the same surname, so maybe this isn't such a good idea, as keys have to be unique! In practice, you might have a unique identifier such as an ID number.

In [123]:
student_dict = {'Amen' : saeed, 'Bezos' : jeff, 'Amen' : saeed_jr}

In [124]:
print(student_dict.keys())

dict_keys(['Amen', 'Bezos'])


#### Inheritance and creating MastersStudent subclass

In practice you have many different types of students. They share properties! They all have first names, surnames etc. But then they might have small differences. Let's say we take a master's student. They will also have an undergraduate degree too. How could we record this property? We could just create a totally new MastersStudent class, basically copying and pasting all the code. However, this does not seem very efficient! 

Instead, we can use inheritance, to create a MastersStudent class which inherits the properties of the Student class and adds a new one undergraduate_degree.

In [125]:
# We are extending the Student class
# Student is the superclass of MastersStudent
# MastersStudent is the subclass of Student
class MastersStudent(Student):
    
    def __init__(self, first_name, surname, age, graduated, undergraduate_degree):
        
        # note: use of super, to call the __init__ method in the superclass
        super(MastersStudent, self).__init__(first_name, surname, age, graduated)
        
        self.undergraduate_degree = undergraduate_degree
    
    @property
    def undergraduate_degree(self):
        return self.__undergraduate_degree

    @undergraduate_degree.setter
    def undergraduate_degree(self, undergraduate_degree):
        self.__undergraduate_degree = undergraduate_degree

Let's create a new Master's student object.

In [126]:
bill_clinton = MastersStudent('Bill', 'Clinton', 24, True, 'Law')

We can retrieve properties like age, which are from the superclass!

In [127]:
print(bill_clinton.age)

24


And also properties specific to the subclass, like undergraduate degree

In [128]:
print(bill_clinton.undergraduate_degree)

Law


As well as methods from the superclass.

In [129]:
bill_clinton.print_student_details()

Bill Clinton is 24. They have graduated


We can end up creating very complicated examples, for example nesting classes. We could create a University class, which contains a list of Student classes in it. We have a method which returns those students at the university with a certain surname.

In [130]:
class University(object):
    
    def __init__(self, student):
        self.student = student
        
    @property
    def student(self):
        return self.__student

    @student.setter
    def student(self, student):
        if isinstance(student, Student):
            student = [student]
            
        self.__student = student
        
    def find_student_surname(self, surname):
        matching_student = []
        
        for s in self.student:
            if s.surname is surname:
                matching_student.append(s)
        
        return matching_student

Let's create a University filled with our previously populated list of students. First, let's recall the list..

In [131]:
for s in student_lst:
    s.print_student_details()

Saeed Amen is 22. They have graduated
Jeff Bezos is 22. They have graduated
Saeed Jr Amen is 20. They have have not graduated


In [132]:
uni = University(student_lst)

Now, let's retrieve all the students with surname 'Amen' and print out their output.

In [133]:
student_amen = uni.find_student_surname('Amen')

for s in student_amen:
    s.print_student_details()


Saeed Amen is 22. They have graduated
Saeed Jr Amen is 20. They have have not graduated


## Tutorial: Introduction to Python

Below we have a few Python code writing exercises to try, using what you have learnt. 

### Install Anaconda and PyCharm

Before doing this, it's worth trying to install Anaconda and PyCharm (see separate notebook with full instructions) and all the other Python libraries you'll need for this Python course. Alternatively, do this for the next class and just use https://repl.it/languages/python3 for this tutorial.

### Get familar with various code editors

* Try Notebook or similar tool
* Also try an IDE like PyCharm or VSCode if available

### Create a 'Hello World' application

* Print a "Hello world" string
* Print a number

### Variables and basic operators

* Create integer, float and string variables and check their types 
* What does 3 > 2 evaluate to?
* What does 3 > 3 evaluate to?
* Multiply 10 by 10 and print output
* Concatenate two strings together and print output

### Working with lists

* Create a list of numbers from 1-5
* Add an element on the end
* Remove the first element
* Remove the last element


### Using dictionaries

* Create a dictionary consisting of a surnames as keys, and first names as values for members of The Beatles
* Print all the keys
* Print all the values
* Try looking up some values for given keys and printing them

### Conditionals and loops

* Print random numbers with a pause
* Print all the elements of a list of numbers, each on their own line with a for loop
* Use an if statement, which prints "True" if a > b (where a and b are integer variables of your choice), and prints false otherwise


### Basic functions

* Create a function which takes in two numbers and multiplies them giving the result
* Test it out with two numbers
* Write a function which takes in a list, and print the first and last elements on different lines
* Test out this function with some lists (careful, edge cases!)

### Writing simple function to do a mathematical operation

* Factorial computation (ie. n!) both with recursive and iterative (don't just use the builtin factorial function, although this can be used to check your work!)
* Use multiprocessing to calculate factorial of several numbers

### Writing a function to sort a list

* Use built in sort() function on your list to check your algorithm
* Hint: look at sorting algorithms like bubblesort, quicksort
* If you end up implementing multiple sorting algorithms, which one is quickest?

### Downloading a webpage

* Write code to read in a URL
* For a list of URLs, print out the title of each page (ie. don't want any other content or HTML tags)
* If you have time write a for loop that can take multiple webpages in one go, to read their titles
* Hint: use some of the string manipulation functions in Python (there is no need to use other libraries!)
