## What is Python

- dynamic / weakly typed
- interpreted
- object oriented
- open source
- wide range of use cases

## Guido van Rossum

<img src="../assets/guido.jpeg" alt="drawing" width="300"/>

Python was named after Monty Python - see the [Argument Clinic](https://youtu.be/xpAvcGcEc0k?t=84) (my favourite Python skit).

For more on the development of Python:
- [The A-Z of Programming Languages](https://www.computerworld.com.au/article/255835/a-z_programming_languages_python/)
- [Lex Fridman interview](https://www.youtube.com/watch?v=ghwaIiE3Nd8)

## Why Python

The motivation behind Python is **programmer productivity**:

> So I set out to come up with a language that made programmers more productive, and if that meant that the programs would run a bit slower, well, that was an acceptable trade-off - Van Rossum

Python is **not** the language of choice for applications where speed is the most important concern

Python is the language of choice for data science - also popular with web developers

Central to the language is the idea of being **pythonic**

```python
#  not pythonic
for i in range(mylist_length):
   do_something(mylist[i])

#  pythonic
for element in mylist:
   do_something(element)
```

## The Zen of Python

Python has a philosophy built into the language

In [None]:
import this

## Running the Python interperter

If you are using the recommended Jupyter Lab, then you can eaisly access a terminal in the same window as this notebook

We can start an interactive Python **interpreter** session by typing the following 

```bash
#  the $ indicates the command is run in a shell
$ python
```

We can see where this Python executable is located on our machine using the bash program `which`:

In [None]:
#  use ! to run bash/shell code in the notebook directly
!which python

And what version of Python we are using by running Python with a command line argument:

In [None]:
#  use ! to run bash/shell code in the notebook directly
!python --version

# Python development environment

## Which Python should I use?

Use Python 3.6

3.7 has f-strings (eaiser string formatting), data classes (better namedtuple) 
- not compatible with some data science libraries (ie tensorflow)
- if you use these features now your colleagues will need to install a new virtual environment

## System Python

macOS and most Unix operating systems come with a version of Python installed by default. 

This is the **system Python**, and is used by the OS.  You want to avoid using this - breaking this can be painful.

Keep your system Python as clean as possible.

You can do this using a **virtual environment**.  

## Virtual environments

Virtual environments are ignored by most beginners - using them is part of becoming an intermediate level Python programmer.  It is a best practice.

Idea = it is cheaper and simpler to copy the whole Python installation and to customise it than to try to manage a single installation that satisfies all the requirements. It’s the same advantage we have when using virtual machines, but on a smaller scale.
- similar idea that is is eaiser to do a clean install of a buggy OS than to fix it

A virtual environment allows you to **isolate** different installations of Python
- a directory (with many subdirectories) that mirrors a Python installation like the one that you can find in your operating system

Makes it easy access to different installations of Python with different packages
- managing fast moving libraries like TensorFlow, pandas
- reproducibility

Using Python virtual environments (usually one per project) is **computational hygiene**.

```bash
#  create a new environment called dsr
$ conda create --name dsr python=3.6

#  activate the dsr environment
$ conda activate dsr

$ pip install jupyterlab
```

## Anaconda

A distribution of Python ([installers here](https://www.anaconda.com/distribution/)) for data science
- precompiles a lot of the C code used in libraries like `numpy` - useful on Windows

Also has a **virtual environment manager**

```bash
$ conda info —envs

$ conda create --name dsr python=3.6

$ conda env remove -n dsr
```

## iPython

IPython = Interactive Python 
- command shell for interactive computing
- IPython is what runs in Jupyter

We can use `?` to see infomation about Python objects:

Two `??` to see the source:

In [None]:
def example_func():
    """ 
    My docstring
    """
    print('')

example_func?

In [None]:
example_func??

## Built-in functions

The Python interpreter has a number of functions and types built into it that are always available - [see the documentation for a complete list](https://docs.python.org/3/library/functions.html)

- `len`
- `any`
- `all`
- `reversed`
- `range`

## dir()

See your current **name space** using the Python builtin `dir()`

In [None]:
dir()

We can also use `dir` to see what *methods* and *attributes* an instance an object has.

Let's look at the `dir` of a `float` object:

In [None]:
dir(float(64))

## Primitives - numbers

In [None]:
float(16)

In [None]:
int(16.0)

We can do exponentiation:

In [None]:
float(4)**20

In [None]:
float(4)**0.5

The modulo (`%`) operator gives us the remainder after division:

In [None]:
20 % 6

This can be used to check if a number is even:

In [None]:
20 % 2 == 0

Another use case for this is only printing something at certain frequencies when iterating (such as batches or epochs when training neural nets):

In [None]:
for i in range(20):
    if i % 3 == 0:
        print(i)

## Primitives - booleans

In [None]:
True

In [None]:
False

**Truthy, Falsy** 
- values which are evaluated to `True` or `False`

In [None]:
bool(100)

In [None]:
bool(0)

In [None]:
bool([0, 1])

In [None]:
bool(None)

In [None]:
True * True

In [None]:
True * False

In [None]:
False * False

In [None]:
if []:
    pass

## Conditionals

In [None]:
import random

c1 = random.randint(0, 1)
c2 = random.randint(0, 1)

if c1:
    print('c1')
    
elif c2:
    print('not c1, c2')
    
else:
    print('not c1, not c2')

## Comparisons

In [None]:
5 == 6

In [None]:
5 == 3 + 2

In [None]:
x = 1
y = 2

x == y  # ... x is equal to y
x != y  # ... x is not equal to y
x > y   # ... x is greater than y
x < y   # ... x is less than y
x >= y  # ... x is greater than or equal to y
x <= y  # ... x is less than or equal to y

## Logical operators

`and`, `or`, `not`

In [None]:
True and False

In [None]:
True or False

In [None]:
not True

In [None]:
not False

## Variables & objects

In Python (unlike other languages) there is a difference between **objects** and **variables**:
- object = the actual data in memory
- variable = a label that refers to an object

Objects have an identity, type and value.  Only the value changes over time.

In Python, variables **refer** to objects.  They are labels for objects - not the object themselves.
- one object can have many labels
- one label = only one object

Below we create two objects

In [None]:
first = [2, 4, 8]

second = [2, 4, 8]

We can use two different operators to compare these variables.

The `==` operator checks if the two objects have the same values:

In [None]:
first == second

The `is` operator checks whether both variables refer to the same object:

In [None]:
first is second

In [None]:
third = first

first == third

In [None]:
first is third

Under the hood Python is comparing the object's `id` - a unique value for each object:

In [None]:
id(first)

Most of the time we only care about comparing values, meaning

This behaviour can lead to strange effects:

In [None]:
third.append(16)

first

# Loops

### `for`

The `range` bulitin provides a convenient way to create an iterable: 

In [None]:
range?

In [None]:
for item in range(0, 3, 1):
    print(item)

You can control the start, stop & step:

In [None]:
for item in range(1, 6, 2):
    print(item)

### `while`

A common pattern is to use a condition to break out of a loop:

In [None]:
done = False

while not done:
    done = True

## Exercises

Write a program to print out:

```
*****                                                                  
  *                                                                    
  *                                                                    
  *                                                                    
  *                                                                    
  *                                                                    
  *  
```

It might be useful to know you can do

`'*' * 2 = '**'`

`'*' + ' ' = '* '`

Write a program to print:
```
1
22
333
4444
55555
666666
7777777
88888888
999999999
```

## Datetimes

Common in the workflow of the data scientist is working with datetimes.

### ISO 8601

A standard for formatting datetime strings:

`2019-09-23T17:45:18+00:00`

The bit after the `+` represents the offset from UTC 

`2019-09-23T17:45:18Z` is equivalent to the above (`Z` = Zulu = UTC)

## datetime

The `datetime` library offers an object (also called `datetime`) for handling dates in Python

In [None]:
from datetime import datetime

datetime?

In [None]:
datetime(2019, 1, 1)

Also useful is the `timedelta`:

In [None]:
from datetime import timedelta

timedelta?

These can be used together in intuitive ways:

In [None]:
datetime(2019, 1, 1) - timedelta(days=10)

In [None]:
datetime(2019, 1, 1) > datetime(2018, 1, 1)

Attributes on the `datetime` object show us the day, year etc:

In [None]:
datetime(2019, 1, 1).year

We can also use this object to get the current time:

In [None]:
datetime(2019, 1, 1).now()

We can use the `strftime` to print the datetime in a format we want ([codes for day, week are given here](http://strftime.org/)):

In [None]:
datetime(2019, 1, 1).strftime('%d.%m.%Y')

## Exercise

Create a list of datetimes on a 5 minute frequency between `2019-09-23T17:45:00+00:00` to `2019-09-25T07:05:00+00:00`

Run a **while** loop that:
- starts at 1989-09-09
- increments in 5 day increments
- stops when the date exceeds 1990-10-03
- prints the remaining time until the next 5 day point

Run a **for** loop that:
- starts at 1988-02-28
- iterates in 30 day increments
- prints the month when the month changes (only the month!)
- stops when the date reaches (or exceeds) 1989-09-09
- prints the remaining time until the next 30 day point