# Python Basics 

Python is an easy to learn, powerful programming language. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for data science. 

## Hello world 

In [None]:
print("Hello Python world!")

That's it! This simplicity makes Python very quick to develop in. 

### Python 2 vs Python 3

There were two incompatible versions of Python in common use: Python 2 and 3. This notebook is written for Python 3. 

You should use Python 3. As of January 2020 Python 2 will be in EOL (End Of Life) status and receive no further official support. After that date, there will be no further updates nor bugfixes. Since this end-of-life date has been planned for nearly a decade (the first end-of-life date was slated to happen in 2014, and was pushed back to 2020), and nearly all popular libraries have already ported their code, Python 2.x is well on its way to obsolescence. As such, we can only recommend learning and teaching Python 3.

Be aware - Python 3 broke backward compatibility, and much Python 2 code does not run un-modified on Python 3. 

In [None]:
# python 2 code 
print "Hello Python world!"

### Basic data types

#### Scalars

In [None]:
a = 1

`a` is an integer. 

In [None]:
type(a)  # `type` is a useful in-built function.

In [None]:
b = 1.0

`b` is a float. 

In [None]:
type(b)

In [None]:
c = "faculty"

`c` is a string. 

In [None]:
type(c)

Python is dynamically typed. Dynamic typing means that runtime objects (values) have a type, as opposed to static typing where variables have a type.

In [None]:
# This is fine!
a = 1
a = "faculty"

And is (mostly) strongly typed. Strong typing means that the type of a value doesn't change in unexpected ways. A string containing only digits doesn't magically become a number. Every change of type requires an explicit conversion. 

In [None]:
a = 1
b = "2"
a + b

In [None]:
a = 1
b = 2.0
a + b

#### Sequences

In [None]:
l = [1, 2, 3, 4]
l

`l` is a list - an ordered collection of elements. It's probably the most common sequence type in Python. Elements are accessed by integer indexing:

In [None]:
l[0]

In [None]:
type(l)

In [None]:
t = 1, 2, 3, 4
t

`t` is a tuple. Although it is not necessary, it is conventional to enclose tuples in parentheses. It's like a list, but the elements are immutable.   

In [None]:
t[0] = 10

A tuple can be used to group any number of items into a single compound value.

Python has a very powerful tuple assignment feature that allows a tuple of variables on the left of an assignment to be assigned values from a tuple on the right of the assignment (tuple packing/unpacking).

In [None]:
t = (1, 2, 3, 4)  # tuple packing
(one, two, three, four) = t  # tuple unpacking

In [None]:
s = set([1, 2, 3, 4, 4, 4])
s

`s` is a set. It's an unordered collection of unique elements. 

In [None]:
type(s)

In [None]:
d = {"a": 1, "b": 2}
d

`d` is a dictionary. It maps keys to values.

In [None]:
d["a"]

### Control flow 

Python has the usual set of control flow statements (for, while, if, etc.)

In [None]:
l = []  # empty list
for i in range(10):
    l.append(i)
l

There is no 'end for' statement or curly brackets - instead, python uses whitespace to denote blocks. Four spaces is conventional, so its best to stick to that. 

In [None]:
animals = ["cat", "dog", "snake"]

`in` can be used to loop through any iterable. 

In [None]:
# 'pythonic'
for animal in animals:
    print(animal)

In [None]:
# 'un-pythonic'
for i in range(len(animals)):
    print(animals[i])

Because using a for loop to append to a list is a very common operation, python has a <b>'list comprehension'</b> syntax to accomplish the same thing more consisely. 

In [None]:
l = [i for i in range(10)]
l

It's also faster!

In [None]:
%timeit l = [i for i in range(10)]

In [None]:
%%timeit
l2 = []
for i in range(10):
    l2.append(i)

It is worth noting that appending and fetching elements from a list is quite slow, $O(n)$, whereas adding and fetching elements from a dictionary is much faster, $O(1)$.

Compare adding an element to a dictionary here with the above:

In [None]:
%%timeit
l3 = {}
for i in range(10):
    l3[i] = i

### Functions  

Functions are denoted by the word `def`. 

In [None]:
def f(x):
    """
    This is a function that returns its input 
    """
    return x


print(f(10))

Python also has classes, but we won't talk about these right now.

### Docstrings

Docstrings are useful for keeping track of what functions do and what their inputs and outputs are

They are essential if you're sharing code!

You can access them in a notebook using `<SHIFT>-<TAB>` or by running `function_name??` in a cell

In [None]:
import pandas as pd

pd.DataFrame.sample??

In [None]:
pd.DataFrame.sample()  # Use <SHIFT>-<TAB>

In [None]:
import numpy as np


def probsample(ids, buy_probabilities, select_num, power=1):
    """
    Probabilistic sampling of customers, weighted according to the
    buy_probabilities argument.

    Parameters
    ----------
    ids: array-like
        Specifies the customer ids to be sampled.

    buy_probabilities: array-like
        Predicted probability to purchase generated by the model.

    select_num: int
        The number of customers to sample.

    power: int
        Used to increase the weighting of the sampling process on the
        model probabilities.

    Returns
    -------
    ids: array-like
        Specifies the customer ids selected by probabilistic sampling.
    """

    normalised = np.power(buy_probabilities, power)
    normalised = normalised / np.sum(normalised)
    ids = np.random.choice(ids, select_num, p=normalised, replace=False)

    return ids

### Packages

Python has a rich and versatile standard library which is immediately available (sys, os, time, shutil, glob, re, random, functools, itertools).  This is sometimes refered to as __batteries included__. 

In addition, Python has a bunch of extremely useful third-party packages for doing scientific analysis. In Python there is package to do everything. This is a key reason for the rapid adoption of Python in data science. Many are available by default on the Faculty's Platform. If a package isn't available you can install it using `conda` or `pip`. 

In [None]:
# available packages
!conda list

In the remainder of this session we focus on: 

#### Numpy 

Numpy is the fundamental library for data science. Numpy gives us *fast* and *powerful* tools for numerical operations on large, multi-dimensional arrays of data. Which as you can image is useful for much of data science!

#### Pandas 

Pandas is a library built on top of Numpy which makes analysing messy, real-world datasets more intuitive. Pandas adds more functionality and a wonderfully useful two-dimensional data structure known as a `DataFrame`.

Knowing how to use these libraries will make the slog of understanding your data and getting it into a useable state much easier. 