# Introduction to Python

Python is a general-purpose scripting / programming language that has a simple, easy-to-learn syntax. It has a great suite of libraries for analytics / machine learning hence its popularity among Data Scientists. 

In this tutorials we will introduce you to basic concepts in Python, just enough for you to be able to understand and be productive in the next sections.

This tutorial is adapted from  [Learn Python in Y Minutes](https://learnxinyminutes.com/docs/python/) to Jupyter notebook. 


### How to use Jupyter Notebook
In a Jupyter notebook, each code block can be executed independently, where the results of evaluating the code block is printed directly below it in the 'Out[n]' field.

To execute a code block, click on it and press `CTRL+Enter`.

To execute a code block and immediately move to the next code block, press `Shift+Enter`.

Try using `Shift-Enter` to execute every code blocks as you follow along this tutorial.

### Basics

In [1]:
# Single line comments start with a # symbol.

The hash (`#`) symbol denotes a comment. This means Python will not evaluate any text following the hash symbol.

In [2]:
# Python has a print function
print("I'm Python. Nice to meet you!")  # => I'm Python. Nice to meet you!

I'm Python. Nice to meet you!


In [3]:
# In Jupyter notebook, result of the last statement or variable 
# is automatically printed to the `Out` field.

"I'm Python. Nice to meet you!"  # => I'm Python. Nice to meet you!

"I'm Python. Nice to meet you!"

# 1. Primitive Datatypes and Operators

### Number Types

You have numbers, which can either be integers...

In [4]:
# Integer
3  # => 3

3

or floating-point (i.e. floats).

In [5]:
# Floats
3.2  # => 3.2

3.2

Floats are commonly used in mathematical calculations where it is important to store decimal points in a number.

### Number Operators

Math is what you would expect

In [6]:
1 + 1  # => 2

2

In [7]:
8 - 1  # => 7

7

In [8]:
10 * 2  # => 20

20

In [9]:
35 / 5  # => 7

7.0

Division is a bit tricky. It will automatically convert output type to float in order to keep the decimal places.

In [10]:
5 / 2  # => 2.5

2.5

You can have integer-division by using `//` instead, where the resulting type is rounded down and kept as int.

In [11]:
5 // 2 # => 2

2

WARNING: This behaviour is swapped around in in Python 2. 

In Python 3 (our version), `/` is normal float-division and `//` is integer-division.

In Python 2, `/` is integer-division and `//` is float-division.

In [12]:
# Modulo operation (remainder of)
7 % 3  # => 1

1

In [13]:
# Exponentiation (x to the yth power)
2 ** 4  # => 16

16

In [14]:
# Enforce precedence with parentheses
(1 + 3) * 2  # => 8

8

### Boolean Types and Operator

A boolean can only take one of two values: `True` or `False`.

In [15]:
True

True

In [16]:
False

False

Python reserves special keywords **`and`**, **`or`**, and **`not`** for boolean logic:

In [17]:
# Note these keywords are case-sensitive
True and False  # => False

False

**`and`** will only evaluate to **`True`** if both input values are **`True`**. If either input values are **`False`** then it will evaluate to **`False`**.

Try changing the above code block to **`True and True`** and press `CTRL+Enter` to re-run the block.

In [18]:
True or False  # => True

True

**`or`** will evaluate to **`True`** if either input values are **`True`**, i.e. only one input value needs to be **`True`**. Only if both input values are **`False`** then it will evaluate to **`False`**.

Try changing the above code block to **`False or False`** and press `CTRL+Enter` to re-run the block.

In [19]:
not True  # => False

False

In [20]:
not False  # => True

True

`not` simply flips **`True`** to **`False`**, and **`False`** to **`True`**.

A quick overview of boolean logic:

![BooleanLogic](assets/booleanOperators-VennDiagram.png)

In simple terms, 

- **`and`** checks whether both statements are **`True`**
- **`or`** checks whether either statements are **`True`**

### Strings

Strings are created with `"` or `'`

In [21]:
"This is a string."

'This is a string.'

In [22]:
'This is also a string.'

'This is also a string.'

In [23]:
""" 
Multiline strings can be written
using three "s, and are often used
as comments
"""

' \nMultiline strings can be written\nusing three "s, and are often used\nas comments\n'

In [24]:
# Strings can be added too!
"Hello " + "world!"  # => "Hello world!"

'Hello world!'

In [25]:
# ... or multiplied
"Hello" * 3  # => "HelloHelloHello"

'HelloHelloHello'

In [26]:
# You can find the length of a string
len("This is a string")  # => 16

16

#### String Formatting

You can build a new string from existing template strings using either the keyword `%` or the method `format`.

In [27]:
# String formatting with %
x = 'apples'
y = 'lemons'
z = "The items in the basket are %s and %s" % (x, y)

z

'The items in the basket are apples and lemons'

In [28]:
# A newer way to format strings is the `format` method.
# This method is the preferred way
"{} is a {}".format("This", "placeholder")

'This is a placeholder'

In [29]:
# You can also use keywords in format strings.
"{name} wants to eat {food}".format(name="Bob", food="lasagna")

'Bob wants to eat lasagna'

### None

Null type in Python is represented using the keyword `None`.

In [30]:
# None is an object
None  # => None

In [31]:
# Don't use the equality "==" symbol to compare objects to None
# Use "is" instead
"etc" is None  # => False

False

In [32]:
None is None  # => True

True

The 'is' operator tests for object identity. This isn't very useful when dealing with primitive values, but is very useful when dealing with objects.

# 2. Variables and Collections

In [33]:
# No need to declare variables before assigning to them.
some_var = 5  # Convention is to use lower_case_with_underscores
some_var  # => 5

5

In [34]:
# Accessing a previously unassigned variable is an exception.
# See Control Flow to learn more about exception handling.
some_other_var  # Raises a name error

NameError: name 'some_other_var' is not defined

## List (or array in other languages)

In [35]:
# Lists store sequences
li = []

li

[]

In [36]:
# You can start with a prefilled list
other_li = [4, 5, 6]

other_li

[4, 5, 6]

In [37]:
# Add stuff to the end of a list with append
li = []
li.append(1)  # li is now [1]
li.append(2)  # li is now [1, 2]
li.append(3)  # li is now [1, 2, 3]
li.append(4)  # li is now [1, 2, 3, 4]

li

[1, 2, 3, 4]

In [38]:
# Access a list like you would any array
li[0]  # => 1

1

Note that Python using 0-based indexing, i.e. sequence index starts with 0 instead of 1.

In [39]:
# Assign new values to indexes that have already been initialized with =
li[0] = 42
li[0]  # => 42

42

In [40]:
li # => [42, 2, 3, 4]

[42, 2, 3, 4]

In [41]:
li[0] = 1  # Note: setting it back to the original value
li[0]  # => 1

1

In [42]:
li # => [1, 2, 3, 4]

[1, 2, 3, 4]

In [43]:
# Look at the last element
li[-1]  # => 4

4

In [44]:
# Looking out of bounds is an IndexError
li[4]  # Raises an IndexError

IndexError: list index out of range

In [45]:
# You can look at ranges with slice syntax.
# li[x:y] means get elements at index x all through to index y-1
li[1:3]  # => [2, 3]

[2, 3]

In [46]:
# You can slice to the end of the list by leaving out the index following the colon
li[2:]  # => [3, 4]

[3, 4]

In [47]:
# Or slice from the beginning of the list by leaving out the index before the colon
li[:3]  # => [1, 2, 3]

[1, 2, 3]

In [48]:
# Check for existence in a list with "in"
1 in li  # => True

True

In [49]:
# Examine the length with "len()"
len(li)  # => 4

4

## Dictionary (associative array or hashmap in other languages)

In [50]:
# Dictionaries store mappings
empty_dict = {}

empty_dict

{}

In [51]:
# Here is a prefilled dictionary
filled_dict = {"one": 1, "two": 2, "three": 3}

filled_dict # note that dictionary ordering is not guaranteed

{'one': 1, 'three': 3, 'two': 2}

In [52]:
# Look up values with []
filled_dict["one"]  # => 1

1

In [53]:
# Get all keys as a list with "keys()"
filled_dict.keys()  # => dict_keys(['one', 'two', 'three'])

dict_keys(['one', 'two', 'three'])

Note - Dictionary key ordering is not necessarily guaranteed. Your results might not match this exactly 

In [54]:
# Get all values as a list with "values()"
filled_dict.values()  # => dict_values([1, 2, 3])

dict_values([1, 2, 3])

Note - Same as above regarding key ordering.

In [55]:
# Get all key-value pairs as a list of tuples with "items()"
filled_dict.items()  # => dict_items([('one', 1), ('two', 2), ('three', 3)])

dict_items([('one', 1), ('two', 2), ('three', 3)])

In [56]:
# Check for existence of keys in a dictionary with "in"
"one" in filled_dict  # => True

True

In [57]:
1 in filled_dict  # => False

False

In [58]:
# Looking up a non-existing key is a KeyError
filled_dict["four"]  # KeyError

KeyError: 'four'

In [59]:
# Use "get()" method to avoid the KeyError
filled_dict.get("one")  # => 1

1

In [60]:
filled_dict.get("four")  # => None

In [61]:
# The get method supports a default argument when the value is missing
filled_dict.get("one", 4)  # => 1

1

In [62]:
filled_dict.get("four", 4)  # => 4

4

In [63]:
# note that get doesn't set the value in the dictionary, hence
filled_dict.get("four") # is still => None

In [64]:
# set the value of a key with a syntax similar to lists
filled_dict["four"] = 4  # now, filled_dict["four"] => 4

filled_dict["four"]

4

# 3. Control Flow

In [65]:
# Let's just make a variable
some_var = 5

In [66]:
# Here is an if statement. Indentation is significant in python!
# prints "some_var is smaller than 10"
if some_var > 10:
    print("some_var is totally bigger than 10.")
elif some_var < 10:  # This elif clause is optional.
    print("some_var is smaller than 10.")
else:  # This is optional too.
    print("some_var is indeed 10.")

some_var is smaller than 10.


In [67]:
"""
For loops iterate over lists
prints:
    dog is a mammal
    cat is a mammal
    mouse is a mammal
"""
for animal in ["dog", "cat", "mouse"]:
    # You can use {} to interpolate formatted strings. (See above.)
    print("{} is a mammal".format(animal))

dog is a mammal
cat is a mammal
mouse is a mammal


In [68]:
"""
"range(number)" returns a list of numbers
from zero to the given number
prints:
    0
    1
    2
    3
"""
for i in range(4):
    print(i)

0
1
2
3


In [69]:
"""
"range(lower, upper)" returns a list of numbers
from the lower number to the upper number
prints:
    4
    5
    6
    7
"""
for i in range(4, 8):
    print(i)

4
5
6
7


In [70]:
"""
While loops go until a condition is no longer met.
prints:
    0
    1
    2
    3
"""
x = 0
while x < 4:
    print(x)
    x += 1  # Shorthand for x = x + 1

0
1
2
3


In [71]:
# Handle exceptions with a try/except block

try:
    # Use "raise" to raise an error
    raise IndexError("This is an index error")
except IndexError as e:
    pass  # Pass is just a no-op. Usually you would do recovery here.
except (TypeError, NameError):
    pass  # Multiple exceptions can be handled together, if required.
else:  # Optional clause to the try/except block. Must follow all except blocks
    print("All good!")  # Runs only if the code in try raises no exceptions
finally:  # Execute under all circumstances
    print("We can clean up resources here")

We can clean up resources here


# 4. Functions

In [72]:
# Use "def" to create new functions
def add(x, y):
    print("x is {} and y is {}".format(x, y))
    return x + y  # Return values with a return statement

In [73]:
# Calling functions with parameters
add(5, 6)  # => prints out "x is 5 and y is 6" and returns 11

x is 5 and y is 6


11

In [74]:
# Another way to call functions is with keyword arguments
add(y=6, x=5)  # Keyword arguments can arrive in any order.

x is 5 and y is 6


11

### Function Scope

In [75]:
x = 5

def set_x(num):
    # Local var x not the same as global variable x
    x = num  # => 43
    print(x)  # => 43

set_x(43)

43


# 5. Modules

In [76]:
# You can import modules
import math

In [77]:
math.sqrt(16)  # => 4.0

4.0

In [78]:
# You can get specific functions from a module
from math import ceil, floor

In [79]:
ceil(3.7)  # => 4

4

In [80]:
floor(3.7)  # => 3

3

In [81]:
# You can shorten module names
import math as m

In [82]:
math.sqrt(16) == m.sqrt(16)  # => True

True

# 6. Introduction to Numpy

Numpy is a widely-used linear algebra library for python. It enables us to do matrix or vector computations efficiently and effectively. 

Most of the machine-learning tutorials will be written using numpy. For this reason we will also give a basic introduction to Numpy.

### Basics

First let's import numpy, giving it a shorter alias `np`.

In [83]:
import numpy as np

Let’s create a python array and a Numpy array.

In [84]:
# python array
a_li = list(range(10000)) # create a list from 0 to 9999

# You can check the type of a variable using the function `type()`
type(a_li) # => list

list

We will not print out `a_li` as this is a very large list.

Numpy has a function called **`arange`** that is identical to Python's built-in **`range`**. The only difference is that it will return a numpy array instead of a Python list.

In [85]:
# numpy array
a_np = np.arange(10000)

a_np # => array([   0,    1,    2, ..., 9997, 9998, 9999])

array([   0,    1,    2, ..., 9997, 9998, 9999])

### Slicing

Slicing operations on numpy arrays work the same way as with Python list.

In [86]:
# get first 10 elements of a_li
a_li[:10] # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [87]:
# get first 10 elements of a_np
a_np[:10]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [88]:
# get last 10 elements of a_li
a_li[-10:] # => [9990, 9991, 9992, 9993, 9994, 9995, 9996, 9997, 9998, 9999]

[9990, 9991, 9992, 9993, 9994, 9995, 9996, 9997, 9998, 9999]

In [89]:
# get last 10 elements of a_np
a_np[-10:]

array([9990, 9991, 9992, 9993, 9994, 9995, 9996, 9997, 9998, 9999])

### Speed

Numpy operations on numpy arrays are much faster than equivalent built-in functions on Python lists.

In the next cells we use the magic function **`%timeit`** that comes automatically with Jupyter noteboook. **`%timeit`** will execute the code that follows it 100k times and return the average runtime. You can identify magic functions by the **`%`** symbol.

In [90]:
# get the sum of `a_li` using native Python `sum()` function
%timeit sum(a_li)

59.3 µs ± 380 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [91]:
# get the sum of `a_np` using specialised numpy `np.sum()` function
%timeit np.sum(a_np)

7.68 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


### From arrays to matrices

In [92]:
# Let's check the type of `a_np`
type(a_np)

numpy.ndarray

The type of `a_np` is `np.ndarray`. 

`ndarray` stands for *n*-dimensional array. This means a numpy array can have arbitrary number of dimensions. 

Let's try creating a 2-dimensional array, or more conventionally, a matrix.

In [93]:
b_np = np.array([[1,2,3],
                 [4,5,6]])

b_np

array([[1, 2, 3],
       [4, 5, 6]])

### Shapes and sizes

You can get the shape of the ndarray along each dimension by calling the property `shape` of the associated `np.ndarray`.

In [94]:
b_np.shape # => (2,3)

(2, 3)

This means `b_np` has 2 rows in the first dimension and 3 columns in the second dimension.

You can get the total number of elements in the ndarray using the property `size`.

In [95]:
b_np.size # => 6

6

### Reshape

You can create the same matrix as `b_np` using a combination of **`arange`** and **`reshape`**.

In [96]:
np.arange(6)

array([0, 1, 2, 3, 4, 5])

In [97]:
np.arange(6).reshape(2,3)

array([[0, 1, 2],
       [3, 4, 5]])

Note here that just like **`range`**, **`arange`** starts counting from 0, not 1.

**`reshape`** is a very powerful function that can convert ndarrays to different shapes.

You can even get it to infer the size of a certain dimension by passing `-1` to the dimension you don't want to explicitly specify.

In [98]:
np.arange(6).reshape(-1,3)

array([[0, 1, 2],
       [3, 4, 5]])

In [99]:
np.arange(6).reshape(-1,2)

array([[0, 1],
       [2, 3],
       [4, 5]])

In [100]:
np.arange(6).reshape(6,-1)

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5]])

### Transpose

You can transpose a matrix using **`np.transpose`** or **`np.ndarray.T`**.

In [101]:
b_np

array([[1, 2, 3],
       [4, 5, 6]])

In [102]:
# transpose `b_np`
np.transpose(b_np)

array([[1, 4],
       [2, 5],
       [3, 6]])

In [103]:
b_np.T

array([[1, 4],
       [2, 5],
       [3, 6]])

### Matrix Operations

Addition, Subtraction, Multiplication, and Division on numpy arrays are by default element-wise.

In [104]:
b_np + b_np

array([[ 2,  4,  6],
       [ 8, 10, 12]])

In [105]:
b_np - b_np

array([[0, 0, 0],
       [0, 0, 0]])

In [106]:
b_np * b_np

array([[ 1,  4,  9],
       [16, 25, 36]])

In [107]:
b_np / b_np

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

### Dot Product

For vector-vector multiplications, you need to use `np.dot`, which stands for dot-product.

In [108]:
# first create a 3 elements vector / array
c_np = np.array([1,2,3])

c_np

array([1, 2, 3])

In [109]:
# get shape of `c_np`
c_np.shape

(3,)

In [110]:
# calculate dot product between `c_np` and itself
np.dot(c_np, c_np)

14

Remember that the dot-product between two vectors is calculated by:
    
$$ \boldsymbol{a} \cdot \boldsymbol{b} = a_1 \cdot b_1 + a_2 \cdot b_2 + \cdots + a_n \cdot b_n $$

The two vectors need to have identical number of elements $n$.

In our case, the calculations was $1\cdot1 + 2\cdot2 + 3\cdot3 = 14$.

### Matrix Multiplication

As a quick linear-algebra refresher, you can matrix-multiply a $n \times x$ matrix with a $x \times m$ matrix to get a $n \times m$ matrix.

The size of the inner elements $x$ must be the same. 

For a quick overview of matrix-multiplication, see [this link](https://www.mathsisfun.com/algebra/matrix-multiplying.html).

There are several ways to do matrix multiplications in Numpy.

- `np.dot`
- `np.matmul`
- the `@` operator

When multiplying arrays of less than 2 dimensions, they are identical.

In [111]:
# calculate dot product between `b_np` and its transpose.
np.dot(b_np, b_np.T)

array([[14, 32],
       [32, 77]])

In [112]:
np.matmul(b_np, b_np.T)

array([[14, 32],
       [32, 77]])

In [113]:
b_np @ b_np.T

array([[14, 32],
       [32, 77]])

Because we are multiplying `b_np` (a $2 \times 3$ matrix) with its transpose (a $3 \times 2$ matrix), we get as output a $2 \times 2$ matrix.

In higher dimensions, you should always use `np.matmul` as `np.dot` may give unpredictable results.

The `@` operator is just syntactic sugar for `np.matmul`, i.e. it will evaluate to `np.matmul` under the hood.

### Ones and Zeros

You can initialise a matrix of ones using `np.ones()`, giving the function a tuple containing the matrix size as the argument.

In [114]:
# create a 3 x 4 matrix of ones
np.ones((3,4))

array([[ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.]])

In [115]:
# you can do the same with `np.zeros()`
np.zeros((3,4))

array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

In [116]:
# do a a matrix-vector multiplication using `np.dot()`
np.dot(b_np, np.ones((3,1)))

array([[  6.],
       [ 15.]])

### More Operations

Numpy comes with standard mathematic functions such as:

- summary statistics
    - **`sum`** 
    - **`mean`**
    - **`std`** for standard deviation
- element-wise identities
    - **`abs`** for absolute, 
    - **`log`** for logarithm
    - **`exp`** for exponentiation, 
- etc...

In [117]:
# Get the sum of the array [1,2,3]
np.sum(np.array([1,2,3])) # => 6

6

In [118]:
# Get the average of the array [1,2,3]
np.mean(np.array([1,2,3])) # => 2.0

2.0

In [119]:
# Get the standard deviation of the array [1,2,3]
np.std(np.array([1,2,3])) # => 0.81649658092772603

0.81649658092772603

In [120]:
# Get the absolute value of the array [-1,2,-3]
np.abs(np.array([-1,2,-3])) # => array([1, 2, 3])

array([1, 2, 3])

In [121]:
# Get the logarithm of the array [1,2,3]
np.log(np.array([1,2,3])) # => array([0., 0.69314718, 1.09861229])

array([ 0.        ,  0.69314718,  1.09861229])

In [122]:
# Get the exponentiation of the array [1,2,3]
np.exp(np.array([1,2,3])) # => array([2.71828183, 7.3890561 , 20.08553692])

array([  2.71828183,   7.3890561 ,  20.08553692])

... and that's it, folks! Have fun with the next section :)