# Contents

[1. Setting up your programming environment](#settingup)

[2. Variables, control flow and functions](#variables)

[3. Data structures](#datastructures)

# 1. Setting up your programming environment <a class="anchor" id="settingup"/>

## 1.1 Jupyter Setup

Install Jupyter Lab using the following command.
                
    $ pip3 install jupyterlab
Run Jupyter Lab using the following command.
    
    $ jupyter lab
The interface allows you to create individual .py files, or entire notebooks that contain cells. Each cell is a small set of commands to execute and view the output of inline. A cell can contain code or markdown.

## 1.2 Python Basics

Python relies on whitespace indentation, and comments are given using the # sign. Code blocks are initiated with a colon, after which the contents of the block are indented one level.


In [1]:
# This is a simple Python script

array = [1,2,3,4,5,6]
pivot = 3
greater = []
less = []

for x in array:
    if x < pivot:
        less.append(x)
else:
    greater.append(x)
    
    
print(less)
print(greater)

[1, 2]
[6]


Everything in Python is an **object**, making methods available on e.g. variables, strings, data struc- tured, modules or even functions. In a notebook, pressing “Tab” will expose the available methods.
Python assigns variable by reference, i.e. it uses binding. This means that when a variable $ a $ is assigned to another variable $ b $, changes to a will also reflect in b.
Python is strongly typed but will make implicit type conversions when obvious. To find the type of a variable, use the following function.

In [2]:
a = 15.4

print(isinstance(a, int))

print(isinstance(a, (int, float)))

print(type(a))

False
True
<class 'float'>


This returns true if the provided variable is of the given type. In the second example, a tuple is used, where the function checks if the variable is of any type provided in the tuple.

**Duck typing** refers to checking if an object has certain behaviours, regardless of its actual type. You can check if an object is iterable as follows.

In [3]:
def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError:
        return False
    
isiterable('a string')

True

Python can import other Python files as modules, and even use variables from those files with new names.

In [4]:
# Import an entire python module
import numpy as nm
print(array)

# Import specific objects
from numpy import array
print(array)

[1, 2, 3, 4, 5, 6]
<built-in function array>


Python supports all the standard arithemtic and logical operators. One operator worth highlighting is the is operator.

In [5]:
a = ["Hello","Goodbye"]
b = ["Hello","Goodbye"]

# Returns true if both variables are referencing the same object
print(a is b)
print(a is not b)

# NOTE: These are not the same as
print(a == b)
print(a != b)

False
True
True
False


The standard scalar types are `int`, `float`, `str`, `bytes`, `bool`, and `None`.

While strings can be written with single or double quotes, in Python we can use triple quotes for strings spanning multiple lines.

In [6]:
  c = '''
        This is a multi-line
        string that has a lot
        of text in it
        '''
    
len(c)

82

Python string literals are immutable, you cannot modify a string directly using assignment to an index of the string. They do however behave like lists and can be iterated on.

 Strings also support a Python feature called templating, where templates are inserted into a string and then filled using the ``format`` method.

In [7]:
# .2f represents a 2-decimal float, s is a string and d is an integer

template = "{0:.2f} {1:s} are worth US${2:d}"

template.format(4.5560, 'Argentine Pesos', 1)

'4.56 Argentine Pesos are worth US$1'

# 2. Variables, Control Flow and Functions <a class="anchor" id="variables" />

## 2.1 Control Flow

### Conditionals

In [8]:
x = -5

if x < 0:
    print("Smaller")
elif x > 0:
    print("Larger")
else:
    pass

# Ternary Operator that is equivalent to the above
value = "Smaller" if x < 0 else "Larger"

Smaller


The `pass` statement continues to the next statement without executing anything. It is similar to a `continue` statement in other languages. The ternary operator is a more concise syntax to express a simple conditional statement.

### Loops

`for` loops always use an iterator and act like a range-based loop. `while` loops use a conditional statement and execute as long as the statement is true.

In [9]:
# for loop
some_list = [1,2,3,4,5]
for x in some_list:
    print(x)
    
# while loop
i = 0
while i < 5:
    print(some_list[i])
    i += 1


1
2
3
4
5
1
2
3
4
5


The `range` function returns an iterator of evenly spaced integers.

In [10]:
# Returns a list [0, 1, 2, 3, 4, 5]

list(range(6))

[0, 1, 2, 3, 4, 5]

Most [operators] in Python map to a function.

[operators]: https://docs.python.org/3.5/library/operator.html?highlight=operators#mapping-operators-to-functions

## 2.2 Basic Data Structures

### 2.2.1 Tuples

A tuple is a fixed-length, immutable sequence of Python objects. Tuples can also be concatenated to create longer tuples. Functions in Python can return multiple values using tuples.

In [11]:
tup = 1,2,3
nested_tup = (1,2,3),(4,5)

print(tup)
print(nested_tup)

print(tuple([1,2,3]))
print(tuple("Hello"))

print(tuple("Hello")[0])

(1, 2, 3)
((1, 2, 3), (4, 5))
(1, 2, 3)
('H', 'e', 'l', 'l', 'o')
H


Tuples can be __unpacked__ in an assignment operation.

In [12]:
tup = 4,5,6

a,b,c = tup

print("b is", b)

# Swap two variables using unpacking
a, b = b, a

print("a is", a,", and b is", b)

b is 5
a is 5 , and b is 4


To iterate on tuples, they need to be a sequence of tuples or part of a list.

In [13]:
tup = [(1,2,3),(4,5,6),(7,8,9)]
for a, b, c in tup:
    print(a, b, c)

1 2 3
4 5 6
7 8 9


### 2.2.2 Lists

List are variable-length data structures and contents can be modified in-place. List indices can be addressed using ranges. These ranges are including the first index and excluding the last index.

In [14]:
mylist = [1,2,3,4,5]
print(mylist[1:3])

[2, 3]


The `list` function can create an iterable object from something else.

In [15]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

You can append, insert, pop and remove items.

In [16]:
mylist = [1,2,3,4,5]

mylist.append(6)
print(mylist)

mylist.insert(1, 'hello')
print(mylist)

mylist.pop(1)
print(mylist)

mylist.remove(2)
print(mylist)

[1, 2, 3, 4, 5, 6]
[1, 'hello', 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
[1, 3, 4, 5, 6]


Lists can be sorted, binary-searched, sliced and concatenated.

In [17]:
mylist.sort() # sorts in place
print(mylist)

import bisect
bisect.bisect(mylist,4) # requires a sorted, ascending list

print(mylist[3:4])
print(mylist[-2:])

[1, 3, 4, 5, 6]
[5]
[5, 6]


### 2.2.3 Sequence Functions

#### enumerate

In [18]:
# Keep track of the index in a loop

for i, value in enumerate(mylist):
    print(i, value)

0 1
1 3
2 4
3 5
4 6


#### zip

In [19]:
seq1 = ['foo', 'bar', 'baz']
seq2 = [1,2,3]

list(zip(seq1,seq2))

[('foo', 1), ('bar', 2), ('baz', 3)]

### 2.2.4 dict

In [20]:
mydict = {'name': 'Arjun', 'age': 32}

print(mydict['name'])

print('name' in mydict)

print(list(mydict.keys()))

Arjun
True
['name', 'age']


### 2.2.5 set 

In [21]:
myset = set([1,1,2,2,2,0,-1,5,5])
myset

{-1, 0, 1, 2, 5}

All set theory operations are available as functions to operate on sets.

### 2.2.6 Comprehensions

Comprehensions allow filtering of a list and return a new list with the filtered elements contained. The syntax is enclosed in square brackets.

In [22]:
[x for x in mylist if x > 3]

# Do something with x before appending to the new list

[x**(1/2) for x in mylist if x > 2]

[1.7320508075688772, 2.0, 2.23606797749979, 2.449489742783178]

## 2.3 Functions

Functions are declared using the `def` keyword and return a value (or a tuple) using the `return` keyword.

In [23]:
def myFunc(arg):
    return (arg, arg+1)

a, b = myFunc(1) # Unpack the return tuple into variables

print(a + b)

3


Functions are __objects__, so they can be used like variables and arguments in loops and other functions. For example, you can abstract a list of functions to apply to a certain object in a separate list, then iterate over that list in a loop to apply all functions to a piece of data.

In [24]:
clean_ops = [str.title, str.strip]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

clean_strings(["maChine learning", "data sCiEnce "], clean_ops)

['Machine Learning', 'Data Science']

The [map](https://docs.python.org/3/library/functions.html#map) function can apply a function to an iterator and accepts a function as its first argument. This is another example of Python functions being treated as objects.


### 2.3.1 Lambda functions

Anonymous or "lambda" functions can be created and stored in variables. These can be passed as arguments to other functions, if the function accepts another function as an argument.

In [25]:
# Calculate squares
nums = range(10)
squareit = lambda x: x**2

list(map(squareit, nums))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

### 2.3.2 Generators

Generators are a powerful way to create an iterable object as a result of a function. Instead of returning a completed list and returning that, generators allow us to calculate just one list item in the function and return the results "lazily" as a list once accessed.

In [26]:
# Using the 'yield' keyword

def squares(n):
    for i in range(n+1):
        yield i**2
        
        
sum(squares(10))

385

In [27]:
# Using the generator shorthand
squares = (num**2 for num in range(11))
sum(squares)

385

In [28]:
# Combining with Lambda functions
squareIt = lambda num: (num**2 for num in range(11))
sum(squareIt(10))

385

### 2.3.3 Exception Handling

Python provides exception handling with `try-except-finally` blocks. Speficic error types can be surpressed for `except`, with multiple error types provided as a tuple.

In [29]:
def try_float(float):
    try:
        return float(float)
    except (TypeError, ValueError):
        return float
    finally:
        print("This always executes before returning.")
        
try_float("Hello")

This always executes before returning.


'Hello'

## 2.4 Handling Files

In [30]:
path = 'sample-text-file.txt'

f = open(path)

lines = [x.rstrip() for x in f]

f.close()

lines

['Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,',
 'quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.',
 'Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.',
 'Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.']

In [31]:

wordLines = [x.split(" ") for x in lines]
dict = {}
for line in wordLines:
    for word in line:
        if word in dict.keys():
            dict[word] += 1
        else:
            dict[word] = 1
            
dict
    

{'Lorem': 1,
 'ipsum': 1,
 'dolor': 2,
 'sit': 1,
 'amet,': 1,
 'consectetur': 1,
 'adipiscing': 1,
 'elit,': 1,
 'sed': 1,
 'do': 1,
 'eiusmod': 1,
 'tempor': 1,
 'incididunt': 1,
 'ut': 2,
 'labore': 1,
 'et': 1,
 'dolore': 2,
 'magna': 1,
 'aliqua.': 1,
 'Ut': 1,
 'enim': 1,
 'ad': 1,
 'minim': 1,
 'veniam,': 1,
 'quis': 1,
 'nostrud': 1,
 'exercitation': 1,
 'ullamco': 1,
 'laboris': 1,
 'nisi': 1,
 'aliquip': 1,
 'ex': 1,
 'ea': 1,
 'commodo': 1,
 'consequat.': 1,
 'Duis': 1,
 'aute': 1,
 'irure': 1,
 'in': 3,
 'reprehenderit': 1,
 'voluptate': 1,
 'velit': 1,
 'esse': 1,
 'cillum': 1,
 'eu': 1,
 'fugiat': 1,
 'nulla': 1,
 'pariatur.': 1,
 'Excepteur': 1,
 'sint': 1,
 'occaecat': 1,
 'cupidatat': 1,
 'non': 1,
 'proident,': 1,
 'sunt': 1,
 'culpa': 1,
 'qui': 1,
 'officia': 1,
 'deserunt': 1,
 'mollit': 1,
 'anim': 1,
 'id': 1,
 'est': 1,
 'laborum.': 1}

A more succint way of accessing a file and its contents is shown below.

In [32]:
with open('sample-text-file.txt') as f:
    words = f.readline().split(' ')
words

['Lorem',
 'ipsum',
 'dolor',
 'sit',
 'amet,',
 'consectetur',
 'adipiscing',
 'elit,',
 'sed',
 'do',
 'eiusmod',
 'tempor',
 'incididunt',
 'ut',
 'labore',
 'et',
 'dolore',
 'magna',
 'aliqua.',
 'Ut',
 'enim',
 'ad',
 'minim',
 'veniam,',
 '\n']

### Important modules

The two most important modules for data science currently are `scipy` and `numpy`.

In [33]:
import numpy as np
x = np.array([1,2,3,4,5])
print(x)
print(np.mean(x))

[1 2 3 4 5]
3.0


In [34]:
from scipy import stats
stats.describe(x)

DescribeResult(nobs=5, minmax=(1, 5), mean=3.0, variance=2.5, skewness=0.0, kurtosis=-1.3)

# 3. Data Structures

## 3.1 NumPy

NumPy provides a library for numerical computing. It is much faster than built-in python operations in many cases as it is written in C, and Python allows easy wrapping low-level language functions and using them within Python. It therefore uses less memory and is faster in accessing this memory since C doesn't need to deal with type checking.

NumPy can perform complex operations on arrays __without the need for loops__.

In [35]:
import numpy as np

my_arr = np.arange(1000000)
my_list = list(range(1000000))

%time for _ in range(10): my_arr2 = my_arr*2
    
%time for _ in range(10): my_list = [x*2 for x in my_list]

CPU times: user 11.8 ms, sys: 2.85 ms, total: 14.6 ms
Wall time: 14.7 ms
CPU times: user 493 ms, sys: 110 ms, total: 603 ms
Wall time: 604 ms


### 3.1.1 NumPy `ndarray`

NumPy provides `ndarray`, a fast, flexible data structure for large datasets. It allows us to operate on whole blocks of data with syntax as if they were scalar elements.

In [36]:
# Create an ndarray of random numbers
data = np.random.randn(2,3)

# Multiply each array value with 10
data * 10


array([[20.48917648, 21.32067442, -4.08299396],
       [-2.11159852, -1.17030544,  2.59444752]])

In [37]:
# Add each array value of two arrays
data + data

array([[ 4.0978353 ,  4.26413488, -0.81659879],
       [-0.4223197 , -0.23406109,  0.5188895 ]])

All elements of an `ndarray` must be of the same type. We can see the _shape_ and _type_ of the array.

In [38]:
data.dtype

dtype('float64')

In [39]:
data.shape

(2, 3)

In [40]:
data.ndim

2

Any sequence like object can be converted into a NumPy array using the `array()` function.

In [41]:
list = [[1,2,3,4],[5,6,7,8]]

nparr = np.array(list)

nparr

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Accessing an n-dimensional array is similar to matrix notation, providing the row and column indices as arguments.

In [42]:
nparr[0][1]

# or 

nparr[0,1] 

2

## 3.2 Pandas

Pandas provides additional data structures to handle large datasets in tabular form. The most important structures are `Series` and `Dataframe`.

### 3.2.1 Series 

A series is a one-dimensional array-like object, similar to a `dict` but with easier handling of keys and values.

In [43]:
import pandas as pd

obj = pd.Series([4,7,-5,3])
obj

0    4
1    7
2   -5
3    3
dtype: int64

In [44]:
obj.values

array([ 4,  7, -5,  3])

In [45]:
obj = pd.Series([4,7,-5,3], index=['d','b','a','c'])
obj

d    4
b    7
a   -5
c    3
dtype: int64

In [46]:
obj['b']

7

In [47]:
obj[obj>0]

d    4
b    7
c    3
dtype: int64

In [48]:
obj*2

d     8
b    14
a   -10
c     6
dtype: int64

In [49]:
'a' in obj

True

Any `dict` can be passed to `Series()` to be converted to a Series.

### 3.2.2 DataFrame

DataFrames are rectangular tables of data with an ordered collection of columns. It has both a row and column index. It is commonly created from a `dict` where the keys are the column headings and the values are lists of row entries for that specific column. 

In [50]:
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
        'year': [2000, 2001, 2002, 2001, 2002, 2003],
        'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]
       }
frame = pd.DataFrame(data)
frame

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


In [51]:
# Customize column ordering and row indices, and construct new (empty) columns
frame = pd.DataFrame(data,  columns=['year', 'state', 'pop'], index=['One', 'Two', 'Three','Four','Five','Six'])
frame

Unnamed: 0,year,state,pop
One,2000,Ohio,1.5
Two,2001,Ohio,1.7
Three,2002,Ohio,3.6
Four,2001,Nevada,2.4
Five,2002,Nevada,2.9
Six,2003,Nevada,3.2


Each row can be accessed as an object. Each column can be retrieved as a Series.

In [52]:
# Access a single row by index
frame.loc['One']

year     2000
state    Ohio
pop       1.5
Name: One, dtype: object

In [53]:
# Access a column
frame.year

One      2000
Two      2001
Three    2002
Four     2001
Five     2002
Six      2003
Name: year, dtype: int64

In [54]:
# Assign values to a column. If it doesn't exist, it will be created.
frame['debt'] = np.arange(6.)
frame

Unnamed: 0,year,state,pop,debt
One,2000,Ohio,1.5,0.0
Two,2001,Ohio,1.7,1.0
Three,2002,Ohio,3.6,2.0
Four,2001,Nevada,2.4,3.0
Five,2002,Nevada,2.9,4.0
Six,2003,Nevada,3.2,5.0
