# Lecture 1: The fundamentals of Python and Jupyter Notebooks

*by Eduard Silantyev* 

Python is an interpreted high-level programming language for general-purpose programming, created by Guido van Rossum and first released in 1991.

*Why is Python a de-facto tool of choice of data scientists?* 

Please give **three** reasons / intuitions.

The research environment for the workshop is powered by Jupyter Notebooks, which allow one to perform a great deal of data wrangling, data cleaning and data analysis. This lecture will demonstrate the power of Python in combination with Jupyter notebooks to solve common problems that we come across when analysing data. 

This lecture introduces Python and Jupyter Notebooks as craftsmanship tools of a data scientist.

## Virtual Environment

For this course you are required to have Anaconda 3 distribution installed. If you have problems with installation contact one of instructos.

## Cell Types

As you can see, each cell can be either code or text. To select between them, choose from the 'Cell Type' dropdown menu on the top left.

## Executing a Command

A code cell will be evaluated when you press play, or when you press the shortcut, Shift + Enter. Evaluating a cell evaluates each line of code in sequence, and prints the results of the last line below the cell.

In [1]:
a = [5,6,7,8,9]

In [2]:
b = "Hello"
d = ['aa', 32434, True]
d.remove('aa')

Sometimes there is no result to be printed, as is the case with assignment.

In [9]:
X = 2

Remember that only the result from the last line is printed.

In [10]:
X

2

In [11]:
2 + 2
3 + 7

10

However, you can print whichever lines you want using the `print` statement.

In [12]:
print(2 + 2)
3 + 3

4


6

## Knowing When a Cell is Running

While a cell is running, a `[*]` will display on the left. When a cell has yet to be executed, `[ ]` will display. When it has been run, a number will display indicating the order in which it was run during the execution of the notebook `[5]`. Try on this cell and note it happening.

In [13]:
#Take some time to run something
a = 0
for i in range(10000000):
    a = a + i
a

49999995000000

## Importing Libraries

The vast majority of the time, you'll want to use functions from pre-built libraries. Here we import `numpy` and `pandas`, the two most common and useful libraries in quant finance and trading. We recommend copying this import statement to every new notebook.

Notice that you can rename libraries to whatever you want after importing. The `as` statement allows this. Here we use `np` and `pd` as aliases for `numpy` and `pandas`. This is a very common aliasing and will be found in most code snippets around the web. The point behind this is to allow you to type fewer characters when you are frequently accessing these libraries.

In [14]:
import numpy as np
import pandas as pd

## Tab Autocomplete

Pressing tab will give you a list of Jupyter's best guesses for what you might want to type next. This is incredibly valuable and will save you a lot of time. If there is only one possible option for what you could type next, Jupyter will fill that in for you. Try pressing tab very frequently, it will seldom fill in anything you don't want, as if there is ambiguity a list will be shown. This is a great way to see what functions are available in a library.

Try placing your cursor after the `.` and pressing tab. Further, you can view a method signature by pressing shift + tab.

In [15]:
np.random.

SyntaxError: invalid syntax (<ipython-input-15-1a778a4e80a5>, line 1)

## Getting Documentation Help

Placing a `?` after a function or a module and executing that line of code will give you the documentation Jupyter has has for that function. It's often best to do this in a new cell, as you avoid re-executing other code and running into bugs.

In [16]:
pd.DataFrame?

## Introduction to Python

This section addresses the main features and functionalities of Python in the domain of Data Science. As data scientists, we care a lot about data structures that let us store large collections of data. Python API has a number of such data structures. Further libraries such as `pandas` define very convinient ways of interacting with data.

# Variables

The basic variable types that we will cover in this section are `integer`s, `float`s, `boolean`s, and `string`s. 

## Numerics

Variables provide names for values in programming. If you want to save a value for later or repeated use, you give the value a name, storing the contents in a variable. Variables in programming work in a fundamentally similar way to variables in algebra, but in Python they can take on various different data types. The concept of dynamic typing, also known as **duck typing** means that unlike many object-oriented languages like Java and C++, Python allows variables to be coerced to objects of different types.   

An `integer` in programming is the same as in mathematics, a round number with no values after the decimal point. We use the built-in `print` function here to display the values of our variables as well as their types!

In [40]:
my_int= 42
print(my_int, type(my_int))

42 <class 'int'>


Variables assignment is expressed via equals `=` sign, irrespective of the type of the variables that we are assigning. Variables are case sensitive, so make sure that you match the variable names exactly when using them after assignment:

In [41]:
print(My_int)

NameError: name 'My_int' is not defined

A `float` is a type we prescribe to floating point or, in mathematics, real numbers. `float` number type is used throughout data science to represent and do arithmetic operations on numerical values. To define a float we need to include a decimal point `.` or explicitly force the number the number to be a float by using `float()` function:

In [45]:
my_float = 42.
print(my_float, type(my_float))
my_float = float(my_int)
print(my_float, type(my_float))

42.0 <class 'float'>
42.0 <class 'float'>


We can use `int()` function to coerce a `float` variable back to `int` as follows (note that any digits after decimal point will be truncated): 

In [49]:
my_int = int(65.23)
print(my_int, type(my_int))

65 <class 'int'>


## Booleans

`boolean` is a binary variable type, that can either be `True` or `False`. In later sections we will see how significant `booleans` are in programming. Here is some basic usage:   

In [53]:
my_bool = True
print(my_bool, type(my_bool))

True <class 'bool'>


## Strings 

`string` type allows us, programmers, define text variables that we can do various operations on. In order to define a `string`, you will use single `'` or double `"` quotation marks. In data science `string` type is essential when working with textual data.  

In [54]:
my_str = "foo"
print(my_str, type(my_str))

foo <class 'str'>


Python API defines a `format()` method that allows us to format strings to our liking:

In [55]:
sentence = "Hello. My name is {}"
name = "Ed"
sentence = sentence.format(name)
print(sentence)

Hello. My name is Ed


You can also access individual `string` characters via `[]` indexing, which is, to the most part, identical syntax to that of accessing `list` elements as we shall see in the next sections. A point to remember for now is that indexing in Python starts from `0`:

In [59]:
print(sentence[0])
print(sentence[7])

H
M


We will see more ways to slice and dice strings when we cover `list` type.

## Basic Math and `math` library

Python API ships with standard arithmetic operators:  

In [62]:
print('Addition: ', 5+2)
print('Subtraction: ', 5-2)
print('Multiplication: ', 5*2)
print('Division: ', 5/2)
print('Exponentataion: ', 5**2)
print('Modulo: ', 5%2)

Addition:  7
Subtraction:  3
Multiplication:  10
Division:  2.5
Exponentataion:  25
Modulo:  1


In addition, we can perform an array of other common operations on numerics:  

In [68]:
print('Absolute value: ', abs(-5))
print('Rounding: ', round(3.14))
print('Maximum value: ', max(3,2,8,10,2,5))
print('Minimum value: ', min(3,2,8,10,2,5))

Absolute value:  5
Rounding:  3
Maximum value:  10
Minimum value:  2


Additionally, `math` library provides us with many utilities a data scientist would find handy: 

In [69]:
import math

We can start make use of these utilities right out of the box:

In [75]:
print('Euler: ', math.e)
print('Pi: ', math.pi)

Euler:  2.718281828459045
Pi:  3.141592653589793


# Collections

## Lists

Lists are arguably the most used data structure in python. Its core function is to allow storage and access of various elements. Financial data in particular is often represented as time-series, which are, at theier core, collections of observed values over time. To define a `list` we use square brackets `[]`:

In [77]:
my_list = [1,5,6,3]
print(my_list, type(my_list))

[1, 5, 6, 3] <class 'list'>


Python allows us to combine elements of different types into the same `list`:

In [78]:
my_list = [1, "hello world", True, math.pi]
print(my_list)

[1, 'hello world', True, 3.141592653589793]


Accessing elements of a `list` is performed using `[]`. Rememeber that index of the first element of the list is `0`, which means that in order to access *n*th element of a `list` you have to pass *n-1* as an index:

In [79]:
print(my_list[0])
print(my_list[1])
print(my_list[2])
print(my_list[3])

1
hello world
True
3.141592653589793


You may also access elements from the end of an array by using negative indexing:

In [80]:
print(my_list[-1])
print(my_list[-2])
print(my_list[-3])
print(my_list[-4])

3.141592653589793
True
hello world
1


We may set an element in a `list` to a new value:

In [81]:
my_list[-1] = 4
print(my_list)

[1, 'hello world', True, 4]


`list`s may also be sliced and diced in many different ways. We can select a range from the list:

In [90]:
my_list = ['problem','worthy','of','attack','proves','its','worth','by','fighting','back']
print(my_list[3:6])

['attack', 'proves', 'its']


We may also select ranges without lower / upper boounds:

In [89]:
print(my_list[3:])
print(my_list[:5])

['attack', 'proves', 'its', 'worth', 'by', 'fighting', 'back']
['problem', 'worthy', 'of', 'attack', 'proves']


You can select a step:

In [91]:
my_list[::2]

['problem', 'of', 'proves', 'worth', 'fighting']

Reverse the order by setting negative step size:

In [94]:
my_list[::-1]

['back',
 'fighting',
 'by',
 'worth',
 'its',
 'proves',
 'attack',
 'of',
 'worthy',
 'problem']

And combine these slicers in various ways:

In [93]:
my_list[2:10:3]

['of', 'its', 'fighting']

Python gives us a few handy funcitons to generate lists of common spec, such as ordered numbers. We utilise `range()` numbers to do just that, just don't forget to wrap with a `list()` coercer to get an actual list back instead of `range`:

In [105]:
range(10)

range(0, 10)

In [98]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Conviniently, we can define a step as well:

In [119]:
list(range(0,10,2))

[0, 2, 4, 6, 8]

We can add elements to the end of the list via `append()` method:

In [129]:
my_list = list(range(0,10))
my_list.append(25)
my_list.append(25)
print(my_list)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 25, 25]


We may remove specific elements by calling `remove()` and supplying it with the value of an element we'd like to remove. Note that only the first instance of an element will be removed: 

In [130]:
my_list.remove(25)
my_list.remove(4)
print(my_list)

[0, 1, 2, 3, 5, 6, 7, 8, 9, 25]


Python lists are essential for many aspects of scientific computation; knowing how to manipulate data using Python lists is thus a very important skill to have in your toolbox. For a complete reference on methods available on Python `list`s refer to its documentation page. Common collection operations such as filtering, mapping and sorting are very convinient to execute using anonymous functions (lambdas): 

In [138]:
list(filter(lambda x: x > 5, my_list))

[6, 7, 8, 9, 25]

In [132]:
list(map(lambda x: x*2, my_list))

[0, 2, 4, 6, 10, 12, 14, 16, 18, 50]

In [145]:
sorted(my_list, reverse=True)

[25, 9, 8, 7, 6, 5, 3, 2, 1, 0]

## Dictionaries
Python dictionaries are powerful abstractions that let us define key-value pairs. In other programming languages, such abstractions are also known as maps. We define dictionaries by using `{}` and separating keys and values by `,`. Elements can be accessed via `get()` method or via `[]` notation. Let's see some examples:

In [155]:
my_dict = {1:'one',2:'two',3:'three'}
print(my_dict.get(1))
print(my_dict[1])
my_dict[4] = 'four'
print(my_dict)

one
one
{1: 'one', 2: 'two', 3: 'three', 4: 'four'}


Since python is dynamically typed, we can define very convinient data structures. Let's see how we would define a dummy data-set of financial time-series:

In [154]:
my_dict = {
    'AAPL':[200,201,200.1,205],
    'GOOG':[700,750,640,720],
    'AMZN':[900,850,920,910]
}

Here, each value corresponds to a list of dummy prices. As we go forth, we will see how powerful such data structures are.

# Logic

## Logical Operators
Logical operators deal with boolean values, as we briefly covered before. If you recall, a bool takes on one of two values, True or False (or  11  or  00 ). The basic logical statements that we can make are defined using the built-in comparators. These are == (equal), != (not equal), < (less than), > (greater than), <= (less than or equal to), and >= (greater than or equal to).

In [156]:
6 == 6

True

In [157]:
6 != 6

False

In [158]:
6 > 6

False

In [160]:
6 >= 6

True

Each data type can also be evaluated to `True` or `False`. As a general rule, objects like `string` will evaluate to `False` if they do not contain anything and `True` otherwise:

In [169]:
print(bool())
print(bool(''))
print(bool(' '))
print(bool(1))

False
False
True
True


## Flow of control

We can control the flow of our programs using the basic logic operators and `if` statements. If statment executes if the boolean expression that it evaluates is `True` and `else` block will execute if the block is not `True`. Else-if or `elif` lets us set a specific boolean expression to evaluate if the base case is not `True`.

In [171]:
#If-else statement skeleton
if 'Condition':
    #do something if the 'Condition' is True
    print('in the if block')
else:
    #do something otherwise
    print('in the else block')
#note that 'Condition' will evaluate to True as it is a non-empty string

in the if block


In [173]:
a = 6
if a == 6:
    print('a is greater than 6')

a is greater than 6


We can also have nested `if-else` statements:

In [176]:
a = 9

if a % 2 == 0:
    print('a is divisible by 2')
    if a % 3 == 0:
        print('a is divisible by 2 and 3')
if a % 3 ==0:
    print('a is divisible by 3')
else:
    print('a is devisible by neither 2 or 3')

a is divisible by 3


We can use Pythonic syntactic sugar constructs such as `in` and `and` to check the membership of element/s in some structure. For example:

In [177]:
string = 'I love Python'
if 'y' and 'o' in string:
    print('This is my string!')

This is my string!


# Iteration
It is an essential tool of any programmer to go through a collection of elements and perform operations on each of them. For a data scientist, this skill is ever so important across many stages of the process: data collection, cleaning and analysis. 

## While loop

While loop is a basic loop that will execute if some condition is `True` and stops executing when the condition is `False`:

In [178]:
a = True
while a:
    print('inside while loop')
    a = False

inside while loop


We can escape the loop by issuing a `break` command:

In [181]:
a = 0
while True:
    a += 1
    print(a)
    if a == 10:
        break

1
2
3
4
5
6
7
8
9
10


## For loop

For loop is one of the most convinient iteration techniques out there. We use for loops to iterate through lists, dictionaries, ranges and other data structures. Unlike `while` loop, `for` loop will by default go through every element in the list and perform a given task. 

In [182]:
a = 0
for i in range(10):
    a += i
print(a)

45


There is more than one way to iterate through dictionaries:

In [183]:
my_dict = {
    'AAPL':[200,201,200.1,205],
    'GOOG':[700,750,640,720],
    'AMZN':[900,850,920,910]
}

In [190]:
for i in my_dict:
    print(i)

AAPL
GOOG
AMZN


Calling `items()` on dictionary gives access to keys and values:

In [189]:
for key, value in my_dict.items():
    print(key, value[1])

AAPL 201
GOOG 750
AMZN 850


# Functions

Funcitons have a huge singinficance for promgrammers and data scientists alike. Functions allow us to describe a piece of logic that we can reuse in future. This lets us leverage power of computation and perform an arbitrary task on large sets of data. Functions are defined using `def` key-word and anonymous functions can be defined via `lambda` key-word. We use `return` to get a value from the function. We use `()` and pass corresponding parameters to run functions. 

In [191]:
#function that does not return anything:
def printer(a):
    print(a)

printer('hello world')

hello world


In [192]:
def area(length, width):
    return length * width

In [193]:
double = lambda x: x*2
double(10)

20

In [194]:
type(area)

function

In [195]:
type(double)

function

We are only scratching the surface of functions here and highlighting details of a programming skillset of a data scientist. Python has a wide array of other functionalities that make it useful beyond data science and mathematical computation.  

This concludes the lecture / tutorial on Python programming for data science. Please be sure to ask any questions!