# What this Notebook is about?

This Notebook is part of the Data Science for Management course I'm teaching at the University of Firenze (Italy) during 2016/2017 academic year. 

A companion tutorial for the same course is [R Course for Data Science](https://github.com/andrgig/Data-Science-for-Management/blob/master/R%20Course%20for%20Data%20Science.ipynb).

For suggestions you can contact me on [Linkedin](https://it.linkedin.com/in/agigli) or [Twitter](https://twitter.com/andrgig).

*Andrea Gigli, PhD in Applied Statistics, MSc in Big Data Analytics and Social Mining*

---


## References & Acknowledments

This Notebook is inspired by contents from [Learn Python!](http://python.berkeley.edu/learn/), [Python for Social Science](http://www-rohan.sdsu.edu/~gawron/python_for_ss/course_core/book_draft/index.html), [Software Carpentry](https://bids.github.io/2016-01-14-berkeley/python/), [Pandas Cookbook](https://github.com/jvns/pandas-cookbook) and [other useful links](http://python.berkeley.edu/resources/). A special thanks to Eric Reynolds for having shared with me his Notebook on Introductory Python.

Now we are ready to start a quick tour of the Python programming language and start writing Python scripts and functions. I won't cover OOP in this Notebook and I'll mention only few things about classes. 

## What is Python

At its core, Python is an *interpreter* - it takes code (a sequence of commands), reads it, and executes its. This is different from programming languages like C and C++, which *compile* code into a language that the computer can understand directly - a result, Python is essentially an *interactive* programming language, you can program and see the results at the same time.

## Which one should I download?

Python comes in different *versions* and *flavors*.

The latest version is 3.6, however you'll still see version 2.7 floating around. This is because - while 3.6 is similar to 3.5 and 2.7 is similar to 2.6 - there are some big differences between Python 2 and Python 3. There is still a sizeable proportion of users on Python 2, which is why it hasn't been abandoned yet - however if you're learning it now it's worth starting with Python 3. It worths to mention a couple of key differences for novice programmers:

- In Python 2, you can print with `print 42` or `print(42)`. In Python 3, you need to use parentheses, as in `print(42)`.
- In Python 2, division of two integers like `5/2` will evaluate to `2`. (Python will drop the remainder if both numbers are integers.) Python 3 does exact division (`2.5`, in this example). If you use Python 2 and do not want this behavior, add this line at the top of each program:

`from __future__ import division`.

The flavor of Python refers to the way it's packaged.

The standard Python interpreter, available on http://www.python.org/, comes with a standard set of libraries, and if you want more functionality, you need to download them separately.

One of the most popular Python distributions, especially for data science, is Anaconda Python, which comes with a large set of popular libraries built in. Some examples are:
* Numpy - which provides matrix algebra functionality
* Scipy - which provides a whole series of scientific computing functions
* Pandas - which provides R-like dataframe objects for manipulating time series
* Matplotlib - for plotting graphs
* Jupyter - for notebooks like this one.

I recommend to [download the Anaconda installer](http://continuum.io/downloads) (Python 3).

Choose *Install for me only*.

By default, Anaconda will prepend itself to your PATH – leave this as is.

When Anaconda has finished installing, open a terminal (Linux, OSX), or the Anaconda Prompt (Windows).

Type ```conda update conda```, hit enter, and then type ```y``` (and hit enter).

Type ```conda update anaconda```, hit enter, and then type ```y``` (and hit enter).


## How do I use Python?

Once you've downloaded a Python distribution, there are various ways of actually using it.

* The most immediate way is to just execute `python.exe` on the command line to get a Python console for interacting with the interpreter.

* If you're looking learn Python or do data analysis, Jupyter notebooks allow you to see the results of your code as you write it, as well as make notes, plot graphs and so on. Alternatively you can use an interactive development environment with advanced editing, interactive testing, debugging and introspection features like [Spyder](https://pythonhosted.org/spyder/).

* If you're a programmer or do some slightly more complex things, you'll usually want to write one or more scripts, perhaps linked together and execute one of them, again using `python.exe`.

For this last use case, an integrated development environment (IDE) can be very useful. An IDE is a graphical user interface which makes writing complex easier by providing a text editor, a file browser, a debugger (for stepping through code line by line to figure out why your code isn't doing what it's supposed to) and many other tools, all in one software application. A good one is [PyCharm](https://www.jetbrains.com/pycharm/), which has a version called 'Community Edition' which is free to download.

#### Run the Jupyter Notebook

With Jupyter, you intersperse code, output, explanatory text, and figures in one big file called a "notebook." Notebooks are a convenient format to explore a language and to share examples of code.

To run a notebook, open the Terminal (Linux, OSX) or Anaconda Prompt (Windows) and type ```jupyter notebook```.
The notebook will open a new tab in your default browser. Do not close the terminal, as this will also shutdown the notebook.
When it has loaded, click on "New" (at the top right) and then "Python3" to create a new notebook.

#### Text editors and IDEs

You may need to use a text editor in combination with the Jupyter notebook. Popular choices include [Sublime Text](http://sublimetext.com/) and [Atom](https://atom.io/).



## What you should do now

First of all re-write (not Ctrl-C Ctrl-V) the following code and play with it in Iphyton or Spyder ... line by line without cheating :-P 

If you want to learn Python you need to make practice, 10/15 minutes exercise every day is my recommendation. You can find tons of them on the Web. For example here's a link to a [Jupyter Notebook](https://bids.github.io/2016-01-14-berkeley/python/00-python-intro.ipynb) and its [Web Page](https://bids.github.io/2016-01-14-berkeley/python/00-python-intro.html) version.






# Python basics

Now let's start writing some code. I suggest you to creare a new file in Spyder (Python 3.x) and write the code in grey boxes in it, then run commands selection by selection. In this way you'll understand quickly how things work and you'll have something to review at home and play with it.


In [1]:
#write something like a number
2

2

In [1]:
hello #it raises an error

NameError: name 'hello' is not defined

In [3]:
"hello"

'hello'

In [4]:
print("Hello world!")

Hello world!


In [5]:
print("Hello")
print('Goodbye')

Hello
Goodbye


In [6]:
# the next line prints Ciao
print("Ciao")  # this line prints Ciao
# this is a comment too

Ciao


## Variables

In [7]:
#things can be stored as variables
x = 9
a = 2
c = "hello"
d = True

In [8]:
#and printed out
print(x)

9


In [9]:
print(x,a,c,d)

9 2 hello True


## Mathematical operators

In [12]:
1 + 2

3

In [3]:
40 - 5 + 7 - 8 * 5

2

In [14]:
x * 20

180

In [15]:
x / 4

2.25

In [16]:
x // 4  # integer division - result will be rounded down to the nearest integer

2

In [17]:
y = 3
x ** y  # x to the power of y

729

In [18]:
3 * (x + y)

36

In [19]:
# the result of the expression can be saved to another variable, or even back to the same variable
x = x + 1
x

10

In [20]:
x += 1  # alternative 'shorthand' syntax, it means the same as x = x + 1
x

11

## Boolean operators

The expressions we saw so far evaluate to a number. Boolean expressions are expressions which evaluate to true or false.

In [21]:
1 == 2

False

In [22]:
1 != 2

True

In [23]:
2 < 2

False

In [24]:
2 <= 2   # the other way is written '>='

True

In [25]:
print(x)
15 <= x and x <= 20   # this particular expression occurs very often and for simplicity can be written: 15 <= x <= 20

11


False

In [26]:
15 <= x or x <= 20

True

In [27]:
not False

True

In [28]:
15 <= x <= 20

False

## String operators

A 'string' is a sequence of characters (letters, digits, spaces, punctuation, new lines, etc.)

In [29]:
"abc" + "def"

'abcdef'

In [30]:
"The number " + 3.4 + " is my favorite number" # this rises an error

TypeError: Can't convert 'float' object to str implicitly

In [31]:
"The number " + str(3.4) + " is my favorite number"

'The number 3.4 is my favorite number'

In [32]:
type(3.4)

float

In [33]:
type(str(3.4))

str

In [34]:
# string interpolation

template = "From %s to %s"
print(template % ('a', 'z'))
print(template % (1, 10))

From a to z
From 1 to 10


## Functions

These will be very familiar to anyone who has programmed in any language, and work like you would expect.

In [5]:
# There are thousands of functions that operate on things
print(type(3)) #type() function
print(len('hello')) #len() function
print(round(3.3))#round() funcction

<class 'int'>
5
3


In [36]:
#to find out what a function does you can type it's name followed by a question mark
round?

In [37]:
# Not every function is available if you don't have imported it
log?

Object `log` not found.


In [38]:
# this will cause an error because the log function has not been 'imported' yet
log(3)

NameError: name 'log' is not defined

Many useful functions are not in the Python 'standard library', but in external packages which you can download or build by yourself. These need to be imported into your Python notebook (or program) before they can be used. 

In [7]:
# log() function is defined in math module

import math

#to show the module function (and methods)
dir(math)

['__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'e',
 'erf',
 'erfc',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'hypot',
 'isfinite',
 'isinf',
 'isnan',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'log2',
 'modf',
 'pi',
 'pow',
 'radians',
 'sin',
 'sinh',
 'sqrt',
 'tan',
 'tanh',
 'trunc']

In [8]:
math.log? # an help appears

In [41]:
math.log(3)

1.0986122886681098

In [42]:
# we don't want to write 'math.' every time
# we can use the 'from <module> import <function1>, <function2>, ...' syntax

from math import log, exp
print(log(3))
print(exp(3))

1.0986122886681098
20.085536923187668


## Methods

Before we get any farther into the Python language, we have to say a word about "objects". We will not be teaching object oriented programming in this workshop, but you will encounter objects throughout Python (in fact, even seemingly simple things like ints and strings are actually objects in Python).

An object is like a bundled "thing" that contains within itself both *data* and *functions* that operate on that data. For example, strings in Python are objects that contain a set of characters and also various functions that operate on the set of characters. When bundled in an object, these functions are called "methods".

Instead of the "normal" function(arguments) syntax, methods are called using the syntax variable.method(arguments).

In [43]:
# A string is actually an object
a = 'hello, world!'
print(a)
print(type(a))

hello, world!
<class 'str'>


In [44]:
#Objects have bundled methods
print(a.capitalize())
print(a.replace('l', 'X'))

Hello, world!
heXXo, worXd!


## Collections
### Lists

In [45]:
# Lists are created with square bracket syntax
a = ['blueberry', 'strawberry', 'pineapple']
print(a, type(a))

['blueberry', 'strawberry', 'pineapple'] <class 'list'>


In [46]:
a = [21, 32, 15]

In [47]:
a

[21, 32, 15]

In [48]:
len(a)

3

In [49]:
a[0]  # list indexing is zero-based

21

In [50]:
a[1]

32

In [51]:
a[2]

15

In [52]:
# you can use the same syntax to set elements
a[1] = 33
a

[21, 33, 15]

In [53]:
a[3]

IndexError: list index out of range

In [54]:
a[-1]

15

In [55]:
a[1:]  # elements starting from and including item 1

[33, 15]

In [56]:
a[:2]  # elements up to but excluding item 2

[21, 33]

In [57]:
# Lists are objects, like everything else, and have methods such as append
# add an element to the end of the list
a.append(9)

In [58]:
a

[21, 33, 15, 9]

In [59]:
# insert an element within a list
a.insert(2, 85)

In [60]:
a

[21, 33, 85, 15, 9]

In [61]:
# iterating over a list
for x in a:
    print(x)

21
33
85
15
9


In [62]:
sorted(a)

[9, 15, 21, 33, 85]

In [63]:
a  # sorted function doesn't change a, it returns a new list

[21, 33, 85, 15, 9]

In [64]:
a.sort()

In [65]:
a  # a.sort method changes a

[9, 15, 21, 33, 85]

In [66]:
# careful! lists are objects, and variables *point* to objects
a = [4, 5, 6]
b = a

In [67]:
a

[4, 5, 6]

In [68]:
b

[4, 5, 6]

In [69]:
a == [4, 5, 6]   # a and [4, 5, 6] are equal, i.e. they contain the same items

True

In [70]:
a is [4, 5, 6]   # but a and this newly constructed list are two different objects

False

In [71]:
# a and b are not just equal they are the same object
a is b

True

In [72]:
b.append(7)

In [73]:
b

[4, 5, 6, 7]

In [74]:
a

[4, 5, 6, 7]

In [75]:
# clone a list

b = list(a)

In [76]:
b is a

False

In [77]:
b.append(8)

In [78]:
b

[4, 5, 6, 7, 8]

In [79]:
a

[4, 5, 6, 7]

---
### Exercise 
Go to https://try-python.appspot.com/ and play with it for 15 minutes

### Exercise 

Do this tasks
1. Ask five people around you for their heights (in metres).
2. Store these in a list called heights.
3. Append your own height, calculated above in the variable metres, to the list.
4. Get the first height from the list and print it.
5. Extract the last value in at least two different ways.
----

### Tuples

Tuples create a bit of confusion for beginners because they're very similar to lists, but they have some subtle conceptual differences. Nonetheless, tuples do appear when programming in Python, so it's important to know about them.

Like lists, tuples are sequences of any type of object. **Unlike lists, they are immutable**. This means that:
- Once constructed, they cannot be changed - i.e. you cannot append, insert or delete elements
- Because they are immutable, they can be used as dictionary keys (lists cannot)

You declare tuples using () instead of []

In general, they're often used instead of lists:
- to group items when the position in the collection is critical, such as coord = (x,y)
- when you want to make prevent accidental modification of the items, e.g. shape = (12,23)



In [80]:
xy = (23, 45)
print(xy[0])
xy[0] = "this won't work with a tuple" #this rises an error

23


TypeError: 'tuple' object does not support item assignment

In [81]:
# accessing elements is the same as with a list
t1 = (1, 2, 3)
print("Length: %s" % (len(t1)))
print("First element: %s" % (t1[0]))
print("First element: %s" % (t1[-1]))

Length: 3
First element: 1
First element: 3


In [82]:
# another way of accessing the elements is to 'unpack' the tuple
# this works with lists too, by the way
x, y, z = (1, 2, 3)
print(x)
print(y)
print(z)

1
2
3


In [83]:
# to add elements, you need to build a new tuple
t2 = t1 + (4, 5)
t2

(1, 2, 3, 4, 5)

In [84]:
# they can be used as dictionary keys
d = {
    ('Finance', 1): 'Room 8',
    ('Finance', 2): 'Room 3',
    ('Math', 1): 'Room 6',
    ('Programming', 1): 'IT room'
}
d

{('Finance', 1): 'Room 8',
 ('Finance', 2): 'Room 3',
 ('Math', 1): 'Room 6',
 ('Programming', 1): 'IT room'}

----------------
### Exercise

Do this tasks
1. Ask five people around you for their ages.
2. Store these in a tuple called ages.
3. Append your own age, calculated above in the variable myage, to the tuple.
4. Get the second age from the touple and print it.
5. Extract the last value in at least two different ways.
---------------

### Dictionaries

You can think of lists as objects which map (sequential) *indices* to values:

$$
0 \mapsto 4 \\ 1 \mapsto 5 \\ 2 \mapsto 6
$$

Dictionaries are objects which map *keys* to values. Keys can be (almost) any kind object (strings, numbers, etc) and they do not have to be sequential.

$$
\mathtt{"k"} \mapsto 3 \\
\mathtt{"q"} \mapsto 4
$$

Dictionaries are the collection to use when you want to store and retrieve things by their names (or some other kind of key) instead of by their position in the collection. A good example is a set of model parameters, each of which has a name and a value. Dictionaries are declared using {}.

In [85]:
a = {'k': 3, 'q': 4}

In [86]:
# look up the values contained in the dict

a['k']

3

In [87]:
a['q']

4

In [88]:
# an error is thrown if the requested key is missing

a['f'] #this rises an error

KeyError: 'f'

In [89]:
'f' in a

False

In [90]:
a['f'] = 56

In [91]:
a

{'f': 56, 'k': 3, 'q': 4}

In [92]:
'f' in a

True

In [93]:
# how many key or values are there in the dict?

len(a)

3

In [94]:
# to overwrite an existing value, just set it again

a['f'] = 123

len(a), a

(3, {'f': 123, 'k': 3, 'q': 4})

In [95]:
# get the collection of keys
a.keys()

dict_keys(['f', 'k', 'q'])

In [96]:
# get the collection of values
a.values()

dict_values([123, 3, 4])

### Exercise

- Build the following dictionary

$$
\mathtt{"uno"} \mapsto 1 \\
\mathtt{"due"} \mapsto 2 \\
\mathtt{"tre"} \mapsto 3 \\
\mathtt{"quattro"} \mapsto 4 \\
\mathtt{"cinque"} \mapsto 5
$$

- Add "sei" as key and 6 as value using update() method
- Build

$$
\mathtt{"sette"} \mapsto 7 \\
\mathtt{"otto"} \mapsto 8 \\
\mathtt{"nove"} \mapsto 9 
$$

- Concatenate the two dictionaries
- Write a IF statement to check if a key already exists in a dictionary

### Numpy Array

Even though numpy arrays (often written as ndarrays, for n-dimensional arrays) are not part of the core Python libraries, they are so useful in scientific Python that we'll include them here in the core lesson. Numpy arrays are collections of things, all of which must be the same type, that work similarly to lists (as we've described them so far). The most important are:

1. You can easily perform elementwise operations (and matrix algebra) on arrays
2. Arrays can be n-dimensional
3. There is no equivalent to append, although arrays can be concatenated

Arrays can be created from existing collections such as lists, or instantiated "from scratch" in a few useful ways.

When getting started with scientific Python, you will probably want to try to use ndarrays whenever possible, saving the other types of collections for those cases when you have a specific reason to use them.

In [97]:
# We need to import the numpy library to have access to it 
# We can also create an alias for a library, this is something you will commonly see with numpy
import numpy as np


In [98]:
# Make an array from a list
alist = [2, 3, 4]
blist = [5, 6, 7]
a = np.array(alist)
b = np.array(blist)
print(a, type(a))
print(b, type(b))

[2 3 4] <class 'numpy.ndarray'>
[5 6 7] <class 'numpy.ndarray'>


In [99]:
# Do arithmetic on arrays
print(a**2)
print(np.sin(a))
print(a * b)
print(a.dot(b), np.dot(a, b))

[ 4  9 16]
[ 0.90929743  0.14112001 -0.7568025 ]
[10 18 28]
56 56


In [100]:
# Boolean operators work on arrays too, and they return boolean arrays
print(a > 2)
print(b == 6)

c = a > 2
print(c)
print(type(c))
print(c.dtype)

[False  True  True]
[False  True False]
[False  True  True]
<class 'numpy.ndarray'>
bool


In [101]:
# Indexing arrays
print(a[0:2])

c = np.random.rand(3,3)
print(c)
print('\n')
print(c[1:3,0:2])

c[0,:] = a
print('\n')
print(c)

[2 3]
[[ 0.95348347  0.22544004  0.65226686]
 [ 0.757638    0.0042517   0.26898069]
 [ 0.43068185  0.99836348  0.56987177]]


[[ 0.757638    0.0042517 ]
 [ 0.43068185  0.99836348]]


[[ 2.          3.          4.        ]
 [ 0.757638    0.0042517   0.26898069]
 [ 0.43068185  0.99836348  0.56987177]]


In [102]:
# Arrays can also be indexed with other boolean arrays
print(a)
print(b)
print(a > 2)
print(a[a > 2])
print(b[a > 2])

b[a == 3] = 77
print(b)

[2 3 4]
[5 6 7]
[False  True  True]
[3 4]
[6 7]
[ 5 77  7]


In [103]:
# ndarrays have attributes in addition to methods
#c.
print(c.shape)
print(c.prod())
c.prod?

(3, 3)
0.00509540423657


In [104]:
# There are handy ways to make arrays full of ones and zeros
print(np.zeros(5), '\n')
print(np.ones(5), '\n')
print(np.identity(5), '\n')

[ 0.  0.  0.  0.  0.] 

[ 1.  1.  1.  1.  1.] 

[[ 1.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  1.]] 



In [105]:
# You can also easily make arrays of number sequences
print(np.arange(0, 10, 2))

[0 2 4 6 8]


----
### Exercise

Revisit your list of heights

1. turn it into an array
2. calculate the mean
3. create a mask of all heights greater than a certain value (your choice)
4. find the mean of the masked heights
5. find the number of heights greater than your threshold
6. mean() can take an optional argument called axis, which allows you to calculate the mean across different axes, eg across rows or across columns. Create an array with two dimensions (not equal sized) and calculate the mean across rows and mean across columns. Use 'shape' to understand how the means are calculated.
----

### Dates

Dates are not usually included in standard beginner Python tutorials, however for finance they're pretty essential.

In Python, the standard `date` class lives in the `datetime` module. We're also going to import `relativedelta` from the `dateutil.relativedelta` module, which allows us to add/subtract days/months/years to dates.

In [139]:
from datetime import date
from dateutil.relativedelta import relativedelta

In [140]:
date.today()

datetime.date(2016, 12, 28)

In [141]:
date.today() + relativedelta(months=2)

datetime.date(2017, 2, 28)

In [142]:
date.today() - relativedelta(days=3)

datetime.date(2016, 12, 25)

In [143]:
one_day = relativedelta(days=1)
date.today() - 3 * one_day

datetime.date(2016, 12, 25)

In [144]:
one_month = relativedelta(months=2)
date.today() + 2 * one_month

datetime.date(2017, 4, 28)

In [145]:
# we can also difference dates to calculate the number of days between them

d1 = date(2017, 1, 1)
d2 = date(2017, 1, 10)
(d2 - d1).days

9

In [146]:
# converting a date to string
str(d1)

'2017-01-01'

In [147]:
# if we want to control the way it looks - check out the docs for more details
d1.strftime("%Y-%b-%d (%a)")

'2017-Jan-01 (Sun)'

In [148]:
# going the other way is a bit more complicated but still not too bad
# to do it we need to use another object, 'datetime'
from datetime import datetime
datetime.strptime('11 Feb 2018', "%d %b %Y").date()

datetime.date(2018, 2, 11)

In [149]:
d1.weekday()  # 0 = Monday, ..., 6 = Sunday

6

## If, If... else, If... elif... else
Often we want to check if a condition is True and take one action if it is, and another action if the condition is False. We can achieve this in Python with an if statement.
TIP: You can use any expression that returns a boolean value (True or False) in an if statement. Common boolean operators are ==, !=, <, <=, >, >=. You can also use is and is not if you want to check if two variables are identical in the sense that they are stored in the same location in memory.

In [150]:
if x != 1:
    print("x does not equal 1")

x does not equal 1


In [151]:
if 1 == 2:
    print("This will not be printed")
elif 1 > 3:
    print("This will not be printed either")
else:
    print("This *will* be printed")

This *will* be printed


## Loops

So far, everything that we've done could, in principle, be done by hand calculation. In this section and the next, we really start to take advantage of the power of programming languages to do things for us automatically.

We start here with ways to repeat yourself. The two most common ways of doing this are known as for loops and while loops. For loops in Python are useful when you want to cycle over all of the items in a collection (such as all of the elements of an array), and while loops are useful when you want to cycle for an indefinite amount of time until some condition is met.
The basic examples below will work for looping over lists, tuples, and arrays. Looping over dictionaries is a bit different, since there is a key and a value for each item in a dictionary. Have a look at the Python docs for more information.



In [152]:
# A basic for loop - don't forget the white space!
wordlist = ['hi', 'hello', 'bye']
for word in wordlist:
    print(word + '!')

hi!
hello!
bye!


**Note on indentation**: Notice the indentation once we enter the for loop. Every idented statement after the for loop declaration is part of the for loop. This rule holds true for while loops, if statements, functions, etc. Required identation is one of the reasons Python is such a beautiful language to read.

If you do not have consistent indentation you will get an IndentationError. Fortunately, most code editors will ensure your indentation is correction.

**NOTE** In Python the default is to use four (4) spaces for each indentation, most editros can be configured to follow this guide.

In [153]:
for i in range(5):
    print(i)

0
1
2
3
4


In [154]:
for i in range(25, 30):
    print(i)

25
26
27
28
29


In [155]:
for i in range(30, 25, -1):
    print(i)

30
29
28
27
26


In [9]:
# building a new list from an existing list

# create a new empty list
b = []

# add items to b
for x in a:
    b.append(x ** 2)

# let's see what's in b now
b

NameError: name 'a' is not defined

In [157]:
# list comprehensions - a powerful feature of Python
b = [x ** 2 for x in a]

b

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'

In [158]:
# Sum all of the values in a collection using a for loop
numlist = [1, 4, 77, 3]

total = 0
for num in numlist:
    total = total + num
    
print("Sum is", total)

Sum is 85


In [159]:
# Often we want to loop over the indexes of a collection, not just the items
print(wordlist)

for i, word in enumerate(wordlist):
    print(i, word, wordlist[i])

['hi', 'hello', 'bye']
0 hi hi
1 hello hello
2 bye bye


In [160]:
x = 1
while x ** 2 < 50:
    print(x)
    x = x + 1  # remember that this means x = x + 1  or  x += 1

1
2
3
4
5
6
7


In [161]:
# While loops are useful when you don't know how many steps you will need,
# and want to stop once a certain condition is met.
step = 0
prod = 1
while prod < 100:
    step = step + 1
    prod = prod * 2
    print(step, prod)
    
print('Reached a product of', prod, 'at step number', step)

1 2
2 4
3 8
4 16
5 32
6 64
7 128
Reached a product of 128 at step number 7


In [162]:
# iterate over a dictionary (keys, by default)

a = {'k': 3, 'q': 4}

for k in a:
    print(k)

k
q


In [163]:
# iterate over a dictionary by key and value
for k, v in a.items():
    print(k, v)

k 3
q 4


In [164]:
# like lists, dictionaries can be specified by comprehensions too

is_gt_three = {
    key: key*10
    for key in range(5)    
}
is_gt_three

{0: 0, 1: 10, 2: 20, 3: 30, 4: 40}

### Exercise
We can now calculate the variance of the heights we collected before.

As a reminder, sample variance is the calculated from the sum of squared differences of each observation from the mean

1. First, we need to calculate the mean: create a variable total for the sum of the heights. Using a for loop, add each height to total. Find the mean by dividing this by the number of measurements (-1), and store it as mean. Note: To get the number of things in a list, use len(the_list).
2. Now we'll use another loop to calculate the variance: Create a variable sum_diffsq for the sum of squared differences. Make a second for loop over heights. At each step, subtract the height from the mean and call it diff. Square this and call it diffsq. Add diffsq on to sum_diffsq. Divide diffsq by n-1 to get the variance.
3. Display the variance.

Note: To square a number in Python, use **.

### Exercise
Write a Python program to iterate over the d = {'x': 10, 'y': 20, 'z': 30} using for loops and print key value pairs.


## Function, Modules and Classes

One way to write a program is to simply string together commands, like the ones described above, in a long file, and then to run that file to generate your results. This may work, but it can be cognitively difficult to follow the logic of programs written in this style. Also, it does not allow you to reuse your code easily - for example, what if we wanted to run our logistic growth model for several different choices of initial parameters?

The most important ways to "chunk" code into more manageable pieces is to create functions and then to gather these functions into modules, and eventually packages. Below we will discuss how to create functions and modules. 

A third common type of "chunk" in Python is classes, but we will not be covering object-oriented programming in this workshop and we'll only mention them later on.

### Functions



In [165]:
# We've been using functions all day
x = 3.333333
print(round(x, 2))
print(np.sin(x))

3.33
-0.190567635651


In [166]:

# It's very easy to write your own functions
def multiply(x, y):
    return x*y

In [167]:
# Once a function is "run" and saved in memory, it's available just like any other function
print(type(multiply))
print(multiply(4, 3))

<class 'function'>
12


In [168]:
# It's useful to include docstrings to describe what your function does
def say_hello(time, people):
    '''
    Function says a greeting. Useful for engendering goodwill
    '''
    return 'Good ' + time + ', ' + people

**Docstrings**: A docstring is a special type of comment that tells you what a function does. You can see them when you ask for help about a function.

In [169]:
say_hello('afternoon', 'friends')

'Good afternoon, friends'

In [170]:
# Keyword arguments can be used to make some arguments optional by giving them a default value
# All mandatory arguments must come first, in order
def say_hello(time, people='friends'):
    return 'Good ' + time + ', ' + people


say_hello('afternoon')

'Good afternoon, friends'

---
### Exercise

Finally, let's turn our variance calculation into a function that we can use over and over again. Copy your code from Exercise 4 into the box below, and do the following:
1. Turn your code into a function called calculate_variance that takes a list of values and returns their variance.
2. Write a nice docstring describing what your function does.
3. In a subsequent cell, call your function with different sets of numbers to make sure it works.
4. Refactor your function by pulling out the section that calculates the mean into another function, and calling that inside your calculate_variance function.
5. Make sure it can works properly when all the data are integers as well.
6. Give a better error message when it's passed an empty list. Use the web to find out how to raise exceptions in Python.

### Exercise

We can make our functions more easily reusable by placing them into modules that we can import, just like we have been doing with numpy. It's pretty simple to do this.
1. Copy your function(s) into a new text file, in the same directory as this notebook, called stats.py.
2. In the cell below, type import stats to import the module. Type stats. and hit tab to see the available functions in the module. Try calculating the variance of a number of samples of heights (or other random numbers) using your imported module.

---

## Classes

Let's start with some terminology:
* A class is collection of related functions, and these are called the methods of the class.
* The methods act on instances of the class.
* An instance is basically a collection of related data.
* Each data item has a name, and those names are called the attributes of the class.

In summary, classes are collections functions that operate on a data set, and instances of that class represent individual data sets.

Class methods always take the instance self as the first argument, and fall into two categories:
* Normal methods which use or modify the instance attributes
* Special methods, which define the class's behaviour: you can spot these because they start and end with two underscores __

There are lots of other things you can do with classes, but this is enough for now.

Let's take a look at an example:


In [171]:
from datetime import date

class Person(object):
    
    def __init__(self, name, date_of_birth):
        self.name = name
        self.date_of_birth = date_of_birth
        
    def age(self):
        today = date.today()
        age = today.year - self.date_of_birth.year
        if today.month < self.date_of_birth.month or today.day < self.date_of_birth.day:
            age -= 1
        return age


In [172]:
p = Person("Andrea", date(1999, 12, 19))

In [173]:
p

<__main__.Person at 0x61eb668>

In [174]:
print(p.name)
print(p.date_of_birth)
print(p.age())

Andrea
1999-12-19
17


In [175]:
def print_age(person):
    print("%s is %s years old right now" % (person.name, person.age()))

print_age(p)

Andrea is 17 years old right now


## File Input/Output

Reading/Writing files are key tasks in data analysis. The Python Standard library has two methods you can use: write() e read(). Tese methods are appled to a built-in class you can create using the open() function, taking in input the file name and some additional arguments. Two arguments worth to be mentioned are the mode argument and encoding argument. Mode can be one of the following

  Mode  | Meaning
  ------------- | -------------
  'r' | open for reading (default)
  'w' | open for writing, truncating the file first
  'x' | open for exclusive creation, failing if the file already exists
  'a' | open for writing, appending to the end of the file if it exists
  'b' | binary mode
  't' | text mode (default)
  '+' | open a disk file for updating (reading and writing)
  
Encoding is the name of the encoding used to decode or encode the file. The default encoding is platform dependent, to check it run the following lines

In [28]:
import locale
locale.getpreferredencoding()

'cp1252'

**BE AWARE**: Nowadays it is recommended to use UTF-8 encoding (we are using a Linux Virtual Machine and in Linux UTF-8 is the default encoding) but you may work with text files with a different encoding and you may need to convert them in UTF-8 to represent them homogeneously in your system. The module [codecs](https://docs.python.org/3/library/codecs.html#module-codecs) is very helpful in this case. 

I don't want to go too much in detail on this topic, you can read a gentle [introduction to character sets here](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/).

In [26]:
file = open("newfile.txt", "w")
file.write("hello world in the new file")
file.write("and another line")
file.close()

<class '_io.TextIOWrapper'>


In [5]:
file = open("newfile.txt", "r")
print(file.read())
file.close()

hello world in the new file
and another line


In [4]:
file = open("newfile.txt", "w")
file.write("hello world in the new file\n")
file.write("and another line")
file.close()

In [7]:
file = open("counts_file.txt", "w")
for i in range(10):
    file.write("%s\n" % i)
file.close()

In [8]:
file = open("counts_file.txt", "r")
print(file.read())
file.close()

0
1
2
3
4
5
6
7
8
9



In [9]:
file = open("data/multicols_file.txt", "w")
for i in range(10):
    for j in range(5):
        file.write("%s\t" % j)
    file.write("\n")
file.close()

In [10]:
file = open("multicols_file.txt", "r")
print(file.read())
file.close()

0	1	2	3	4	
0	1	2	3	4	
0	1	2	3	4	
0	1	2	3	4	
0	1	2	3	4	
0	1	2	3	4	
0	1	2	3	4	
0	1	2	3	4	
0	1	2	3	4	
0	1	2	3	4	



### Exercise
Add a column containing the row count to the file

In [21]:
# readline() method

file = open('counts_file.txt', 'r')
print(file.readline())

0



In [23]:
print(file.readline())
print(file.readline())
print(file.readline())

2

3

4



In [24]:
file.readlines() #read the remaining lines and return a list
file.close()

['5\n', '6\n', '7\n', '8\n', '9\n']

In [29]:
file = open('counts_file.txt', 'r')

for line in file: #file is iterable
    print(line)

file.close()

0

1

2

3

4

5

6

7

8

9



In [31]:
with open("multicols_file.txt") as f:
    for line in f:
        print(line)
#you don't need to close the file explicitly 

0	1	2	3	4	

0	1	2	3	4	

0	1	2	3	4	

0	1	2	3	4	

0	1	2	3	4	

0	1	2	3	4	

0	1	2	3	4	

0	1	2	3	4	

0	1	2	3	4	

0	1	2	3	4	



In [32]:
with open('multicols_file.txt', 'r') as f:
    line = f.readlines()

    for data in line:
        number = data.split() #create a list of strings
        print(number)

['0', '1', '2', '3', '4']
['0', '1', '2', '3', '4']
['0', '1', '2', '3', '4']
['0', '1', '2', '3', '4']
['0', '1', '2', '3', '4']
['0', '1', '2', '3', '4']
['0', '1', '2', '3', '4']
['0', '1', '2', '3', '4']
['0', '1', '2', '3', '4']
['0', '1', '2', '3', '4']




## Python Magic Commands
http://nbviewer.jupyter.org/github/jdwittenauer/ipython-notebooks/blob/master/notebooks/language/IPythonMagic.ipynb