---

_This notebook will guide you through learning the basics of Python and Python Notebooks (Jupyter)._

_IF you are reading this using the course's JupyterHub, remember that you can always update to the newest set of material for the course by opening the following URL while you are logged in your account:_

[Click here to update if you are logged in to the course's JupyterHub](https://tinyurl.com/ds2020-21)

_IF you are using your own installation, then download the material manually from the Campus Virtual of the course:_

[Campus virtual](https://e-aules.uab.cat/2020-21/course/view.php?id=45842)

---

# Introduction to Python and Python Notebooks

<a href="http://python.org"> Python </a> is a high-level, general-purpose programming language and it offers several tools that are broadly used by the scientific community.

<a href="http://ipython.org/">IPython</a> is an interactive interface, letting you quickly process data and test ideas. 

The (IPython) Jupyter **notebook** works in your web browser, allowing you to document your computations in an easily reproducible form, it is an interactive computational environment which you can combine code execution, rich text, mathematics, plots and rich media.

## Why is it so popular?

Python is perfect for quick prototyping: it is interpreted and dynamically typed; provides high-level programming with nice readability.

Python incorporates documentation directly into the language itself. Moreover, since there exists a large community of users and developers, it is easy to find help, recipes, and code snippets on-line.

It is Free software / Open source.

It runs natively on Windows, Mac OS, linux, and others.

![Programming is fun again!](python_comic.png "Programming is fun again!")

<!---
See ... <a href="http://www.xkcd.com/353/">Programming is fun again!</a>
-->

# Installing Python

Use <a href="https://www.anaconda.com/distribution/">Anaconda</a>: A free Python distribution which supports Linux, Windows and Mac.

Anaconda includes Python and analytic Python packages such as NumPy, SciPy, Matplotlib, and IPython that we will be using in this course.

# Using the Jupyter notebook

The Jupyter notebook allows you to include text, code, and plots in the same document.

This makes it ideal for example to write up a report about a python project and share it with others.

**Starting up**
The Jupyter Notebook App that was installed locally with Anaconda can be launched by clicking on the Jupyter Notebook icon installed by Anaconda in the start menu (Windows) or through the Anaconda Navigator graphical interface, or by typing in a terminal:

<code> jupyter notebook </code>

Running the jupyter notebook application, will automatically start a browser and open the localhost URL of the notebook interface.

**First steps**
At first glance, a notebook looks like a fairly typical application - it has a menubar (File, Edit, View, etc.) and a tool bar with icons. Below this, you will see an empty cell, in which you can type any Python code. You can write several lines of code, and once it is ready to run, you can press Shift+Enter and it will get executed:

In [None]:
print("Hello world! This is the Data Science!")

You can then click on that cell, change the Python code, and press Shift+Enter again to re-execute the code.

You can insert/copy/paste cells at any point of your notebook using the "Insert" and "Edit" menu entries.

There are basically two types of cells: code cells and text cells. You can change the type of a cell by using the "Cell" menuitem. Text cells can be just raw text or formatted using Markdown syntax. Markdown sytanx allow you to render text with format (e.g. headings, bold text, lists, etc.).

**Tips**

Save often! There is currently no auto-save in the notebook, so you will lose your changes if you close the browser window. You can save your notebook using the "File" menuitem or just pressing Ctr+S keys.

If you need more info about the Jupyter notebook check the oficial documentation <a href="http://ipython.org/notebook.html">here</a>.

# The Python Programming Language

_Remember that code cells are run by pressing shift-enter or using the play button in the toolbar._

## Variables

Let's define some variables and give them a value

In [None]:
x = 1
y = 2
x + y

Typing just the name of a variable, will print out its contents

In [None]:
x

Although it is better practice to use the `print` function for that

In [None]:
print(x)

Note that we never told Python what kind of value we wanted to store in each variable. Python is a *dynamically typed* language, which means that it decides on the type of the contents of a variable at run time. Let's see what Python decides to use in different situations

Use `type()` to return the object's type

In [None]:
x = 1.0
type(x)

In [None]:
x = 1
type(x)

In [None]:
type(1.0)

In [None]:
type(1)

In [None]:
type('This is a string')

In [None]:
type(None)

### Operators

In [None]:
1 + 2, 1 - 2, 1 * 2, 1 / 2

Integer division of float numbers: if you use the `//` operator, then you force the result to be an integer

In [None]:
3.0 // 2.0

Raise to a power

In [None]:
2 ** 2

Binary operators

In [None]:
True and False

In [None]:
not False

In [None]:
True or False

In [None]:
2 > 1, 2 < 1, 2 > 2, 2 < 2, 2 >= 2, 2 <= 2

Equality

In [None]:
[1,2] == [1,2]

## Using Python modules

We can extend Python's functionality by importing modules and using the functions in them. Let's import a standard Python module called *math*

In [None]:
import math

Once imported, we can call the functions inside the module by preceding them with the name of the module like this:

In [None]:
x = math.cos(2 * math.pi)
print(x)

We can give a different name to the module we import it we want to

In [None]:
import math as m

In [None]:
x = m.cos(2 * m.pi)
print(x)

We can import just the function(s) we need, or import the whole module (all functions) into the current namespace instead. If we do this, then we do not need to precede function calls with the name of the module.

In [None]:
from math import *
x = cos(2 * pi)
print(x)

There are several ways to look at documentation for a module. To get a list of the functions in a module we could use the <code>dir()</code> function

In [None]:
print(dir(math))

To get specific help on a function we can use the <code>help()</code> function

In [None]:
help(math.cos)

## Control Flow

In [None]:
statement1 = False
statement2 = False

if statement1:
    print("statement1 is True")
elif statement2:
    print("statement2 is True")
else:
    print("statement1 and statement2 are False")

## Defining our own Functions

Let's define a new function that takes two numbers and adds them together. Let's call it `add_numbers`

In [None]:
def add_numbers(x, y):
    return x + y

add_numbers(1, 2)

Note how we used **identation** to define what is inside the scope of the function.

Python programs get structured through indentation, i.e. code blocks are defined by their indentation. In the case of Python using identation correctly is a language requirement, not just a matter of style. This principle makes it easier to read and understand other people's Python code.

So, how does it work? All statements with the same distance to the right belong to the same block of code, i.e. the statements within a block line up vertically. The block ends at a line less indented or the end of the file. If a block has to be more deeply nested, it is simply indented further to the right.

There is another aspect of structuring in Python which you can see in the example. Functions (as well as Loops and Conditional statements and other structures introducing blocks) end with a **colon** ":". This implies that a block of code is expected to follow.

So, in Python we structure code by colons and indentation.

Let's update `add_numbers` to take an optional 3rd parameter. The way to indicate that an parameter is optional is to give it a default value (like here we specified that the default value of <code>z</code> is `=None`

In [None]:
def add_numbers(x,y,z=None):
    if (z==None):
        return x+y
    else:
        return x+y+z

print(add_numbers(1, 2))
print(add_numbers(1, 2, 3))

We can assign the function `add_numbers` to variable `a`. Careful, what I mean here is to assign the function itself to variable `a`, so we can then use the variable `a` as an alias for the function

In [None]:
def add_numbers(x,y):
    return x+y

a = add_numbers
a(1,2)

In [None]:
type(a)

Look how this is different from assigning the *result* of a function call to a variable

In [None]:
def add_numbers(x,y):
    return x+y

a = add_numbers(1,2)
a

In [None]:
type(a)

Remember how we used the `help()` function to get some help about a function? We can enable this for our function as well, by adding a doc string

In [None]:
def add_numbers(x,y):
    """
    Adds two numbers together
    """
    
    return x+y

In [None]:
help(add_numbers)

## Tuples

Tuples are an immutable data structure (cannot be altered). We use tuples to groups objects together. Note that they can hold objects of different types. We define tuples using parentheses

In [None]:
x = (1, 'a', 2, 'b')

print(x)
type(x)

To access one of the elements of a tuple, use square brackets. Remember that the indexing in Python starts from zero.

In [None]:
print(x[0])
print(x[3])

But we cannot change the contents of a tuple once a tuple has been created. The following line will raise an error.

In [None]:
x[1] = 10

Another useful operation is "unpacking", which refers to assigning the items in a tuple to individual variables 

In [None]:
point = (10, 20)

x, y = point

print("x =", x)
print("y =", y)

## Lists

Lists are a mutable data structure. Note that they can hold values of different types. We define lists using square brackets

In [None]:
x = [1, 'a', 2, 'b']
print(x)
type(x)

<br>
To access one of the elements of a list, use square brackets. Remember that the indexing in Python starts from zero.

In [None]:
print(x[0])
print(x[3])

<br>
In the case of lists we can change the values of individual elements

In [None]:
print(x)

x[2] = True

print(x)

<br>

Use `append()` to append an object to a list.

In [None]:
x.append(3.3)
print(x)

<br>

Another way to add elements in a list is by using the `insert()` function, specifying at what position to add each new object

In [None]:
l = ['s', 'c', 'i', 'e', 'n', 'c', 'e']
print(l)

l.insert(0, 'd')
l.insert(1, "a")
l.insert(2, "t")
l.insert(3, "a")
l.insert(4, " ")

print(l)

<br>

The function `range()` is a handy way to generate a range of integers, that you can then use to fill in a list. 

(The `range` function does not directly return a list, it returns something called an *iterator*, this is why we use the `list()` function to "consume" the iterator into a list.)

In [None]:
start = 10
stop = 30
step = 2

# consume the iterator created by range
l = list(range(start, stop, step))

print(l)
type(l)

<br>

We can create an empty list like this

In [None]:
l = []
print(l)

<br>

Use `+` to concatenate lists.

In [None]:
[1,2] + [3,4]

<br>

Use `*` to repeat lists.

In [None]:
[1]*3

<br>

Use the `in` operator to check if something is inside a list.

In [None]:
1 in [1, 2, 3]

## Loops

This is a simple loop in python. In every iteration x gets the next value in the range

In [None]:
for x in range(4):
    print(x)

<br>

In the same way we went through the numbers of a range, we can go through the items of any list (or iterator). This is an example of how to loop through each item in a list

In [None]:
x = [1, 'd', 5, True, 5.3]

for item in x:
    print(item)

<br>

Here is a different way, using the indexing operator and a while statement. The function `len()` returns the length (number of items) of a list

In [None]:
i=0
length = len(x)

while( i != length ):
    print(x[i])
    i = i + 1

## Slicing

Imagine you have the list `['a', 'b', 'c', 'd', 'e']` and want to cut out a part part of this list, e.g. the middle three elements from 'b' to 'd'. One way to do it would be with a for loop

In [None]:
full_list = ['a', 'b', 'c', 'd', 'e']
new_list = []

for x in range(1, 4):
    new_list.append(full_list[x])

print(new_list)

That is pretty inefficient. The Pythonic way of doing things is by using slicing like this

In [None]:
new_list = full_list[1:4]

print(new_list)

The two numbers inside the square brackets indicate the start and end index for the slicing. In the above case, this means to start slicing from element 1 and end at element 4, but not include it.

The colon in the middle is how Python understands that we want to use slicing to get objects in the list.

We can do the same with tuples

In [None]:
myTuple = (1, 2, 3, 4, 5, 6, 7)

print(myTuple[2:4])

There is also an optional second clause that we can add in our slicing ranges that allows us to set how the list's index will increment between the start and end indices that we've indicated (what step to use)

In [None]:
myList = list(range(0, 20))
print(myList)

#now slice from the beginning until the end, but select one every two elements
newList = myList[0:len(myList):2]
print(newList)

<br>

Lists have a default bit of functionality when slicing. If there is no value before the first colon, it means to start at the beginning index of the list. If there isn't a value after the first colon, it means to go all the way to the end of the list. This saves us time so that we don't have to manually specify `len(a)` as the ending index

In [None]:
#now slice from the beginning until the end, but select one every two elements
newList = myList[::2]
print(newList)

<br>

We can also slice in reverse, starting from the end. This is achieved by specifying a negative step. The -1 in the following example means to increment the index every time by -1, which effectively will traverse the list by going backwards.

In [None]:
#now reverse the list
newList = myList[::-1]
print(newList)

## Strings

<br>

Now let's look at strings. You can use either single quotes or double quotes to define strings

In [None]:
s = 'Hello world'
type(s)

<br>

Use bracket notation to slice a string, much as you would do with lists or tuples. You can think of a string as a list of characters

In [None]:
x = 'This is a string'
print(x[0]) #first character
print(x[0:1]) #first character, but we have explicitly set the end character
print(x[0:2]) #first two characters


<br>
This will return the last element of the string.

In [None]:
x[-1]

<br>
This will return the slice starting from the 4th element from the end and stopping before the 2nd element from the end.

In [None]:
x[-4:-2]

<br>
This is a slice from the beginning of the string and stopping before the 3rd element.

In [None]:
x[:3]

<br>
And this is a slice starting from the 3rd element of the string and going all the way to the end.

In [None]:
x[3:]

<br>

This will select one every two characters

In [None]:
x[::2]

<br>

Print out all the characters in a string

In [None]:
for character in x:
    print(character)

<br>

You can concatenate strings using `+` and you can repeat strings using `*`, like you would do to concatenate two lists

In [None]:
word1 = 'Data'
word2 = 'Science'

courseName = word1 + ' ' + word2
print (courseName)

print (word1 * 3)

You can check whether a substring appears in a string using the `in` keyword, like in lists

In [None]:
print('Data' in courseName)

<br>

`split` returns a list of all the words in a string, or a list split on a specific character.

In [None]:
aString = 'Data Science is cool'
listOfWords = aString.split(' ')

print(listOfWords[0])
print(listOfWords[-1])

<br>

To replace a substring with another, use `replace()`

In [None]:
aString = 'Data Engineering is cool'

s2 = aString.replace("Engineering", "Science")
print(s2)

<br>

Make sure you convert objects to strings before concatenating. The following like will cause an error

In [None]:
'Data Science - Lecture ' + 1

In [None]:
'Data Science - Lecture ' + str(1)

The print function allows you to format strings, either using the C-style formatting

In [None]:
aString = 'value1 = %f, value2 = %d' % (3.1415, 1)
print(aString) 

Or an alternative, more intuitive way of formatting a string specific to Python 

In [None]:
aString = 'value1 = {0}, value2 = {1}'.format(3.1415, 1)
print(aString)

## Dictionaries

<br>

Dictionaries associate keys with values. In the following example, names are the keys, and emails are the associated values

In [None]:
x = {'Dimosthenis Karatzas': 'dimos@cvc.uab.es', 'Guybrush Threepwood': 'guybrushg@monkey.island.com'}
print(x)

<br>

Retrieve a value by using the indexing operator

In [None]:
print(x['Dimosthenis Karatzas'])

<br>

You can add new elements in a dictionary in the following way

In [None]:
x['Nils Holgersson'] = 'nils@gmail.com'
print(x['Nils Holgersson'])

<br>
Iterate over all of the keys and print the values associated with each key:

In [None]:
for name in x:
    print(x[name])

<br>
Iterate directly over all of the values:

In [None]:
for email in x.values():
    print(email)

<br>
Iterate over all of the items in the list:

In [None]:
for name, email in x.items():
    print('Name: {0}, email: {1}'.format(name, email))

## List Comprehensions

List comprehensions provide a concise way to create new lists by selecting items from existing lists. Instead of a  cumbershome `for` statement like this:

<code>
new_list = []
for i in old_list:
    if filter(i):
        new_list.append(expression(i))
</code>

<br>

You can obtain the same thing using list comprehension:

<code>
new_list = [expression(i) for i in old_list if filter(i)]
</code>

<br>

Here is an example. Imagine you want to square all numbers in a list

In [None]:
l = [1, 13, 3, 6, 17, 3, 20]

lSquare = [x**2 for x in l]
print(lSquare)

<br>

Suppose now you want to select and square only the numbers above 10. We can add a filter in the list comprehension 

In [None]:
lSquare = [x**2 for x in l if x>10]
print(lSquare)

<br>

The function `zip` takes a bunch of N lists of the same length like 

`a: a1 a2 a3 a4 a5 a6 a7...`<br>
`b: b1 b2 b3 b4 b5 b6 b7...`<br>
`c: c1 c2 c3 c4 c5 c6 c7...`

and "zips" them into one list whose entries are N-tuples (ai, bi, ci). Imagine drawing a zipper horizontally from left to right.

In [None]:
l = [1, 13, 3, 6, 17, 3, 20]

lSquare = [x**2 for x in l]

zipped = list(zip(l, lSquare))

print(l)
print(lSquare)
print(zipped)

<br>

`zip()` in conjunction with the `*` operator can be used to unzip a list

In [None]:
l2, lSquare2 = zip(*zipped)

print(l2)
print(lSquare2)

## Dates and Times

To work with dates and times, we need to import the corresponding modules

In [None]:
import datetime as dt
import time as tm

<br>

`time()` returns a *timestamp* of the current time: the current time in seconds since the Epoch (January 1st, 1970)

In [None]:
tm.time()

<br>
Convert the timestamp to datetime.

In [None]:
dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow

<br>
Handy datetime attributes:

In [None]:
dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second # get year, month, day, etc.from a datetime

<br>

`timedelta()` represents a duration, expressing the difference between two dates

In [None]:
delta = dt.timedelta(days = 100) # create a timedelta of 100 days
delta

<br>

`date.today` returns the current local date

In [None]:
today = dt.date.today()
print(today)

We can add and subtract dates

In [None]:
print(today - delta) # the date 100 days ago
print(today + delta) # the date 100 days from now

And we can compare dates

In [None]:
today > today - delta # compare dates

# Reading and Writing CSV files

<br>

The datafile *mpg.csv* contains fuel economy data for 234 cars. Feel free to open the file with any text editor to see what the contents look like.

* mpg : miles per gallon
* class : car classification
* cty : city mpg
* cyl : # of cylinders
* displ : engine displacement in liters
* drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd
* fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)
* hwy : highway mpg
* manufacturer : automobile manufacturer
* model : model of car
* trans : type of transmission
* year : model year

Let's import our datafile. To do this, we will use the `csv` module, and its function `DictReader()` which maps the information read into a list of dictionaries (one dictionary per row in the file), whose keys are given by the values in the first row of the file (the *title* row).

In [None]:
import csv

with open('mpg.csv') as csvfile:
    mpg = list(csv.DictReader(csvfile))

mpg[:3] # The first three dictionaries in our list, corresponding to the first three lines

<br>
`csv.Dictreader` has read in each row of our csv file as a dictionary. `len` shows that our list comprises 234 dictionaries (234 rows in the file).

In [None]:
len(mpg)

<br>
`keys` gives us the column names of our csv. The keys are repeated in every row, so we can check any of the rows in this case to see them.

In [None]:
mpg[0].keys()

<br>
This is how to find the average cty fuel economy across all cars. All values in the dictionaries are strings, so we need to convert to float.

In [None]:
sum(float(d['cty']) for d in mpg) / len(mpg)

<br>
Similarly this is how to find the average hwy fuel economy across all cars.

In [None]:
sum(float(d['hwy']) for d in mpg) / len(mpg)

<br>

Use `set` to return the unique values for the number of cylinders the cars in our dataset have.

In [None]:
cylinders = set(d['cyl'] for d in mpg)
cylinders

<br>
Here's a more complex example where we are grouping the cars by number of cylinder, and finding the average cty mpg for each group.

In [None]:
CtyMpgByCyl = []

for c in cylinders: # iterate over all the unique cylinder levels we found before
    summpg = 0
    cyltypecount = 0
    for d in mpg: # iterate over all dictionaries
        if d['cyl'] == c: # if the cylinder level type matches,
            summpg += float(d['cty']) # add the cty mpg
            cyltypecount += 1 # increment the count
    CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple ('cylinder', 'avg mpg')

CtyMpgByCyl

<br>

The same example, but with the inner `for` written as a list comprehension

In [None]:
CtyMpgByCyl = []

for c in cylinders: # iterate over all the cylinder levels
    a = [float(d['cty']) for d in mpg if d['cyl'] == c]
    CtyMpgByCyl.append((c, sum(a)/len(a))) # append the tuple ('cylinder', 'avg mpg')

CtyMpgByCyl

<br>
Use `set` to return the unique values for the class types in our dataset.

In [None]:
vehicleclass = set(d['class'] for d in mpg) # what are the class types
vehicleclass

<br>
And here's an example of how to find the average hwy mpg for each class of vehicle in our dataset.

In [None]:
HwyMpgByClass = []

for t in vehicleclass: # iterate over all the vehicle classes
    summpg = 0
    vclasscount = 0
    for d in mpg: # iterate over all dictionaries
        if d['class'] == t: # if the cylinder amount type matches,
            summpg += float(d['hwy']) # add the hwy mpg
            vclasscount += 1 # increment the count
    HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple ('class', 'avg mpg')

HwyMpgByClass

Can you rewrite this, but with the inner for written as a list comprehension?