# Python basics 2

This notebook contains more basics of Python. Use it as a reference whenever needed.

In [1]:
## Python Syntax

In [None]:
### The Significant Whitespace

Most program languages use characters (e.g. `{...}`) or keyworks (e.g. `begin ... end`) to delimitate blocks of codes. 

In [None]:
#### When writing Python code, you rely on INDENTATION to structure your programs. 

All programming languages allow you to indent (and you should!), but in Python you **have to.**

Otherwise, you'll receive and IndentationError and your code won't work

In [None]:
#### How Indentation Works

- All statements with the same distance from the left border belong of the same block of code. 


- Sub-blocks are more indented, while the block ends at the line less indented.


- You should use **4 spaces** per indentation level.


- When a statement is too long it's good practice to avoid lines of code longer than 80 characters), it can be split with `"\"`


- **Never mix** spaces and tabs in a single source file


In [None]:
> #### Recommended Reading:
>
> Recommended Reading: [PEP 8 - Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)

In [None]:
##### The code is way more readable:


```python
# raw_input() reads from standard input (e.g. keyboard)
n_string = raw_input('enter a number, please')

if not n_string.isdigit():
    print("this isn't a number...")
else:
    n = int(n_string)
    if n == 0:
        print("zero? why zero?")
    elif n % 2 == 0:
        print("even")
    else:
        print("odd")
```

In [None]:
### Conditional Statements

A lot of programming has to do with executing a block of code only if a certain condition is verified. 

In Python, the `if-then-else` construct has the form:

```python
if condition1:
    statements
elif condition2:
    statements
elif condition3:
    statements
else:
    statements
```

Note that the `elif` and `else` clauses are optional. A conditional statement can contain a single `if` block, and nothing else.

In [None]:
import random
n = random.randrange(0,99)  # random number between 0 and 99
if n == 0:
    print("zero? why zero?")
elif n % 2 == 0:
    print("even")
else:
    print("odd")

In [None]:
## Flow control

### For Loops

Programming is of little use if we cannot repeat an instruction for an intended number of times. 

The `for` statement allows us to define **iterations** (i.e.taking items from an iterable) by following this template:

```python
for variable in sequence:
	statement
else:
	statement
```

The code in the optional `else` clause is executed if and only if the loop terminates successfully (i.e. without a **`break`**)

In [None]:
# let's iterate over our demo_list
for el in demo_list:
    print(el)

In [None]:
#### The enumerate function

The `enumerate()` is problably the most used among the functions that supports the iteration of an iterable. This funtion return the current item plus **its index** in the iteration process.

In [None]:
# use enumerate in the iteration over our demo_list
for i, el in enumerate(demo_list):
    print (i, "-->", el)

In [None]:
#### The range construct

The  `range()` construct can be used to control the iteration. It generate lists of numbers on the basis of the following three arguments:

- `start` : the first integer of the list
(default is 0)
- `stop` : one larger the last integer of the list (list stop at n - 1)
- `step`: the increment of the list (default is 1)

In [None]:
# let's play with range
print(range(0,10))
print(range(10))
print(range(1,10,2))

In [None]:
# let's use range in a for loop
for el in range(1, 10, 2):
    print(el)

In [None]:
### While Loops

The `while` statement allows us to control a loop on the basis of a condition. 

A `while` loop runs as long as a condition is verified. 

It has the following general form:

```python
while condition:
	statement
else:
	statement
```

the code in the optional `else` clause is execute if and only if the loops terminates successfully (i.e., without a **`break`**)

In [None]:
n = 1
while n % 2 != 0:
    n = random.randrange(0,99)
    print(n)

In [None]:
### Break and Continue

`break` and `continue` are two statements that allow for a more flexible control of a loop. Intuitively:

- `continue` is used to pass to the next iteration of the loop
- `break` is used to interrupt the loop abruptly

In [None]:
# when we encounter 7 we skip to the next step
for el in range(1, 10, 2):
    if el == 7:
        continue
    print(el)

In [None]:
# when we encounter 7 we stop our loop 
for el in range(1, 10, 2):
    if el == 7:
        break
    print(el)

In [None]:
`break` influence the execution of the loop in yet another way: when a loop terminates due to a `break` statement, the code embedded in the option `else` clause is skipped.

In [None]:
# the continue statement does not influence the execution of the else block
for el in range(1, 10, 2):
    if el == 7:
        print ("(let's ignore the " + str(el) + ")")
        continue
    print(el)
else:
    print (">>> the iteration ended with the number " + str(el))

In [None]:
# what if we replace continue with break
for el in range(1, 10, 2):
    if el == 7:
        print ("(we encountered the number " + str(el) + ", let's break the loop)")
        break
    print(el)
else:
    print (">>> the iteration ended with the number " + str(el))

In [None]:
### The Pass Statement

Given the importance of indentation for Python, sometimes we may need a placeholder that allows us to write down a condition for an `if-then-else` construct or for a `while` loop without writing any statement (maybe just a comment). This is the case in which the `pass` statement comes in handy. 

In what follows, **nothing happens**:

```python
if condition1:
    pass
else:
    pass
```


In [None]:
### List Comprehensions  (OPT)

A list comprehension is a syntactic construct that allows us to create lists by applying a function on another list, in just **one line** of code. 

Even if the reverse isn't always true, list comprehensions can always be (inefficiently) expressed as loops. We will exploit this family resemblance for introducing this construct.

In [None]:
In what follows, we start with a list of numbers and we want to square all of its elements and save our final values in a new list.

In [None]:
# our source list
source_list = [1,2,3,4,5,6,7,8,9]

In [None]:
# we can solve this problem with a for loop...
final_list = []
for el in source_list:
    final_list.append(el ** 2)
print(final_list)

In [None]:
# ... or by using list comprehension
final_list = [el ** 2 for el in source_list]
print(final_list)

In [None]:
**Conditional statements may be implemented**

In what follows we want to ignore all the odd numbers

In [None]:
# we can solve this problem with a for loop...
final_list = []
for el in source_list:
    if el % 2 == 0:
        final_list.append(el ** 2)
print(final_list)

In [None]:
# ... or by using list comprehension
final_list = [el ** 2 for el in source_list if el % 2 == 0]
print(final_list)

In [None]:
**If you want to implement an else clause the syntax changes slightly**

In what follows we want to leave the odd numbers unchanged

In [None]:
# we can solve this problem with a for loop...
final_list = []
for el in source_list:
    if el % 2 == 0:
        final_list.append(el ** 2)
    else:
        final_list.append(el)
print(final_list)

In [None]:
# ... or by using list comprehension
final_list = [el ** 2 if el % 2 == 0 else el for el in source_list]
print(final_list)

In [None]:
### Quiz

The following list contains 100 random extractions (with replacement) of numbers between 1 and 15. 

Find the number that has never been extracted

In [None]:
random_numbers = [1, 2, 1, 1, 9, 13, 15, 5, 9, 8, 12, 14, 3, 2, 8, 10, 3, 12, 15, 13, 5, 3, 7, 5, 2, 13, 12, 8, 10, 5, 15, 8, 2, 8, 5, 12, 9, 2, 3, 5, 1, 4, 5, 9, 13, 2, 12, 5, 10, 8, 1, 15, 15, 6, 12, 3, 1, 3, 7, 14, 15, 10, 15, 7, 10, 12, 1, 2, 13, 7, 9, 6, 6, 7, 4, 12, 10, 8, 8, 3, 8, 4, 6, 14, 10, 5, 2, 3, 15, 4, 9, 3, 7, 7, 2, 4, 4, 1, 7, 15]

In [None]:
# your code here

# Input / Output

This notebooks gives you some intuition of functions and Python packages, and then discusses input/output.

## Functions

Functions are constructs that allows us to organize portions of code more than once in a program. 

The alternative way to obtain the same results without functions would be to copy the same portion of code every time it is needed. 

Functions in Python are defined by a `def` statement, following this template:

```python
def function_name(parameters):
    """
    docstring
    """
    function_body
    return result
```

> The list of the parameters required by the function is reported between round brackets right after the name of the function. Each function may have **zero or more** parameters. When a function is called, its parameters are called **arguments**.
>
> The (optional) documentation string should be placed immediately after the function definition. There are many way to format your **docstring**, [PEP 287](https://www.python.org/dev/peps/pep-0257/) recommends reStructuredText, but more formats are available. See [this tutorial](http://daouzli.com/blog/docstring.html) for an introduction to the topic.
>
> The **indented** function body contains all the statements that are executed every time the function is called. When a `return` statement is executed, the function exits and its output is the argument of the `return` statement. 
>
> When there is no return statement in the body function, or when a return statement with no arguments is executed, the function  returns `None`

For instance, the following function calculates the number of characters in a string:

In [2]:
def chars(s):
    """
    Calculates the number of characters in a string
    """
    if not type(s) is str:
        return "This is not a string!"
    r = len(s)
    return r

The docstring is saved into a  `__doc__` variable and can be accessed by using the `help()` function or the IPython `?`

In [3]:
# don't use this, it is just to make the point
print(chars.__doc__)


    Calculates the number of characters in a string
    


In [1]:
# use one of this two
help(chars)
chars?

NameError: name 'chars' is not defined

In order to execute the code included in a function, you have to **call the function**, either in your script or in the interactive shell. For instance:

In [5]:
chars("voodoo")

6

In [6]:
chars(1979)

'This is not a string!'

### Parameters

A function can receive any number of parameters:

In [7]:
def higher(n1, n2, n3):
    """
    find the higher of three numbers
    """
    if n1 > n2 and n2 >= n3:
        return n1
    if n2 >= n3:
        return n2
    else:
        return n3

In [8]:
# a parameter can be passed either by position
higher(4, 2, 8)

8

In [9]:
# or by name
higher(n3 = 8, n1 = 4, n2 = 2)

8

#### Optional Parameters

In some situation it may be useful to have a default parameter value, that is used when a call leaves an arguments **unspecified**.

In [10]:
def higher(n1, n2 = 0, n3 = 0):
    """
    find the higher of three numbers
    """
    if n1 > n2 and n2 >= n3:
        return n1
    if n2 >= n3:
        return n2
    else:
        return n3

In [11]:
higher(9,4)

9

In [12]:
higher(-6)

0

#### Arbitrary Number of Parameters

A different situation is when we want our function to have an unspecified number of parameters. Python functions admit the so-called "tuple references", marked by an asterisk `*` in front of the last parameter  (that becomes a tuple)

In [13]:
def print_params(*params):
    print ("your input:")
    print (params)

In [14]:
print_params("Down from my ceiling", "Drips great noise", "It drips on my head through a hole in the roof") 

your input:
('Down from my ceiling', 'Drips great noise', 'It drips on my head through a hole in the roof')


#### Quiz

* Write a function that takes a string as input and returns a dictionary of tokens (sequences of characters separated by whitespace) as keys, and the number of times they occur as values. The `split()` method for string might be useful.

In [35]:
# your code here

---

## Modules and Packages

Python modules are groupings of related code that are structures as to facilitate its re-use. 

Physically, modules are `.py` files implementing a set of **functions, classes or variables**, as well as **executable statements**, that can be accessed from other modules by using the `import` command.

The `import` command can be used both to import **the whole code** of a module, using the following syntax:

```python
import module
```

or just **specific attributes** (one or more functions, variables, classes or a combination of these) with the following syntax:

```python
from module import name1, name2, name3
```

For example, if order to know what is our current working directory, we can use the function `getcwd()` available `os` module (see below) in two different ways:

In [15]:
import os
os.getcwd()

'/Users/giovannicolavizza/Dropbox/db_projects/Teaching/UvA_CDH_2020/notebooks'

In [16]:
from os import getcwd
getcwd()

'/Users/giovannicolavizza/Dropbox/db_projects/Teaching/UvA_CDH_2020/notebooks'

You can think of a **package** as a structured collection of Python modules.

## File Input/Output

A huge portion of our input data will come from files on disk, and a lot of our work will be saved in memory. So, mastering the art of reading and writing is crucial even in programming.

The following code opens a file in our filesystem, prints the first 10 lines and closes the file:

In [17]:
infile = open('data/adams-hhgttg.txt', 'r')
for i, line in enumerate(infile):
    if i == 10:
        break
    print(line)
infile.close()

The Hitch Hiker's Guide to the Galaxy 



for Jonny Brock and Clare Gorst 

and all other Arlingtoniansfor tea, sympathy, and a sofa







Far out in the uncharted backwaters of the unfashionable  end  of

the  western  spiral  arm  of  the Galaxy lies a small unregarded

yellow sun.



The key passage here is the one in which the `open()` function opens a file and return a **file object**, and it is commonly used with the following two parameters: the **name of the file** that we want to open and the **mode**. 

- **filename**: the name of the file to open

- the **mode** in which we want to open a file: the most commonly used values are `r` for **reading** (default), `w` for **writing** (overwriting existing files), and `a` for **appending**. (Note that [the documentation](https://docs.python.org/2/library/functions.html#open) report mode values that may be necessary in some exceptional case)

>**IMPORTANT**: every opened file should be **closed** by using the function `close()` before the end of the program, or the file could be unavailable to successive manipulations or for other programs.

There are other ways to read a text file, among which the use of the methods `read()` and `readlines()`, that would simplify the above function in:

```python
infile = open('data/adams-hhgttg.txt', 'rt')
text = infile.read()
print(text[:10])
infile.close()
```

However, these methods **read the whole file at once**, thus creating huge problems when working with big corpora.

In the solution we adopt here the input file is read line by line, so that at any given moment **only one line of text** is loaded into memory. 

---

Writing an output file in Python has a structure that is close to that we're ued in our reading examples above. The main difference are 

- the specification of the **mode** `w`


- the use of the function `write()` for each line of text

In [18]:
outfile = open('stuff/output-test-1.txt', 'w')
outfile.write("My name is:")
outfile.write("John")
outfile.close()

11

4

> When writing line by line, it's up to you to take care of the **newlines** by appending `\n` to each line

In [19]:
outfile = open('stuff/output-test-2.txt', 'w')
outfile.write("My name is:\n")
outfile.write("Alexander")
outfile.close()

12

9

### The with statement 

A `with` statement is used to wrap the execution of a block of code.

Using this construction to open files has three major advantages:

- there is no need to explicitly  close the file (the file is automatically closed as soon as the nested code exits)


- the file is closed automatically even when unhandled errors cause the program to crash


- the code is way clearer (it is trivial to identify where in the code a file is opened ) 

In [21]:
# that's how I usually open files
with open("data/adams-hhgttg.txt", "r") as infile:
    for i, line in enumerate(infile):
        if i == 10:
            break
        print(line)

The Hitch Hiker's Guide to the Galaxy 



for Jonny Brock and Clare Gorst 

and all other Arlingtoniansfor tea, sympathy, and a sofa







Far out in the uncharted backwaters of the unfashionable  end  of

the  western  spiral  arm  of  the Galaxy lies a small unregarded

yellow sun.



### Looping through folders and files

We can use the `os` module to loop through a folder and load multiple files in memory.

In [22]:
gutenberg_books = dict()

for root, dirs, files in os.walk("data/gutenberg-extension"):
    for file in files:
        gutenberg_books[file] = open(os.path.join(root,file)).read()

In [23]:
gutenberg_books.keys()

dict_keys(['doyle-sherlock.txt', 'stoker-dracula.txt', 'README', 'shelley-frankestein.txt', 'austen-pride.txt', 'joyce-dubliners.txt'])

In [26]:
print(gutenberg_books['doyle-sherlock.txt'][:300])

﻿[The Adventures of Sherlock Holmes by Arthur Conan Doyle]

ADVENTURE I. A SCANDAL IN BOHEMIA

I.

To Sherlock Holmes she is always THE woman. I have seldom heard
him mention her under any other name. In his eyes she eclipses
and predominates the whole of her sex. It was not that he felt
any emotion


---

### Exercise 1.

The [factorial](https://en.wikipedia.org/wiki/Factorial) of an integer $n$, defined as:

$$
n! = \begin{cases}
               1               & n = 1\\
               n * (n-1)! & \text{n > 1}
           \end{cases}
$$

is the product of all positive integers less than or equal to $n$. For example:

$$4! = 4 * 3 * 2 * 1$$

$$3! = 3 * 2 * 1$$

The factorial operation can be implemented in Python both as a recursive function and as an iterative functions. 

Write one factorial function picking the approach you prefer.

In [12]:
# your code here

### Exercise 2.

Read the file `data/adams-hhgttg.txt` and:

- Count the number of lines in the file

- Count the number of non-empty lines

- Read each line of the input file, remove its newline character and write it to file `stuff/adams-output.txt`

- Compute the average number of alphanumeric characters per line

- Identify all the unique words used in the text (no duplicates!) and write them in a text file called `stuff/lexicon.txt` (one word per line)

In [None]:
# your code here

with open("stuff/lexicon.txt", "w") as infile:
    infile.write("something")

---