# Session 5: Data structures, files, and errors

*Data Structures and Algorithms*

*Achyuthuni Sri Harsha*

------------------------------------------------------------------------

This session will introduce you to the following Python concepts: *dictionaries, sets and tuples, error handling, input and output, and debugging*.

We have previously covered many key Python concepts, including the use of variables and functions, and the list data structure. In this session we'll learn to use some of Python's other compound data structures: dictionaries, sets, and tuples. Each of these has its own benefits, and we'll learn to pick the appropriate one for the kind of problem and data we're facing. We will also read data from files into Python. As our programs are starting to grow in size and complexity, we'll also go over some ways to debug code and handle errors that may occur.

------------------------------------------------------------------------

## Preparation:

**Readings:**

Guttag: Chapters 4.5-4.6, 5.6, 6.2, 7.

OR

Sweigart, Al. Automate the Boring Stuff with Python.

-   Chapter 5 – Dictionaries
    <https://automatetheboringstuff.com/chapter5/>

-   Chapter 8 – Files <https://automatetheboringstuff.com/chapter8/>

-   Chapter 10 – Debugging
    <https://automatetheboringstuff.com/chapter10/>

***Optional Readings:***

Miller and Ranum. Problem Solving with Algorithms and Data Structures.

-   <https://interactivepython.org/runestone/books/published/pythonds/index.html>

-   Section 6.5: Hashing (how dictionaries work)

Sweigart, Al. Automate the Boring Stuff with Python.

-   Chapter 11 - Web scraping
    <https://automatetheboringstuff.com/chapter11/>

Murray, Scott. Interactive Data Visualization on the Web.

-   <http://chimera.labs.oreilly.com/books/1230000000345/index.html>

-   Chapter 3 only, until Javascript

Berlind, David. What Are APIs and How Do They Work?

-   <https://www.programmableweb.com/api-university/what-are-apis-and-how-do-they-work>

**Questions:**

Please read the material above, and think about how you would explain to
your classmates:

-   When is a dictionary useful compared to a list?

-   When and why do we use exceptions?

-   What are HTML, DOM, and CSS? Using these terms, how would you
    briefly explain how a browser renders a web page on your screen?

-   What are APIs, and what are their main benefits?

------------------------------------------------------------------------

## Built-in data structures

Python has several data structures built-in in its core components. You
have already been introduced to one of them: the list. Today we'll cover
the `dictionary`, `set`, and `tuple` structures.

### Dictionaries

We have been using lists, where we could access the list contents by
their **position** in the list. Lists are flexible and may contain
elements of different data types. For example, we might want to keep
track of people and their email addresses. One way to do it would be as
follows.

    >>> l = [['Sally', 'sally[at]gmail.com'], ['Fred', 'fred[at]gmail.com'], 
             ['John', 'john92[at]yahoo.com'], ['Ellie', 'ellie[at]hotmail.com']]

Let's say we want to look up Ellie's email in the list. How do we do
this? Our list is not sorted, but we could do a linear search within the
first items of each lists to find her name, and then access her email.
We would look up index `[0][0]`, see that it is not Ellie, then
`[1][0]`, see that it is not Ellie, until we find Ellie, and then find
the email.

This is cumbersome and slow, especially if we need to do it repeatedly.
If we keep the list sorted, we can then use binary search for a faster
lookup, but is there a way we can do even better?

The answer is **yes**, using a *dictionary*.

A dictionary is a mapping of **keys** to **values**. Here, we want to
associate each name to an email address, or in other words *map* names
to email addresses. So, the **names** are our **keys**, and the
**emails** are the **values**.

Once we've set up a dictionary, we can access the values directly using
the **key** instead of an integer index. This is much simpler than going
through the list indices to find the right name.

Once our dictionary is ready, we will be able to do the following (but
not yet!):

    In[1]: email_dict['Ellie']
    Out[1]: 'ellie[at]hotmail.com'

How do we create a dictionary? We put key-value pairs inside curly
braces. Each key is followed by a colon and then its value, and the
pairs are separated by commas. This creates a new object of dictionary
type, which Python calls just "dict".


In [1]:
email_dict = {
    'Sally': 'sally[at]gmail.com',
    'Ellie': 'ellie[at]hotmail.com'
}
email_dict

{'Sally': 'sally[at]gmail.com', 'Ellie': 'ellie[at]hotmail.com'}

In [2]:
type(email_dict)

dict

To access a value, we can now do what we wanted.

In [3]:
email_dict['Ellie']

'ellie[at]hotmail.com'

If we try to access a key that does not exist in the dictionary, we get an error, just like we would get one if tried to access a list out of its index range.

In [4]:
email_dict['Fred']

KeyError: 'Fred'

We can create an empty dictionary with the function `dict()`, or just curly braces. An empty dictionary is also displayed as empty curly braces, just like an empty list is shown as empty square brackets. Let's restart our dictionary from scratch.

In [5]:
email_dict = dict() # or: email_dict = {}
email_dict

{}

To update a value associated with a key, we use syntax very similar to lists. The difference is that we don't use update the value at an integer index, but the value at a dictionary key.

If the key already exists in the dictionary, the value is updated. If the key does not exist, it is added with the specified value. So, to add all of our four friends to the empty dictionary, we run these statements.

In [6]:
email_dict['Sally'] = 'sally[at]gmail.com'
email_dict['Fred'] = 'fred[at]gmail.com'
email_dict['John'] = 'john92[at]yahoo.com'
email_dict['Ellie'] = 'ellie[at]hotmail.com'

Now, if Ellie changes her email address, we can update it with a similar command.

In [7]:
email_dict['Ellie'] = 'ellie[at]gmail.com'

If we want to remove an entry altogether, we use the `del` command, giving the dictionary name and the key.

In [8]:
del email_dict['Fred']

To test whether a key exists in the dictionary, we can use the `in` keyword. This will give the a Boolean value True or False.

In [9]:
'Fred' in email_dict

False

In [10]:
'Ellie' in email_dict

True

This is the basic syntax of dictionaries.

To recap, the dictionary is a data structure that associates keys to values. You might have observed that because of this key-value association, each dictionary key must be *unique*. So, if we want to add another friend called Ellie, we'll need to give her a different key, perhaps with her surname included.

A word on the data we can use as dictionary keys and values. Dictionary values can be any Python objects (numbers, strings, lists, other dictionaries...). The data types we can use keys, however, are more restricted. We can use immutable types like strings and numbers, but not mutable types like lists or other dictionaries. This is because with a mutable key like a list, the value of the key might change, and this would confound our lookups.

| Keys                       | Values                               |
|----------------------------|--------------------------------------|
| must be unique             | could be the same for different keys |
| immutable (hashable) types | any Python objects                   |

Let's pause for a moment and think about why the dictionary structure is
useful. As we saw above, we can achieve the same functionality using
lists. But the dictionary is much more **intuitive** to use when looking
up elements. With a list, we would need to write a loop to go through
the data. With dictionary, it's also much easier to check whether a
contact is in our address book already, and avoid accidentally adding
the same contact twice.

However, besides convenience, there is another crucial reason for using
a dictionary: it is blazingly **fast**. When we look up 'Ellie' in a
list, this is a linear time operation as we would have to loop through
the list. In a dictionary this only takes constant time on average. This
may surprise you: how does the dictionary avoid the looping? This has to
do with the ingenious design of the data structure. Python Dictionaries
are implemented as a hash table structure. We will not go into detail
here on how they work, but you will find more information on hash tables
in the readings for this session.

To sum up, dictionaries are both convenient and fast when our data
involves **mappings from keys to values**. In some ways, you can think
of a list as a similar mapping, but of integers to values, and
dictionary as a mapping with more general keys. Notice however that
**dictionaries are not sorted**, and **we cannot access the contents by
index**. For tasks that require the ordering of items, we often prefer
lists over dictionaries.

### Looping through dictionaries

Many dictionary applications involve looping. Recall the general
structure of looping through a sequence, like a list or a string:

    for item in sequence:
        statements

A dictionary contains key-value pairs, so in effect two possible
variables per element, so how does looping work? A simple loop will use
the **keys** as the loop variable:

In [11]:
for item in email_dict:
    print(item)

Sally
John
Ellie


Notice that because a dictionary is not sorted, the results may come out in different order depending on the system setup, not alphabetically or in the order they were added.

If we want to print out both the key and the value, we can loop either access the value at each key, or call the `items()` method of
dictionaries:

In [12]:
for key in email_dict:
    print(key, email_dict[key])

Sally sally[at]gmail.com
John john92[at]yahoo.com
Ellie ellie[at]gmail.com


In [13]:
for key, value in email_dict.items():
    print(key, value)

Sally sally[at]gmail.com
John john92[at]yahoo.com
Ellie ellie[at]gmail.com


If we want to loop just through the values, we can similarly use the `values()` method.

In [15]:
for value in email_dict.values():
    print(value)

sally[at]gmail.com
john92[at]yahoo.com
ellie[at]gmail.com


Now, let's practice working with dictionaries with the following
exercises.

a\. Complete the function `print_dict_values` in the script file
`ses05.py`. The function accepts a dictionary as argument, and prints
all its key/value pairs as specified in the file. The function returns
the number of keys in the dictionary.

**Hint**: you can use `sorted()` to sort the keys of the dictionary, and
then loop through them. You can get the keys of a dictionary `d` as a
list with

    >>> keys = list(d.keys()) # get keys as list
    >>> sorted_keys = sorted(keys) # sort keys

You'll then need to loop through the keys to print the key-value pairs.

b\. Complete the function `count_characters` that accepts as argument a
string and counts the number of each character in the string. The
function should return a dictionary with characters as keys and their
occurrence counts as values.

This works as follows:

    >>> count_characters('hello')
    {'h': 1, 'e': 1, 'l': 2, 'o': 1}

**Hint**: you'll need to loop through the characters of a string, and
update a dictionary values as you go along. You may assume the input
string is lower case. If this was not the case, what string method would
you need?

### Sets

A set is an unordered collection of unique items. A set is created using
the function`set()`. For example, if we have a list with some duplicate
items, passing it to `set()` will create a set that only contains the
unique elements of the original list.

In [16]:
a = [0, 1, 1, 0, 2, 3]
set(a)

{0, 1, 2, 3}

In [17]:
set([0, 1, 1, 0, 2, 3, 'a', 'b', 'a'])

{0, 1, 2, 3, 'a', 'b'}

Alternatively, we can create a set like a list by replacing `[` and `]` by `{` and `}` respectively:

In [18]:
a = {0, 1, 1, 0, 2, 3}
type(a)

set

In [19]:
a

{0, 1, 2, 3}

The standard operations on a set include adding and removing elements, as well as union and intersection. Adding and removing works with the `add` and `remove` methods. 

In [21]:
a = {0, 1, 2}
a.add(3)
a

{0, 1, 2, 3}

In [22]:
b = {2, 3, 4, 5, 6}
b.remove(6)
b

{2, 3, 4, 5}

A union between two sets means taking all items that appear in at least one of the sets. An intersection means taking only the items that appear in both sets.

In [23]:
a.union(b)

{0, 1, 2, 3, 4, 5}

In [24]:
set.union(a, b) # Equivalent way of giving previous command

{0, 1, 2, 3, 4, 5}

In [25]:
a.intersection(b)

{2, 3}

Looping through a set works much like looping through a list.

In [26]:
nordic_countries = {'Sweden', 'Finland', 'Norway', 'Denmark', 'Iceland'}
for country in nordic_countries:
    print(country)

Norway
Finland
Iceland
Sweden
Denmark


We can check membership with `in`:

In [27]:
'Estonia' in nordic_countries

False

In [28]:
'Finland' in nordic_countries

True

As with dictionaries, the inner workings of sets involve hashing, and lookups are on average constant time \$O(1)\$.

### Tuples

A tuple is an ordered sequence. We have already seen two ordered
sequences: lists and strings. Recall that a list was flexible and
*mutable*: its elements could be any Python objects, and we could change
the values of these elements. A string, however, was an *immutable*
sequence of *characters* only, where we could not do simple changes.

A tuple is flexible in contents, like a list, but immutable, like a
string. That is, a tuple is fixed size and includes fixed values.

We create tuples using parentheses instead of brackets.

In [29]:
cities = ('Helsinki', 'St. Petersburg', 'Stockholm')
type(cities)

tuple

We can access, slice, and loop through tuples like lists.

In [30]:
cities[2]

'Stockholm'

In [31]:
cities[0:2]

('Helsinki', 'St. Petersburg')

In [32]:
for city in cities:
    print(city)

Helsinki
St. Petersburg
Stockholm


Since tuples are immutable, if we try to change a value, we will get an error:

In [33]:
cities[0] = 'London'

TypeError: 'tuple' object does not support item assignment

Due to the fixed size of the tuple, it is slightly faster than a list.
However, this becomes noticeable only when you do have several thousands
of elements, while a list offers a much more dynamic structure. Tuples
are useful when we need an ordered sequence and want to be sure that the
elements cannot be mutated.

Here's one common example where you will see tuples. If you have a
function that returns multiple values, they are returned as a tuple by
default. Let's define such a function:

In [34]:
 def squared_and_cubed(x):
    """
    Returns x squared and x cubed.
    """
    return x**2, x**3

Now, if we call this function, we get a tuple. If we wish to assign the values to different variables, we can use multiple assignment.

In [35]:
squared_and_cubed(2)

(4, 8)

In [36]:
x, y = squared_and_cubed(2)

Now practice working with sets and tuples with the following exercises.

### Comprehensions

We have now gone through the most important standard Python data
structures, and looping through them. Now, let's add a useful way of
augmenting looping to our toolbox: **comprehensions**.

We've learned to use loops to create lists. For instance, to create the
squared values of integers from 0 to 9, we can do:

In [37]:
l = []
for i in range(10):
    l.append(i**2)
print(l)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


However, there is another possible syntax, more concise and more
*pythonic*, which is a slightly geeky way to say that the code follows
the conventions and best practices of the Python coding community. We
can do this in one line of code using what is called a *list
comprehension*.

Here we create a list using using square brackets, and inside we define
the action we wish to perform for each item in the range sequence.

In [38]:
l = [i**2 for i in range(10)]
print(l)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


Comprehensions are a concise way to program the creation of a list. Here's another example:

In [39]:
l = [i * 'a' for i in range(10)]
print(l)

['', 'a', 'aa', 'aaa', 'aaaa', 'aaaaa', 'aaaaaa', 'aaaaaaa', 'aaaaaaaa', 'aaaaaaaaa']


We can add a condition to filter out odd numbers:

In [40]:
l = [i for i in range(10) if i%2 == 0]
print(l)

[0, 2, 4, 6, 8]


We can generally think of the comprehension structure of consisting of
doing mapping and filtering on data. The mapping could be any function
that works on the items in the sequence.

    [mapping_of_item for item in sequence if filtering_condition]
    
We can also create other data structures by comprehensions, for example sets and dictionaries. Here's an example:

In [41]:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
letter_numbers = {alphabet[i]:i for i in range(len(alphabet))}

We can accomplish the same results with standard loops, but comprehensions can be very convenient for simplifying our code.

## Errors and debugging

By now, you may have become quite familiar with some of Python's error
messages. You may have also have spent time trying to figure out why
your program produces a specific error, or does not produce the correct
result. These are called bugs and the process of finding and fixing them
is called debugging.

Fortunately, Python is quite is expressive in dealing with errors,
telling us what went wrong and where this happened in our code. Together
with the fact that Python code is interpreted line by line, this
expressiveness makes finding and dealing with code errors convenient
compared to many other languages.

When an error occurs, the Python interpreter tries to give us as much
information as possible about the error. For instance, consider the
following code:

In [42]:
text = 'hello'
print(txt)

NameError: name 'txt' is not defined

The first line shows that our instruction `print(txt)` returned a
*NameError*, which quite explicitly tells us that something is wrong
with the variable name. The *Traceback* then points to the line at which
the error occurred. Here the code was run in the console. If we were
running a complex program, it would point to the specific line in the
script file. Finally, details about the `NameError` are given, here,
`name 'txt' is not defined`.

This is a fairly simple case, the name is misspelt (`txt` instead of
`text`). Of course, if we would have declared the `txt` before this for
some other purpose, the error wouldn't have been raised - we would have
printed the value of that variable.

Another example of error occurs when we attempt invalid operations, such
as adding an integer to, say, a string (of characters):

In [43]:
s = 'a little string of size '
'This is ' + s + '.'

'This is a little string of size .'

In [44]:
s + 26

TypeError: can only concatenate str (not "int") to str

Again, Python is explicit about the problem with converting types.

These kind of bugs are easy to spot as the program crashes. But often
bugs are more difficult to spot: the program may run through but produce
an incorrect result. Or it may perhaps work in the most common input
cases, but fail spectacularly for other less common cases. Such glitches
can potentially cause a lot of damage, as examples of stock-trading
algorithm malfunctions show.

Debugging is connected to the broader process of *testing* your program.
The purpose of testing is to find any bugs that might exist. This
involves running your program for various inputs and checking whether
the behaviour is as intended. When designing complex programs, this
process starts with the design of the program, and plays an important
role in the modularization of the program into smaller parts, such that
the functioning of each can be tested separately from the rest.

Program testing is a deep topic and we will not go into much detail
here; you can read more in Guttag's Chapter 6.1. The key, for our
purposes, is finding a set of inputs that you believe would be very
likely to catch most problems in your program, much like our OK tests.
So for example, if calculating the absolute value of a number, you might
require tests with both positive and negative numbers, and perhaps zero,
and reasonably believe that this would capture most errors.

    def abs_value(x):
        """
        Returns absolute value of parameter x

        Example use:
        >>> abs_value(2)
        2
        >>> abs_value(-1)
        1
        >>> abs_value(0)
        0
        """
        # code here

> **Advanced** This type of tests specified within the function are
> called *doctests*. You can run doctests for the file `file_name.py`
> using the following command on the command line:
>
>     python -m doctest -v file_name.py
>
> In Spyder, you can quickly run a single doctest by placing the cursor
> on a line starting with `>>>` and hitting `F9`. (Remember to define
> the function itself first!)

When working on a program and testing it on some input, we may then run
into a problem: either there's an error, as in the above simple
examples, or the program produces the wrong result.

Let's look at an example. Suppose we're working on the piece of code to
calculate percentage changes. Here we haven't written any tests for our
code yet.

This code looks like it should work:

In [45]:
def abs_value(x):
    """
    Returns absolute value of parameter x

    Example use:
    >>> abs_value(2)
    2
    >>> abs_value(-1)
    1
    >>> abs_value(0)
    0
    """
    # code here

When working on a program and testing it on some input, we may then run
into a problem: either there's an error, as in the above simple
examples, or the program produces the wrong result.

Let's look at an example. Suppose we're working on the piece of code to
calculate percentage changes. Here we haven't written any tests for our
code yet.

This code looks like it should work:

In [46]:
def pct_change(L):
    """
    Calculates per-period percentage changes (returns)

    Parameters:
        L: a list of numbers 

    Returns a new list of percentage changes
    """
    pct_change_list = []
    for ind in range(len(L)):
        change = (L[ind]-L[ind-1])/L[ind-1]
        pct_change_list.append(change)

    return pct_change_list

# test with a specific input
pct_change([4, 2, 1, 2])

[1.0, -0.5, -0.5, 1.0]

What should happen with our test? If we calculate the percentage changes
by hand, the first change should be -50% (from 4 to 2), then the second
again -50% (from 2 to 1), and then 100% (from 1 to 2).

But the result we now get is this:

    [1.0, -0.5, -0.5, 1.0]

We have four values instead of three, and the first value is 100% (1.0
is equal to 100%)!

Let's try to debug the problem. Often, the first debugging step is
**adding a print statement** to understand what our program is actually
doing. Here, it looks like the variable `change` is somehow calculated
wrong. So, let's print out some details on how it is calculated.

In [47]:
def pct_change(L):
    """
    Calculates per-period percentage changes (returns)

    Parameters:
        L: a list of numbers 

    Returns a new list of percentage changes
    """
    pct_change_list = []
    for ind in range(len(L)):
        print('ind', ind, 'L[ind-1]', L[ind-1], 'L[ind]', L[ind])
        change = (L[ind]-L[ind-1])/L[ind-1]
        pct_change_list.append(change)

    return pct_change_list

At index 0, the calculation is wrong: it calculates the difference from
2 to 4. That is, when we access `L[ind-1]` at `ind=0`, we're accessing
`L[-1]`, which is the *last value of the list*. So, we're calculating
the change from the last to the first value, which does not make sense
here.

One way to fix this is by a condition. We'll say that our program should
not make the calculation at index zero, but produce the value `None`
instead. THen, it should work.

In [48]:
def pct_change(L):
    """
    Calculates per-period percentage changes (returns)

    Parameters:
        L: a list of numbers 

    Returns a new list of percentage changes
    """
    pct_change_list = []
    for ind in range(len(L)):
        if ind == 0:
            change = None
        else:
            change = (L[ind]-L[ind-1])/L[ind-1]
        pct_change_list.append(change)
    return pct_change_list

### Debugging in Spyder

Sometimes adding print statements is not enough to figure out the
problem in the code. We can then use Spyder's built-in debugging
facilities.

Let's try this by creating a new empty Python script file and copying
the code above to the file. Then save the file and give it a name of
your choice, for example `debug_test.py`.

Spyder has a debugging menu: we can start debugging the file by clicking
on the blue play/pause icon or using `Ctrl+F5`. Notice the change in the
console, which no prompts you with `ipdb>` to signify we've entered the
debugger.

We can then run the file line-by-line using `Ctrl+F10`, or the next
button in the toolbar. Start the debugger use the command to run the
next line. The color of the line where the debugger is will change as
you go along. When you reach the line

    pct_change([4, 2, 1, 2])

instead of stepping to the next line, let's step *into* the function.
This moves the execution point inside the function, where we can move
line by line with `Ctrl+F10`. This will define the variables as we go
along. Stepping into functions like this is useful for seeing what
happens inside our code and what may be causing an error. The Spyder
variable explorer on the top right hand side of the window will show you
the values and types of the variables at any given point in time. This
can be very useful for spotting problems. We can stop the debugger with
`Ctrl+Shift+F12` or the blue stop button.

If we want to skip to a specific point of code to debug a problem, we
can also set a breakpoint on that line (using `F12`), and run the code
until that breakpoint in the debugger.

Alternatively, if we run code in Spyder and our program crashes, we can
write `%debug` into the console. This will bring the execution back to
the point where the problem happened and we can inspect the variables at
that point. This is useful for "post-mortem" debugging.

The value of debugging using Spyder is being able to step through the
code and seeing how the variables change and where the problem occurs.

Debugging is a skill that takes a while to learn. Some common types of
bugs include:

-   Referring to a wrong variable name
-   Passing arguments to a function in the wrong order
-   List indexing
-   Mixing list values and indices when looping
-   Forgetting parentheses () when calling a function - this will just
    refer to the function object

**Exercise.** Consider the following function fact(n). Given an integer
\$n\$, it should return the factorial \$n!\$, that is, multiply together
the integers from 1 to \$n\$.

    def fact(n):
        """ A buggy factorial"""
        accumulator = 1 # fact(0) = 1
        for i in range(1, n):
            accumulator *= i
        return accumulator

However, the code does not quite work. Copy the code into Spyder and use
the debugger to find the mistake.

## Handling exceptions

In addition to bugs in our code, a program may run into an error for
various reasons. It may try to open a file or scrape a website that does
not exist. Or it may rely on user input that may come in the wrong
format. Up to now, we have simply said that if an error comes up, this
is fatal for the program, causing it to stop executing. But this is
usually not very convenient: we don't want a website or program to crash
at the simplest error. Instead of crashing, we can *handle* errors
within programs.

Errors can be alternatively thought of as *exceptions*. Most often, an
error is a situation where a statement is outside Python's appropriate
syntax or semantics. When we started defining functions, we used the
example of converting miles to kilometers.

In [49]:
def miles_to_km(miles):
    """
    Convert miles to kilometers.
    """
    return miles * 1.609

Now, suppose that instead of using a number, we accidentally call the
function with a string input `'9'`. The interpreter finds this operation
is not well defined, and *raises an exception* (error) called TypeError.

In [50]:
miles_to_km('9')

TypeError: can't multiply sequence by non-int of type 'float'

The error causes our program to terminate. But we might expect that the
function is sometimes called with this kind of bad input. If we
anticipate this error, we can try to *handle* it so our program can
continue executing. This is often also called *catching* errors. The
idea is that instead of stopping execution, we tell Python what to do
when the exception/error occurs. Our program execution can then
continue.

You can see this type of behaviour every day when you use a computer.
For example, if you're filling out a form requiring a date on a website,
and type a date in illegal format, the program does not crash but tells
you what happened and prompts you again for a valid date.

Catching exceptions in Python is done with the following syntax:

    try:
        # block of code
    except ExceptionName:
        # code in case of error

Let's return to our error-producing example:
We tried to add an integer and a string, which produced an error.

With the try-except statement, we can try to catch such errors and
correct the input. Suppose that we anticipate one of the inputs to be a
string instead of an integer.

We can change the code as follows using the above syntax in the case of
TypeError to make the error clearer:

In [51]:
def miles_to_km(miles):
    """
    Convert miles to kilometers.
    """
    try:
        return miles * 1.609
    except TypeError:
        print('The input should be an integer or a float.')

Now, if we try to a string input, we get the message instead of an error:

In [52]:
miles_to_km('9')

The input should be an integer or a float.


Depending on how we want to design the program, we could go further and try to convert strings to integer values within the function.

In [53]:
def miles_to_km(miles):
    """
    Convert miles to kilometers.

    Try to convert string inputs into integers.
    """
    try:
        if type(miles) == str:
            miles = int(miles)
        return miles * 1.609
    except TypeError:
        print('The input should be a float, an integer, or an integer string.')

Now our program works for integers in string form.

In [54]:
miles_to_km('9')

14.481

Notice however that how we want to do handle exceptions depends on the goal of the program. What would happen with the following function calls?

In [55]:
miles_to_km([9])

The input should be a float, an integer, or an integer string.


In [56]:
miles_to_km('9.1')

ValueError: invalid literal for int() with base 10: '9.1'

We haven't specified what to do in the latter case, which produces a
`ValueError`, and we would need to handle it differently.

Apart from the specific error `TypeError` that was inserted above,
Python includes several types of built-in errors. Additionally, many
libraries override the error classes to provide errors that suit their
needs.

We can always use the most generic term for error, which is `except:`
with no specified type to catch all errors regardless of their type.
However, in general it is good practice to be as specific as possible
about the exceptions we use in order not to accidentally intercept all
possible errors and problems in code.

Exceptions are useful more generally than for catching errors, as they
can be used to control program flow. It is also possible to define and
raise our own types of exceptions in specific situations. For more
information, see Guttag's chapter 7.2. It is also useful to learn to use
*assertions* to confirm results of computations, see Guttag 7.3.

Let's assume that you are given the code below:

    >>> def divide(a, b):
    ...     """divides a with b"""
    ...     return a / b

Now, if you execute `divide(10, 2)`, it should work. However, in the
case of `divide(10, 0)`, what does it do? Can you correct it with an
exception?

Complete the function `divide` in `ses05.py`.

## Reading data from files

Most algorithms in real applications require auxiliary input data from
files, or need to export results or parameters into files. Python offers
a convenient built-in way to parse files. Later on in the course, we'll
look at some application-specific libraries which include their own ways
of parsing files for their purposes.

We can use the command `open(filename, flags)` to open a file. Here,
`flags` is `'r'` if we are reading the file, `'w'` if we are writing
into it (there are also some more advanced options available, which we
won't go into here).

The zip file contains the text file `odyssey.txt`, downloaded from
[Project Gutenberg](https://www.gutenberg.org/). You can open the file
by typing the following commands into Spyder's console:

In [57]:
filename = 'odyssey.txt'
f = open(filename, 'r')

The variable `f` that we have just defined contains the required
information so that we can parse the text if we want to. However, we
have just opened the file, not actually read its contents. Let's start
reading the data:

In [58]:
l = f.readline()
print(l)

ï»¿Project Gutenberg Etext The Odyssey, by Homer, Butcher & Lang Tr



What you have just done is that one line has been read from the file we
opened. If we call it again, the next line will be parsed till the end
of file is reached. After we're done, we should clean up by closing the
file with the command

In [59]:
f.close()

Files often contain several (thousands of) lines, so typically we loop
through them and parse as many lines as we wish. Furthermore, we can
automatically have the file close using the keyword `with - as`. After
the indented block following `with`, the file closes automatically.

In [60]:
# parsing the first n lines of the file.
with open(filename, 'r') as f:
    for line in f:
        print(line)

ï»¿Project Gutenberg Etext The Odyssey, by Homer, Butcher & Lang Tr

#3 in our series by Homer





[Please note:  this is version 08a:  it needs more proofreading

and is based on different source than version dyssy10.txt/.zip]



Copyright laws are changing all over the world, be sure to check

the copyright laws for your country before posting these files!!



Please take a look at the important information in this header.

We encourage you to keep this file on your own disk, keeping an

electronic path open for the next readers.  Do not remove this.





**Welcome To The World of Free Plain Vanilla Electronic Texts**



**Etexts Readable By Both Humans and By Computers, Since 1971**



*These Etexts Prepared By Hundreds of Volunteers and Donations*



Information on contacting Project Gutenberg to get Etexts, and

further information is included below.  We need your donations.





THE ODYSSEY OF HOMER



DONE INTO ENGLISH PROSE



by S. H. BUTCHER, M.A.



AND



A. LANG, M.A.





Alternatively to looping through the text file, we can use the commands
`f.read()` to read the entire text to a single string, or
`f.readlines()` to read the entire text to a list of strings containing
each line of the text.

The above approach for reading files works in many cases. But text files
come in many shapes and forms: in particular, some of them are *encoded*
in different ways to capture different character conventions. A common
one is `utf8`, which captures the Unicode standard. To read a text file
with a specific encoding, we can include it as an option:

In [61]:
filename = 'odyssey.txt'
with open(filename, 'r', encoding='utf8') as f:
    text = f.readlines()

**Exercise.** What is the first word of the 16th line of the text?


Let's write a function to count words in a book. Complete the function
`count_words` in the file `ses05.py`.

**Exercise.** Use the function on the Odyssey. How many times is the word `Odysseus` used? How would you find
the most common words? The most common words with at least 10 characters?

### Writing to a file

Writing to a file works similarly to reading from a file. When we open a
file for writing data into it, we use the flag `'w'` instead of `'r'`:

In [62]:
with open('test.txt', 'w') as f:
    f.write('hey there')

This will write the string `'hey there'` to the file `test.txt`.

Note we must be very careful when writing to files as this will by
default overwrite previous data.

## Modules and libraries; Trump tweets revisited

There are optional exercises on analysing Trump tweets from JSON data in
the Jupyter Notebook file `trump_tweets.ipynb`. You can find
instructions for opening the file in the session materials.

There are optional exercises on web scraping in the Jupyter Notebook
file `web_scraping.ipynb`. You can find instructions for opening the
file in the session materials.

**Exercise.** What is the first word of the 16th line of the text?


Let's write a function to count words in a book. Complete the function
`count_words` in the file `ses05.py`.

**Exercise.** Use the function on the Odyssey. How many times is the word `Odysseus` used? How would you find
the most common words? The most common words with at least 10 characters?

### Writing to a file

Writing to a file works similarly to reading from a file. When we open a
file for writing data into it, we use the flag `'w'` instead of `'r'`:

    with open('test.txt', 'w') as f:
        f.write('hey there')

This will write the string `'hey there'` to the file `test.txt`.

Note we must be very careful when writing to files as this will by
default overwrite previous data.

## Modules and libraries; Trump tweets revisited

There are optional exercises on analysing Trump tweets from JSON data in
the Jupyter Notebook file `trump_tweets.ipynb`. You can find
instructions for opening the file in the session materials.

There are optional exercises on web scraping in the Jupyter Notebook
file `web_scraping.ipynb`. You can find instructions for opening the
file in the session materials.

## All done\!

## Review questions

Here are some questions to review the material we've covered today:

-   Can you think of situations when we'd like to use tuples, sets, and
    dictionaries instead of lists?
-   Why and when do we use exceptions?
-   Can you think of a real-world scenario where 'user input' is useful?
-   How would you explain to your neighbour what is 'debugging'?

To learn more about how dictionaries work, read this chapter on hashing
in Miller and Ranum's online book:

-   <https://interactivepython.org/runestone/books/published/pythonds/SortSearch/Hashing.html>

### Encryption

Let's build one of the most widely known encryption techniques,
*Caesar's cipher*. It works as follows: we start with a string
`str_to_encrypt` and an encoding integer `n`. We'll then replace each
letter in the string to an encrypted one by shifting the letter by n
positions. For example, if we set n = 1, each `a` becomes `b`, `b`
becomes `c` and so forth. We'll write a function that performs this
encryption.

Complete the functions `caesar_cipher_encrypt` and
`caesar_cipher_decrypt` in `ses05_extra.py`. 

Once you've implemented the encryption, how can you decrypt most easily?