# Python for Computational Linguists 1.2: Introduction to Python

# Python for Linguists

Hello and welcome to the first module of Python for Computational Linguistics!

This module will introduce you to Python, the programming language that we will use throughout this course. We recommend you to go through this module even if you're already familiar with Python - a reminder is always good.

## What is Python?

Python is a simple programming language which aim is to aid programmers to write clear, readable code quickly, and it's particularly suitable for beginners due to its simple syntax. However, don't be fooled by its simplicity - Python is a full-fledged programming language which, later in this course, will allow us to build complex language models and train huge neural networks!

> **<h3>💻 Try it yourself!</h3>**

To begin your career as a programmer, you can run your first Python code in the box below by clicking inside the box with your mouse and pressing simultaneously the keys `Control` and `Enter` on your keyboard:

In [None]:
2 + 2

Did the number `4` appear just below the equation? Congratulations! You just have solved a very difficult mathematical problem with Python.

# Jupyter Notebooks

The web page you are visiting right now is a *Jupyter Notebook*. A Jupyter Notebook is an environment which allows you to run Python code within your browser.

Actually, the code does not run *in* your browser, but on a [*server*](https://en.wikipedia.org/wiki/Client%E2%80%93server_model); if you're running this notebook on Azure or Colab, for example, the instructions you write run on Microsoft's or Google's servers somewhere in the world, which then send back the results on this page. However, you can also run a Jupyter server, for example, on your laptop; this way, the code will effectively run on your machine.

We write code in *cells*, which are the grey boxes you see in this page.

> **<h3>💻 Try it yourself!</h3>**

Try to write

`print('Hello world!')`

In the cell below. To run the code, press `Control` and `Enter` on your keyboard, as you did before.

In [1]:
print('Hello world!')

Hello world!


You should see the string `Hello world!` appearing under the cell!

Now that we know how to run code, let's begin with some programming basics. When you encounter a cell code, please always assume that you should run it by using the usual `Control` and `Enter` keyboard combination, unless specified otherwise.

# Basic programming concepts

## Statements

A computer program is usually divided into many *instructions*, called statements, which tell the computer what operations should run. Please note that from now on we will use the term *statement* and *instruction* interchangeably.

For example, run the cell below:

In [2]:
print('This is a statement')
print('This is another statement')

This is a statement
This is another statement


As you can see, in Python, each line corresponds to a statement. If we need, we can break a instruction in multiple lines, but this works only on special cases. For example, if you run the cell below, you will see that

In [3]:
print(
    'A multiline instruction'
)

A multiline instruction


Is a valid instruction. On the other hand, if you run the cell below, you will gen an **`Error`**:

In [4]:
2 +
2

SyntaxError: invalid syntax (<ipython-input-4-61f4b0ef3787>, line 1)

This happens because Python does not know how to read the code in the cell, so it will *raise* a an error and try to display an informative message about what is happening.

As a rule of thumb, statements inside parentheses can be split on multiple lines, in order to improve the readability of the code. However, until you are more confident with the language, we advise you to write an instruction per line.

Please note that when you encounter errors, you should **always** read the error type and message to understand what is happening; however, we will delve into this topic later.

We will start by using Python as a calculator. In Python, numbers support the basic mathematical operations, as `+`, `-`, `*, /`.

For example, we can sum two numbers by writing:

In [5]:
134 + 25

159

And we can multiply them by writing:

In [6]:
13.4 * 4

53.6

However, Python supports more complex operations as well, such as the power. For example, $2^3$ can be written as:

In [7]:
2 ** 3

8

We can also group operations using parentheses:

In [8]:
(2 ** 3) / 3

2.6666666666666665

> **<h3>💻 Try it yourself!</h3>**

Play with basic maths operations in the cell below.

In [13]:
45//6

7

## Variables

While playing with a calculator is surely amusing, if we could only do mathematical operations in Python, it would be obviously quite limited. For example, what if we wanted to store the result of an operation, to reuse it again later?

For this reason, virtually all programming languages offer the possibility of storing data in  **variables**. This way, we can store a datum in the computer's memory and reuse it later, e.g. to form more complex expressions, to pass it to external programs, and so on.

The syntax for assigning variables is

`variable_name = value`

For example, if we write and run:

In [14]:
a = 5

We saved the value `5` in the variable `a`. To see the value stored in a variable, we can simply write its name in a cell, e.g.:

In [15]:
a

5

> **<h3>💻 Try it yourself!</h3>**

Let's play with some variables! Can you guess the results of the cells below before running them?

In [16]:
a = 5
b = 2

In [17]:
a ** b

25

In [18]:
(a + b) / 3

2.3333333333333335

In [19]:
(b * 2) / a - 3

-2.2

In [20]:
string1 = 'hello'
string2 = 'world'
print(string1 + ' ' + string2)

hello world


Did you correctly guess everything? Good! If you are puzzled by the concept of *summing texts* don't worry - we'll explain that later.

You can also **update** the content of a variable:

In [21]:
d = a + 3
d

8

In [22]:
a = 1
d = a + 3
d

4

Please note that in Python variable names **can't** contain spaces. In fact, a variable name can contain only alphanumeric characters (i.e. letters and numbers) and underscores, i.e. the character `_`. Moreover, a variable name can't begin with a number.

> **<h3>💻 Try it yourself!</h3>**

Which of the following instruction will run, and which of them will fail?

In [23]:
a_number = 2

In [24]:
_a_number = 2

In [25]:
2_number = 22

SyntaxError: invalid decimal literal (<ipython-input-25-b149423fb2f5>, line 1)

In [26]:
a_number =
2

SyntaxError: invalid syntax (<ipython-input-26-3cdda63118ad>, line 1)

In [27]:
twoIs__anumber = (
    2)

> **<h3>💻 Try it yourself!</h3>**

Knowing that the area of a rectangle is base $\times$ height, calculate the area of a rectangle with base 4 and height 9 in the cell below.

1. Create a variable `base` with value 4;
1. Create a variable `height` with value 9;
1. Multiply the two variables to see the result.

In [29]:
base = 4
height = 9
base*height

36

## Comments

Sometimes, when you write code, you may want to describe what a line does, in order to help the future you to understand what that code means. This is achieved using **comments**, special instructions which are *ignored* by Python when you run the code.

In particular, everything written after a `#` is ignored **until the end of the line**. For example, if we write:

In [30]:
# 2 + 3

Nothing happens! Other examples of comments are:

In [31]:
3 + 2  # this is a comment

5

Moreover, **you can't break a comment in multiple lines**:

In [32]:
3 + 2 # this is
a broken comment

SyntaxError: invalid syntax (<ipython-input-32-f506ed4d4dba>, line 2)

In [33]:
3 + 2 # this is
      # a valid comment

5

**Pro tip**: you can use *multiline strings* as comments. A multiline string a string with triple quotes, such as:

In [34]:
""" This is
another comment
"""

3 + 2

5

Note that Python didn't print the string! However, this kind of comments have their limits as well:

In [35]:
3 + 2 """ This comment style work only on non-code lines """

SyntaxError: invalid syntax (<ipython-input-35-75ac46bdd079>, line 1)

In [36]:
3 + """ For example, we can't split a code line like this """ 2

SyntaxError: invalid syntax (<ipython-input-36-aebdf69d22f0>, line 1)

## Errors

So now we have seen what happens when we write bad code: Python refuses to run it and returns an **`Error`**. However, one can (and will) make many, many kinds of errors while writing code, and it's always important to understand them to fix and, possibly, do not repeat them.

For example, the expressions we've seen when talking about variables work because the variables `a`, `b`, and `d` have been previously created. Look what happens if we try to use a variable which has never been assigned:

In [37]:
z + 5

NameError: name 'z' is not defined

Now this error is different from the ones we encountered before. If we read the messages, in fact, the previous errors were `SyntaxError`s, i.e. errors related to the syntax of the code we wrote; now the error is a `NameError`, i.e. Python is telling us that we're referencing the name of a variable that does not exist.

When you encounter an error, you should **always** read the error type and message to know what you're doing wrong. Messages are usually informative and [will help you to solve the problem](https://geekandpoke.typepad.com/geekandpoke/2009/06/the-art-of-bugfixing-chapter-1.html).

There are two kinds of errors:
- `Error` and `Exception` are serious errors that will make your program stop and crash (in other words, bugs).
- `Warnings` are errors that *won't* make your program crash. However, you should always read the warning messages, because they may point to potential errors in your code.

# Built-in data types and functions

## Built-in functions

Python offers a set of built-in functions to perform some basic operations. We have already encountered the simplest of them all: the `print` instruction, which allowed us to write some text below the code cells.

Functions are defined this way:

```
function_name(argument_1, argument_2, ... )
```

For example, `print()` accepts some text as argument and prints it out in the browser (or on a console). Let's see again how it works:

In [38]:
print("Nothing will come of nothing.")

Nothing will come of nothing.


`print` will actually try to, um, print everything you pass to it. For example:

In [39]:
print(a)
print(3 ** 2)
print(a - b ** 7)

1
9
-127


Some functions also may **return** a value and/or accept more arguments. But what does *return* mean?

Take for example this line:


In [40]:
a = min(2,5)

`min` selected the minimum value between 2 and 5 and *returned* it, allowing us to save the result in the variable `a`. Now, if we try to print `a`, we will get:



In [41]:
a

2

However, not all functions return a value. For example:

In [42]:
a = print('hello')
print(a)

hello
None


See? while `print` allowed us to show `hello` on screen, it did actually not return any value, hence `a` didn't contain anything! Another example of function that does not returna a value is the method `sleep` from the package `time` (we'll explain what packages are later), which puts your program at halt for a certain amount of seconds:

In [44]:
import time
print('going to sleep.')
a = time.sleep(5) # halts the execution for 5 seconds.
                  # Notice how the next line is executed after
                  # a 5 second delay
print("I'm awake!")
print(a)

going to sleep.
I'm awake!
None


You can use the return value of a function as you wish. For example, you can store it in a variable, you can use it in another computation, and so on. For example, can you guess what these functions are doing?

In [45]:
abs(-3)

3

In [46]:
pow(2,3)

8

In [47]:
a = pow(-2,3)
print(a)
b = abs(a) * 2
print(b)

-8
16


> **<h3>💻 Try it yourself!</h3>**

Try to write the following expression in the cell below: $ | (3 * (-2))^{-2} |$

Note:
- $|a|$ denotes the absolute value of $a$.
- The result should be $0.02\overline{7}$.

In [48]:
abs(pow((3 * (-2)),-2))

0.027777777777777776

<details>
    <summary>Click <b>here</b> to see the answer.</summary>
    <p><code>abs(pow((3 * (-2)),-2))</code></p>
</details>

Will the code in the cell below work? Can you explain why?

In [49]:
print(2 + print(3))

3


TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

<details>
    <summary>Click <b>here</b> to see the answer.</summary>
    <p>As explained before, <code>print</code> only shows content to the screen, but it does not return any value. For this reason, Python does not understand what you are trying to do here, because it expects to sum 3 with another number, while <code>print</code> isn't actually returning anything (hence the <code>NoneType</code> in the error message).</p>
</details>

## Types

The built-in function `type()` tells us what is the **type** of something. For example, run the following cell:

In [50]:
type(2)

int

Do you remember the variables we defined above? Let's check their types by running the cells below:

In [51]:
print(a)
type(a)

-8


int

In [52]:
print(string1)
type(string1)

hello


str

The **type** is a property which tells Python (or, in general, any programming language) how to deal with the object we are giving him, and which operations we can perform on it.


Now, `a` is an **`integer`**, i.e. a number. Hence, Python knows that it can sum, multiply, and divide it. `string1`, instead, is a **string**, i.e. a textual variable. While it's obvious that we can't "divide" a string, we can perform other kinds of operations on it, e.g.:

In [53]:
print(string1)
print(string1[0])   # get the first character of 'hello'
string1 + string2   # concatenate two strings

hello
h


'helloworld'

We will see later what these operations mean; by know, you can try to guess what we're doing here.

Python offers these built-in types, which we will describe in detail below:

- Numbers, i.e. integers, floating point numbers (i.e. non-integers), and complex numbers.
- Strings, i.e. text
- Booleans, i.e. the truth values `True` and `False` of the [Boolean Algebra](https://en.wikipedia.org/wiki/Boolean_algebra)

You don't need to know much else about types for now. However, we will sometime use `type` to see how Python handles data.

## Numbers

We already encountered numbers. Now, we'll see some operations that Python offers to handle them.

In [54]:
print(1 + 2)    # sum
print(3 - 7)    # subtraction
print(2 * 3)    # multiplication
print(5 / 6)    # division
print(2 ** 3)   # power

3
-4
6
0.8333333333333334
8


Python offers the several built-in mathematical functions (any many more):

In [55]:
print(pow(2,3))             # power
print(abs(-3))              # absolute value
print(round(987.654321,3))  # rounding to the nth decimal
print(round(22/7,2))        # does this ring any bell?
print(max(10,1000))         # maximum value between two numbers
print(min(-1000,-10))       # minumum value between two numbers

8
3
987.654
3.14
1000
-1000


> **<h3>💻 Try it yourself!</h3>**

Can you guess the difference between the two division operators?

In [56]:
7/2

3.5

In [57]:
7//2

3

As we know, computers store information using bits. For this reason, numbers are stored using the [floating point representation](https://en.wikipedia.org/wiki/Floating-point_arithmetic). You don't need to know the details of how does it work; however, you should be aware that `1` and `1.0` are two different things in Python.

In fact, let's see their types:

In [58]:
print(type(1))     # Ask Python to save 1 as an integer
print(type(1.0))   # Ask Python to save 1 as a real number

<class 'int'>
<class 'float'>


Now that you know that, what is the difference between `/` and `//`?

<details>
    <summary>Click <b>here</b> to see the answer.</summary>
    <p><code>/</code> is called <i>true division</i>, and it always return the correct result of the division.</p>
    <p><code>//</code> is called <i>integer division</i> or <i>floor division</i>, and it always round the result of the division to the closest lower integer - or, in other words, it will remove the decimal part from the result of the division.</p>
    <p>While this difference may seem meaningless to you, it caused <a href="https://docs.python.org/3/whatsnew/2.2.html#pep-238-changing-the-division-operator">quite a stir</a> when it was introduced into Python!</p>
</details>

## Booleans

The Boolean values `True` and `False` are the *truth values* associated to a statement; for example, `Shakespeare was an English poet` is a true statement, and `Claudius is a character in Romeo and Juliet` is a false statement.

Unfortunately, Python can't understand natural language statements. However, [Boolean algebra](https://en.wikipedia.org/wiki/Boolean_algebra) is vastly used when programming; for example, we can compare numerical values using the classic comparison operations:

+ greater `<` and lesser `>`
+ greater or equal `<=` and lesser or equal `>=`
+ equal `==` and not equal `!=`

The results of comparison operations are of type **Boolean**.

In [59]:
2 < 3

True

In [62]:
a = 4
b = 3
c = 3

### ❓ Quiz  

Can you guess the result of these operations before running the cell below?

In [63]:
print(a < b)
print(b > a)
print(b < c)
print(b <= c)

print(a == b)
print(a != b)

d = (a == b)
print(type(d)) # What type is d?

False
False
False
True
False
True
<class 'bool'>


## Strings

`string` is the fancy name used by programmers for sequences of characters, i.e. for *textual* variables.

As we already encountered them, you should already know how to create a string: you just have to place the text between single (`'`) or double (`"`) quotation marks.

In [64]:
s1 = '' # empty string
s2 = "hello"
s3 = 'world'
s4 = 'hello world'
s5 = "2345"

We can perform a wide range of operations over strings. For example, we can get their length:

In [65]:
print(len(s1))
print(len(s2))

0
5


We can *concatenate* strings using the `+` operator:

In [66]:
print(s2 + s3) # we concatenate s2 and s3
print(s2 + " " + s3)

helloworld
hello world


Notice that `s2 + " " + s3` is equal to the string `"hello world"`:

In [67]:
(s2 + " " + s3) == s4

True

If needed, we can even *multiply* strings, i.e. *repeat* them:

In [68]:
print(s2 * 3)  # we repeat s2 three times

hellohellohello


### Indexing and slicing

Other common operations over strings are *indexing* and *slicing*. **Indexing** allows us to get the $n$-th element of any sequence of elements, using this syntax:
    
```python
variable[index]
```

returns the $index$-th element of the variable `variable`.

In [69]:
s = 'this is an example'

# indexing (to access a byte in the string)
print(s[0]) # print the first character of the string
print(s[1]) # print the second character of the string

t
h


If you're not familiar with programming, you probably are asking yourself why we getting the *zeroth-*element of our string did not end up in error.

<a id='zero-based'></a>
This happens because in Python indices start with zero. This is called **[Zero-based indexing](https://en.wikipedia.org/wiki/Zero-based_numbering)**, and it is a convention used in most programming language for performance reasons.

This also means that, if a string has five characters, e.g. `hello`, its last element will have index 4:

In [70]:
'hello'[4]

'o'

Please note that strings are **immutable**, i.e. you can't change their content. For example, you can't do things like:

In [71]:
s[2] = 'x'

TypeError: 'str' object does not support item assignment

However, since `s` is just a variable, you can change its content altogether. Let's see other examples:

In [72]:
s = 'Romeo and Juliet'
print(s[0])
print(s[15])

R
t


However, words are not all of the same length. How do we get the last character of a string without having to count every time by hand how many character it contains?

In [73]:
print(s[len(s) - 1])       # we can use len()
print(s[-1])               # or we can use the negative notation

t
t


As we've seen in this last example, **negative indexing** tells Python to start looking from the last character:

In [74]:
'hello'[-4]

'e'

If we need more than a character from a string, we can use **slicing**. The syntax is:
```
variable[start_position:end_position]
```
For example, to get the first two character of a string, we write:

In [75]:
'hello'[0:2]

'he'

Please note that *spans* will *not* contain the character denoted by the right index.

Let's see tome other examples:

In [76]:
print(s)
print(s[0:2])
print(s[:2])  # the same as s[0:2]
print(s[5:])  # the same as s[5:len(s)]

Romeo and Juliet
Ro
Ro
 and Juliet


As you have seen in this example, we can **omit** one of the two indices of the span if we want Python to look up from the beginning (omitting the left index) or to the end (omitting the right index) of the string.


### ❓ Quiz  

Can you guess the result of these operations before running the cell below?

In [77]:
print(s[1:2])
print(s[-3:])

o
iet


<details>
    <summary>Click <b>here</b> to see the answer.</summary>
    <p>The first instruction gets the second character of the string, since it prints the span starting from <code>s[1]</code> and ending with (but <b>not including</b>) <code>s[2]</code>.</p>
    <p>The second answer uses <b>negative indexing</b> to get the span starting from the third-to-last character of the string (<code>-3</code>) to the end of the string.</p>
</details>


### Built-in string functions

Now we will introduce a new class of functions, i.e. **object methods**. The syntax for this kind of functions is called *dot notation* and works this way:
```python
object.function()
```
This particular syntax tells Python that the function `function()` is applied to `object`. Each object (or type) has its peculiar set of functions; for example, it would not make sense to do the square root of a string, or to replace all the threes in a number with the dollar symbol.

For example, we can find specific substrings in a string using `string.find()`:

In [78]:
print(s)
print(s.find("Juliet"))
print(s.find("Othello"))

Romeo and Juliet
10
-1


`find()` returns the index where the given substring starts, or `-1` if the given substring is not present in the input string.

`replace()` allows us to find a substring and replace it with something new:

In [79]:
d = s.replace("Romeo", "King Lear")

print(s)
print(d)

Romeo and Juliet
King Lear and Juliet


Please notice how the string assigned to variable `s` is not modified, and the result of the operation is stored in the new variable `d`.

You should also be aware that `replace` operation will replace _all_ mentions of the given substring:

In [80]:
s = s.replace(" ", "_")
print(s)

Romeo_and_Juliet


In [81]:
# what happens if the requested string does not exist in the input one?
s.replace("Desdemona","Ophelia")

'Romeo_and_Juliet'

Other useful operations on strings are the following:

In [82]:
x = "This is a nice University"

# Is a string contained in another substring?
print('is' in x)
print('ix' in x)

# convert to upper/lowercase
print(x.upper())
print(x.lower())

# count how many instances of a substring
print(x.count('i'))
print(x.count('is'))

# concatenate with a given delimiter
print("-".join(x))
print("*".join(x))

# splits string at delimiter.
# creates a list (see below) with the obtaines substrings
print(x.split("nice"))
print(x.split(" "))     # delimiter found multiple times.
print(x.split("x"))     # delimiter not found. Creates a list with the entire string as the only element

True
False
THIS IS A NICE UNIVERSITY
this is a nice university
5
2
T-h-i-s- -i-s- -a- -n-i-c-e- -U-n-i-v-e-r-s-i-t-y
T*h*i*s* *i*s* *a* *n*i*c*e* *U*n*i*v*e*r*s*i*t*y
['This is a ', ' University']
['This', 'is', 'a', 'nice', 'University']
['This is a nice University']


We usually cannot mix strings and numbers. If we do that, we may obtain something different then expected:

In [83]:
number142 = 142
string142 = '142'

print(number142)
print(string142)

print(type(number142))
print(type(string142))

print(number142 * 3)
print(string142 * 3)

print(number142 == string142)

142
142
<class 'int'>
<class 'str'>
426
142142142
False


> **<h3>💻 Try it yourself!</h3>**

Play with strings in the cell below.

Given the string we saved in variable `othello`, you should:

- print the string;
- determine if `Desdemona` is a substring;
- find the position of `what`;
- convert the string to uppercase;
- get the first three character of the string, convert them to lowercase, and print them.

In [85]:
othello = 'Men should be what they seem'

print(othello)
print("Desdemona" in othello)
print(othello.find("what"))
print(othello.upper())
print(othello[:3].lower())



Men should be what they seem
False
14
MEN SHOULD BE WHAT THEY SEEM
men


<details>
    <summary>Click <b>here</b> to see the answer.</summary>
    <p><code>print(othello)</code></p>
<p><code>print('Desdemona' in othello)</code></p>
<p><code>print(othello.find('what'))</code></p>
<p><code>print(othello.upper())</code></p>
<p><code>print(othello[:3].lower())</code></p>
</details>

## Converting between types

What happens if we want to treat a string like a number, of vice versa?

For example, way may want to sum the number contained in a string to an actual number, like
```python
"10" + 10
```

Or, we may want to append a number to a string, like
```python
"Shakespeare wrote " + 17 + " comedies"
```

What happens if we try to do the former?

In [86]:
"10" + 10

TypeError: can only concatenate str (not "int") to str

An error! Obviously, Python tells us that we can't sum strings and numbers. So, in order to do that, we need to **convert** our variable to the desired type:

In [87]:
int("10") + 10

20

In [88]:
"Shakespeare wrote " + str(17) + " comedies"

'Shakespeare wrote 17 comedies'

As you can see, we can use a `type` as a function, in order to convert a variable of one type to the desired one.

It is **very** important to always remember the data type of our variables. If not, the results may be very different than expected:

In [89]:
a = "3"
b = "4"

print(a + b)
print(int(a) + int(b))

34
7


# Composite data types: tuples, lists, sets, and dictionaries

Now we will see some more complex data types, i.e. tuples, lists, sets, and dictionaries. All this types have in common that they allow us to store *more data* inside a single variable. In fact, while numbers, strings, and so on, allow us to store only *one* number, strings, etc., in a variable, it is often useful to store more than a single information in a variable.

For example, what if we wanted to keep all the the titles of Shakespeare's comedies in a single variable?

## Tuples

Tuples are the most basic composite data type. A tuple is a sequence of elements, much like a string is a sequence of characters. The syntax for defining tuples is
```python
( element_1, element_2, ... element_n )
```

Let's write some example tuples:

In [90]:
(1, 2, 3)

(1, 2, 3)

In [91]:
('hello', 'world')

('hello', 'world')

In [92]:
x = ('One', 2, 'three', 4.0, False)
print(x)

('One', 2, 'three', 4.0, False)


As you can see, elements within tuples can be of any type. You can slice and index tuples exactly as you do with strings:

In [93]:
print(x[0])
print(x[0:2])
print(x[-1])

One
('One', 2)
False


You can use some of the methods for strings on tuples, too:

In [94]:
T = (1, 2, 3, 4, 3, 2, 1)
print(len(T))
print((1,2) + (3,4))
print(T.index(4))  # the index of the first matching 4 in the tuple
print(T.count(2))  # the number of times 2 occurs in the tuple

7
(1, 2, 3, 4)
3
2


Because tuples are immutable, we cannot change the tuples (ie. item assignment, appending...) once they are created.

In [95]:
T[0] = 2

TypeError: 'tuple' object does not support item assignment

## Lists

The simplest way to describe lists is as *mutable tuples*. They are defined this way:

```python
[ element_1, element_2, ... element_n ]
```

Let's look at an example:

In [96]:
L = [1,2,3,4,5]
print(L)
L[3] = 0
print(L)

[1, 2, 3, 4, 5]
[1, 2, 3, 0, 5]


Note to the reader: if you are asking yourself why we did `L[3] = 0` and the **fourth** element of the array was modified, go back and review [zero-based indexing](#zero-based)!

Like strings and tuples, we can use indexing, slicing, and other useful operations:

In [97]:
#indexing
print(L[0])

#slicing
print(L[:-1])

#concatenate with another list
L = L + [4, 3, 2]
print(L)

print(L.index(5))
print(L.count(3))

1
[1, 2, 3, 0]
[1, 2, 3, 0, 5, 4, 3, 2]
4
2


Lists, however, support many more useful methods:

In [98]:
L.sort()
print(L)

L.reverse()
print(L)

print(max(L))
print(min(L))
print(sum(L))

[0, 1, 2, 2, 3, 3, 4, 5]
[5, 4, 3, 3, 2, 2, 1, 0]
5
0
20


> **<h3>💻 Try it yourself!</h3>**

Which of the methods explained above work on strings and tuples too? Try it in the cell below!

In [105]:
T = (1,2,3)
L = [1,2,3]
L.sort()
print(L)

# try T.sort(), L.sort() ...

[1, 2, 3]


You may be asking yourself: if tuples are just immutable lists, why bother using them?

The answer is for *performance*. Tuples are generally faster and require less memory than lists, so they are often used when you just need to store some data and/or if you want to be sure that these data are not modified by other programmers. On the other hand, if you know that you'll update or iterate over a sequence frequently, lists are much faster, hence they should be preferred.

You don't need to know more for now. However, since both structures are extensively used in Python, you'll frequently have to deal with them, hence is important to know their differences.

### Updating lists

So, what does it mean that lists are *mutable*?

When we studied tuples, we saw that we can't modify them after they have been created. For example, if you run this code:
```python
T = (1, 2)
T[0] = 2
```
it would raise a `TypeError`.

Lists, on the other hand, can be *updated*: we can modify the elements inside a list, add, and remove them. Let's see some examples:

In [106]:
tragedies = [
    "Antony and Cleopatra",
    "Coriolanus",
    "Cymbeline",
    "Hamlet",
    "Julius Caesar",
    "Othello",
    "Romeo and Julia",
    "Timon of Athens",
    "Titus Andronicus",
    "Troilus and Cressida",
]

print(tragedies)

['Antony and Cleopatra', 'Coriolanus', 'Cymbeline', 'Hamlet', 'Julius Caesar', 'Othello', 'Romeo and Julia', 'Timon of Athens', 'Titus Andronicus', 'Troilus and Cressida']


Do you notice the mistakes in the list? Let's fix them!

In [107]:
# Who's Julia? Let's find the index of the wrong element in the list.
tragedies.index("Romeo and Julia")

6

In [108]:
# Now, let's update it:
tragedies[6] = "Romeo and Juliet"
print(tragedies)

['Antony and Cleopatra', 'Coriolanus', 'Cymbeline', 'Hamlet', 'Julius Caesar', 'Othello', 'Romeo and Juliet', 'Timon of Athens', 'Titus Andronicus', 'Troilus and Cressida']


That's great, but where is Macbeth? We need to insert it in the list!

The method `append` adds and element to the end of a list:

In [109]:
tragedies.append("Macbeth")
print(tragedies)

['Antony and Cleopatra', 'Coriolanus', 'Cymbeline', 'Hamlet', 'Julius Caesar', 'Othello', 'Romeo and Juliet', 'Timon of Athens', 'Titus Andronicus', 'Troilus and Cressida', 'Macbeth']


Let's sort the list now:

In [110]:
tragedies.sort()
print(tragedies)

['Antony and Cleopatra', 'Coriolanus', 'Cymbeline', 'Hamlet', 'Julius Caesar', 'Macbeth', 'Othello', 'Romeo and Juliet', 'Timon of Athens', 'Titus Andronicus', 'Troilus and Cressida']


Lists also allow us to add and remove elements by index, or to remove an exact element (*remove by value*):

In [111]:
L=[1,'a',True]

#append at the end
L.append('b')
print (L)

# delete an item at index 1 and returns the deleted item
print (L.pop(1))
# delete an item at index 0
del L[0]
print (L)
# delete the first matching item by value in a list:
L.remove('b')
print(L) # removes 'b'

#insert: L.insert (position, item): insert an item at position of L
L.insert(0,'a')
print (L)

[1, 'a', True, 'b']
a
[True, 'b']
[True]
['a', True]


Did you notice that we never wrote `L = L.append(...)` but just `L.append()`? This is because, as we've just seen, lists are mutable, hence most of the operations we perform on lists modify them on the fly.  

We can also concatenate lists, exactly as we do with strings:

In [112]:
[1,2,3] + [4,5,6]

[1, 2, 3, 4, 5, 6]

Remember that can we also use type names as function to convert between types, so for example we can convert a tuple to a list and vice versa:

In [113]:
T = (1,2)
L = [3,4]
print(list((T)))
print(tuple(L))

[1, 2]
(3, 4)


> **<h3>💻 Try it yourself!</h3>**

Given the lists in the cell below, concatenate and save them in a third variable called `L3`. Then, remove from the list all the male characters. Finally, sort the list, and print its last element.

In [115]:
L1 = ['Anthony', 'Othello', 'Lady Macbeth', 'King of France', 'Iago']
L2 = ['Hamlet', 'Cleopatra', 'Ophelia', 'Ariel', 'Agamemnon', 'Rosalind']

In [117]:
# Write your answer here
L3 = L1 + L2
L3.remove("Anthony")
L3.remove("Othello")
L3.remove("King of France")
L3.remove("Iago")
L3.remove("Hamlet")
L3.remove("Agamemnon")
L3.sort()
print(L3[-1])


Rosalind


<details>
    <summary>Click <b>here</b> to see the answer.</summary>
    <p><code>L3 = L1 + L2</code></p>
    <p><code>print(L3)</code></p>
    <p><code>L3.remove('Anthony')</code></p>
    <p><code>L3.remove('Othello')</code></p>
    <p><code>L3.remove('King of France')</code></p>
    <p><code>L3.remove('Iago')</code></p>
    <p><code>L3.remove('Hamlet')</code></p>
    <p><code>L3.remove('Agamemnon')</code></p>
    <p><code>print(L3)</code></p>
    <p><code>L3.sort()</code></p>
    <p><code>print(L3[-1])</code></p>
</details>

### ❓ Quiz

Can you guess the result of this line of code before running the cell below?

In [118]:
print(list('hello world'))

['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']


<details>
    <summary>Click <b>here</b> to see the answer.</summary>
    <p>This line converts the string <code>hello world</code> to a list of characters. This effectively shows why lists and strings behave very similarly - after all, strings are basically lists of characters where we can perform some more operations, like uppercasing, lowercasing, and so on.</p>
</details>

## Sets

Sets are, simply said, lists that do not allow duplicates. Sets are *unordered*, hence we can't access the $n$-th element of a set as we do for lists.

The syntax for sets is:
```
{ element_1, element_2, ... element_n }
```

As usual, let's see some examples.

In [119]:
# create a set:
a={1,'a','b'}
print(a)

# we can also create a set from a list: it will only keep the unique items
b=set([1,'a','a','b'])
print (b)
print(len(b))

{1, 'a', 'b'}
{1, 'a', 'b'}
3


We can add and remove elements to a set:


In [120]:
tragedies = {
    "Antony and Cleopatra",
    "Coriolanus",
    "Cymbeline",
    "Hamlet",
    "Julius Caesar",
    "King Lear",
    "Othello",
    "Romeo and Juliet",
    "Timon of Athens",
    "Titus Andronicus",
    "Troilus and Cressida",
    "Two Gentlemen of Verona"
}
tragedies.add("Macbeth")
tragedies.remove("Two Gentlemen of Verona")
print(tragedies)

{'Coriolanus', 'Titus Andronicus', 'Cymbeline', 'Othello', 'Romeo and Juliet', 'Troilus and Cressida', 'Macbeth', 'Hamlet', 'Julius Caesar', 'Antony and Cleopatra', 'Timon of Athens', 'King Lear'}


Did you notice that the elements of the sets are shuffled when we `print` it? That's because, as we said before, there is absolutely no guarantee that the elements in a set retain the order in which we inserted them. Keep this in mind!

For example, when you transform a set into a list, the elements may be shuffled:

In [121]:
list(tragedies)

['Coriolanus',
 'Titus Andronicus',
 'Cymbeline',
 'Othello',
 'Romeo and Juliet',
 'Troilus and Cressida',
 'Macbeth',
 'Hamlet',
 'Julius Caesar',
 'Antony and Cleopatra',
 'Timon of Athens',
 'King Lear']

## Dictionaries

Dictionaries are a mutable, composite data type that *maps* a *key* to its *value*. You can think of them as a telephone directory, where your name is your *key*, and your phone number is your *value*.

The syntax for defining dictionaries is:
```
{ key_1 : value_1 , key_2 : value_2 , ... , key_n : value_n }
```

For example, let's build a dictionary about Shakespeare's bio:

In [122]:
d = {
    'name' : 'William',
    'surname' : 'Shakespeare',
    'year of birth' : 1564,
    'year of death' : 1616,
    'birthplace' : 'Stratford-upon-Avon'
}

print(d)

{'name': 'William', 'surname': 'Shakespeare', 'year of birth': 1564, 'year of death': 1616, 'birthplace': 'Stratford-upon-Avon'}


We can add, modify, and remove and remove information in/from a dictionary:

In [123]:
d['wife'] = 'Anne'
print(d)
d['wife'] = 'Anne Hathaway'
print(d)
del(d['wife'])
print(d)

{'name': 'William', 'surname': 'Shakespeare', 'year of birth': 1564, 'year of death': 1616, 'birthplace': 'Stratford-upon-Avon', 'wife': 'Anne'}
{'name': 'William', 'surname': 'Shakespeare', 'year of birth': 1564, 'year of death': 1616, 'birthplace': 'Stratford-upon-Avon', 'wife': 'Anne Hathaway'}
{'name': 'William', 'surname': 'Shakespeare', 'year of birth': 1564, 'year of death': 1616, 'birthplace': 'Stratford-upon-Avon'}


We will encounter a key error if we fetch a key that does not exist.

In [124]:
d['wife']  # now we deleted this information!

KeyError: 'wife'

Note that *anything* can be used as dictionary key or value:

In [125]:
t = 'true'
d = {False: 0, 1: t, t: str}
print(d[False])
print(d[1])
print(d[t])

0
true
<class 'str'>


> **<h3>💻 Try it yourself!</h3>**

Create a dictionary where the keys are 'Macbeth', 'The Tempest', and 'Romeo and Juliet', and the values are the names of the main male character and save it in a variable. Print the dictionary. Then, replace the names of the male characters with the name of a female character from the same play. Finally, print the dictionary again.

In [127]:
plays = {
    'Macbeth' : 'Macbeth',
    'The Tempest' : 'Prospero',
    'Romeo and Juliet' : 'Romeo',
}
print(plays)

{'Macbeth': 'Macbeth', 'The Tempest': 'Prospero', 'Romeo and Juliet': 'Romeo'}


<details>
    <summary>Click <b>here</b> to see the answer.</summary>
    <p> <pre><code>d = {
    'Macbeth' : 'Macbeth',
    'The Tempest' : 'Prospero',
    'Romeo and Juliet' : 'Romeo',
}
print(d)
d['Macbeth'] = 'Lady Macbeth'
d['The Tempest'] = 'Ariel'
d['Romeo and Juliet'] = 'Juliet'
print(d)</code> </pre></p>
</details>

# Conditional statements and loops

## Statements - refresher

Until now, we only worked with simple statements that don't allow us to do much. As you surely remember, *statements* (or *instructions*) are the commands we issue to Python, to tell him what to do.

Right now we worked mainly with assignments, i.e. statements with the syntax

```python
variable_name = variable_value
```

However, there are many more complex statements allowed by Python. For example, can you guess what these statements mean by executing them?

In [128]:
count = 0
print(count)

count += 1
print(count)

0
1


In [129]:
name, surname = 'William', 'Shakespeare'
print(name)
print(surname)

William
Shakespeare


In [130]:
the_bard = the_swan_of_avon = 'William Shakespeare'
print(the_bard)
print(the_swan_of_avon)

William Shakespeare
William Shakespeare


Did you guess what the instructions above do?

1. `+=` is the increment assignment operator, and is shorthand for `variable = variable + increment`
(e.g. `a += 1` is translated as `a = a + 1`).
2. This is called *sequence assignment*, i.e we can write `var_1, ..., var_n = value_1, ..., value_n`,
and each value$_i$ will be assigned to the respective variable$_i$
3. This is instead called *multiple target assignment*, i.e. by writing `var_1, ..., var_n = value` each `var`$_i$ will contain the same value.

You should also remember that some statements *return* a value, while others don't. For example,

In [131]:
l = [1, 2, 3]
len(l)

3

Here, `len` returns the length of `l`. On the other hand,

In [132]:
print('Hello')

Hello


Does not return anything, in fact:

In [133]:
x = print('hello')
print(x)

hello
None


Nothing (in Python lingo, the value `None`) is stored within `x`.

But what if we wanted to do something more complex? For example, what if we wanted to apply a function to all the elements of a list, or if we wanted to run the code *when some condition arises*?

## `if` statements

For example, let's define the following variables:



In [134]:
name, surname = 'William', 'shakespeare'
print(name)
print(surname)

William
shakespeare


If we are to write the content of the variables in person, we could be sure that their content is always OK. But often we deal with data taken from books, scraped from the Internet, and so on, so we may want to clean our variables.

In our case, we notice that the Bard's name has the first letter capitalised, while his surname doesn't. So, could we write a generic program to let Python capitalise the first letter of a string, *if needed*?

We know that
```
string.islower()
```

Returns `True` if a string's characters are all lowercase, and `False` otherwise. So we can tell Python that if our variable contains a lowercase string, we will update it with the string with the first letter capitalised.

The code that runs this operation is the following:

In [135]:
x = surname
print(x)

shakespeare


In [136]:
if x.islower():
    x = x[0].upper() + x[1:]
print(x)

Shakespeare


If you run the cells above, you will see that the first cell will print `shakespeare`, than the second will print `Shakespeare`, as intended. This is because
- The line `if x.islower()` checked if `shakespeare` was lowercase. Since it is the case,
- The line `x = x[0].upper() + x[1:]` ran, and the first letter of `x` was updated.

Now let's see what happens when `islower()` returns `False`:

In [137]:
x = 'SHakespeare'
print(x)

SHakespeare


In [138]:
if x.islower():
    x = x[0].upper() + x[1:]
print(x)

SHakespeare


Nothing happened, right? The string `SHakespeare` is still badly formatted, because the `if` instruction failed, hence it did not clean up the word.

When there are such complex cases, it is often needed to write more complex `if` statement, that include multiple *clauses*. Let's see another example: we want to write a statement that determines if a number is positive, zero, or negative.

In [139]:
#check if x is negative, 0 or positive, and print accordingly
x = -224
if x < 0:
    print('x is negative')
elif x == 0:
    print('x is zero')
else:
    print('x is positive')

x is negative


Now try different values for `x` by changing the `x = 1` instruction and see what happens.

In this case, we have three conditions, which are:
1. x is negative
2. x is zero
3. x is negative.

Each condition corresponds to a different *branch* of the `if`. They are ran sequentially, i.e. if the first check failed, Python tries with the second one; if the second one fails, Python tries the third; and so on, until there are more `elif` (shorthand for `else if`) or it reaches an `else`.

Formally, the syntax if the `if-else` construct is the following:

```python
if expression_1 :
    statement_group_1
elif expression_2:
    statement_group_2
    # ...
    # eventual other elifs-statements
    # ...    
else:
    statement_group_n

```

You should have noticed a couple of things.

1. The `else` and `elif` blocks are **optional**, i.e. an if can have one, both, or neither of them.
2. The code inside the `if, `elif`, and `else` block is **indentated**, i.e. it begins exactly four spaces after the beginning of the line. What does it mean?


## Indentation and Code Blocks

When we run an `if` instruction, when does Python know which instruction belong to the body of the `if`, and which to the external code flow?

Let's try with an example. If we write:

In [140]:
x = 1.0
if x > 1:
    x = x / 2
    print(x)

We see that nothing happens. Instead, if we write

In [141]:
x = 1.0
if x > 1:
    x = x / 2
print(x)

1.0


The `print` instruction runs. Why?

This happens because Python uses **indentation**, i.e. the space at the beginning of the line, to distinguish between different code blocks. All the code with the same indentation belongs to the same code block, and instructions like `if` (like `for`, `while`, that we will introduce next) run a code block at a time.

The syntax is the following:

```python

if expression:
    code_block_1
    code_block_1
    if expression:
        code_block_2
    else:
        code_block_3
else:
    code_block_4
    
code_block_5
```

### ❓ Quiz  

Can you guess the result of the cells below before running them?

In [142]:
x = 5
if x > 0:
    if x % 2 != 0:
        print('odd')
    else:
        print('even')
    print('positive')
else:
    print('zero or negative')

odd
positive


In [143]:
x = 5
if x > 0:
    if x % 2 != 0:
        print('odd')
    else:
        print('even')
        print('positive')
else:
    print('zero or negative')

odd


In [144]:
x = -6
if x % 2 != 0:
    print('odd')
else:
    print('even')

if x > 0:
    print('positive')
else:
    print('zero or negative')

even
zero or negative


In [145]:
x = 3
if x % 2 != 0:
    print('odd')
else:
    print('even')
    if x > 0:
        print('positive')
    else:
        if x < 0:
            print('negative')
        else:
            print('zero')

odd


## Loops: `for` and `while`

`for` and `while` are used to repeat a code block for an arbitrary number of times.

For example, if we want to print five zeros, instead of writing five `print`, we could do the following:

In [146]:
count = 0
while count < 5:
    # increment count by 1
    count += 1
    print('0')

0
0
0
0
0


As the name implies, `while` runs until the expression specified is `True`; in this case, it will run five times as requested. Formally, the syntax is:

```python
while expression:
    statement_1
    ...
    statement_n
```

Another possible solution would be the following:

In [147]:
for x in [0, 0, 0, 0, 0]:
    print(x)

0
0
0
0
0


Again, this print five zeroes. The `for` loops works this:

```python
for element in sequence:
    statement_1
    ...
    statement_n
```

This means that Python will **iterate** over the provided sequence, one element at a time, and run the statements inside the code block. Each statement has access to the current element.

Let's see other examples of `while` and `for`. Can you guess in advance how these code block work?

The first of these, the while statement, provides a way to code general loops.

In [148]:
# a loop that strips the last letter of a string one by one
x = 'spam'
while x: # While x is not empty
    print(x)
    x = x[1:] # Strip first character off x

spam
pam
am
m


In [149]:
count = 7
while count > 5:
    # increment count by 1
    count -= 1
    print('0')

0
0


In [150]:
for x in [10, 9, 8]:
    print(x)

10
9
8


In [151]:
for x in [10, 9, 8]:
    print(x // 2)

5
4
4


In [152]:
for word in ['Why,', 'such', 'is', "love's", 'transgression', '.']:
    print(len(word))

4
4
2
6
13
1


> **<h3>💻 Try it yourself!</h3>**

Given the list of Shakespeare tragedies below, iterate over the list and print each item in all caps.

In [157]:
tragedies = [
    "Antony and Cleopatra",
    "Coriolanus",
    "Cymbeline",
    "Hamlet",
    "Julius Caesar",
    "Othello",
    "Macbeth",
    "Romeo and Juliet",
    "Timon of Athens",
    "Titus Andronicus",
    "Troilus and Cressida",
]

# write your code here:
for x in tragedies:
  print(x.upper())



ANTONY AND CLEOPATRA
CORIOLANUS
CYMBELINE
HAMLET
JULIUS CAESAR
OTHELLO
MACBETH
ROMEO AND JULIET
TIMON OF ATHENS
TITUS ANDRONICUS
TROILUS AND CRESSIDA


<details>
    <summary>Click <b>here</b> to see the answer.</summary>
    <p><code>for tragedy in tragedies:
        print(tragedy.upper())</code></p>
</details>

### Break

`break` is used inside `for` and `while` to stop the loop before its natural conclusion:

In [158]:
count = 0
while count < 5:
    # increment count by 1
    count += 1
    print(count)

    if count == 2:
        break  # actually stop when count reaches 2

1
2


In [159]:
for x in [1, 2, 3, 4, 5]:
    print(x)
    if x == 2:
        break  # same as before

1
2


### Continue

`continue` is used inside `for` and `while` to skip the current iteration of the cycle. For example, the following loops will print only even numbers:

In [160]:
count = 0
while count < 11:

    # increment count by 1
    count += 1

    if count % 2 != 0:
        continue
    else:
        print(count)

2
4
6
8
10


In [161]:
for x in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:
    if x % 2 != 0:
        continue  # same as before
    else:
        print(x)

2
4
6
8
10


# Conclusion

Well done! You have successfully completed the first module, and you should now now the basics of Python. In the next modules we'll delve more into the NLP world - we'll introduce some Python libraries used in computational linguistics and we'll start building our first NLP applications.

## ✍️ Final Assessment (Please answer Questions 1 and 2)

[11 marks] Question 1:

The function `input()` is used to display a prompt and ask the user for input. For example, run the following cell:

In [162]:
name = input('What\'s your name? ')
print('Ah, so your name is ' + name + '.')

What's your name? Jay
Ah, so your name is Jay.


Your exercise is to build a sorting hat that assigns a House to students. The sorting hat will read names from the user and it will determine the House as follows:

- If the name is either 'Harry', 'Ron', or 'Hermione', the student will be assigned to House 'Gryffindor';
- If the name is shorter than five letters, the student will be assigned to House 'Ravenclaw';
- If the name contains a space or it contains less than tree vowels, the student will be assigned to House 'Slytherin';
- Otherwise, the student will be assigned to House 'Hufflepuff'.

You have to save each student in a dictionary that contains the list of students for each House. When the user types the name `exit`, the Sorting Hat stops reading the names and prints the dictionary with the names.


For example, inserting the the names `Ron`, `Harry`, `Cho`, `Luna`, `Draco`, `Amycus`, `Nymphadora` into the Sorting Hat should print:

```python
{'Gryffindor': ['Ron', 'Harry'], 'Slytherin': ['Draco', 'Amycus'], 'Ravenclaw': ['Cho', 'Luna'], 'Hufflepuff': ['Nymphadora']}
```

Write your code in the cell below.

In [167]:
# Initialise the dictionary that will contain the names.
houses = {"Gryffindor" : ["Harry", "Ron", "Hermione"], "Slytherin" : [], "Ravenclaw" : [], "Hufflepuff" : []}

#houses["Gryffindor"] = ["Harry", "Ron", "Hermione"]
#houses["Slytherin"] = ["Draco", "Amycus"]
#houses["Ravenclaw"] = ["Cho", "Luna"]
#houses["Hufflepuff"] = ["Nymphadora"]


while True: #to make it iterate forever (unless forcibly ended)

    name = input("What's your name? ")
    while len(name)==0:                           #improvement 1 - forces input
      name = input("I said tell me your name!")

    vowel_count = 0
    for x in name:
      if x=="a" or x=="A" or x=="e" or x=="E" or x=="i" or x=="I" or x=="o" or x=="O" or x=="u" or x=="U":
        vowel_count += 1
      else:
        continue

    if name == 'exit':
        break #stops the look

    elif name == "Harry" or name == "Ron" or name == "Hermione":
      print("Congratulations, you have been assigned to Gryffindor!")

    elif len(name) < 5:
      houses["Ravenclaw"].append(name)
      print("Congratulations, you have been assigned to Ravenclaw!")

    elif name.find(" ") != -1 or vowel_count < 3:
      houses["Slytherin"].append(name)
      print("Congratulations, you have been assigned to Slytherin!")

    else:
      houses["Hufflepuff"].append(name)
      print("Congratulations, you have been assigned to Hufflepuff!")


print("Ok, here is the register.")
print(houses)


What's your name? 
I said tell me your name!
I said tell me your name!
I said tell me your name!
I said tell me your name!
I said tell me your name!dd
Congratulations, you have been assigned to Ravenclaw!
What's your name? exit
Ok, here is the register.
{'Gryffindor': ['Harry', 'Ron', 'Hermione'], 'Slytherin': [], 'Ravenclaw': ['dd'], 'Hufflepuff': []}


### ✍️ Improvements:

[4 marks] Question 2:
You **must** choose 1 of these improvements:

- If the user does not provide any name into the input prompt, display `I said tell me your name!` and ask for the name until she/he provides it. When a name is provided, assign it to the correct House.
- If the user's name contain an invalid character (i.e. a number or a symbol), display `You are expelled!`, and save the name in a separate `expelled` list. You can use the built-in function `string.isalpha()` ([docs](https://docs.python.org/3.8/library/stdtypes.html)), but please note that in our cases spaces **are** permitted in a name; for example, `Dobby the House Elf` is a valid name, while `X Æ A-12 Musk` is not. Print the list after you print the dictionary.

**Hint**: use the keyword `continue`.

**Question**: does the order of the `if...else`s matter? If yes, how?

## References and useful links

- [Jupyter Homepage](https://jupyter.org/index.html)
- [Jupyter Notebook Documentation](https://jupyter-notebook.readthedocs.io/en/stable/)
- [The official Python 3.6 Tutorial](https://docs.python.org/3.6/tutorial/index.html)

# Survey

Please complete the [post-module survey](https://docs.google.com/forms/d/e/1FAIpQLSeLX1N344kBn8q9PTgg455lrzvVzzI5IW9itF4cT_WqeQKaFQ/viewform) when you are finished. Thank you!

# Continue to next module

[Click here](../module_1.3/module_1.3.ipynb) to move to the next module.