# Chapter 1: Getting started

-- *A Python Course for the Humanities by Folgert Karsdorp and Maarten van Gompel*

### Notebook Basics

#### Text cells

The box this text is written in is called a *cell*. It is a *text cell* marked up in a very simple language called 'Markdown'. Here is a useful [Markdown cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). To edit a cell, select it by clicking on it. To see the result, press ctrl+enter to 'run' the cell. Running this text cell produces formatted text.

#### Code cells

The other main kind of cell is a *code cell*. The cell immediately below this one is a code cell. Again, to run it, click on the cell and press ctrl+enter. Running a code cell runs the code in the cell (marked by the **In**) and produces a result (marked by the **Out**). We say the code is **evaluated**.

Try it now.

In [None]:
print("Ready, set, GO!")

In [None]:
# This is a comment in a code cell. Comments start with a # symbol.
# They are ignored by the computer and only serve to explain some aspect of the code for readers.

> **Important!**

> **When running code cells you should run them in order, from top to bottom of the notebook. This is because cells may rely on the results of other cells. Without those earlier results being available you will get an error.**

> **To run all the cells in a notebook at once, in order, choose Cell -> Run All from the menu above. To clear all the results from all the cells, so you can start again, choose Cell -> All Output -> Clear.** 

----------

Python can be used as a calculator. When you enter an expression such as `2 + 2` and run it, you will get the expected result `4`. The following basic arithmetic operators are available:

- `+` addition
- `-` subtraction
- `*` multiplication
- `/` division
- `**` exponentation; `2 ** 3` is equivalent to `2 * 2 * 2`
- `(...)` change [order of operations](https://en.wikipedia.org/wiki/Order_of_operations);
  compare `2 + 3 * 4` vs. `(2 + 3) * 4`

#### Quiz!

In the code box below, write a simple program that calculates how many minutes there are in seven weeks.

In [None]:
# insert your code here

## General remarks

* **Edit** any cell and try changing the code, or delete it and write your own.

* Before running a cell, try to **guess** what the output will be by thinking through what will happen.

* If you encounter an **error**, realise this is normal. Errors happen all the time and by reading the error message you will learn something new.

* Remember: you cannot break the notebook or your computer, so **don't be afraid to experiment**.

-----------

Great! You have written your first little program! So, can we now go beyond using our programming language as a simple calculator? Before we ask you to write another program, we will first have to explain something about *assignment*.

## Assignment

We can assign values to variables using the `=` operator. A variable is just a name we give to a particular value, you can imagine it as a box you put a certain value into, and on which you write a name with a black marker. The following code block contains two operations. First, we assign the value 2 to the name `x`. After that `x` will hold the value 2. You might say Python stored the value 2 in `x`. Finally we print the value using the `print()` command.

In [None]:
x = 2
print(x)

Now that we stored the value 2 in `x`, we can use the variable `x` to do things like the following:

In [None]:
print(x * x)
print(x == x)
print(x > 6)

Can you figure out what is happening here? 

Variables are not just numbers. They can also be text. These are called strings. For example:

In [None]:
book = "The Lord of the Flies"
print(book)

A string in Python must be enclosed with quotes (either single `'` or double quotes `"`). Without those quotes Python thinks it's dealing with variables that have been defined earlier. `book` is a variable to which we assign the string `"The Lord of the Flies"`, but that same string is not a variable but a value!

Variable names can be chosen arbitrarily. *We* give a certain value a name, and we are free to pick one to our liking. It is, however, recommended to use sensible names. This makes the code easier to understand for others and yourself if you read your code later.

In [None]:
# not recommended...
banana = "The Lord of the Flies"
print(banana)

You are free to use the name `banana` to hold the title `"The Lord of the Flies"` but you will agree that this naming is not transparent. 

Variables can vary and we can update our variables. Say we have counted how many books we have in our office:

In [None]:
number_of_books = 100

Then, when we obtain a new book, we can update the number of books accordingly:

In [None]:
number_of_books = number_of_books + 1
print(number_of_books)

Updates like these happen a lot. Python therefore provides a shortcut and you can write the same thing using `+=`:

In [None]:
number_of_books += 5
print(number_of_books)

For now the final interesting thing we would like to mention about variables is that we can assign the value of one variable to another variable. We will explain more about this later on, but here you just need to understand the basic mechanism. Before you evaluate the following code block, can you predict what Python will print?

In [None]:
book = "The Lord of the Flies"
reading = book
print(reading)

----------

#### Quiz!

Now that you understand all about assigning values to variables, it is time for our second programming quiz. We want you to write some code that defines a variable, *name*, and assign to it a string that is your name.

If your first name is shorter than 5 characters, use your last name. If your last name is also shorter than 5 characters, use the combination of you first and last name.

In [None]:
# insert your code here
print(name)

--------------

##### What we have learnt

To finish this section, here is an overview of the concepts you have learnt. Go through the list and make sure you understand all the concepts.

-  variable
-  value
-  assignment of variables
-  difference between variables and values
-  strings
-  integers
-  updating variables

------

## String manipulation

Many disciplines within the humanities work on texts. Quite naturally programming for the humanities will focus a lot on manipulating texts. In the last quiz you were asked to define a variable containing a string that represents your name. We have already seen some basic arithmetic in our very first calculation. Not only numbers, but also strings can be added, or, more precisely, *concatenated*, together as well:

In [None]:
book = "The Lord of the Flies"
print(name + " likes " + book + "?")

Alternatively, we can pass multiple values to `print`, which often makes code simpler. Notice how spaces are automatically added between the values.

In [None]:
print(name, "likes", book)

A string consists of a number of characters. We can access the individual characters with the help of *indexing*. For example, to find only the first letter of your name, you can type in:

In [None]:
first_letter = name[0]
print(first_letter)

Notice that to access the first letter, we use the index `0`. This might seem odd, but just remember that indices in Python start at zero.

----------

#### Quiz!

Now, if you know the length of your name you can ask for the last letter of your name:

In [None]:
# fill in the last index of your name (hint: indices start at 0)
last_letter = name[...]
print(last_letter)

---------

It is rather inconvenient having to know how long our strings are if we want to find out what its last letter is. Python provides a simple way of referring to the end of a string:

In [None]:
last_letter = name[-1]
print(last_letter)

Alternatively, there is the function `len()` which returns the length of a string we give it:

In [None]:
print(len(name))

Do you understand the following?

In [None]:
print(name[len(name)-1])

---------

#### Quiz!

Now can you write some code that defines a variable `second_to_last_letter` and assign to it the second-to-last letter of your name?

In [None]:
second_to_last_letter = # insert your code here
print(second_to_last_letter)

------

You're starting to become a real expert in indexing strings. Now what if we would like to find out what the last two or three letters of our name are? In Python we can use *slices* to specify a range of indices. To find the first two letters of our name we type in:

In [None]:
first_two_letters = name[0:2]
print(first_two_letters)

The `0` index can be left out because `0` is assumed by default, so we could just as well use `name[:2]`. This says to take all characters of `name` until you reach index 2. We can also start at index 2 and leave the end index unspecified:

In [None]:
without_first_two_letters = name[2:]

Because we did not specify the end index, Python continues until it reaches the end of our string.

If we would like to find out what the last two letters of our name are, we can use a shortcut to specify an index relative to the end:

In [None]:
last_two_letters = name[-2:]
print(last_two_letters)

Take a look at the following picture. Do you fully understand it? 
<div style="float: center;"><img style="float: center;" src="http://www.nltk.org/images/string-slicing.png" align=center /></div>

-------

#### Quiz!

Can you define a variable `middle_letters` and assign to it all letters of your name except for the first two and the last two?

In [None]:
middle_letters = # insert your code here
print(middle_letters)

Given the following two words, can you write code that prints out the word *humanities* using only slicing and concatenation? (So, no quotes are allowed in your code.)

In [None]:
word1 = "human"
word2 = "opportunities"
# insert your code here

----------

##### What we have learnt

To finish this section, here is an overview of what we have learnt. Go through the list and make sure you understand all the concepts.

-  concatenation (i.e., `+` with strings)
-  indexing: `a[n]`
-  slicing: `a[n:m]`
-  `len(a)`

-------

## Lists

Consider the sentence below:

In [None]:
sentence = "Python's name is derived from the television series Monty Python's Flying Circus."

Words are made up of characters, and so are string objects in Python. As we will see, it is always to be prefered to represent our data as naturally as possible. Now for the sentence above, it may be more natural to describe it in terms of words than in terms of characters. Say we want to access the first word in our sentence. If we type in:

In [None]:
first_word = sentence[0]
print(first_word)

Python only prints the first letter of our sentence. (Think about this if you do not understand why).

We can transform our sentence into a `list` of words (represented by strings) using the `split()` method. A method is a function that is associated with an object; it lets us do something to the object in question.

Aside: see below how you can discover what methods an object has.

In [5]:
# Put the cursor after the '.' below and type the <TAB> key
# on your keyboard (below escape on the top left). A menu will appear
# with a list of methods
example = ''
example.

In [4]:
# To see the documentation for a particular method,
# enter its name followed by '?' and run the cell:
example = ''
example.split?

Back to our sentence. Here's an example of using the `split` method:

In [None]:
words = sentence.split()
print(words)

By issuing the `split` method on our sentence, Python splits the sentence on spaces and returns a list of words. In many ways a list functions like a string.

Before, we saw the function `len()` which showed the number of characters in a string. `len()` can also be applied to lists. Compare the following and explain the results:

In [None]:
print(len(sentence))
print(len(words))

With a string we could get individual characters using indices. Similarly, with a list we can get particular elements using indices and slice indices. Let's try it!

--------

#### Quiz!

Write a small program that defines a variable `first_word` and assign to it the first word of our word list `words`. Play around a little with the indices to see if you really understand how it works.

In [None]:
first_word = # insert your code here
print(first_word)

---------------

A `list` acts like a container where we can store all kinds of information. We can access a list using indices and slices. We can also add new items to a list. For that you use the method `append`. Let's see how it works. Say we want to keep a list of all our good reads. We start with an empty list and we will add some good books to it:

In [None]:
#start with an empty list
good_reads = []
good_reads.append("The Hunger games")
good_reads.append("A Clockwork Orange")
print(good_reads)

Now, if for some reason we don't like a particular book anymore, we could overwrite it with a different one as follows:

In [None]:
good_reads[0] = "Pride and Prejudice"
print(good_reads)

--------

#### Quiz!

Here's another small Quiz! Try to change the title of the second book in our good reads collection.

In [None]:
# insert your code here
print(good_reads)

-----------

We just changed one element in a list. Note that if you do the same thing for a string, you will get an error:

In [None]:
name = "Pythen"
name[4] = "o"

This is because strings (and some other types) are *immutable*. That is, they cannot be changed, as opposed to `lists` which *are* mutable. Note that while it is fine to assign a new string (`name = 'Python'`), the example above illustrates that an existing string cannot be changed.

Let's explore some other ways in which we can manipulate lists.

#### remove()

Let's assume our good read collection has grown a lot and we would like to remove some of the books from the list. Python provides the method `remove` that acts upon a list and takes as its argument the items we would like to remove. 

In [None]:
good_reads = ["The Hunger games", "A Clockwork Orange", 
              "Pride and Prejudice", "Water for Elephants",
              "The Shadow of the Wind", "Bel Canto"]

good_reads.remove("Water for Elephants")

print(good_reads)

If we try to remove a book that is not in our collection, Python raises an error (don't be afraid, your computer won't break ;-))

In [None]:
good_reads.remove("White Oleander")

--------

#### Quiz!

Define a variable `good_reads` as an empty list. Now add some of your favorite books to it (at least three) and print the last two books you added. 

In [None]:
# insert your code here

------

Just as with strings, we can concatenate two lists. Here is an example:

In [None]:
#first we specify two lists of strings:
good_reads = ["The Hunger games", "A Clockwork Orange", 
              "Pride and Prejudice", "Water for Elephants",
              "The Shadow of the Wind", "Bel Canto"]

bad_reads = ["Fifty Shades of Grey", "Twilight"]

all_reads = good_reads + bad_reads
print(all_reads)

#### sorted()

It is always nice to organise your bookshelf. We can sort our collection with the following expression:

In [None]:
good_reads = sorted(good_reads)
print(good_reads)

#### nested lists

Up to this point, our lists have consisted only of strings. However, a list can contain all kinds of data types, such as integers and even lists! Do you understand what is happening in the following example?

In [None]:
nested_list = [[1, 2, 3, 4], [5, 6, 7, 8]]
print(nested_list[0])
print(nested_list[0][0])

We can put this to use to enhance our good reads collection with a score for each book we have. An entry in our collection will consist the title of the book and a score from 1 to 10. The title and score will be stored in a list: `[title, score]`. We first initialize an empty list:

In [None]:
good_reads = []

And add two books to it:

In [None]:
good_reads.append(["Pride and Prejudice", 8])
good_reads.append(["A Clockwork Orange", 9])

---------

#### Quiz!

Update the `good_reads` collection with three of your own books and give them all a score. Can you print out the score you gave to the first book you added? In other words, print only the score, not the title. Hint: you can specify multiple indices.

In [None]:
# insert your code here

-----------

##### What we have learnt

To finish this section, here is an overview of the new concepts and functions you have learnt. Go through them and make sure you understand them all.

-  list
-  *mutable* versus *immutable*
-  `.split()`
-  `.append()`
-  nested lists
-  `.remove()`
-  `sorted()`

-------------

## Conditions

### Simple conditions

A lot of programming involves executing a certain piece of code only when a particular condition holds. We have already seen two conditions at the very beginning of the chapter. Here we give a brief overview. Can you figure our what all of the conditions do?

In [None]:
print("2 < 5 =", 2 < 5)
print("3 > 7 =", 3 >= 7)
print("3 == 4 =", 3 == 4)
print("school == homework =", "school" == "homework")
print("Python != perl =", "Python" != "perl")

### if, elif and else

Your `good_reads` should now have 5 books. What happens if we ask for the 10th book, even though there are only 5?

In [None]:
good_reads[10]

We get an error. An `IndexError`, which basically means this index is not valid for this list (often referred to as 'out of bounds'). We will learn  more about error handling later, but for now we would like to prevent our program from running into this error in the first place. Let's write a little program that for a given number `n`, tries to get book number n, or tells us that we don't have that many books.

In [None]:
n = 10
if n < len(good_reads):
    print("Book number", n, "is:")
    print(good_reads[n])
else:
    print("We don't have that many books")

A lot of new syntax here. Let's go through it step by step. First we ask if the value is smaller than the number of books. The part after if evaluates to either `True` or to `False`. Let's type that in:

In [None]:
n < len(good_reads)

Because we have 5 books in the collection, Python returns False. Let's do the same thing with a smaller value:

In [None]:
4 < len(good_reads)

As can be expected, now the condition is `True`.

Note: since indices start counting from 0, index 4 actually refers to the 5th and last book. Therefore we use `<` meaning strictly lower than when comparing indices to the length of a list.

Back to our `if` statement. If the expression after `if` evaluates to `True`, our program will go on to the next line and execute the code under it. Let's try that as well:

In [None]:
if 4 < len(good_reads):
    print("Index is valid!")

In [None]:
if n < len(good_reads):
    print("Index is valid!")

Notice that the print statement in the last code block is not executed. That is because the value we assigned to `n` is still `10` and thus the part after `if` did not evaluate to `True`. In our little program above we used another statement besides `if`, namely `else`. It shouldn't be too hard to figure out what's going on here. The part after `else` will be executed if the condition after `if` evaluated to `False`. In English: if the value of `n` is too large, print a message stating this.

#### Indentation!

Before we continue, we must first explain to you that the layout of our code is not optional. Unlike in other languages, Python does not make use of curly braces to mark the start and end of code blocks (such as those belonging to if-statements). The only delimiter is a colon (`:`) and the indentation of the code. This indentation must be used consistently throughout your code. The convention is to use 4 spaces as indentation (usually entered by the tab-key on your keyboard). This means that after you have used a colon (such as in our `if` statement) the next line should be indented by four spaces more than the previous line.

Sometimes we have several conditions that we want to treat differently. For that Python provides the `elif` statement. We use it similar to `if` and `else`. Note however that you can only use `elif` after an `if` statement! Above we checked the size of our collection. We can also check whether a string contains certain parts, whether a list contains certain elements. For example we could test whether the letter 'a' is in the word *banana*:

In [None]:
"a" in "banana"

Likewise the following evaluates to `False`:

In [None]:
"z" in "banana"

Let's use this in an `if-elif-else` combination:

In [None]:
word = "rocket science"
if "a" in word:
    print(word + " contains the letter a")
elif "s" in word:
    print(word + " contains the letter s")
else:
    print("What a weird word!")

--------

#### Quiz!

Let's practice our new condition skills a little. Write a small program that defines a variable `weight`. If the weight is more than 50 pounds, print "There is a $25 charge for luggage that heavy." If it is not, print: "Thank you for your business." Change the value of weight to see both statements. (Hint: make use of the `<` or `>` operators)

In [None]:
# insert your code here

-------

### and, or, not

Up to this point, our conditions have consisted of single expresssions. However, quite often we would like to test for multiple conditions and then execute a particular piece of code. Python provides a number of ways to do that. The first is with the `and` statement. `and` allows us to juxtapose two expressions that both need to be true in order to make the entire expression evaluate to `True`. Let's see how that works:

In [None]:
word = "banana"
if "a" in word and "b" in word:
    print("Both a and b are in " + word)

If one of the expressions evaluates to False, nothing will be printed:

In [None]:
if "a" in word and "z" in word:
    print("Both a and z are in " + word)

-------

#### Quiz!

Replace `and` with `or` in the `if` statement below. What happens? 

In [None]:
word = "banana"
if "a" in word and "z" in word:
    print("Both a and b are in " + word)

In the code block below, can you add an `else` statement that prints that none of the letters were found?

In [None]:
if "a" in word and "z" in word:
    print("Both a and z are in " + word)
# insert your code here

----------

Finally we can use `not` to test for conditions that are not true. 

In [None]:
if "z" not in word:
    print("z is not in " + word)

Objects, such as strings or integers of lists are `True` when they are non-empty/non-zero. Empty strings, lists, dictionaries etc on the other hand are considered `False`. We can use this principle to, for example, only execute a piece of code if a certain list contains any values:

In [None]:
numbers = [1, 2, 3, 4]
if numbers:
    print("I found some numbers!")

Now if our list were empty, Python wouldn't print anything:

In [None]:
numbers = []
if numbers:
    print("I found some numbers!")

--------

#### Quiz!

Can you write code that prints "This is an empty list" if the provided list does not contain any values?

In [None]:
numbers = []
# insert your code here

Can you do the same thing, but this time using the function `len()`?

In [None]:
numbers = []
# insert your code here

-----------

##### What we have learnt

To finish this section, here is an overview of the new functions, statements and concepts we have learnt. Go through them and make sure you understand what their purpose is and how they are used.

-  conditions
-  indentation
-  `if`
-  `elif`
-  `else`
-  `True`
-  `False`
-  empty objects are false
-  `not`
-  `in`
-  `and`
-  `or`
-  multiple conditions
-  `==`
-  `<`
-  `>`
-  `!=`
-  `IndexError`

---------

## Loops

Programming enables us to automate things. Once we specify how to do something once, we should also be ablo to do it many times. Loops make it possible to perform a certain action on a range of elements. For example, given a list of words, we would like to know the length of all words, not just one. Now you *could* do this by going through all the indices of a list of words and print the length of the words one at a time, taking up as many lines of code as you have indices. Needless to say, this is rather cumbersome. What's more, if we don't know length of the list of words in advance, this strategy doesn't even work.

Python provides the `for`-statement that allow us to iterate through any iterable object and perform actions on its elements. An iterable is an object with multiple elements such as a string or a list. The basic format of a `for`-statement is: 

    for X in iterable:
        # do things with X

That reads almost like English. We can print all letters of the word *banana* as follows:

In [None]:
for letter in "banana":
    print(letter)

The code in the loop is executed as many times as there are letters, with a different value for the variable `letter` at each iteration. Read the previous sentence again.

Likewise we can print all the items that are contained in a list:

In [None]:
colors = ["yellow", "red", "green", "blue", "purple"]
for whatever in colors:
    print("This is color " + whatever)

We can iterate through our good reads collection as well. Since our collection is a list of lists, each item is a list:

In [None]:
for book in good_reads:
    print(book)

Since we know that books in our collection will always have the form `[book, score]`, we can take a shortcut and separate these two values:

In [None]:
for book, score in good_reads:
    print(book, "has score", score)

In the example above the variable `book` will get the first element, and the variable `score` will get the second. However, if any item is not a list with exactly two elements, you will get an error.

-------

#### Quiz!

The function `len()` returns the length of an iterable item:

In [None]:
len("banana")

We can use this function to print the length of each word in the color list. Write your code in the box below:

In [None]:
colors = ["yellow", "red", "green", "blue", "purple"]
# insert your code here

Now write a small program that iterates through the list `colors` and `appends` all colors that contain the letter *r* to the list `colors_with_r`. (Hint: use `colors_with_r.append`)

In [None]:
colors = ["yellow", "red", "green", "blue", "purple"]
colors_with_r = []
# insert you code here

-----------

##### What we have learnt

Here is an overview of the new concepts, statements and functions we have learnt in this section. Again, go through the list and make sure you understand them all.

-  loop
-  `for` statement
-  iterable objects
-  variable assignment in a `for` loop

--------

#### Final Quiz!

We have covered a lot of ground. Now it is time to put all what we learned together. The following quiz might be quite hard and we would be very impressed if you get it right! 

What we want you to do is write code that counts the number of word tokens in which the letter 'a' is present in a small corpus. You need to do this on the basis of a frequency distribution of words that is represented by a list of `[word, frequency]` pairs. For example, the word *happening* has an *a* and it occurs 4 times, therefore this word contributes 4 to the total result.  Assign your value to the variable `number_of_as`.

In [6]:
frequency_distribution = [['Beg', 1], ["Goddard's", 1], ['I', 3], ['them', 2], ['absent', 1], ['already', 1],
        ['alteration', 1], ['amazement', 2], ['appeared', 1], ['apprehensively', 1], ['associations', 1],
        ['clever', 1], ['clock', 1], ['composedly', 1], ['deeply', 7], ['do', 7], ['encouragement', 1],
        ['entrapped', 1], ['expressed', 1], ['flatterers', 1], ['following', 12], ['gone', 9], ['happening', 4],
        ['hero', 2], ['housekeeper', 1], ['ingratitude', 1], ['like', 1], ['marriage', 15], ['not', 25],
        ['opportunities', 1], ['outgrown', 1], ['playfully', 2], ['remain', 1], ['required', 2],
        ['ripening', 1], ['slippery', 1], ['touch', 1], ['twenty-five', 1], ['ungracious', 2],
        ['unwell', 1], ['verses', 1], ['yards', 5]]
number_of_as = 0
# insert your code here

-------

<p><small><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Python Programming for the Humanities</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="http://fbkarsdorp.github.io/python-course" property="cc:attributionName" rel="cc:attributionURL">http://fbkarsdorp.github.io/python-course</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. Based on a work at <a xmlns:dct="http://purl.org/dc/terms/" href="https://github.com/fbkarsdorp/python-course" rel="dct:source">https://github.com/fbkarsdorp/python-course</a>. Some material based on <a href="https://github.com/mchesterkadwell/intro-to-text-mining-with-python">https://github.com/mchesterkadwell/intro-to-text-mining-with-python</a></small></p>