# (Yet Another) Short Introduction to Python

## {'loading': data, 'cleaning': data, 'processing': data, 'storing': data} 
### by Thomas Jurczyk, CERES

***
In this short tutorial, I will introduce you to some basic concepts and use cases of Python. The focus of this tutorial lies on two different aspects:

**Firstly**, I would like to demonstrate why and how some basic programming knowledge can be useful for the daily work of historians, social scientists, people working in the field of religious studies, philologists, and others who are dealing with (preferably already digitized) textual data. Therefore, **I have manually created a simple use case in the form of two different text versions of the same text that we want to compare to each other using Python.** More precisely, our task in this tutorial is

 - _to load the two text versions and make them accessible in Python (**data loading and cleaning**)_
 - _to compare both text versions with each other and to highlight the differences (**data processing**)_
 - _to create, output, and save one single file that merges the two different text versions and informs us about the differences between them (**data storing**)_

**Secondly**, I will introduce some basic concepts of (high level) programming languages such as Python. For instance, in the course of this tutorial we will deal with

 - **variables**
 - _different basic built-in datatypes such as_ **strings**, **integers**, _and_ **floats**
 - _**lists**_
 - _**functions**_ (even though we won't create functions, we will still use them)
 - _some basic_ **loops**
 - _**if-statements**_
 - _and more_
 
<font color=red>Remember:</font> You should be able to finish this tutorial in less than two hours. Naturally, this means that both the use case presented here as well as the introduction of the concepts need to be simple and short. However, I have tried to implement as many references and links to external sources as possible. I highly encourage you to use them in order to deepen your knowledge of the respective topics. You may also find a short bibliography with online ressources and literature on my website.  

## Part One: Loading Textual Data into Python
Before we can start working with the two text versions, we first have to make them accessible in Python. This means that we have to load them into the Python environment. To do so, we will already make use of some fundamental concepts of Python (and programming languages in general) such as **functions** and **variables**.

Before we go into the details, let's first look at the code in the following cell and execute it. You can execute the code in each Jupyter cell by pressing `SHIFT + ENTER`.

<font color=red>WARNING</font>: In order for this to work, the two `.txt` files (namely `text_a.txt` and `text_b.txt`) **have to be in the same folder as the Jupyter Python Notebook you are working with right now! (named "Python Tutorial TJ")** If you do not have these files, please go to the [end](#Text-Data-used-in-this-tutorial) of this tutorial and simply copy & paste the two texts in two separate `.txt` files, save them in the same folder as this Jupyter Python Notebook, and name them `text_a.txt` and `text_b.txt`.

Let's start and execute the code in the cell below by pressing `SHIFT + ENTER`:

In [None]:
f = open("text_a.txt", "r")
text_a = f.read()
print(text_a)

Ok, what you should see now is the beginning of Kafka's "The Trial" below the cell that you have just executed. But what exactly did we do here?

Let's start with the first line:
```python
f = open("text_a.txt", "r")
```
What we basically did here is using a built-in Python **function** `open()` to open a text file. You can usually identify functions in code through the `()` behind their name (for instance, `open()` `close()` `len()` etc. are all built-in Python functions that perform different operations).

### Functions

Now, **what is a function?**

Functions in programming can be compared to functions in mathematics. Functions take a certain input, do something with this input, and then (usually) create an output. For instance, the function $ f(x) =\ x^2 $ takes, let's say, any $x\in\mathbb{R}$  as an input and outputs the square of it. The `open()` function operates in exactly the same way. It takes a certain input (that is called "argument" in the case of functions in programming), does some magic with it "behind the scenes", and outputs something (else). In our case, we are passing two arguments to the function. First, we pass the name of the file to the function that we want to open as a **string** object: `"text_a.txt"`. A **string** object is one of the fundamental data types in Python and can basically be interpreted as text. In order to tell Python that we want to create an object of the type string, we have to wrap the data in quotation marks `""` (otherwise, it won't work and Python will throw an error). We will talk about some other important data types in a second, but let's first get back to our code.

The second argument that we pass to the function `open()` is `"r"` which stands for "read" and is also of type string. This argument tells Python that we do only want to read from `text_a.txt` (and not write something into the file).

We then assign the output of the `open()` function (which is a file object that we can now further process in Python) to a **variable** named `f`.

### Variables

Now, what is a **variable** and why do we need it?

Essentially, a variable is a container that we can use to store data. In this case, we are storing the output of `open()` in a container (**variable**) called `f`. This comes in handy as we want to further process the file object that we have just created with `open()`. Instead of calling the function `open()` over and over again whenever we want to do something with the textual data from `text_a.txt`, we can simply store the output content of `open()` (a file object) in `f`. From now on, we can take this variable `f` whenever we want to access the textual data from `text_a.txt`.

<font color=red>There are two important things to note here</font>: Firstly, variables in Python underly certain restrictions (for instance, a variable cannot start with a number; for example, `123four` does not work as a variable name in Python), but generally you are free to name them however you like. For instance, instead of calling our variable `f`, we could also call it `file` or `Darth_Vader` or `FRUCHTTIGER`. Note, however, that Python is **case sensitive** (thus, `file` and `FILE` are different names in Python) and that there exist some naming conventions in Python that are beyond the scope of this tutorial. For further reading, I recommend the Wikipedia entry on __[Naming Conventions (programming)](https://en.wikipedia.org/wiki/Naming_convention_(programming)__

Wow, so much text for only one line of code! Note, however, that these are the basics that we will encounter over and over again. Once you have understood them, we will be able to go through our code much faster.

Let's briefly look at the second line of our code where we use some of the concepts that we've already discussed.

```python
text_a = f.read()
```

If we would try and read our file `f` right now (for instance, by passing it to our standard output via the built-in Python function `print()` in the third line of our code), we would encounter something like `<_io.TextIOWrapper name='text_a.txt' mode='r' encoding='cp1252'>`. This, of course, is not what we want. Before we can access our real text data, we first have to process our file object stored in `f` a little further. We are doing this via the **method** `read()`.

Hold on... why is this a **method** and not a **function**?! For the sake of simplicity, we will leave this problem aside in this tutorial and agree that _**methods**, for us, work in pretty the same way as **functions**_. However, methods are always connected to an object (which is `f` in this case) which is expressed via the `.` in this example. For further reading, please see [footnote 1](#footnote1).

Now, after we've processed our file `f` via the `read()` method and stored the output (which is of type **string**) in another variable (that we named `text_a`), we can finally print out the text that was stored in the file `text_a.txt` and is now available in our Python environment for further processing!

***

### First Task:
After such a long reading, it's now time for some practice.

As you know, the goal of this chapter is to compare two different versions of one text. So far, we have only loaded one text. What about you try and load our `text_b.txt` file in the cell below?

Before you start doing this, be aware that variables can only contain one value at the same time. Thus, if you would try and load our second text into the variable called `text_a`, you would overwrite the former content of `text_a` with the new content from `text_b.txt`.

Secondly, why don't you try out some other built-in functions of Python? For instance, you could use the built-in Python function `type()` with one of our loaded text files as an **argument** (for instance, with `text_a`) to display the data type of the file (which, in this case, should be of type `string`). You could also go and see what length (which is the number of characters and spaces) our `text_a` file in Python has by using the built-in Python function `len()` and again the text file `text_a` as an argument. Note that you should get used to printing out the outputs of functions like `len()` or `type()`, although in Jupyter notebooks it would even work without the `print()` statement. So, instead of writing `len(x)` you should try and always write `print(len(x))`. 

In short:

 - load the second text file `text_b.txt` (in a readable form!) into Python and assign it to an appropriate variable of your choice
 - play around with other built-in Python functions such as `type()` or `len()` by passing the already loaded text files as an argument

Remember that you can execute the code in the code cells by pressing `SHIFT + ENTER`.


In [None]:
#your code goes here – by the way, this line is a comment; it starts with a # and everything that is following the # will be ignored by the compiler (which means it won't be executed); however, these lines can become very long at some point so if you want to write longer comments, you can also use ''' comment ''' instead to write comments over several lines


Did everything work out? Ideally, we should now have two files stored in two different variables of which one is called `text_a` and the other `text_b` (or whatever name you have given it).

Before we proceed, let's have a **closer look at our data**. <font color=red>This is always a very important step because the structure and functionality of our program code are heavily depending on what we actually want it to do with our data. And this, in return, demands that we know what our data looks like.</font>

In our artificial example in this tutorial, we do not intend to write a user-ready program that can then be installed by others (who might not even know anything about programming and therefore need a _graphical user interface_ etc.). Nor do we want to process just _any_ text, for instance four texts that are completely different from each other and maybe even written in different languages. <font color=red>In this example, we do want to know the differences between two different text versions that are already stored in two separate files.</font> It's always great to write more general programs and functions if you have the time and expertise to do so. However, we as scholars in the field of humanities do usually lack both.

After this healthy portion of self-pity, let's now look at our two data sets by - again - printing them out and comparing them to each other. Why don't we start and see whether there actually are any differences at all between the two strings in `text_a` and `text_b`? Maybe they both look the same and we thus do not need any further processing?

### Boolean Operators

To check whether this is true or not, we can make use of the so-called **Boolean operators** in Python. Boolean operators are shipped with Python and include, among others, the following comparisons:

 - `==` (equal to)
 - `<` or `>` (smaller or bigger than)
 - logical operators such as `and` or `or`
 - and many more (for an overview, see the official Python 3 documentation under this __[link](https://docs.python.org/3/library/stdtypes.html)__)
 
What these operators do is to compare two (or more) arguments to each other and return either the value `True` (which is sometimes represented as 1) or `False` (which can also be represented as 0) as an output.

### Second Task:

So, why don't you try and compare the two text files to each other via the `==` operator in the cell below? What is the output? Note that you can also store the output in a variable which you could then print out via the `print()` command as in the following code example:

 ```python
ident_txt = (text_a == text_b) # note that we do not need the () here, but they make the code more readable
print(ident_txt)
```

_Additional task_: What about the lengths of the two texts? Store the output of the `len()` function (which is of type `int` meaning `integer`) of the two texts in two different variables, then create a third variable that takes the Boolean value of the comparison of the two variables and print it out. What happens if you try to compare `text_a` of type `string` to `len(text_a)` (which is of type `int`)?

Good luck! 

In [None]:
# your code goes here


Thanks to the Boolean operator `==`, we now officially know that the two strings `text_a` and `text_b` are not the same! We therefore know that there are differences between the two text versions. In the next part of this tutorial, we will try to figure out what the differences between the two text versions are like.

## Part Two: Data Cleaning and Processing

A first step could be to **check whether both text versions have the same number of sentences**. Before you proceed, try to find a solution for this task yourself. Very often, it is much more advisable to stick to pen & paper first and try to break down and manually formulate the problem that you want to solve with Python before trying to implement any possible solution via code.

Below, you may find a possible solution for our problem. We will discuss the details once you have had a look at the code and executed it in the cell below. 

In [None]:
# splitting the entire string text_a into smaller strings and store them in a list
text_a_sentences = text_a.split(".")
print(text_a_sentences)

At this point, you should already be able to understand at least some parts of the code above. We are again using a Python built-in string method named `split()` that, in this case, takes a string `"."` as an argument, splits up a single string into substrings by the argument string, and stores them in a list. We then store the returned output of `split()` in a variable named `text_a_sentences` and print it out. By looking at the output of `print(text_a_sentences)`, we see that it has been stored in form of a new data object that looks like this:
`["string1", "string 2", etc.]`

### Lists

This data type is called a **list** (you could have also figured this out by using the `type()` function). Lists in Python can store different objects (such as integers, strings, floats etc.) and make them easily accessible. **You can access every single element in a list by using the index of the element's position in that list in square brackets.** Maybe it is easier to demonstrate this via an example:

In [None]:
# accessing elements in a list via indexing
print(text_a_sentences[0]) # this prints out the FIRST element in the list; note that most indices in programming languages start with 0 instead of 1
print(text_a_sentences[1]) # prints out the second element in the list
print(text_a_sentences[2]) # prints out the third element in the list
print(text_a_sentences[-1]) # prints out the LAST element in the list

Ok, great... but wait! What is this? Wasn't our initial idea to split the string into several _sentences_? Instead, we now have three different substrings such as `", he knew he had done nothing wrong but, one morning, he was arrested"` stored in our list. Go back to our initial `split()` expression and the output of `print()` and figure out what the problems are (yes, there are several!) and what might have caused them.
<br>

Let's formulate the three issues at work here:

 - Firstly, our `split()` method did exactly what it was supposed to do: It split up our initial string into two sub strings whenever it encountered a `.` in the string `text_a` (note: `split()` did not store the `.` in our list but removed them!). Great, but... our string does not only use `.` for the end of a sentence but for abbreviations such as `Mrs.` or `K.` as well. What can we do to solve this problem?
 - Secondly, we can see that we have at least one empty string in our list (the last one; this is also the reason why our command `print(text_a_sentences[-1])` didn't print anything - because the last entry in our list was simply empty!)
 - Thirdly, you might have noticed that some of the strings that are the start of a sentence have a space `__` as a first character; for the moment, this only looks weird but does not really bother us; however, this might become a problem later on so we should take care of this as soon as possible.
 
I have to admit that this result is quite disappointing. You might even ask yourself: Why would you create a tutorial with something like this?!

Well, because you will encounter problems like this _over and over and over_ again. Preparing and cleaning your data for the actual analysis is a major part when dealing with textual data and you will for sure encounter problems like this all the time. Annoyingly, dealing with these kinds of problems can quickly become really messy and difficult, depending on both your data and the things you want to do with it. But don't worry, in most cases you will find a solution (at some point) and there are many built-in functions in Python (as well as external libraries that you can load in your poject) that will make your life much easier (see [footnote 2](#footnote2) for some suggestions). However, as this is a tutorial we will not use external libraries to solve our problems but try to cope with them with only the things we know.

Let's start about thinking of solutions for our first problem. How can we tell Python to only split the sentences after the sentence is actually finished? Well, there are several more or less clumsy ways of doing this. We will go for a pretty clumsy one that, however, allows us to introduce another built-in string method, namely `str.replace()`! This method takes two arguments of type `string`. But first, you need to put the string that you want to be changed at the start of the method (thus acting as the object from which we call the `replace()` method which is a method from Python's built in `string` class [footnote 3](#footnote3)). The two arguments we pass are: First, the string that we want to be changed and second a string of how we want it to be changed. For example, let's say that we have the string `"I am four meters tall!"` stored in a variable called `height`. This is obviously not true, so we want to change the `"four"` to `"two"` (which isn't true either, by the way). We can easily do that by using our `str.replace()` method:

```python
height = "I am four meters tall!"
new_string = height.replace("four", "two")
print(new_string)
```
If you want, you can copy & paste this code in one of the code cells below to execute it. The output should be `"I am two meters tall!"`

Voilà! Coming back to our inital problem: Why don't we change all the strings like `K.` or `Mrs.` in our `text_a` string to something without a `.`? As a suggestion, we could write the following code (feel free to execute it and print the results).

In [None]:
text_a_new = text_a.replace("K.", "K")
text_a_new = text_a_new.replace("Mrs.", "Mrs")
print(text_a_new)

This does already look very promising. Now, let's get back to our initial idea to split the string into sentences, store them in a list variable, and print them out.

In [None]:
text_a_list = text_a_new.split(".")
print(text_a_list)

It worked! We now have a list in which each element is a single sentence from our initial text! Great. But.. what about the other two problems? Again, there are several different ways of coping with them. In our case, we will simply delete the last element in our list via the `del list[index]` command. But what about the initial spaces at the beginning of some of the sentences? If we would try to delete them with our `str.replace()` method, we would obviously delete _every_ space in the string – which is something that we, of course, do not want to do. Luckily, Python again provides us with an appropriate built-in method called `strip()`. `strip()` deletes every space at the beginning and at the end of a string (together with some other expressions such as `\n` that are in most cases irrelevant for the actual single string).

### Third Task:
Alright, I think that this is something that you can do on your own.

 - Delete the **last** element in our list `text_a_list` with the help of the command `del text_a_list[index]` by replacing `index` with the appropriate index of the element we want to delete
 - Strip every element in the list `text_a_list` with the help of the `strip()` method (you do not need to pass any arguments here); in this case, the method does not change the string in the list directly but only returns a new string; this means that you have to assign it to the initial place in the list and thus replace it. Let me give you an example: Let's say that we want to strip the second entry in a list and leave it in the same place. We could do this in the following way:
 
```python
# remember that indexing usually starts with 0! thus, the second element has the index 1
some_list[1] = some_list[1].split() 
```

Now, do the same for each element in our list `text_a_list` and delete the last element!

In [None]:
# your code goes here:


### Fourth Task:

Now, we are finally done preparing the data for our comparison! Wait... we have only cleaned and processed `text_a` so far. What about `text_b`? I think it would be a great exercise for you to do everything that we have learned until now again for `text_b`. Like this, we will have two lists named `text_a_list` and `text_b_list` with the single sentences from `text_a` and `text_b` as string elements. We can then finally start comparing the sentences in each list to the sentences in the other list.

If you lack the time, you can also go to [footnote 4](#footnote4) where I have already provided the code for you (which is different to what we've done in this tutorial, just ignore and execute it). You only have to put it into a code cell and execute it to get our `text_b_list`.

Whatever way you choose, it is important that we have both lists ready to continue with the comparison in the third chapter of this tutorial.

In [None]:
# your code goes here


## Part Three: Comparing the Two Text Versions

OK, we have our two lists ready and would now like to compare them with each other. How could we do that?

For instance, we could first see whether both text versions actually have the same number of sentences. We can do this again by using our `len()` function and the Boolean operator `==`.

In [None]:
print(len(text_a_list) == len(text_b_list))

Luckily, they do! Because if they did not have the same number of sentences, we would have to think of a much more complicated solution than what we are trying to do next.

As a next step, why don't we try and compare the first element of the first list to the first element of the second list, then the second element of the first list to the second element of the second list and so on and so forth. Like this, we can check whether the single sentences of each version are the same as in the other version. Now... we could do this manually for every single element by again applying the Boolean operator `==` which is quite repetetive and tiring... I mean, we only have a couple of sentences, but imagine we had to do this for a text of several thousand words or sentences! Or.. we could use a loop! **Loops** are very helpful tools in every programming language and if you want to learn more about them, please follow [this link to DataCamp](https://www.datacamp.com/community/tutorials/loops-python-tutorial).

### For Loops

In this tutorial, we will concentrate on **for loops**. Before we explain what a **for loop** exactly does, let's first look at the code below and execute it. Maybe you can already figure out how it works by only looking at the code!

In [None]:
# conmparing the strings from text_a_list to the strings of text_b_list

for number in range(len(text_a_list)):
    similarity = (text_a_list[number] == text_b_list[number])
    print(similarity)

Again, let's break this down into several parts and start with the first line `for number in range(0,len(text_a_list))`. This might look a little confusing at first, however, the logic behind it is quite simple. But first, we have to understand **how a for loop works.**

A **for loop** basically consists of two parts and you can best understand it by literally reading it out:

`FOR (every) number IN (the) range from 0 to len(text_a_list) DO something`

As we know that `len(text_a_list) = 4 = len(text_b_list)`, this also reads as

`FOR (every) number IN (the) range from 0 to 4 DO something`.

And what are we doing in this case? We are taking the single numbers of each loop to use them as indices for our lists and compare the list elements to each other (note that the numbers we receive from `range()` are 0,1,2,3, because the `range(0,X)` function is inclusive for the first element (0) but not for the second one (X). Thus, `range(0,10)` means "0,1,2,3,4,5,6,7,8,9" - this is very annoying in the beginning, I agree, but you will get used to it quickly. You can learn more about for loops in the link that I have provided above.

If you are struggling with this right now, just accept it. It takes some time to understand all this. You don't learn a language in one day and neither do you learn programming in a day or even in one tutorial. It's important that you know that these concepts exist and that you use them to become familiar with them. And be assured, you will use for loops (as well as other loops) _everywhere_, because they are so powerful.

Now, if you look at the output, you can see a column of `FALSE` `FALSE` `TRUE` `FALSE`. This means that all sentences except the third one differ from the sentences in the other version. This is already quite an insight! Before we proceed and see how we can figure out what the exact differences are all about, let's quickly come back to the for loop again. It is important to understand how loops in general and for loops in particular work. A loop performs exactly like you would expect it to work when you hear that it is called "loop":

The code starts with the first number from our `range(len(text_a_list))` container, assigns the value from the container to the variable `number` (which is then `= 0`), and then goes through the indented instructions behind the `:`. To read more about the fundamental concept of `indentation` in Python, please follow __[this link](https://www.python-course.eu/python3_blocks.php)__. This means that during the first loop, the instructions would look as follows:

```python

similarity = (text_a_list[0] == text_b_list[0]) # because the variable number has the value 0 right now
print(similarity)

```

Once the result is printed out, the loop **returns to its beginning (the `for`)**. But this time, the second element from the `range(len(text_a_list))` iterable container is assigned to `number` (which is then `= 1`). The loop then again proceeds towards the Boolean comparison, prints out the result of the comparison between `text_a_list[1]` and `text_b_list[1]`, and again returns to the starting point of the for loop. It continues like this until the container with iterable elements (here, the `range(len(text_a_list))` function) is "empty". Then, the loop ends and Python continues with the code that follows after the for loop with its indented statements.

This was quite intense and useful at the same time. But we haven't finished yet, neither with our analysis nor with looping! In fact, let's again apply everything that we have learned so far and for loops in particular to compare _every single word_ of each text to the corresponding word in the other text version to check whether they are similar or not! **Let's also assume that no words are missing but that both text versions have not only the same number of sentences but also the same number of words in each sentence; thus, the only difference between the two text versions lies in the way the single words are written.**

### Task Four: 

Can you already figure out how this could work? Write it down! Note that loops can be nested, which means that you can have a for loop within a for loop! (or another loop within a for loop; or the inverse... so many possibilities!) So, you could have something like this:

```python

for element in container:
    for entry in element:
        print(entry)
        
```

Do you understand what this **loop nesting** does? Maybe it is easier if you assign concrete values to the `container`: Let's say, for instance, that `container = [[1,2], [2,1]]` would be a list of lists (which is an iterable object just like our `range()` function above)! Yes, lists can again be elements of other lists... we will use this concept in a second. What would the output look like? Try it out yourself.

Can you even write such a nested loop to compare the single words of each text version to the corresponding word in the other text version?


In [None]:
# your code goes here

***

#### Comparing the Words of Both Text Versions
Whether you've been successful or not, let's now look at some code that compares the words of each text version with the help of two nested for loops.

In [None]:
# comparing the words of each list to the words of the other list and printing the boolean result
for number in range(len(text_a_list)):
    # we first have to create two new lists with the single words from each sentences as string elements
    word_list_a = text_a_list[number].split(" ")
    word_list_b = text_b_list[number].split(" ")
    # as we know that the sentences have the same length, we can create pretty the same for-loop as before; this time with the word lists
    for num in range(len(word_list_a)):
        similarity = (word_list_a[num] == word_list_b[num])
        print(similarity)
    

OK, this seems to be working and by looking at the (long) list of Boolean values we can already figure out that five words seem to be different from each other. Now, this again is already quite helpful, but still... wouldn't it be better to simply print out the words that differ from each other? Yes, it would indeed. But how can we do that?

You might have already guessed it: **We need yet another tool!** But I promise, this will be the last new concept in this tutorial.

### If-Statements

The last thing I want to introduce in this tutorial is an **if-statement**. If-statements are great to structure the flow of our program. Again, it might be the easiest way to understand if-statements by using "pseudo code" (which is more readable than actual code -  although certain people would even call Python "pseudo code" due to its readability, "magic behind the scenes", and... ah, we don't care).

So, what does an if-statement do? Well, it acts as kind of a doorkeeper by setting up certain conditions that have to be evaluated as `TRUE` in order to execute the indented code that follows the if-statement. For instance, look at the following "code":

`IF i am younger than twelve THEN i only have to pay 1€ for a bus ticket"`

Which, more abstractly, could be read as

`IF myAge <= 12 THEN ticketPrice = 1`

Or, in Python:

```python

busTicket = 15 # yes, it's very expensive!
myAge = 13
if myAge <= 12:
    busTicket = 1
print(busTicket)

```
What would the output in this case look like?

I think you got the general point of if-statements and conditional statements in programming languages such as Python: We first have to fullfil a certain condition and if this condition is met (and therefore results in a Boolean `TRUE`), the following indented statement(s) below the if-statement will be executed. If the condition results in a `FALSE`, the indented code after the statement will be skipped and Python continues with the code after the if-statement. However, there is always more to learn. For instance, you can combine one dimensional if-statements with further conditions (`elif` which is short for `else if` and `else`). Or you could create a more complex condition that includes several lower level conditions. See, for example, the code below (what does it do? What is the value of the output?):
```python

busTicket = 0
myAge = 15
if myAge <= 12:
    busTicket = 1
elif (myAge > 12) and (myAge <= 18):
    busTicket = 5
else:
    busTicket = 15
print(busTicket)
```


<br>To wrap things up :

`IF` you want to know more about if-statements, please see the following __[link](http://anh.cs.luc.edu/handsonPythonTutorial/ifstatements.html)__ `ELSE` please continue with the next part of our tutorial.

***
Now, we have the prerequisites to print out the words that differ from each other. We can even reuse the code that we have already written above and only add some changes to it.

In [None]:
# comparing the words of each list to the words of the other lists and only print out the words that differ from each other
for number in range(len(text_a_list)):
    # we first have to create two new lists with the single words from each sentences as string elements
    word_list_a = text_a_list[number].split(" ")
    word_list_b = text_b_list[number].split(" ")
    # as we know that the sentences have the same length, we can create pretty the same for-loop as before; this time with the word lists
    for num in range(len(word_list_a)):
        similarity = (word_list_a[num] == word_list_b[num])
        if not similarity:
            print("The word in text_a is: " + word_list_a[num]) # note that you can join strings with arithmetical operators such as +; "hello " + "world!" results in one single string "hello world!" 
            print("The word in text_b is: " + word_list_b[num])

Awesome! We have almost achieved what we set up as an idea for our program in the beginning of this tutorial. We have an overview of the words that differ from each other between the two text versions.

However, we also said that we wanted to create and save a single file that includes a new text version that merges the two text versions. We can now easily do this with all the things we have learned so far. We only have to introduce one tiny last function (No, this is not a concept! I thus didn't lie when I told you that the if-statement was the last concept in this tutorial). To print out our new text as a single string and later save it as a file, we somehow need to merge all the single strings that we have created so far into one single output string again. We can do this with the help of the `str.join()` method. Here is an example of how this method works:

In [None]:
# example for join() function
list_of_strings = ["Hello", "World!", "I", "am", "a", "single", "string!"]
single_string = " ".join(list_of_strings)
print(single_string)
print(type(single_string))

Of course, instead of the `" "` space you could join the strings in the list with whatever string you like. And indeed, in our example we will be joining the sentences with the help of a combined string of `"."` + `" "`.


## Part Four: The Final Step

The only thing that is left for us now is to combine what we have learned so far in order to finally create our merged file. In a second step, we will then save it in our folder as a `.txt` document. Of course, you can try to do this on your own first. However, as we might need some additional methods and functions though, you might have to search on the internet in some situations or go to the respective documentaions. If you are in the mood of a challenge, go for it!

***

I have provided the code of one possible solution below. Take your time and go through it step by step and try to fully understand what is going on. If you've already forgotten what the `open()` function does, please go back to the beginning of this tutorial. In order to safe our file, we have to adjust certain arguments and we will also use some new functions and methods. However, by now you should be able to understand what is going on even if you do not know some of the functions. If you don't know what something means or does, please look at the comments or try to find answers on the web (which is something that you will be constantly doing while writing code anyway, so it's good to get used to it as soon as possible). Again, we can reuse some parts of the code that we have written above.

In [None]:
# the final code tp solve our problem

# we first create an empty list in which we will store the sentences from the two merged text versions via the .append() method
new_text_list = []

# comparing the words of each list to the words of the other lists 
for number in range(len(text_a_list)):
    
    # we first have to create two new lists with the single words from each sentences as string elements
    word_list_a = text_a_list[number].split(" ")
    word_list_b = text_b_list[number].split(" ")
    
    # creating an empty word list in which we store the words from the two text versions
    new_word_list = []
    # as we know that the sentences have the same length, we can create pretty the same for-loop as before; this time with the word lists
    for num in range(len(word_list_a)):
        similarity = (word_list_a[num] == word_list_b[num])
        if not similarity:
            new_word_list.append("{ The word in text_a is: " + word_list_a[num] + \
            " | The word in text_b is: " + word_list_b[num] +"}")
        else:
            new_word_list.append(word_list_a[num])
    
    # now we are joining the string in new_word_list to one single string that we append to the new_text_list
    new_text_list.append(" ".join(new_word_list))

# once both for-loops have finished, we should now have a new list new_text_list with the merged sentences, \
# let's also join them to one string that we can print out and later store in a single file

new_text = ". ".join(new_text_list)
new_text = new_text + "." # adding the missing full stop at the end of the last sentence
print(new_text)

Finally, let's save the new file as a `.txt` file in our folder and name it `new_text.txt`.

In [None]:
f = open("new_text.txt", "w")
f.write(new_text)
f.close()

Take a look at your current working directory. The new file should be there and you should be able to open it with any kind of text editor!

## The End
Voilà! We are finally done. We have accomplished quite a lot throughout this tutorial. If you feel overwhelmed or have the impression that you've only understood half of what we did: This is totally normal. Again, it takes a lot of time and practice to get at least a little familiar with the different concepts. Just go back to the start of this tutorial and go through the code again. Try to understand what we were doing in each step. Or try out some of the millions of other tutorials that are out there. Just search on the web. If you are not content with one tutorial, try something else. Maybe you prefer video tutorials instead of long text walls. Maybe you need a real book in front of you. Maybe you just need to start to write code and look up different concepts according to your needs. Also, I've set up a short bibliography on my homepage with sources that I personally found and find quite useful. Feel free to have a look at this bibliography if you want.

I hope that this tutorial was at least a little helpful. Please let me know if you have any suggestions, comments, or corrections. You can always contact me under my [mail address](mailto:admin@thomas-jurczyk.com).

Thank you and good luck with your further steps in the world of Python!
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>



# Appendix:
# Text Data Used in This Tutorial

**Text A** – simply copy & paste the text below into a .txt file (if you're using Windows, you can create .txt files with the Editor)

`Someone must have been telling lies about Josef K., he knew he had done nothing wrong but, one morning, he was arrested. Every day at eight in the morning he was brought his breakfast by Mrs. Grubach's cook - Mrs. Grubach was his landlady - but today she didn't come. That had never happened before. K. waited a little while, looked from his pillow at the old woman who lived opposite and who was watching him with an inquisitiveness quite unusual for her, and finally, both hungry and disconcerted, rang the bell.`<br>

**Text B** – simply copy & paste into a .txt file (if you're using Windows, you can create .txt files with the Editor)

`Someone must have been telling lies about Josef K., he knew he had dne nothing wrong but, one morning, he was arrested. Every day at eight in the morning he was brought his breekfast by Mrs. Grubach's cook - Mrs. Grubach was his landlady - but today she didn't come. That had never happened before. K. waited a little while, looked from his pilloow at the old woman who lived oposite and who was watching him with an inquiseness quite unusual for her, and finally, both hungry and disconcerted, rang the bell.`<br>




# Footnotes
<br>
<div id="footnote1"> 1 : The following answer from a user on <a href="https://stackoverflow.com/questions/155609/whats-the-difference-between-a-method-and-a-function" target="_blank">Stack Overflow</a> sums up the differences between a <b>function</b> and a <b>method</b>:<br><br>
    <i>A <b>function</b> is a piece of code that is called by name. It can be passed data to operate on (i.e. the parameters) and can optionally return data (the return value). All data that is passed to a function is explicitly passed.

A <b>method</b> is a piece of code that is called by a name that is associated with an object. In most respects it is identical to a function except for two key differences:

A method is implicitly passed the object on which it was called.

A method is able to operate on data that is contained within the class (remembering that an object is an instance of a class - the class is the definition, the object is an instance of that data).</i>
</div><br>
<div id="footnote2"> 2 : For example, you could import the _[NLTK](https://www.nltk.org/)_ library via `import nltk` to make use of the function sentence tokenizer (note that you might have to download it first via `pip install nltk` in your Windows command line interface). This tokenizer can deal with all the problems above and will return a nicely split string as a list. If you want, you can copy & paste the code below into a code cell and execute it to see the result (note again that you have to have NLTK installed).</div>

```python
# get a list of sentences using NLTK
from nltk.tokenize import sent_tokenize
text_a_sents_nltk = sent_tokenize(text_a)
print(text_a_sents_nltk)
```
    
<div id="footnote3"> 3 : No, stop. Classes are really nothing that we have to discuss at this point. Just focus on the other stuff that is going on. Still interested? Well, you could take a look at __[this resource](https://www.digitalocean.com/community/tutorials/how-to-construct-classes-and-define-objects-in-python-3)__. 
</div><br>

<div id="footnote4"> 4 : Here is the code you need to create the list `text_b_list`: </div>
    
```python
    
# loading the text
f = open("text_b.txt", "r")
text_b = f.read()

# replacing the problematic parts of the string
text_b = text_b.replace("K.", "K")
text_b = text_b.replace("Mrs.", "Mrs")

# creating the list
text_b_list = text_b.split(".")

# deleting the last element
del text_b_list[-1]

# strip the strings in the list
text_b_list = [x.strip() for x in text_b_list]

print(text_b_list)
    
```
