# Python Bootcamp (oDCM)

*After installing Anaconda and going over the first 3 chapters of the [Introduction to Python](https://learn.datacamp.com/courses/intro-to-python-for-data-science) DataCamp course, you should have an understanding of variables, lists, and functions. Therefore, we assume you know how to load a Jupyter Notebook and perform basic operations in Python. In this tutorial, we fill in the gaps of knowledge required for web scraping and APIs.*

--- 

## Learning Objectives

Students will be able to: 
* Apply conditional logic using if-else statements
* Define and add items to a dictionary
* Loop over a list of elements 
* Write functions using parameters
* Handle common error messages and debug code
* Read and write text files 

--- 

## Acknowledgements
This course draws on a variety of online resources that can be retrieved from the [course website](https://odcm.hannesdatta.com/#student-profile--prerequisites). 


--- 

## Support Needed?
For technical issues outside of scheduled classes, please check the [support section](https://odcm.hannesdatta.com/docs/course/support) on the course website.

---

## 1. Conditional Logic

### 1.1 Comparison Operators
**Importance**  
In one of the very first Datacamp exercises, you were asked to assign the value `100` to a variable `savings` with the statement `savings = 100`. In practice, you oftentimes find yourself in a situation where you want to make a decision based on whether something is true or false. For example, if we reach a negative account balance (`balance < 0`) we want to transfer money from our savings account to our checking account. 

In Python we can make such comparisons with logical operators like you have seen before in your math classes: 

| Operator | Example | What it does | 
| :--- | :--- | :--- |
| > | `a > b` |  Truthy if **a** is greater than **b** |
| < | `a < b` | Truthy if **a** is less than **b** |
| >= | `a >= b` | Truthy if **a** is greater than or equal to **b** |
| <= | `a <= b` | Truthy if **a** is less than or equal to **b** |



**Let's try it out!**

Now assume both `a` and `b` take on the value `1`. Before running the cell below, try to evaluate whether each of the four values (`>`, `<`, `>=`, `<=`) translate into a `True` or `False` value (also known as booleans). 

In [1]:
a = 1 
b = 1
print(a > b)
print(a < b)
print(a >= b)
print(a <= b)

False
False
True
True


Rather than looking for values greater or smaller than, we can also check whether items take on the same value (`a == b`) or a different value (`a != b`). Note the difference between variable assignment (single `=`) and the comparison operator (double `==`). For example, `savings = 100` means we create a new variable that we assign a value of `100`, whereas `savings == 100` checks whether the variable already contains a value of `100` (if not it will return `False`). 

In [2]:
savings = 100 # variable assignment
print(savings == 100) # comparison 
print(savings != 100) # comparison (False because it is 100!)

True
False


### 1.2 If-statements

**Importance**  
We can use these comparison operators as inputs for if-statements which tell the computer to choose a different path based on some type of comparison: 

First, it checks whether the first condition is met (`if some condition`). If not, it will move on to the next line (`elif some condition`). If neither of those conditions is true, it will `do something` according to the `else` clause.

In the bank account example, our program could look like this:

In [3]:
if balance < 0: 
    print("You should top up your checking account to avoid paying interest")
elif balance == 0: 
    print("Your checking account balance is exactly €0.00, be careful when making new payments!")
else: 
    print("You have a positive balance")

NameError: name 'balance' is not defined

A few remarks:   
* After each comparison there is a colon (`:`). This tells the program that it's the end of the comparison.  
* The statement below each if, elif, else clause needs to be indented (that is, a TAB or 4 spaces to the right). This improves structure and readability. 
* There can be multiple `elif` statements. For example, if you want to display another message when the balance is positive though very low (e.g., `balance < 50`).
* Note how we can derive that the `balance` must be positive from the fact that it is neither negative (`balance < 0`) nor equal to zero (`balance == 0`).

**Let's try it out!**  
Add a variable `balance` to the top of the cell and assign it a value of `-10` and run the cell. Now do it again and change the value to `0` and `10`. Does the output match your expectations? 

**Exercise 1**  
Say that we want to develop a program that advises students on whether they can take the Online Data Collection and Management course. On the [course catalog](https://catalogus.tilburguniversity.edu/osiris_student_tiuprd/OnderwijsCatalogusKiesCursus.do) page we find that the course is instructed to Marketing Analytics students. Furthermore, Research Master students can audit this course upon approval of the instructor. 

Use conditional logic to write a program that checks the value of a variable `study` and prints one of the following statements: 
* *You satisfy the course requirements* (for Marketing Analytics students)
* *Please send an email with your motivation to enroll in the course to Hannes Datta* (for Research Master students)
* *You do not satisfy the course requirements. Please contact your educational officer if you want to enroll in the course.* (for all other studies, e.g. "Psychology")

In [4]:
# your code goes here!

In [5]:
# solution
if study == "Marketing Analytics":
    print("You satisfy the course requirements")
elif study == "Research Master":
    print("Please send an email with your motivation to enroll in the course to Hannes Datta")
else:
    print("You do not satisfy the course requirements. Please contact your educational officer if you want to enroll in the course.")

NameError: name 'study' is not defined

### 1.3 And / Or Operators

**Importance**  
In reality, you may want to check for multiple conditions. For example, employers with a full-time job go to their work every workday provided that it's not a holiday. In other words, both conditions must be met (`and`). Alternatively, you can check if at least one of the conditions is satisfied (`or`). For example, you go to bed if you're tired or whether it's bedtime already. Lastly, you may require a value to be NOT true, for example, workers go to work during workdays not during the weekend (`not`). 


| Operator | Example | What it does | 
| :--- | :--- | :--- |
| and | if workday and no_holiday: <br> &nbsp;&nbsp;&nbsp;&nbsp;print("Go to work!") | Truthy if both `workday` and `no_holiday` are true |
| or | if tired or bed_time: <br> &nbsp;&nbsp;&nbsp;&nbsp;print("Go to sleep!") | Truthy if either `tired` or `bed_time` is true (or both) |
| not | if not weekend: <br> &nbsp;&nbsp;&nbsp;&nbsp;print("Go to work!") | Truthy if the opposite is true |

**Let's try it out!**  
Change the boolean values (from `True` to `False` and vice versa), and see how it affects the output! Note that there's no need to write `if workday == True` (`if workday` is shorter and preferred).

In [6]:
workday = True
no_holiday = True

if workday and no_holiday:
    print("Go to work!")

Go to work!


In [7]:
tired = True
bed_time = False

if tired or bed_time:
    print("Go to sleep!")

Go to sleep!


In [8]:
weekend = False

if not weekend:
    print("Go to work!")

Go to work!


**Exercise 2**  
In addition to the study program, the oDCM [course catalog](https://catalogus.tilburguniversity.edu/osiris_student_tiuprd/OnderwijsCatalogusKiesCursus.do) page also describes that students are expected to have acquired a working knowledge in Python. Extend your program of Exercise 1 such that it not only checks whether students have the right study program, but also the required `prior_knowledge` (boolean variable).

In [9]:
# solution (it's not necessary to check whether prior_knowledge == True)
if study == "Marketing Analytics" and prior_knowledge:
    print("You satisfy the course requirements")
elif study == "Research Master" and prior_knowledge:
    print("Please send an email with your motivation to enroll in the course to Hannes Datta")
else:
    print("You do not satisfy the course requirements. Please contact your educational officer if you want to enroll in the course.")

NameError: name 'study' is not defined

Once you chain `and` and `or` statements things become more complex. Suppose that we want to calculate whether a student passed the course or not. As mentioned in the grading criteria, students pass the course if the total course grade is >= 5.5, and the exam is passed (>= 5.5). Since the student scored a 4.3 in her first attempt, she needed to take the resit for which she scored a 5.6. Still, her final grade was only a 5.4 because her team did not do really well in the team project. 

According to the grading criteria, she therefore did not pass the course. Yet the boolean expression below evaluates to `True`, why is that? 

In [10]:
final_grade = 5.4
exam_grade = 4.3
resit_grade = 5.6

final_grade > 5.5 and exam_grade > 5.5 or resit_grade > 5.5

True

Python implicitly evaluates the code from left to write which implies that: 
* The final grade and the exam grade must be greater than or equal to 5.5
* OR the resit grade must be greater than or equal to 5.5 (regardless of the final grade)

We can fix this by explicitly enforcing the structure of the comparisons with parentheses:

In [11]:
final_grade > 5.5 and (exam_grade > 5.5 or resit_grade > 5.5)

False

**Exercise 3**  
The minimum age for driving in the Netherlands is 17, but you cannot get a full license until the age of 18. In between 17 and 18, you need to be accompanied by a coach (e.g., parent who sits next to you in the car). 

The code snippet below should reflect this policy but currently has some issues. Add parentheses to the conditional expressions below such that it adheres to the policy.

In [12]:
driver_license = False
age = 17
coach = True

if driver_license and age >= 18 or age == 17 and coach: 
    print("You're allowed to drive!")    
else: 
    print("You're not allowed to drive!")    

You're allowed to drive!


In [13]:
# solution
if driver_license and (age >= 18 or age == 17 and coach): 
    print("You're allowed to drive!")    
else: 
    print("You're not allowed to drive!")      

You're not allowed to drive!


### 1.4 Wrap-up
Congrats! You've just learned the basics of conditional logic which forms the foundation of any programming language and is a powerful tool. As a programmer, it not only gives you control over what happens under which conditions but it's also incredibly helpful in the later stages of data collection and management (e.g., filtering down on rows).

---
## 2. Dictionaries

### 2.1 Creating Dictionaries
**Importance**  
Previously, we looked at the following list of lists (only the first 2 students are shown): 

In [14]:
students_list = [["Lotte", "Marketing Analytics", True], 
            ["Joep", "Research Master", True]]

One of the major limitations of lists is that it's unclear what each item is (e.g., we have no clue what the `True` and `False` values mean without context). Ideally, we'd like to assign these values a label to encode more information than only the order. A dictionary (`{}`) is a data structure that consists of key-value pairs and thus addresses this need. The keys describe the data (labels) and the values represent the data itself. 

Keys and values are separated by a colon, and we denote the next item with a comma (just like a list). Here are the data of the first student converted into a dictionary: 

In [15]:
student_dict = {"name": "Lotte", 
                "study": "Marketing Analytics", 
                "prior_knowledge": True}

print(student_dict)

{'name': 'Lotte', 'study': 'Marketing Analytics', 'prior_knowledge': True}


**Let's try it out!**   
Add another key to `student_dict` that stores the age of Lotte (23). What happens once you use a key that has already been declared previously?  

### 2.2 Accessing Data in Dictionaries

**Importance**  
To get the value out of a dictionary you pass it the key (just like you use a dictionary: you look up the word and find its definition or translation):

In [16]:
print(student_dict["name"]) 

Lotte


**Let's try it out!**  
Try to get the value of the `study` program. What happens once you pass it a key that does not exist (e.g., `email`)?

Indeed, you get a KeyError if the key does not exist. You can overcome this problem by using `.get()` which returns the value if the key exists and `None` otherwise.

In [17]:
print(student_dict.get("name"))
print(student_dict.get("email"))

Lotte
None


The examples we looked at thus far are relatively straightforward. In reality, there is often a hierarchy of data structures (e.g., a dictionary or list within a dictionary). For example, `enrollments` stores a list of dictionaries with courses. Each course contains a list of students enrolled. 

In [18]:
enrollments = [
    {
        "course": "Online Data Collection & Management",
        "instructor": "Hannes Datta",
        "students": [
            {
                "name": "Lotte",
                "study": "Marketing Analytics",
                "prior_knowledge": True
                
            }, 
            {
                "name": "Joep",
                "study": "Research Master",
                "prior_knowledge": True,
                "honors": "The Societal Challenge of Migration"
            }
        ]
    }
]

**Exercise 4**  
1. Access the `name` key in the `enrollments` list and print the name `Lotte` to the console.
2. Change the key such that it returns the `honors` program of a student enrolled in oDCM. Make sure your code doesn't throw any `KeyErrors`.

In [19]:
# Question 1 
print(enrollments[0]["students"][0]["name"])

Lotte


In [20]:
# Question 2
print(enrollments[0]["students"][0].get("honors"))

None


### 2.3 Adding Elements to a Dictionary

**Importance**  
By now you should be able to handle dictionary keys regardless of whether they are present or not. But how about adding a new key-value pair to a dictionary? 

Contrary to lists, it's not just a matter of appending another item to the end. Rather, you pass the dictionary a key on the left-hand-side of the equation and the corresponding value on the right-hand-side. 

For example, we can add Lotte's email to the `student_dict` as follows. 

In [21]:
student_dict["email"] = "lotte.v.veen@tilburguniversity.edu"

**Let's try it out!**  
Make sure the `enrollments` stay up to date by also adding Lotte's email there. Can there be an `email` associated with Lotte without any email registered for `Joep`? What happens once you change Lotte's email to `lotte.v.veen@uvt.nl`? 

**Exercise 5**  
Add another `course` (Data Preparation and Workflow Management) taught by Hannes Datta to the `enrollments` list which stores a dictionary with student enrollment data of Sanne, a Marketing Analytics student, with programming prior knowledge.

In [22]:
# solution
enrollments.append({
    "course": "Data Preparation and Workflow Management",
    "instructor": "Hannes Datta",
    "students": [{
        "name": "Sanne",
        "study": "Marketing Analytics", 
        "prior_knowledge": True
    }]
})

### 2.4 Wrap-up
Although dictionaries may look a little daunting at first, we hope you got the hang of it. In future lessons about APIs, you'll see why this data structure is so important for this course. For now, you should have gained the skills to define and access data in dictionaries and argue when a list or dictionary is preferred to store your data. 

---
## 3. Looping

### 3.1 Iterate over items in list

**Importance**  
Naturally, when programming there are a lot of things that you want to repeat. For example, on an e-commerce site you may want to print the description, price, and photo for all products. Then we use a `for` loop to do something with every item in the `iterable_object` (e.g., a list). The `item` references the current position of the iterator and will run through every item of the collection. 

For example, here we print the price of all items in the list `prices`.

In [23]:
prices = [9.99, 3.95, 24.95]

for price in prices:
    print(price)

9.99
3.95
24.95


**Let's try it out!**  
Add another product price (8.95) to the `prices` list and run the cell again. What happens?

Given the upcoming sales period, you'd like to apply a 10% discount on all your products. Therefore you use a for-loop to update all prices and store them into a new list called `prices_discount`. 

In [24]:
prices_discount = []

for price in prices: 
    price_discount = price * 0.85
    prices_discount.append(price_discount)

print(prices_discount) # in a future lessson we learn you how to round figures!

[8.4915, 3.3575, 21.2075]


**Exercise 6**  
Your manager suggests changing the discount policy so that customers are incentivized to buy multiple items. For each additional purchase they receive a 10% extra discount on the cheapest item: 
* 1 item = 10% discount 
* 2 items = 20% discount on the cheapest item, 10% discount on most expensive item
* 3 items = 30% discount on the cheapest item, 20% on the second most expensive item, 10% discount on the most expensive item, etc.

Write a program that prints out the total price for all products. Your program should work for any number of purchases. Tip: you can sort the list of prices in descending order using `sorted(prices, reverse=True)`.

In [25]:
# solution
total_price = 0
discount_rate = 0.10

prices = sorted(prices, reverse=True)

for price in prices: 
    total_price += price * (1-discount_rate)
    discount_rate += 0.10

print(total_price)

33.211999999999996


**Exercise 7**  
We'll continue with the enrollment evaluation program we discussed in exercise 2. More specifically, we provide you with a list of student names, study program, and prior knowledge (`students`), and we'd like you to print a list of all student names that satisfy the course requirements. 

In [26]:
# first name, study, prior knowledge
students = [["Lotte", "Marketing Analytics", True], 
            ["Joep", "Research Master", True], 
            ["Mirte", "Marketing Analytics", False], 
            ["Dirk", "Economics", True], 
            ["Sanne", "Marketing Analytics", True], 
            ["Roy", "Research Master", False]]

# your code goes here!

In [27]:
# solution (most elegant)
for student in students: 
    if student[1] in ["Marketing Analytics", "Research Master"] and student[2]:
        print(student[0])

Lotte
Joep
Sanne


In [28]:
# solution (alternative 1 - iterator)
for student in students: 
    if (student[1] == "Marketing Analytics" or student[1] == "Research Master") and student[2]: # mind the brackets!
        print(student[0])

Lotte
Joep
Sanne


In [29]:
# solution (alternative 2 - counter)
for counter in range(len(students)): 
    if (students[counter][1] == "Marketing Analytics" or students[counter][1] == "Research Master") and students[counter][2]:
        print(students[counter][0])

Lotte
Joep
Sanne


---
### 3.2 Iterate over indices
**Importance**  
Another common strategy to loop over items in a list is to generate a list of indices with the `range()` function. This function returns a sequence of numbers of which the first number is inclusive and the last number is exclusive. For example, `range(1,10)` generates the numbers 1, 2, ... 9. If you only provide it a single number it assumes you start at zero. 

In [30]:
print(list(range(1,10))) # from 1 to 9
print(list(range(10))) # from 0 to 9

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


**Let's try it out!**  
Change the input values and see how it affects your output. What happens if you choose negative values, or if the first value is larger than the second value? 

As said, we use often these counter variables to loop over a list of items. In that case, we generate a sequence of numbers of the same length as the number of items in the list. For example, for `prices = [9.99, 3.95, 24.95]` we use `range(3)` because there are 3 items in the list with index `0`, `1`, and `2`. 

Another way of writing this is `range(len(prices))` which first determines the length of the list of prices (3) and then passes that value to the `range()` function. The advantage is that this method still works if you add (or remove) items from the prices list (so you don't manually need to change it to `range(4)` if you add another price to the list). Note: you don't see the numbers you generated until you use the `list()` keyword.

In [31]:
print(list(range(3)))
print(list(range(len(prices))))

[0, 1, 2]
[0, 1, 2]


Append another item to the `prices` list and run the cell above again. Do both ranges still give the same output? Why (not)? 

---
Next, we combine the concepts of counters and list indexing to loop through all individual elements like this: 

In [32]:
for counter in range(len(prices)):
    print(prices[counter])

24.95
9.99
3.95


**Let's try it out!**  
Can you print the amount of VAT (21%) for each of the prices? Does your program still run if you append a new price to `prices`? 

Alternatively, we can iterate using a `while` loop, which has a different format: 


The while loops continue to execute while a certain condition is truthy and will end when they become falsy. Therefore, you need to specify the termination conditions manually. If the condition doesn't became false at some point, your loop will continue forever. 

In the example below, it the `while` loop prints the prices until `counter` exceeds the value 3 (i.e., the number of items in the list). 

In [33]:
counter = 0
while counter < len(prices):
    print(prices[counter])
    counter += 1

24.95
9.99
3.95


**Let's try it out!**  
What happens once you leave out `counter += 1` and run the cell? Why is that? 

**Exercise 8**  
Did you know you that can insert emojis into your code? It works a bit different than you're used to though. You need to pass it a specific [unicode](https://unicode.org/emoji/charts/full-emoji-list.html), for example, `\U0001f600` turns into:

In [34]:
"\U0001f600" # in some rare ocassions you may not see a happy emoji below (in that case you can still do this exercise by replacing the emoji by X)

'😀'

That was easy, wasn't it? Here's a challenge for you: use for-loops and counters to construct the following figures (and no: we don't want you to copy-paste 100 emojis...). You can either use a `for` or `while` loop: 

<img src="images/smiley.png" align="left" width=20%/>

Here's another one if you're up for it (tip: you need to swap the [unicode](https://unicode.org/emoji/charts/full-emoji-list.html) for another one).

<img src="images/triangle.png" align="left" width=20%/>

In [35]:
# solution
for counter in range(10):
    print("\U0001f600" * 10)

😀😀😀😀😀😀😀😀😀😀
😀😀😀😀😀😀😀😀😀😀
😀😀😀😀😀😀😀😀😀😀
😀😀😀😀😀😀😀😀😀😀
😀😀😀😀😀😀😀😀😀😀
😀😀😀😀😀😀😀😀😀😀
😀😀😀😀😀😀😀😀😀😀
😀😀😀😀😀😀😀😀😀😀
😀😀😀😀😀😀😀😀😀😀
😀😀😀😀😀😀😀😀😀😀


In [36]:
# solution
for counter in range(10):
    print("\U0001f60d" * counter)


😍
😍😍
😍😍😍
😍😍😍😍
😍😍😍😍😍
😍😍😍😍😍😍
😍😍😍😍😍😍😍
😍😍😍😍😍😍😍😍
😍😍😍😍😍😍😍😍😍


---
### 3.3 Iterate over Dictionaries
**Importance**  
Lists on their own are already iterable, but that's not the case for dictionaries. Yet we can iterate over the `.keys()` and `.values()`. Let's take another look at the `student_dict` dictionary: 

In [37]:
student_dict

{'name': 'Lotte',
 'study': 'Marketing Analytics',
 'prior_knowledge': True,
 'email': 'lotte.v.veen@tilburguniversity.edu'}

In [38]:
for key in student_dict.keys(): 
    print(key)

name
study
prior_knowledge
email


In [39]:
for value in student_dict.values(): 
    print(value)

Lotte
Marketing Analytics
True
lotte.v.veen@tilburguniversity.edu


In practice, you often want to access both keys and values using `.items()`: 

In [40]:
for key, value in student_dict.items(): 
    print(key, value)

name Lotte
study Marketing Analytics
prior_knowledge True
email lotte.v.veen@tilburguniversity.edu


**Exercise 9**  
Given a list of `students_grades` calculate the average grade of all five students. Your code should work properly regardless of the number of students (in other words, don't calculate it by hand!).

In [41]:
students_grades = {"Lotte": 6.7, "Joep": 7.2, "Mirte": 9.3, "Dirk": 5.2, "Sanne": 7.5, "Roy": 6.9}
# your code goes here

In [42]:
# solution
total = 0
for value in students_grades.values(): 
    total += value
total/len(students_grades)

7.133333333333334

**Exercise 10**  
In exercise 4 and 5, we looked at a data structure that stores student `enrollments`. We can apply our learnings from this section to easily gather a list of all student names. Write a for-loop to extract all student `names` of *all* courses to a list `students_enrolled`. In other words, make sure your code outputs a list of all student names (even if new courses or students are added).

In [43]:
# solution
students_enrolled = []

for course in enrollments: 
    for student in course["students"]:
        students_enrolled.append(student["name"])
    
print(students_enrolled)

['Lotte', 'Joep', 'Sanne']


---
### 3.4 List and Dictionary Comprehensions
**Importance**  
In programming we usually aim to write more concise code. Python offers an elegant way to loop over elements in a list or dictionary in a single line of code. In other words, the outcome is identical but it requires fewer lines of code and is also more efficient. 

Rather than calculating the discounted prices like this, ...

In [44]:
prices_discount = []

for price in prices: 
    price_discount = price * 0.85
    prices_discount.append(price_discount)

print(prices_discount)

[21.2075, 8.4915, 3.3575]


... you can achieve the same result in a singe line of code: 

In [45]:
[0.85 * price for price in prices]

[21.2075, 8.4915, 3.3575]

In the same rein, looping over a dictionary can also be simplified: 

In [46]:
prices_dict = {"beer": 9.99, "meat": 3.95, "toaster": 24.95}
prices_dict_discount = {}

for product, price in prices_dict.items():
    price_discount = price * 0.85
    prices_dict_discount[product] = price_discount
    
prices_dict_discount

{'beer': 8.4915, 'meat': 3.3575, 'toaster': 21.2075}

In [47]:
{product: price * 0.85 for product, price in prices_dict.items()}

{'beer': 8.4915, 'meat': 3.3575, 'toaster': 21.2075}

**Let's try it out!**  
Revisit your solutions to exercises 9 and 10, and come up with a more concise way to get to the same solution. 

---
## 4. Functions

### 4.1 Syntax

**Importance**  
Without realizing it, you have already used plenty of Python functions: `print()`, `len()`, `range()` just to name a few. Formally, a function is a process for executing a task which, simply put, is a bunch of lines of code wrapped up in a package that you can reuse. In other words, it's useful for executing similar procedures over and over. This helps your code to stay DRY (Don't Repeat Yourself) which is a software development principle to keep your code clean and prevent code duplication. 

For example, let's say you a `book_price_usd`, `tablet_price_usd`, and `laptop_price_usd` variable that stores the product price in dollars, which you'd like to convert to euros. Then you could repeat yourself 3 times in a row, like this: 

In [48]:
book_price_usd = 19
book_price_eur = book_price * .82 
print(f"The price in euros is {book_price_eur}")

tablet_price_usd = 349
tablet_price_eur = tablet_price_usd * .82 
print(f"The price in euros is {tablet_price_eur}")

laptop_price_usd = 699
laptop_price_eur = laptop_price_usd * .82 
print(f"The price in euros is {laptop_price_eur}")

NameError: name 'book_price' is not defined

But what if the USD-EUR exchange rate changes to `.83` tomorrow? Then, we need to update our code in 3 places. Now, imagine what would happen if we were to keep track of the prices of thousands of products... It would take ages! For that reason, it's recommended to not repeat yourself. Functions help a great deal with that. 

Each function starts with `def` followed by the name of the function (usually in snake_case = words separated by underscores; you can give it any name you like!), parentheses `()`, and a colon (`:`). Like if-statements, the next code block is indented. 

You can invoke the function by calling the name of the function followed by parentheses, for example:

In [49]:
def sing_happy_birthday(): 
    print("Happy Birthday To You")
    print("Happy Birthday To You")
    print("Happy Birthday Dear You")
    print("Happy Birthday To You")

sing_happy_birthday()

Happy Birthday To You
Happy Birthday To You
Happy Birthday Dear You
Happy Birthday To You


**Let's try it out!**  
Call the function `sing_happy_birthday()` three more times. Do you see how just three lines of code create another 12 print statements? 

---
### 4.2 Variable Scope
**Importance**  
Within a function, you can assign variables like you're used to. It's, however, important to realize that variables created in functions are scoped in that function. That is to say, you can only refer to these variables inside the body of the function. For example, the `print(instructor)` statement will raise an error. 

In [50]:
def say_hello():
    instructor = "Hannes"
    print("Hello " + instructor)
    
say_hello()
print(instructor) 

Hello Hannes


NameError: name 'instructor' is not defined

**Let's try it out!**  
Change the location of the `instructor` variable such that you solve the `NameError`. Does the `say_hello()` function also have access to the `instructor` variable? 

---
### 4.3 Parameters
**Importance**  
Most functions accept input, perform some transformation, and then output the result. Parameters are temporary input variables you can use within the function. You can think of them as placeholders that get assigned when you call the function. You can name parameters anything you want but it's recommended to use semantic names (that have a meaning). For example, not `string1` but `first_name`. Below the function `say_hello()` takes a single parameter `first_name` that is used within the body of the function. 


In [51]:
def say_hello(first_name):
    print("Hello " + first_name)
    
say_hello("Hannes")

Hello Hannes


**Let's try it out!**  
Call the function `say_hello()` with your own name. Add a second parameter `last_name` and adapt the function such that it prints the full name (first and last name combined). 


By default, the function assumes that the first argument (e.g., `"Hannes"`) corresponds with the first parameter (`first_name`). If you'd like to change the order you need to explicitly specify that as follows: 

In [52]:
def say_goodbye(greeting, person):
    print(greeting + " " + person + "!")
    
say_goodbye(person = "Mom", greeting = "Love you")
# say_goodbye("Mom", "Love you") assumes that greeting = "Mom", person = "Love you"

Love you Mom!


---
### 4.4 Return Keyword
**Importance**   
Rather than solely printing some text to the console, you often want to store the output of a function in another variable which is exactly what the `return` keyword does. It literally "returns" the output which can be a single variable, but also a list or dictionary. In the example below, we add up `a` and `b` and store the outcome in another variable `result`. 




In [53]:
def add(a,b):
    return a + b

result = add(3,4)
print(result)

7


**Let's try it out!**  
Use your new function to add up the numbers `5` and `-2`. Can you reuse the result (3) in another `add()` function? 


**Exercise 11**  
Write a function called `return_day()` that takes one parameter (a number from 1-7) and returns the day of the week (1 = Monday, 2 = Tuesday, etc.). If the number is less than 1 or greater than 7 the function should return `None`. 

In [54]:
# solution
def return_day(num):
    days = {1: "Monday", 2: "Tuesday", 3: "Wednesday", 4: "Thursday", 5: "Friday", 6: "Saturday", 7: "Sunday"}
    return days.get(num)

print(return_day(0))
print(return_day(1))
print(return_day(2))
print(return_day(8))

None
Monday
Tuesday
None


It's important to realize that once a program reaches a `return` statement, it exits the function immediately. The `convert_usd_eur()` function takes a list of `usd_amounts` and calculates the equivalent amount in euros. Note that the input includes 3 amounts, but the output has only a single value. Why is that? 

In [55]:
def convert_usd_eur(usd_amounts, currency_rate):
    eur_amounts = []
    for usd_amount in usd_amounts: 
        eur_amounts.append(usd_amount * currency_rate)
        return eur_amounts
        
convert_usd_eur([10, 100, 1000], .82)

[8.2]

**Let's try it out!**   
The `return` statement is indented twice so that it exits the function in the first loop ($10), rather than after the third iteration. Fix the code so that it works as expected. 

**Exercise 12**  
Write a function `add_list()` that adds up all numbers in `num_list` and returns the `total` value. Test your function with the following list `[3, -4, 1, 2, 0]`.

In [56]:
# solution
def add_list(num_list):
    total = 0
    for num in num_list: 
        total += num
    return total

add_list([3, -4, 1, 2, 0])        

2

### 4.4 Documentation
**Importance**   
For built-in functions (e.g., `print()`) you can directly access its documentation by moving your cursor to the function and pressing `Shift` + `Tab` at the same time (doing it twice collapses the menu). You can also add documentation to your own functions with a so-called docstring: put a triple quote (`"""`) at the start of your function and write a brief description. Then, once you press Shift + Tab in your newly created function (e.g., `add()`) the doc-string will appear in the pop-up. 

<img src="images/docstrings.gif" align="left" width=60%/> 

**Let's try it out!**  
Inspect the documentation of `range()` and `len()`, and add an appropriate docstring to the `convert_usd_eur()` function above. 

### 4.5 Wrap-up
Functions are procedures for executing code. They accept inputs and return outputs when the return keyword is used. You can write your own functions, use built-in functions, or - as you'll find out later - import packages to use functions written by others. 

---
## 5. Debugging & Error Handling

### 5.1 Types of Errors

**Importance**  
More likely than not, you have run into some error up to this point. Then you'll know that it breaks the entire application: all lines after the error will not be executed. To catch errors so that the program still runs, we use `try` and `except` blocks. But first, let's take a quick look at the most common errors: 

| Error | Meaning | Example | Explanation |
| :-- | :-- | :-- | :-- |
| `SyntaxError` | Incorrect syntax (e.g., a typo) | `print("hello"` | Missing the closing `)` |
| `NameError` | Variable is not assigned | `print(undeclared_variable)`| `undeclared_variable` has not been defined |
| `TypeError` | Mismatch of data types | `"My age is " + 23`| You cannot mix integers and strings |
| `IndexError` | Invalid index in a list | `some_list = []` <br> `some_list[2]` | Index 2 does not exist in an empty list | 
| `ValueError` | Right type of input but <br> an inappropriate value | `int("hello")` | It expects a number formatted as string (e.g., `"8"`).|
| `KeyError` | Dictionary does not have <br> a specific key | `some_dict["missing_key"]` | `missing_key` is not defined in `some_dict`.|

**Let's try it out!**  
For each of the code blocks below figure out the type of error (you can check yourself by running the code!)

### 5.2 Try and Except Blocks

**Importance**  
Let's have another look at the `convert_usd_eur()` function which we pass a list of both numeric and string data. As a result, the function breaks once it reaches `"18,82"` returns a `TypeError`. In other words, it did not convert `3049.49` dollars to euros even though it is in the right numeric format. Instead, it would be preferred to skip over the wrong value (`"18.82"`) and continue with the subsequent values. 


In [57]:
def convert_usd_eur(usd_amounts, currency_rate):
    eur_amounts = []
    for usd_amount in usd_amounts: 
        eur_amounts.append(usd_amount * currency_rate)
    return eur_amounts
        
convert_usd_eur([9.95, "18,82", 3049.49], .82)

TypeError: can't multiply sequence by non-int of type 'float'

We can overcome this issue by checking (`try`) whether the multiplication (`usd_amount * currency_rate`) is valid. If so, it will perform the operation like as we did previously. If not, it moves on to the `except` block in which we can describe how to handle the error (e.g., print a custom error message): 

In [58]:
def convert_usd_eur(usd_amounts, currency_rate):
    eur_amounts = []
    for usd_amount in usd_amounts: 
        try: 
            eur_amounts.append(usd_amount * currency_rate)
        except: 
            print(f"The value {usd_amount} is invalid")
    return eur_amounts
        
convert_usd_eur([9.95, "18,82", 3049.49], .82)

The value 18,82 is invalid


[8.158999999999999, 2500.5817999999995]

**Let's try it out!**  
Note that the function output only contains 2 values (the EUR amount of `$9.95` and `$3049.49`). What happens once you pass `convert_usd_eur()` a single value rather than a list? And how about a list of booleans? 

**Exercise 13**  
Revise the function `return_day()` (Exercise 11) such that it returns the error message `"Please enter a number between 1 and 7"` when invalid input has been inserted (rather than `None`). 

In [59]:
# solution
def return_day(num):
    days = {1: "Monday", 2: "Tuesday", 3: "Wednesday", 4: "Thursday", 5: "Friday", 6: "Saturday", 7: "Sunday"}
    try: 
        return days[num]
    except: 
        return "Please enter a number between 1 and 7"

print(return_day(0))
print(return_day(1))
print(return_day(2))
print(return_day(8))

Please enter a number between 1 and 7
Monday
Tuesday
Please enter a number between 1 and 7


### 5.3 Wrap-up
The Python functions you have seen thus far are relatively concise and therefore highly predictable. In the real world things are more complex and you cannot always foresee the input. `try` and `except` blocks are, therefore, essential to make sure your code keeps on running properly. 

---

## 6. Reading and Writing Text Files

### 6.1 Reading Text Files

**Importance**  
Similar to R, you can easily read a text file for further analysis. By default, Python looks for files in the same directory as where you stored this Jupyter Notebook. Therefore, make sure both files are in the same place. Here we `open` the `odcm.txt` file and give it a placeholder name `file` (you can give it any name you want!). Then, we read the file and assign the values to a variable `data`. 

In [60]:
with open("odcm.txt") as file: 
    odcm = file.read()

print(odcm)

Learn how to mine the web
Welcome to the course website of oDCM. This course teaches you the nuts and bolts about collecting data from the web. Unlike most other courses on this topic, this course not only teaches you the technicalities of using web scraping and Application Protocol Interfaces (APIs), but also introduces a comprehensive framework that helps you to think about scraping - specifically with regard to its application in academic marketing research.


**Let's try it out!**  
Add a couple of lines to the `odcm.txt` file, save it, and open the file in this notebook. Did the output change? 

### 6.2 Reading CSV Files

**Importance**   
Another common data format is Comma Separated Values (CSV) which is used to store tabular data (e.g., Excel spreadsheets). Although we can use the same procedure as above we run into an issue: the data appears as a long string which makes it difficult to access specific elements within the data:

In [61]:
with open("students.csv") as file: 
    odcm = file.read()

print(odcm)

student,study,prior_knowledge
Lotte,Marketing Analytics,TRUE
Joep,Research Master,TRUE
Mirte,Marketing Analytics,FALSE
Dirk,Economics,TRUE
Sanne,Marketing Analytics,TRUE
Roy,Research Master,FALSE


Fortunately, there is `reader` method within the Python package `csv`  that splits up the data into columns. We `import` the package so that we have access to additional functions and pass `reader` the `file` object. Finally, we transform it into a list of lists: each list is a row, and elements within the row are the columns separated by commas: 

In [62]:
from csv import reader
with open("students.csv") as file:
    csv_reader = reader(file)
    students = list(csv_reader)

print(students)

[['student', 'study', 'prior_knowledge'], ['Lotte', 'Marketing Analytics', 'TRUE'], ['Joep', 'Research Master', 'TRUE'], ['Mirte', 'Marketing Analytics', 'FALSE'], ['Dirk', 'Economics', 'TRUE'], ['Sanne', 'Marketing Analytics', 'TRUE'], ['Roy', 'Research Master', 'FALSE']]


**Let's try it out!**  
How do you access rows and columns in `students`? What is the data type of `TRUE` and `FALSE`? Why is that? 

**Exercise 14**  
Write a for-loop that stores all student names in a list `student_names` (skip the header = first row). 

In [63]:
student_names = []
for row in range(1, len(students)): 
    student_names.append(students[row][0])

print(student_names)

['Lotte', 'Joep', 'Mirte', 'Dirk', 'Sanne', 'Roy']


### 6.3 Writing to Text Files

**Importance**  
In a similar way, we can write data to `txt` and `csv` files. The `open` function takes different flags: by default it reads (`r`) data, but once you pass it a `w` flag it writes to a file. Each `file.write()` writes data to a new line (`\n` adds a new line; think of it as an ENTER). 

In [64]:
with open("two_lines.txt", "w") as file: 
    file.write("This is the first line. \n")
    file.write("And this the second one! \n")

**Let's try it out!**  
Run the cell twice and inspect the output. What do you see? Did it overwrite the data? How can we easily write dozens (or even thousands) of lines of text to a file without repeating ourselves? 

### 6.4 Writing to CSV Files
**Importance**  
Next to the `reader` method, the `csv` library also contains a `writer` method which obviously writes data to a file. We use a for-loop to go over all rows of the `students` list of lists we imported earlier and write it to a new file `students_write.csv`. 

In [65]:
from csv import writer
with open("students_write.csv", "w") as file:
    csv_writer = writer(file)
    for row in students: 
        csv_writer.writerow(row) 

**Let's try it out!**  
Run the cell another time. Can you open up the file `students_write.csv` in Excel? 

**Exercise 15**  
Write a function called copy which takes in file name and a new file name and copies the contents of the first file to the second file. You can assume the input is a csv-file. 

In [66]:
# solution
from csv import writer, reader 

def copy(file_name, new_file_name):
    with open(file_name) as file:
        csv_reader = reader(file)
        data = list(csv_reader)

    with open(new_file_name, "w") as file:
        csv_writer = writer(file)
        for row in data: 
            csv_writer.writerow(row) 
            
copy("students.csv", "test.csv")

### 6.5 Wrap-up

You made it! Excellent work! We hope this Python bootcamp gives you a head start and we look forward to teaching you all about the exciting world of web scraping and APIs. 