In [1]:
import random

# Lesson 1

In this lesson you'll get introduced to Python, a very powerful scripting language. It's the most prevalent language in data science, which is why you'll use it this semester. As with any programming/scripting language, Python has a few quirky and a few "yes, please!" features.

## Jupyter notebook

Please note that I'll be using Jupyter Notebooks to teach you. All of this code works perfectly fine if you export it to a Python script (xxx.py file). In fact, a Jupyter Notebook itself is one big JSON file with both code and output defined. You can do whatever you want with that JSON structure. There's a few plugins that allow you to separate code and output. There's even a plugin that allows you to write your scripts in a separate file and then run them as a code cell.

There is one big catch when using a Jupyter Notebook: all state is persistent. That means that variables defined in cell 1 will carry over to cell 150,000. It also means that if you overwrite a variable that is defined in cell 1 in cell 3, the rest of the notebook will run with the value of that variable from cell 3. Pretty much every "huh, what, how?" in Jupyter Notebook is caused by the persistent state. To solve this, you can restart the kernel and run the entire notebook again. You can also write better code. The latter option is preferred as the first one is just fighting symptoms of bad programming. This is especially true in a notebook, where it's very easy to write up a big garbled mess of interdepending code that is hard to debug.

I will show the persistent state within this notebook, so pay attention to the commentary.

## This lesson

When this lesson is over, you can write simple programs with variables, functions, control flow and loops.


## Let's begin!

In [2]:
print("Hello world!")

Hello world!


## Code cells

You can make a new code cell by pressing the "+" sign in the above left corner. Or press "shift + enter". From the dropdown menu to the right in the same row, you can select what you want the cell to render. Usually, you choose either "Code" for Python code or "Markdown" for commentary. Just like this cell!

To run a cell, either press "Run" (the play button) or "ctrl + enter" from within the cell you'd like to run. Under the "Kernel" option in the menu, there's options to run an entire notebook and more.

### When in doubt

There is also an option "Restart and clear all outputs", which essentially resets the whole notebook. It then runs all the code cells from top to bottom. This can be useful if you make some hard to debug mistakes, like the state overwriting we talked about earlier.

## Variables

Python supports variables just like most other languages*. 

**Assignment (10 min)**: find out what the community standard style for naming is in Python. You can use the internet, for example: https://pep8.org/. When you've figured out how to use variables, create a new code cell and use three variables to describe yourself. For example, age, name and favorite song. Then create another cell and print the contents of the variables on a new line.

*Name a language that does not support variables?

In [3]:
name = "Loek"
age = 26
favorite_song = "Love will tear us apart"

In [4]:
print(name)
print(age)
print(favorite_song)

Loek
26
Love will tear us apart


**Assignment (5 min)**: Create a new code cell and overwrite one of the variables you wrote earlier. Then run the *printing* cell above again. Explain what happens in a Markdown cell and/or chat.

In [5]:
favorite_song = "Welcome to the jungle"

## Arithmetic

Because you're going to have to. It is data ***science*** so...* Good thing: Python was developed by a mathematician, so this should be really intuitive.

*Do you find maths exceptionally hard? Then it's not such a bad idea to take some extra lessons, online MOOCs or anything that helps. Data science eventually boils down to numbers and you need to understand what you can and (probably more important) can't do with those numbers.

In [6]:
# Addition
print(10 + 8)

# Subtraction
print(5 - 40)

# Multiplication
print(5 * 4)

# Division
print(6 / 2)

# Modulo
print(100 % 3)

# Powers
print(10**2)

# Unnecessary but still fun commands
print("na" * 16 + " Batman!")

18
-35
20
3.0
1
100
nananananananananananananananana Batman!


**Assignment (5 min)**: Think up an arithmetic exercise for a 12-year old. Create a new code cell. Save the exercise in a variable, then print the exercise along with the result on a single line. Use the documentation if you can't figure out how to do this: https://docs.python.org/3/

In [7]:
answer = 5 * (2 % 3)
print("5 * (2 % 3) = " + str(answer))

som = 5 * 3 - 6 + 8 / 2 -4 * 3
text = "5 * 3 - 6 + 8 / 2 - 4 * 3"
print(text, "=", som)

5 * (2 % 3) = 10
5 * 3 - 6 + 8 / 2 - 4 * 3 = 1.0


## Quirky

Before we continue, let's look at some stuff that can make Python... frustrating.

**Assignment (5 min)**: Copy the code in the code cell below. Run it. Explain the error.

In [8]:
if (5 > 1) {
    print("Yay!")
} else {
    print("Eeeeh...")
}

SyntaxError: invalid syntax (<ipython-input-8-04879c05e432>, line 1)

## Syntaxis

Python does a few things differently from other languages. The first thing you will notice is the complete lack of curly brackets in statements and expressions. This is because Python uses blocks of 4 *whitespaces* and colons ( : ).

You really need to watch that whitespace. Ever put in 3 spaces instead of 4? Good luck debugging the completely incomprehensible error. Every not format your code decently? Good luck debugging the completely incomprehensible error.

Also, Python was made as readable as the creator thought it was. So most parentheses "()" you need to use in other languages, are not necessary. I'll still use them however, so that the syntax resembles most other languages.

In [9]:
if (5 > 1):
    print("Yay!")
else:
    print("Eeeeh...")
    
# You could also write the following. Note the missing parentheses. 
# Both ways are fine. Pick one and stick with it.
if 5 > 1:
    print("Yay!")
else:
    print("Eeeeh...")

Yay!
Yay!


**Assignment (10 min)**: Generate a random number (again, documentation!) between 1 and 100. If the number is below 50, print "Lower". If it's between 50 and 75, print "Better" and if it's higher than 75, print "Bestest!".

In [10]:
n = random.randint(1,100)
print(n)

if n < 50 :
    print("lower")
elif n > 50 < 75 :
    print("better")
else:
    print ("bestest!")

96
better


## Best practices

You probably just imported some module `random`. This is a library, just as you could use one in JavaScript, TypeScript, Php and Java. It is good practice to put all necessary imports in the topmost cell of a notebook.

**Assignment (5 min)**: Create a cell at the top of the document. Place the import statement there. Now restart (and run) the kernel via the "Kernel" options in the menu bar at the top of the screen.

## Comments

Python support single line and multiline comments.

In [None]:
# This is a single line comment. Use the "pound" sign, or "hashtag"

"""
This is a multiline comment. Note the very awkward notation with
three quotation marks. What's even weirder, is that those comments
go *in* a function, instead of *above* when you use them as function
documentation.
"""

def what_the_comment():
    """
    So this is a valid function doc. Weird right?
    """
    return "lol k"

## Loops

Python understands for loops and while loops just like any other language. The for-loop in Python is usually used as a foreach-loop.

Sidenote: the range(start, end) function returns a list of numbers from `start` to `end - 1`.

In [11]:
for i in range(0, 10):
    print(i)

0
1
2
3
4
5
6
7
8
9


**Assignment (5 min)**: Rewrite the for-loop as a while-loop.

In [12]:
i = 0
while i < 10:
    print(i)
    i += 1

0
1
2
3
4
5
6
7
8
9


## Data types

Python is a dynamically, but strongly typed language. It will throw type errors only at runtime, but it isn't like JavaScript where everything is converted to pretty much anything. You can, however, cast types explicitely.

In [13]:
five = 5
pi = 3.1415

print(type(five))

print(type(pi))

print(int(pi))
print(type(int(pi)))

<class 'int'>
<class 'float'>
3
<class 'int'>


**Assignment (10 min)**: Find all primitive data types in Python. Describe them in a markdown cell and/or chat.

- Integer = geheel getal (5, -8, 45000)
- Float = kommagetal (6.2, 9235.6, 3.29371057)
- String = goh
- Boolean = True of False
- Complex = mag je weer vergeten: i = sqrt(-1)

## Functions

Python has functions just as any other decent programming language. The only difference is how you define them.

In [14]:
def my_cool_function():
    print("This is from inside myCoolFunction!")

We're in a notebook and state is global for the whole notebook. Once we run the cell above, we can use the function throughout the notebook!

In [15]:
my_cool_function()

This is from inside myCoolFunction!


**Assignment (5 min)**: Create a code cell somewhere above the function definition and call the function. Does it work? Now, restart and run the cell again. Does it still work? Why does it, or why doesn't it?

## Parameters in functions

Because global functions are dangerous.

In [16]:
def print_full_name(first_name, last_name):
    print(first_name, last_name)

In [17]:
print_full_name("Loek", "van der Linde")

Loek van der Linde


**Assignment (20 minutes, including 10 minute break)**: Write a function with a parameter `getal`. The function should count down to 0. You can not assume that `getal` will always be higher than zero!

Test 1: `your_function_name(5)` should print:

```
5
4
3
2
1
```

Test 2: `your_function_name(-2)` should print:

```
"Afraid I can't do that, Dave"
```

In [18]:
def print_numbers(number):
    if (number <= 0):
        print("Afraid I can't do that, Dave")
    else:
        while number > 0:
            print(number)
            number = number - 1
            
print_numbers(5)

5
4
3
2
1


## Best thing evah

Python supports named parameters. This is heaven. You will use it a lot and there's good reason for it.

**Assignment (1 min)**: What is the output of the following code?

In [19]:
print_full_name(last_name = "van der Linde", first_name = "Loek")

Loek van der Linde


## Return of the function

Python supports returns, like any language should

In [1]:
def my():
    return "Precioooouuuuussssss"

## One function to rule them all

> Which way to Mordor, Gandalf? Left or Right? ~ Frodo, 25 December 3018, TA, Rivendell

In [21]:
help(max) # ctrl + i works as well in JupyterLab

Help on built-in function max in module builtins:

max(...)
    max(iterable, *[, default=obj, key=func]) -> value
    max(arg1, arg2, *args, *[, key=func]) -> value
    
    With a single iterable argument, return its biggest item. The
    default keyword-only argument specifies an object to return if
    the provided iterable is empty.
    With two or more arguments, return the largest argument.



**Assignment (15 min)**: Use only the `help` function as above. Using the documentation for the `max` function, write two functions. The first function should accept two numbers and return the highest. The second function should accept an *iterable* of numbers and return the highest number as well.

Tip: don't stress. The documentation can look vague at first. Decipher what the different sections mean. Then translate the word "iterable" and you'll probably understand what it means.

In [22]:
def highest(a, b):
    return max(a, b)

print(highest(12, 10))

def highest_iterable(iterable):
    return max(iterable)

print(highest_iterable([1, 6, 3, 4, 5]))

12
6


## Lists, Dictionaries, Tuples and all that stuff

Python is a bit stricter than most scripting languages when it comes to composite data types. Especially when compared to Php, where you can mostly do anything you want with any data type. JavaScript is a decent comparison, so pay attention.

### Lists

Most people call these arrays, but an array is actually something else*. When we talk about lists, we mean an indexed list of values. Think of JavaScript. You can mash together all kinds of datatypes in a list, it is up to you to decide how useful that is. Protip: don't.

*https://www.geeksforgeeks.org/difference-between-list-and-array-in-python/

In [2]:
numbers = [1, 2, 3, 4, 5]
lecturers = ["Gert", "Mischa", "Greg", "Loek"]
the_horror = [1, ["haha", "lol", 42], "screw_this", 0x0043, [["nested"], 12345], 8.22]

**Assignment (1 min)**: Create a list with at least 4 hobbies, passions, things you like to do. Put the list in a variable and print it.

### Dictionary

Strictly seen, a dictionary is a hash-table (look it up if you want). You can compare it really well with a JSON object. You even write them the same way! 

Note that since a dictionary is a hashtable, it is incredibly quick for calculations. Whereas in Php you would use arrays for everything, in Python you should think well if you need a list or a dictionary. Dictionaries are about 1000 times faster than lists.

**Rule of thumb**: if order is important and index matters --> list. Any other case --> dictionary.

**Assignment (10 min)**: Revisit the exercise where you needed to describe yourself in variables. Redo the exercise in a new code, but now use a dictionary. Save the dictionary in a variable. Use the hobbies variable from above in your dictionary. Finally, print at least 2 properties. Again, feel free to use the docs.

In [24]:
me = {
    "name": "Loek",
    "age": 26,
    "favorite_song": "Love will tear us apart",
    "hobbies": ["Muziek!", "Les geven", "Gamen"]
}

print(me["name"])
print(me["hobbies"][1])

Loek
Les geven


### Tuples

This is probably new for all of you. The official definition for a tuple is:

> a tuple is a finite ordered list (sequence) of elements.

Informally, a tuple is a list of elements that "belong" to each other. The most obvious example is a row in a database table. You can't change one of the elements without changing the "meaning" of the rest of the elements.

Let's say we have a database record for the lecturer, with columns `name`, `age`, `course`. Then the tuple would be `("Loek", 26, "Data Science Introduction 1")`. Note the use of normal parentheses `()` instead of square brackets `[]` for a list. Now, if we change of the values, I would not be the lecturer anymore. This is what we mean with "belonging" to each other.

**Assignment (1 min)**: Think of a tuple you are definitely going to use in the project?

**Assignment (10 min)**: Create a list of at least 5 tuples. The tuples should contain at least three values. For example: a car with `color`, `speed` and `name`. Then, use a loop to print all the names (or other property) of your tuples.

Keep in mind: tuples should always have the same order, else you will encounter some proper weird stuff...

In [25]:
cars = [
    ("black", 170, "transporter"),
    ("black", 240, "a4"),
    ("blue", 220, "impreza"),
    ("yellow", 120, "ka"),
    ("green", 130, "astra")
]

for car in cars:
    print(car[2])

transporter
a4
impreza
ka
astra


## Putting it all together

**Assignment (+/- 30 min)**: This can be homework if the lecture runs to the time limit.

Create a function `get_best_student`. It takes a list as parameter. The list contains at least 3 dictionaries. The dictionaries contain 3 properties: `name`, `age`, `highest_grades`. The last property is again a list that contains three grades.

The function returns a tuple with the `name`, `age` and `highest_grade` (single grade, not multiple!) of the student that has the highest grade overall. If multiple students have the same highest grade, the function can just return the first student.

So, it should work like this:

In [35]:
def get_best_student(students):
    # Implementation here
    highest_grade = 0
    best_student = {}
    
    for student in students:
#         print(max(student["highest_grades"]))
        student_highest_grade = max(student["highest_grades"])
        
        if student_highest_grade > highest_grade:
            highest_grade = student_highest_grade
            best_student = student
    
#     print(highest_grade)
    
    return (best_student["name"], best_student["age"], highest_grade)

In [36]:
students = [
    {
        "name": "John",
        "age": 30,
        "highest_grades":[5, 5, 8]
    },
    {
        "name": "Paul",
        "age": 29,
        "highest_grades": [7, 10, 4]
    },
    {
        "name": "George",
        "age": 35,
        "highest_grades": [9, 9, 6]
    }
]

print(get_best_student(students))
# Output: ("Paul", 29, 10)

('Paul', 29, 10)
