# Machine Learning in Medicine

## Chapter 0: "Introduction to Python"

Welcome to the "Machine Learning in Medicine" seminar, we're glad you've made it here!

Over the course of the following chapter, we will give you a quick introduction to Python, and some general tools you will probably find useful to know about. This tutorial is designed to be quick to complete. If you feel uncertain about some aspects of Python (maybe you're new to programming or find Python weird), that's no problem. There is a plethora of online tutorials for Python which are quite good.

Now for some general rules.

1) As people have different levels of prior knowledge about programming, some of you may find the following tasks to be extremely easy. If that is the case for you, you can skip any and all of the tasks in this notebook that you don't feel you need and immediately jump to "Chapter 1: An introduction to PyTorch".
2) *No question is stupid*. If you don't know how something works, step 1 is to try and formulate your problem as a google search, and see if someone else asked the same question that you have. Usually they have. If you still cannot find an answer, or the problem is difficult to google search, feel free to talk to any of the PhDs supervising the course. That's what we're here for. Similarly, if you have a question about something in these prepared notebooks, ask it. We are not perfect and neither are the tutorials, so chances are that some things warrant additional explanation by us, or are even a little wrong.
3) Experiment. Try to take apart the code and put it back together. The more you actually do instead of only read, the more will the solution stick in your memory. Counterintuitively, this is particularly true if you *don't* understand what the code you just executed actually does. One of the best ways to learn is to take apart someone elses code and reassemble it, or think of a small, private project that you want to do and try to realize it in Python. Learning by doing.
4) If you ever find yourself thinking "This should be a really common use case", it probably is. Try to express what you want in terms of Python lingo and you will almost certainly find what you are looking for on Google and Stackoverflow. Most of the time, it is even already a single function or class in a package you already have. Finding out about features in this way is basically the norm, rather than the exception.

Lastly, don't feel too stressed about grading or presenting your solutions. The things we will put together are complicated and take time to assemble and test. If your solution to a problem mostly works and delivers some result, but there is also a couple errors, we count that as a success.

### Chapter 0.1: "Variables and Built-ins"

Python is a high-level language, built on C. This means that in Python it is, usually, comparatively easy to get started and code *something* which works. The downside is that performance occasionally suffers a little bit. Most machine learning nowadays is done in Python, but the underlying code (that runs "under the hood" so to speak) is implemented in faster languages (C, C++, etc.). Thankfully, some very talented programmers have already done the hard part of creating extremely fast basic math operations running on the GPUs, and we only need to write the comparatively easy Python code. Eventually, the Python code actually makes use of their code, which means that we get the best of both worlds - easy code and fast/efficient implementation.

Let's check out some basics.

You can execute a block of code by selecting it with the mouse and pressing Shift + Enter, or by pressing the little play button on the top left corner of the block.

In [None]:
# In Python, when we declare a variable, all we need to do is give it a name, and tell it what it is.
# Instead of explicitly requiring us to tell it what we have given it, Python infers the so-called type
# of the variable on its own.

# Python immediately knows that a is an integer, that b is a floating point number (in short: float),
# and that the my_pet variable is a string.

a = 1
b = 2.5
my_pet = "dog"

# We can look at almost anything (not images) in Python using the print() function.
# Here, we use it to tell us what the value of the variable is, and what the type of the variable is.

print(a)
print(type(a))

# A good thing to keep in mind is that both you and other people will read your code. This means that
# you want to keep it tidy:
# 1) If there is code that doesn't have anything to do with other code, maybe leave a blank line.
# 2) If you want to tell someone what you're doing, maybe write a comment (start the line with a '#') to
# explain what you are doing.
# 3) And, most importantly, if you create a variable, name it appropriately. For example, if you want to
# have a variable that contains your dog's name, Sparky:

x = "Sparky"        # Not a good name, because you won't remember what x stands for in a few days.
pet_name = "Sparky" # A very good name, because it is short, but tells you what it represents.
the_name_of_my_pet_whomst_i_love = "Sparky" # Not a good name, because it is far too long.

In [None]:
# The print() function is a so-called in-built function. This means that no matter which other things
# you may be working with, these functions are always available to use. There is a number of other
# in-builts which you will find extremely useful. For example, most math is done using a single sign:

c = a + b   # addition
d = b - a   # subtraction
e = a * b   # multiplication
f = a / b   # division
g = a ** b  # power
h = b // a  # floor division
i = b % a   # modulo
j = (a+1)*b # parentheses

# Python also allows easy comparisons. A '==' checks, whether the left and right side evaluate to the
# same value. A '!=' checks whether the left and right side evaluate to different values. 
# You can also use '>', '<', '>=' and '<=' for comparisons.
# When you print the result, or save it to a new variable, it will be either 'True' or 'False'.

print(a == b)
print(a != b)
k = (a == b)
print(k)

In [None]:
# Because Python is your friend, it also allows you to make comparisons using 'is'. It is useful for
# things other than math, too! You can compare pretty much anything.

print(a is b)
print("Hello" is "hello")

In [None]:
# Another common built-in function is 'len'. 'len' gives you the length (surprise) of anything
# you use it on:

print(len("Hello"))

# The string "Hello" has 5 characters, hence its length is 5.

# Python has a lot of these and we discuss those functions as we need them.

### Chapter 0.2: "Functions and conditions"

Next, we will look at functions. A function is a piece of code that you know you will use often, but which you do not want
to constantly rewrite and copy. The only thing you need to know about functions for now, is that they take *some* input and produce
*some* output.

We define functions with the 'def' keyword. Typically, a function looks like this:
We tell the function to expect some inputs and to return some output.
A function can use anything you want as input or output, even other functions! They
are extremely useful.

In [None]:
# Let's make our own multiplication function:

a = 2
b = 3

def my_multiplication(input_1, input_2):
    print(f"input_1 = {input_1}")
    print(f"input_2 = {input_2}")
    output = input_1 * input_2
    return output

# We can confirm that this works by *calling* the function. We tell the function to use 'a'
# as input_1 and 'b' as input_2. The function returns our result, which we save in the variable 'c'.

c = my_multiplication(input_1 = a, input_2 = b)
print(c)

In [None]:
# You can omit the names of the inputs that the function wants. If you do this, it assumes that
# you are providing the inputs in the correct order.

d = my_multiplication(a, b)
print(d)
e = my_multiplication(b, a)
print(e)

In [None]:
# Note that all functions require you to indent their code, so Python knows what is part of your
# function and what isn't.

# This function will work just fine:
def f1(a, b):
    c = a * b
    return c

# Even defining this function will cause an error, because the indentation is missing.
def f2(a, b):
c = a * b
return c

In [None]:
# What happens if we give our function some input that will not work? Let's try.

my_name = "Freddy"
my_pets_name = "Sparky"

f = my_multiplication(my_name, my_pets_name)

In [None]:
# Uh oh. Our function just broke, because you cant multiply words.
# Of course, we knew this would happen, and that multiplication only works with numbers.
# Python is a so-called dynamically typed language. What this means is that you can give any function
# any input you like, and Python will go along with it and if it breaks, it breaks. Sometimes this is
# very nice. But just now it broke our function. Therefore, let's introduce some typing.

def my_multiplication_2(input_1: int, input_2: int) -> int:
    output = input_1 * input_2
    return output

# Now, if I type 'my_multiplication_2(', Python will show me, what it wants, and what I get back.
# Try it!



In [None]:
# Sometimes, you do not want to write every input for your function, but just assume that something is
# the case. You can give default values for inputs like so:

def pet_a_dog(dog_name: str, pet_twice: bool = True):
    print(f"Hello {dog_name}, who's a good boy?")
    if pet_twice is True:
        print("I am petting the dog twice.")
    else:
        print("I am petting the dog.")
    return None

# Let's break down what happened here. First, we started defining a function. We tell the function that
# it should expect an input 'dog_name', which is a string, and an input 'pet_twice', which is a boolean
# (a True or a False). By telling the function pet_twice = True in the definition, we are telling the
# function that pet_twice should be True by default. If a user overwrites the value, the user's value
# takes precedence:

# This function will pet the dog twice.
pet_a_dog(dog_name = "Sparky")
# This function will pet the dog only once.
pet_a_dog(dog_name = "Sparky", pet_twice = False)
# We can even omit the input names again, if we want. As long as everything is in the right order, the
# function still works.
pet_a_dog("Sparky", False)

In [None]:
# We have also seen a very useful new concept in our pet_a_dog function - conditions.
# We can make our code do different things, depending on any condition using the 'if', 'elif'
# and 'else' keywords, followed by a colon. Whatever code we want to execute given that
# condition must also be indented again.

apple_price = 0.50

if apple_price < 0.70:
    print("Oh wow, that's a bargain! I will buy some apples.")
else:
    print("Wow, these apples are expensive!")

In [None]:
# We can also combine several conditions into one, by using the 'and', 'or' and 'not' keywords.

apple_price = 0.50

if apple_price < 0.40:
    print("Oh wow, that's a bargain! I will buy some apples.")
elif apple_price >= 0.40 and apple_price < 0.70:
    print("Hmm, that is not cheap, but I will buy a few apples.")
else:
    print("Wow, these apples are expensive! No, thanks.")

In [None]:
# Try writing a function or two of your own and see if they work as you want them to!








### Chapter 0.3: "Import statements"

Often, code that we want to write already exists. In other cases, maybe the code we want to write is just too hard for
us to write ourselves. In such cases, we 'import' code. Typically, this is done once, at the start of any program you
write. As a general rule, it is good to import only the things you need. However, importing things you don't need is
fine, too.

In [None]:
# If we want to import code from a library called 'numpy', we do it like this:
import numpy
import math

# We can even give it a new name, if we are particularly lazy. Now, instead of
# calling a function by the name 'numpy.array', we can simply write 'np.array'.
import numpy as np

# If we only want something very specific, we can import one or several methods
# like so:
from numpy import array, random

Many of these libraries come pre-installed with Python, but are not always loaded, such as the os or sys libraries. Others must be installed first and can then be used.

We can even make our own code, and import it, given some conditions:
1) The file must be a regular .py file. While it is possible to import from Jupyter Notebooks (the thing we are working in right now), it is not recommended.
2) The computer must know where to look for the files. You can extend the places that vscode will search by providing a custom environment, or even at runtime. Below, we show the latter method.

In [None]:
import sys
# Look for importable stuff in the current directory, ...
sys.path.append("./") 
# ... look in directory above, ...
sys.path.append("../") 
# ... and check the course materials provided by us.
sys.path.append("/datashare/MLCourse/Course_Materials")

# Try to make a small file, for example 'my_library.py' by clicking the New File
# button in the top left and renaming the file. Define a function of your choosing
# in the file, and then import and test it here.




### Chapter 0.4: "Data types"

Python has *a lot* different data structures, many of which we will learn about later.
However, some are particularly useful. Even outside of the context of machine learning, it is good to know these.

#### Chapter 0.4.1 "Lists"

In [None]:
# The most common data type anywhere is probably the List. An explanation of what a List is works best when showing it.
# This is a List:

my_first_list = [2, 4, 6, 8]
print(my_first_list)

# You can have anything you want inside of a List - integer numbers, floating point numbers, strings, booleans, even
# things like functions and other lists! This is a completely valid list:

my_second_list = [2, "4", "six", 8.0, my_first_list]
print(my_second_list)

In [None]:
# You can look at elements of your list by looking at the list at a specific index.
# Note that when we look at anything like a list in Python, we start our indexing at 0, not 1.

# This will print the first element of our list, the integer 2:
print(my_first_list[0])
# This will print the second element of our list, the integer 4:
print(my_first_list[1])

In [None]:
# We can even start counting from the end and go towards the start.
# This will print the last element of our list, the integer 8:
print(my_first_list[-1])
# This will print the second-to-last element of our list, the integer 6:
print(my_first_list[-2])

In [None]:
# We can also look at parts of our list. Lets say that I want the first 2 values of my list.
# This will give me a smaller list, containing the first two elements of my list:
my_shorter_list = my_first_list[0:2]
print(my_shorter_list)
# The notation with the colon means "start at index 0, stop at index 2". Note that the stopping
# point is NOT part of the new list.

In [None]:
# Lists can also do other neat things.
my_first_list = [2, 4, 6, 8]
my_second_list = [2, "4", "six", 8.0]

# They know how long they are:
print(len(my_first_list))

In [None]:
# They know whether something is in them or not:
if "six" in my_second_list:
    print("Yup, 'six' is in my list.")

In [None]:
# You can add new values to a list by 'appending' them:
my_first_list.append(10)
print(my_first_list)

In [None]:
# You can overwrite old elements with new ones:
my_first_list[0] = 3
print(my_first_list)

In [None]:
# You can even stick together two lists, by 'extending' a list:
my_first_list.extend(my_second_list)
print(my_first_list)

#### Chapter 0.4.2: "Dicts"

In [None]:
# Another very common data type is the dict, or dictionary. It is somewhat similar to a list,
# but not quite the same.
# What makes dictionaries special is the way its contents are accessed. A dictionary has so-called
# keys and values. To every key belongs a specific value. Keys are always unique, but values don't
# have to be. 

# A typical dictionary looks like this:
shopping_list = {"apples": 6, "bananas": 4, "strawberries": 20}
print(shopping_list)

# We can look at its contents with almost the same indexing we used for lists. Instead of looking
# for index 0 or index 1 or something like that, we give the dictionary a key and ask it what the
# value for this key is:
num_apples = shopping_list["apples"]
print(num_apples)

# The keys and values in a dictionary can be almost anything you want. However, it is much easier
# and much less dangerous, if you use only strings or integers as keys. For values, you can do
# whatever you want.

In [None]:
# If you want to add something to an existing dicitonary, you can do so by providing the key and value.
# Changing an existing value in a dictionary works the same way:
shopping_list["apples"] = 5
shopping_list["pretzels"] = 3
print(shopping_list)

In [None]:
# If you want to extend your dictionary, you can do so using .update:
sweets = {"bonbons": 30, "chocolate_bars": 2}
shopping_list.update(sweets)
print(shopping_list)

In [None]:
# A dictionary knows its own length:
print(len(shopping_list))

In [None]:
# It also knows whether something is in it or not:
if "apples" in list(shopping_list.keys()):
    print("Apples are on my shopping list.")

In [None]:
# You can even delete parts of your dictionary:
del(shopping_list["bananas"])
print(shopping_list)

#### Chapter 0.4.3: "Tuples"

In [None]:
# Tuples are a bit like lists. However, they have one key difference: Once you make a tuple,
# you can't change it anymore. They are 'immutable'.
# This may sound strange and plainly less useful than lists, but it can sometimes be faster
# to use tuples over lists. There is also another feature you will find out about in a bit.

# You can declare a tuple like this:
my_tuple = (1, 2)

# Just like lists and dicts, you can put basically anything into a tuple:
another_tuple = ("one", 2.0, 3, "4")

In [None]:
# You can look at parts of your tuple just like with lists:
print(another_tuple[0])
print(another_tuple[0:3])

In [None]:
# You can check if something is in the tuple:
if "4" in another_tuple:
    print("Yup, '4' is in my tuple.")

In [None]:
# Now, what are tuples actually good for? 
# Python comes with a neat feature called tuple unpacking. You can see this feature in action below:

my_tuple = (1, 2, 3)
a, b, c = my_tuple
print(a)

# Python just saw that I wanted to declare three variables. On the other side of the '=' was a tuple
# with three entries, and Python assigned the first entry to my first variable, second to second, etc.

In [None]:
# Why is this useful?
# Let's say I have a function that calculates several things for me. I want to get back all the
# different outputs, and I will be using them for separate things. Tuple unpacking will take the
# output from the function and try to 'unpack' it into however many variables I suggest:

def my_function(a: int, b: int):
    c = a + b
    d = a - b
    return c, d

e = 3
f = 2

# Here I suggest g and h as variables, and tuple unpacking will expect my function to give back
# a tuple of length two.
g, h = my_function(a = e, b = f)
print(g)
print(h)

#### Chapter 0.4.4: "Numpy Arrays"

In [None]:
# Numpy arrays are also a little bit like lists. However, there are some differences.
# Let's make one. We create an array by calling the numpy.array function. The only
# argument this function absolutely needs is a list.

my_array = np.array([2, 4, 6, 8, 10])
print(my_array)

# We can also tell numpy to make us an array, for exampling by telling it randomly
# roll us some integers

my_random_array = np.random.randint(low = 0, high = 5, size = 5, dtype = np.int8)
print(my_random_array)

# Notice the 'dtype'? Every numpy array has a dtype, and every element inside our
# array is definitely of that type. In this case, all elements in our array are 8-bit
# integers.

In [None]:
# We can do some of the things with a numpy array that we could do with lists.
# In fact, it makes sense to think of numpy arrays as lists, which are only used
# for maths - in essence, numpy arrays are matrices.

# They do most of the stuff that lists do, such as ...
# ... accessing elements:
print(my_array[0])

In [None]:
# ... slicing:
print(my_array[0:3])

In [None]:
# ... length:
print(len(my_array))

In [None]:
# They also have a multitude of other features, which we will look at in the future.
# One of them is particularly important however. Numpy arrays can have an arbitrary
# number of dimensions. For example:
my_matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
my_other_matrix = np.zeros(shape = (3, 3))
print(my_matrix)
print(my_other_matrix)

# You can check the shape of a numpy array by checking its shape attribute like so:
print(my_matrix.shape)

In [None]:
# Just like real matrices in maths, numpy arrays come with a ton of useful features,
# such as transposing, summing, matrix multiplication, etc. Many of these will also
# be available in PyTorch, which we will ultimately use for our machine learning.

# Here are some examples.
my_matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

# Transposition
print(my_matrix.T)

In [None]:
# Reshaping
print(np.reshape(my_matrix, (2, 6)))

In [None]:
# Flattening into a vector
print(my_matrix.flatten())

In [None]:
# Operations along a dimension
print(np.max(my_matrix, axis = 0))
print(np.max(my_matrix, axis = 1))

# It is impossible to know all of them by heart. What you need to know is how to
# express what you want to do in as concrete terms as possible. If you can do that,
# you will find any function you need on google or in the numpy documentation.

In [None]:
# Try to create some lists, dictionaries, tuples and numpy arrays of your own.
# If you know a math operation that you want to see, maybe a dot product, try
# to find the function in numpy and see if it works the way you think it should!







### Chapter 0.5: "Loops and Iterables"

Now that we've got some of the building blocks and know how to write code, let's do some problem solving.

When you buy groceries, you put your perishable goods into the fridge (or so I hope). But, you don't close
your eyes, take an item out of your bag and put it into the fridge and do this exactly three times or
exactly 5 times. You always take the next thing in the bag and put it into the fridge, until the bag is
empty, and you don't think about how many things you actually just put into the fridge exactly.

The exact same pattern exists in programming and is one of the most common patterns in existence. This
behaviour is called a 'loop'. Different languages handle loops slightly differently, but in broad strokes,
they all do the same thing. Let's build some.

In [None]:
# The most common loop in python is the 'for' loop. It consists of two parts. An iterable
# and the loop segment. An iterable, in layman's terms, is something you can iterate over.
# For example, a list like [1, 2, 3] is an iterable. The first element is a 1, the second
# is a 2, and so on. An iterable is capable of giving you (at the very least) the next
# item, and telling you when it has no more items.
# The loop segment is code that is executed for each iteration. The only rules are that you
# cannot change your iterable during the iteration, and that your code must be indented.

# Let's look at a practical example:
shopping_list = ["Apples", "Bananas", "Strawberries"]
for item in shopping_list:
    print(item)

# Our loop iterates over the iterable, the shopping list.
# On every iteration, it takes the next item on the list and puts it into the variable 'item'.
# In our loop segment, we simply print the 'item' variable.

In [None]:
# You can (and generally will) also access any other variables in your loop:
numbers = [1, 4, 9, 16]
sum = 0
for number in numbers:
    sum = sum + number
print(sum)

In [None]:
# Behind lists, one of the most common iterables is probably the range object:
for n in range(10):
    print(n)

In [None]:
# An alternative to having either the content of an iterable or a number, is using 'enumerate',
# which gives you both of them:
for index, item in enumerate(numbers):
    print(index, item)

In [None]:
# Sometimes, you have two iterables and want to do something with both of them.
# For example, maybe you have two lists of numbers, and want to pair them up to multiply them,
# like in a dot product. For such use cases, 'zip' exists.
vector_a = [1, 2, 3]
vector_b = [3, 0, 2]
sum = 0
for ai, bi in zip(vector_a, vector_b):
    sum = sum + ai * bi
print(sum)

In [None]:
# Loops also have some functionality to let you skip an iteration or stop the loop altogether.
# The following function determines the smallest integers divisor of any integer in a very
# naive fashion. We loop over all potential divisors from 0 to our dividend.

# If the divisor is 0 or 1, we want to skip the current iteration. We do so with the keyword
# 'continue'. The next iteration starts at the start of the for-loop's indentation

# If the dividend modulo the divisor is 0, we have found our smallest divisor, and we want to
# stop iterating. We can stop the iteration using the keyword 'break'. In this case, the code
# skips ahead to where the indentation of the for-loop ends (see below).

def find_smallest_divisor(dividend: int):
    for divisor in range(dividend):
        # continue jumps here
        if divisor == 0 or divisor == 1:
            continue
        if dividend % divisor == 0:
            break
        if divisor== dividend-1:
            print(f"Looks like {dividend} is a prime!")
            return dividend
    # break jumps here
    return divisor

print(find_smallest_divisor(9))
print(find_smallest_divisor(17))

In [None]:
# One final thing that makes iterables cool: Progress bars.
# Who doesn't like a good progress bar steadily filling up?
# tqdm is a package that lets you display progress when
# iterating over an iterable wrapped in tqdm.

from tqdm.auto import tqdm
import time

my_numbers = [1, 2, 3, 4, 5]
my_squares = []
for number in tqdm(my_numbers):
    my_squares.append(number ** 2)

In [None]:
# Try to make yourself a small toy problem and solve it using a for loop.

# If your first thought was "That is not a sensible prime number detector
# up there", maybe that could be your task :)






### Chapter 0.6: "List Comprehensions"

Python contains a nifty little feature called List Comprehension. This allows you to write a piece of code that
would otherwise be a little unwieldy in a single short expression.

In [None]:
# The following two segments of code are equivalent and both create a list containing
# the squares of the digits from 0 to 9:

# Method 1
squared_digits = []
for x in range(10):
    y = x ** 2
    squared_digits.append(y)
print(squared_digits)

# Method 2
squared_digits = [x ** 2 for x in range(10)]
print(squared_digits)

# We have essentially just executed an entire loop inside of the square brackets of
# our list, at the moment of its creation.

In [None]:
# This tool also works on other data types, such as tuples or dictionaries:
squared_digits = {x: x**2 for x in range(10)}
print(squared_digits)

In [None]:
# It can even handle more complex instructions, such as conditionals.
# Let's make a list that contains only digits that are even.
only_even_digits = [x for x in range(10) if x % 2 == 0]
print(only_even_digits)

In [None]:
# Let's make one that containts even and odd digits, but even digits are squared.
square_even_digits = [x ** 2 if x % 2 == 0 else x for x in range(10)]
print(square_even_digits)

# List comprehensions are an extremely powerful tool which is worth practicing a little.

In [None]:
# Try to make a list containing the prime numbers up to 100 by using a list comprehension.
# If you want to challenge yourself, try to do so without using any functions to check
# whether a given number is prime. You can find a solution below.






In [None]:
# Solution with functions:
def is_prime(dividend: int):
    # If the dividend is smaller than 2, it can't be prime
    if dividend < 2:
        return False
    # If the dividend is two, it is a prime.
    elif dividend == 2:
        return True
    else:
        # Try all divisors starting at 2 and up to the square root of the dividend
        for divisor in range(2, max(3, int(np.sqrt(dividend)+1))):
            if dividend % divisor == 0:
                return False
            else:
                pass
                # 'pass' means 'do nothing' - instead of writing
                # else: pass, you can also simply write nothing
        # If we have gotten past the loop without finding a divisor
        # (and thus returning it), the number is a prime.
        return True

primesC = [x for x in range(1, 101) if is_prime(x) is True]
print(primesC)

# Solution without non-native functions:
primesC = [x for x in range(2, 101) if len([y for y in range(2, max(3, int(math.sqrt(x)+1))) if (x % y == 0 and x != 2)]) == 0]
print(primesC)

# As a final note, we should mention that even though the second solution is maybe a little more clever
# and shorter, the first solution is actually the better one by far - you always want your code to be readable!

### Chapter 0.7: "OS and files"

When working with any sort of data, we will not only have to work with lists or arrays we made ourselves. Naturally, we will use pre-existing images or other data. Therefore, we need at least some passing familiarity with how our file system works, how we can open text files or images so that we can work with them.

The most common function used when working with files is open(). In order to open a file, with open() or with any other function, you need to know where the file is. You can refer to files in two ways: relative paths and absolute paths.

A relative path looks like this:
```relative_path = "./example.txt"```.
The ```.``` means "start at the current folder", the one that this notebook lives in.

An absolute path looks like this:
```absolute_path = "/Projects/ML_Course/Course_Materials/Chapter_0/example.txt```.
The ```/``` at the start means that you start at the bottom of the file system, also
called "root", and then go from there.

Let's play around with some files.

In [None]:
# This is how you open a file in read mode:
out = open("./example.txt", "r")

# This is how you read all lines from the text:
lines = out.readlines()

# Lines is a list, and each element of the list is one line of text:
for line in lines:
    print(line)

# And this is how you close it:
out.close()

In [None]:
# There is also a cleaner way of opening files, the 'with' statement:

# Whatever you encapsulate in a 'with' is created for this context.
with open("./example.txt", "r") as out:
    # I can use 'out' in here:
    lines = out.readlines()

# I would not be able to use 'out' outside of this context.
# The context manager has closed 'out' for me again.
# It has also thrown it away.

# However, variables I created in the 'with' block still exist:
for line in lines:
    print(line)

In [None]:
# We can also write to files.
# To do this, we open our file in write mode 'w' (which overwrites)
# or in append mode 'a' (which appends what we write to the end):
with open("./example.txt", "a") as out:
    # String writing allows some controls.
    # For example, '\n' denotes a newline, or '\t' a tab.
    out.write("\n")
    out.write("Programming is easy.\n")
    out.write("I am a god.")

# Let's confirm that writing to the file worked:
with open("./example.txt", "r") as out:
    lines = out.readlines()
for line in lines:
    print(line)

# We can also create files as we open them, if they don't yet exist.
# This is done by using the 'w+' or 'a+' modes.

In [None]:
# Now it's your turn for a moment.
# Make yourself a shopping list. That list is a dictionary.
# Each key is something you want to buy. Each value the amount.

# Now try to use a loop and your knowledge of open() to make
# a file called 'shopping_list.txt' and neatly write your
# shopping list into the file.






Next, let's check out a format called csv - comma-separated values Originally, these were text files that looked something like this:

1,2,3  
4,5,6    
7,8,9    

Nowadays, its not uncommon to also write text or more esoteric things between the commas.

Reading and writing to those files does not work differently from regular text files. However, there is several nice tools to help use this particular formatting style efficiently. For now, learning the simplest one of them will suffice.

In [None]:
# split() is a simple method that every string is capable of.
# You specify by what delimiter the string is supposed to be split.
print("1,2,3".split(","))

In [None]:
# Try to read in the staff.csv file like you have learned above.
# Using split and loops, make one list each which contains all
# names, ages and genders.








In [None]:
# So, what do we do if we don't know where our files live, or what
# their names are? It is quite common to have so much data, that
# you cannot know all of it, nor manually write it.

# The os package provides us with the tools that we need.
import os

# os.listdir() tells us the contents of a directory. It returns a list.
# The "./" is the name of the directory to search and means 'wherever
# we are right now.'
output = os.listdir("./")
print(output)

In [None]:
# Notice that there is files and directories in the output.
# Also, we have been told their names, but never their full paths.

# Let's get some full paths using os.path.join().
# os.path.join() attaches parts of a file's path to one another,
# adding "/" as needed to make it a correct path.
full_paths = [os.path.join(os.getcwd(), p) for p in output]
print(full_paths)

In [None]:
# We can filter the output to contain only directories or files
# using isdir and isfile:
only_files = [p for p in full_paths if os.path.isfile(p)]
only_dirs = [p for p in full_paths if os.path.isdir(p)]
print(only_files)
print(only_dirs)

Just as with any other package, os offers a ton of utility you will learn about at some point in time. For now, this knowledge will hopefully suffice.

### Chapter 0.8: "Try and Except"

We've seen earlier what happens if we break our code. Python will try to execute anything you write and only upon execution realize that maybe it was complete nonsense from the start. Sometimes such a mistake is easy to spot. Sometimes, it is buried within layers and layers of code and almost invisible. Worse, sometimes it's not even you who made the mistake!

When you work with large amounts of data, you can not always guarantee that everything goes smoothly. Maybe one of your 3 million images has the wrong shape, or maybe it doesn't have an RGB channel, maybe the file is simply broken or the website you are downloading it from suddenly stops responding. How can you prevent an error that you don't know about in advance?

Enter 'Try and Except'.

Similar to 'If and Else', 'Try and Except' allows you to write code and execute it on some condition. However, for 'Try and Except', this condition is that Python is throwing an Exception. The 'Try' part wraps the code you want to run. The 'Except' part contains code that is executed if the 'Try' code fails.

In [None]:
# Let's make some toy code using a Try/Except block.
# We want to make a new list, which contains the result of a division.

my_cool_number = 6
divisors = [3, 2, 1, 0, "a"]

new_numbers = []
for d in divisors:
    new_numbers.append(my_cool_number / d)
print(new_numbers)

In [None]:
# Uh oh. We tried dividing by zero, which throws a ZeroDivisionError.
# Let's say we don't mind having an infinity in our list:

new_numbers = []
for d in divisors:
    try:
        new_numbers.append(my_cool_number / d)
    except:
        new_numbers.append(np.inf)

print(new_numbers)

In [None]:
# We have just intercepted every possible error during the for loop and said:
# "If you see any error, just jot down infinity as the result and carry on".

# Sometimes, this is not ideal. Only when you divide by zero is the result
# going to be infinity. What if we have other errors? We can specify what kinds
# of error we want to intercept and what kinds we let through:

new_numbers = []
for d in divisors:
    try:
        new_numbers.append(my_cool_number / d)
    # We except the ZeroDivisionError specifically here
    except ZeroDivisionError:
        new_numbers.append(np.inf)
    # We except multiple kinds of error like this, by putting them into a tuple
    except (ValueError, TypeError):
        pass
    # Any kind of error we have not specifically caught is still raised,
    # crashing your program.

print(new_numbers)

In [None]:
# Try to think of some task that you could wrap in a Try/Except block and
# test it, to see if Try/Except works the way you believe it should!





### Chapter 0.9: "Classes and Inheritance"

Perhaps the most powerful object in Python is the *class*. Classes are extremely versatile and are used in pretty much every program out there, including PyTorch, which we will use for our actual machine learning exercises.

To learn what classes can do, the best way is probably to make some!

In [None]:
# You define a class similarly to a function:
class Dog:
    pass

# This is already a valid class! Of course, it does not do anything yet.

# A class is a little bit like a blueprint. The class itself is not really a
# thing - it is more of a concept.
# To get a so-called instance of a class, we assign it to a variable:
my_dog = Dog()
print(my_dog)

# The Dog class is the concept of dogs in general.
# my_dog is a specific instance of a dog.

In [None]:
# Let's breathe some life into our dog. The two most important things
# that classes (can) have are called 'attributes' and 'methods'.
# An example may help to illustrate what these things mean:

class Dog:

    def __init__(self):
        self.name = "Sparky"

    def bark(self):
        print("WOOF!")

my_dog = Dog()

In [None]:
# We have just made a new, upgraded Dog class.

# As you can see, we have given it a name (self.name = "Sparky").
# The dog's name is what we call an attribute. It can be accessed
# from outside like this:
print(my_dog.name)

# We can also change it from the outside:
my_dog.name = "Good Boy"
print(my_dog.name)

# We can even add (or delete) attributes from the outside:
my_dog.age = 1
print(my_dog.age)

# Attributes can be anything, really: A string (like a name), a number,
# another class instance, and even really esoteric stuff.

In [None]:
# The dog's bark function is what we call a method. 
# Methods are normal functions, which we happened to define inside of our class.

# They are something that only our class can do, meaning we could write
my_dog.bark()
# but could not just write 'bark()'.

In [None]:
# Functions inside of a class can do the same things as regular functions,
# such as taking input or returning output. They are also called Methods.

class Dog:

    def __init__(self):
        self.name = "Sparky"

    def bark(self):
        print("WOOF!")

    def bark_n_times(self, barks: int):
        for n in range(barks):
            print("WOOF!")

    def is_a_good_boy(self):
        return f"{self.name} smiles. He is a good boy."

my_dog = Dog()
my_dog.bark_n_times(barks = 3)
print(my_dog.is_a_good_boy())

You have probably already noticed a few weird things about the Dog class
which we have been looking at until now, and it's time to talk about them.

1) What does 'self' mean? Why is it everywhere in the class, but seemingly
unnecessary when we call my_dog.bark_n_times(barks = 3) for example?

When we make a class, any function inside of it that we do not specifically
mark to be treated otherwise is a so-called bound method. When we have
created a class instance, that instance is passed as the first parameter
of any bound method. An example may help to understand what this means:

In [None]:
# This
print(my_dog.is_a_good_boy())
# is equivalent to this
print(Dog.is_a_good_boy(my_dog))

# Because of this, we are allowed to do these things in functions inside of a class:
def is_a_good_boy(self):
    return f"{self.name} smiles. He is a good boy."
# This would work inside of the Dog class.

# To put it into simpler words: Methods of a class instance are aware of who they are.
# The dog knows its own name, age, and so on.

2. Why does the '\_\_init\_\_' function look so weird? What does it do and why does it have underscores in its name?

Some function names, like '\_\_init\_\_', are what we call 'privileged'. If you name a function in a class this way, Python will do special things with it. Here is a few examples:

- If a class contains an '\_\_init\_\_' function, the function is called when an instance of the class is created. 
- If a class contains a '\_\_len\_\_' function, you can use the built-in len() function on an instance of your class, and it will execute its '\_\_len\_\_' function.

Be careful about using privileged names when writing functions, particularly if they are available everywhere and not just inside of a class.

In [None]:
class Dataset:

    def __init__(self):
        self.data = [1, 2, 3, 4]

    def __len__(self):
        # You could write anything here - you could even, as a joke, always return 3,
        # no matter what self.data actually is.
        return len(self.data)

# The '__init__' function is called here, although you don't see it.
# But self.data already exists:
my_dataset = Dataset()
print(my_dataset.data)

# The '__len__' function is called here.
print(len(my_dataset))

Now it's time to talk about a feature called inheritance. Classes can 'inherit' from another class by specifying that class in their definition. The child class can do anything that the parent class can, meaning it has access to all the same methods and attributes, and everything is named the same. You can then add your own things to the child class just like usually. If you name something the same name as it was named in the parent class, you overwrite the parent method or attribute with your new one.

Let's see the concept in action below.

In [None]:
from collections import UserList

class Cool_List(UserList):

    def __init__(self, input_list):
        self.data = input_list

    # We can overwrite methods of our parent class
    def __len__(self):
        print("I am too cool to tell you my real length.")
        return 0

    # Or we can add methods ourselves
    def be_cool(self):
        print("😎")

# Let's make a normal list
my_list = [1, 2, 3]
print(len(my_list))

# Now we make an instance of a cool_list, and we hand it our original list
my_cool_list = Cool_List(my_list)
# It's __len__ method has been overwritten with a new one
print(len(my_cool_list))
# It has a new method, which a regular list does not have
my_cool_list.be_cool()
# Other methods (for example '__str__', used for printing) are inherited from
# the parent. We can verify this by trying to print the child class instance:
print(my_cool_list)

In [None]:
# Finally, note that the name of our class, Dog, was written with a capital
# letter at the start. This is not mandatory, but its a sort of recommendation,
# which can help keep your code clean.

### Chapter 0.10: "Debugging"

The following code snippets will each contain a mistake. Try executing them, and figure out where the mistake is coming from and how you can fix it.

At the end of the chapter, you can find a solution with an explanation of what was wrong.

In [None]:
# Task 1

# We want to look at the last entry in this list
my_list = [1, 4, 9, 16]

# However, this fails:
print(my_list[4])
# Why, and how do I fix it?

In [None]:
# Task 2

# This is a basic cat class
class Cat:

    def __init__(self, name: str, color: str = "Black"):
        self.name = name
        self.color = color

    def meow(self, loud: bool):
        if loud is True:
            print("MEOW!")
        else:
            print("Meow!")

# This is a cat instance
my_cat = Cat(name = "Findus", color = "Tabby")

In [None]:
# Why doesn't my cat meow when I do this?
my_cat.meow

In [None]:
# Why do I see a "None" when I try to print the output of the meow method?
the_sound_of_a_cat = my_cat.meow(loud = True)
print(the_sound_of_a_cat)

In [None]:
# Task 3

# This functtion should calculate the factorial
def factorial(n):
    result = 1
    for i in range(1, n):
          result = result * i
    return result

# If we look at the result, however, we notice it is 24 instead of 120.
print(factorial(5))
# Why? Fix the function and try running it again.

In [None]:
# Task 4

class Car:

    def __init__(self, color: str, model: str, honk_sound: str = "HONK!"):
        self.color = color
        self.model = model
        self.honk_sound = honk_sound

    def honk():
        print(self.honk_sound)

# This car should honk. But it doesn't. Instead it throws a TypeError. Why?
my_car = Car(color = "Black", model = "Mercedes")
my_car.honk()

In [7]:
# Task 5

a = 10
def subtract_five(b):
    b = b - 5
subtract_five(b = a)
print(a)

# You'd think that if a is 10, and I give a to the function subtract_five,
# and then the function makes it 5, that a should be 5. Why is it 10?
# And how do I make it so the function actually makes a = 5?


10


In [None]:
# Task 6

from time import sleep

# Let's say we have some numbers. We want to add the sum of each
# pair of neighbors to the list of numbers, as well.

numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for i, number in enumerate(numbers):
    sleep(0.5)
    print(numbers[i], numbers[i+1])
    a = numbers[i]
    b = numbers[i+1]
    c = a+b
    numbers.append(c)

# Watch in slow motion what happens. Why does it keep going?
# Will it ever stop? How do you fix this?

### Solutions

#### Task 1

This code snippet throws an IndexError because you are trying to access an index that doesn't exist. In Python, the indices of iterable objects (for example the list [1, 4, 9, 16]) are counted starting from 0. This means that the indices 0, 1, 2, and 3 are valid.

An even better solution than calling print(my_list[3]) would be to call print(my_list[-1]). Python considers -1 the last index of an iterable, -2 the second-to-last, and so on.

#### Task 2

`my_cat.meow` is a method of `my_cat`. If you want to call a method, just like other functions, you must use parentheses:
`my_cat.meow(loud = True)`.

Just like regular functions, class methods can return something. In fact, they always return something! If you do not tell your function to return anything, it will assume you want to return nothing. Writing
```
def do_nothing():
   sleep(1)
   return None
```
is equivalent to writing
```
def do_nothing():
   sleep(1)
```

If you wrote
```
x = do_nothing()
print(x)
```
you would always see "None" get printed. This is what happens in Task 2 as well.

#### Task 3

The factorial function is "forgetting" to multiply by 5 in this example. In fact, it would always "forget" to multiply by the last number it should multiply by. This is because range(a, b) covers the values from a to b, but not b itself. The correct expression for the task would be `range(1, n+1)`.

#### Task 4

```
def honk():
    print(self.honk_sound)
```
should instead be
```
def honk(self):
    print(self.honk_sound)
```

What happens is that the function we defined wants no inputs. But since its a bound method of the Car class, when we call it, the function is implicitly given self as the input. The error message hints that this is the case. Our function *wanted* 0 arguments, but will always be given 1 inside of the class.

#### Task 5

This concept is called Scope. 

The variable b only exists inside of this function. We say "b exists only in the local scope". Meanwhile, "a exists in the global scope". Hence, inside the function we only know about b, but not about a, while outside the function, we know about a, but not about b. If we tried to print b right after pringting a, Python would tell us that we're trying to print something that doesn't exist.

Note that its absolutely possible for a variable to be made available in a different scope. For example, you can bring b from the function's local scope to the global scope, simply by returning b at the end:
```
a = 10
def subtract_five(b):
    b = b - 5
    return b
c = subtract_five(b = a)
print(a)
print(c)
```
You will find that both a and c are printable, because this time, we have brought the b from the function and then said "call this 'new thing' c." If we instead said ```a = subtract_five(b = a)```, we would set a to 5.

#### Task 6

We have just created an infinite iterator. We append to the end of the list while we are reading from it. During every step, we add one new number and use one old number for the last time, and this will go on forever. Whenever your code seems to not do anything but takes forever, or prints things into the console at incredible speed, without looking like it will stop, something like this might have happened.

How can we stop this?
- Make sure you don't change the iterable you are currently using (such as our original list of numbers), unless that is explicitly what you want.
- Use Dictionaries. Dictionaries simply don't allow you to change them while you are iterating over them, so this will never happen.