[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MeyerBender/data_analysis_workshop/blob/main/notebooks/01_intro_to_python.ipynb)

# Introduction to Python

## "Hello World!" - Printing and Variables
This is a jupyter notebook. It contains code cells, which you can run by either clicking the `Run` button on top or hitting `Ctrl + Enter` on your keyboard.
The notebook aims to show how programming in python works and then goes into more detail on how to work with images.

In python, a comment is preceded by a `#`, so any code that comes after the `#` on the same line will not be executed.

You can print things by calling the `print()` function.

You can also do simple math an assign numbers or strings to variables. You do not need to declare the type of a variable, as python will take care of this automatically.

In [1]:
# this is a comment, and running this cell will not do anything

In [None]:
print("Hello, world!")

In [None]:
print(1 + 2)

In [None]:
a = 2
b = 3
print(a + b)

In [None]:
c = "test1"
d = "test2"
print(c + d)

In [None]:
# it is often useful to print out a message and the value of a variable
# python has f-strings to facilitate this
# simple put an f in front of the quotation marks and put everything that should be 
# interpreted in curly brackets
print(f"The value of variable a is {a}")

### Exercise
Create two variables, one string and one number. Write an f-string that prints the variable names and values.

In [None]:
# *** your code here ***

## Data Structures

Numbers and strings are nice, but limited in the things they can do. There are more data structures in python, which help us to structure our code.

- Sets: Sets are unordered lists of items. They can be constructed with curly brackets. 
- Tuples: Tuples are ordered lists of items. Once instantiated, their values cannot be changed.
- Lists: Probably the most useful. An ordered list of items that can be accessed with `list[i]`.
- Dictionaries: Dictionaries map from a set of keys to a set of values. They are also instantiated with curly brackets (or the `dict()` constructor). Keys and values are separated with `:`.

### Sets

In [None]:
set_1 = {1, 2, 3}
print(set_1)

In [None]:
# What do you think happens when we try to access the first element of a set? 
# Why does this happen?
# Also note that in python (and in most programming languages, with R being the exception), indexing starts at 0!
print(set_1[0])

### Tuples

In [None]:
# tuples are instantiated with round brackets
tuple_1 = (1, 2, 3)
print(tuple_1)

In [None]:
# can we access the first element for a tuple?
tuple_1[0]

In [None]:
# how about changing the value?
tuple_1[0] = 5

### Lists

In [None]:
# creating a list
list_1 = [1, 2, 3, 4, 5]
print(list_1)

In [None]:
# can we access the first element?
list_1[0]

In [None]:
# how about changing it?
list_1[0] = 9
print(list_1)

In [None]:
# you can get the length of a list using len()
len(list_1)

### Dictionaries

In [None]:
dict_1 = {"student_1": "Mary", "student_2": "John", "teacher": "Mr. Einstein"}
print(dict_1)

In [None]:
# getting the value for a key
dict_1["student_2"]

In [None]:
# can we use indices to access certain values of the dictionary?
dict_1[0]

In [None]:
# inserting a new key
dict_1["student_3"] = "Caroline"
dict_1

In [None]:
# changing the value for a key
dict_1["teacher"] = "Ms. Curie"
dict_1

### Exercise
Create a list and a dict. Try changing the values within them.

In [None]:
# *** your code here ***

## Control Flow

There are three main ways you can alter the flow of your program.

- if and else: executes a code block if a certain condition is met
- while: executes a code block until a condition is False. Can lead to infinite loops, so for loops are generally better
- for: executes a code block a certain number of times

In [None]:
# if else
# try changing the value of a and see what happens
a = 0

if a == 0:  # checking if a is equal to 0
    print("ZERO")
elif a == 1:  # elif is short for "else if"
    print("ONE")
else:
    print("MULTIPLE")

In [None]:
# while loop
counter = 0

while counter < 3:
    print(counter)
    counter += 1  # this is the same as writing "counter = counter + 1". It basically increases the value of counter by 1

In [None]:
# guess: what value will the variable counter have now?
print(counter)

In [None]:
# for loop

# range(n) creates an object similar to a list from 0 to whatever argument you provide. 
# It is end-exclusive, so range(3) will yield something like [0, 1, 2].
for i in range(3):
    print(i)

In [None]:
# using for with lists
list_1 = ["apple", "banana", "pineapple"]
for item in list_1:
    print(item)

In [None]:
# you can use enumerate to return an items index in the list and its value simultaneously
for i, item in enumerate(list_1):
    print(f"Item on position {i} is {item}")

In [None]:
# often, you want to apply some transformation to all elements of a list
# for example, let's say you want to add _x to all elements in the list
for i, item in enumerate(list_1):
    list_1[i] = item + "_x"
    
print(list_1)

In [None]:
# Exercise: how could you make the expression above shorter using the += syntax shown earlier?
# *** your code here ***

In [None]:
# instead of writing a for loop, we can also use a so called list comprehension
# it accomplishes the same thing, but is more elegant
list_1 = [f"{item}_y" for item in list_1]
list_1

### Exercise

Create a list consisting of five zeroes. Create a dictionary that maps the numbers from 0 to 4 to some letters of your choice.
Write a for loop that goes through the numbers from 0 to 4. For each index, change the element in the list at that index to the corresponding value in your dictionary.

Example: if your dict maps from 0-4 to a-e, the output would be [a, b, c, d, e].

In [46]:
# *** your code here ***

## Functions

A function can be viewed as a reusable codeblock. In R, you instantiate a function with the `function` keyword. In python, we use `def` instead.

In [37]:
# defining a function
def add(a, b):
    # return the output of the function
    return a + b

In [None]:
add(5, 9)

In [None]:
# since python automatically infers types, we can also abuse the method to concatenate strings
add("John ", "Cena")

In [40]:
# we can provide default arguments, so if the function is called without them, they go to their default value
def write_nice_message(greeting, compliment="awesome"):
    print(f"{greeting}, you are {compliment}!")

In [None]:
write_nice_message("Hi", compliment="amazing")

In [None]:
# the compliment argument was not provided, so the function defaults to "awesome"
write_nice_message("Hello there")

### Exercise
Write a function that takes a list of numbers as input and returns the average of those numbers.

In [45]:
# *** your code here ***

## Packages

Python out of the box is already quite powerful, but we can enhance its capabilities by using packages. The most important ones include:
- numpy: allows you to work with big arrays of numbers faster than just using lists
- pandas: basically numpy on steroids. Allows you to work with data frames, which are similar to what you might be used to from R
- matplotlib.pyplot: enables basic plotting and showing images
- seaborn: better plotting than pyplot

### Numpy

In [None]:
# importing packages can be done using the import keyword
import numpy as np
test_array = np.zeros(shape=(3, 2))  # here, we pass the desired shape as a tuple
test_array

In [None]:
# adding another array full of random numbers
random_array = np.random.rand(3, 2)
random_array

In [None]:
# access a value like this. Y is first, then X.
random_array[0, 1]

In [None]:
# we can simply add two arrays together
np.ones(shape=(3, 2)) + random_array

In [None]:
# we can also get the col sums or row sums
print(np.sum(random_array, axis=0))
print(np.sum(random_array, axis=1))

#### Exercise
Create a 5 by 5 array of random numbers. Apply the log transform to the values (use `np.log()`). Print the output.

In [None]:
# *** your code here ***

### Pandas

In [None]:
import pandas as pd

# note how we can use dictionaries and lists to construct a data frame
df = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6], "col3": [7, 8, 9]})
df

In [53]:
# you can also read a csv file by using pd.read_csv("/your/file/path.csv"). Try it!
# *** your code here ***

In [None]:
# select a column like this
df["col3"]

In [None]:
# add a column like this
df["col4"] = [0, 1, 1]
df

In [None]:
# selecting by a condition
df[df["col4"] == 1]

In [None]:
# selecting by indices
# y first, x second
# remember that pythons starts counting at 0
df.iloc[1, 2]

### Pyplot and Seaborn

In [None]:
df

In [61]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
plt.scatter(df["col1"], df["col2"])

In [None]:
sns.scatterplot(df, x="col1", y="col2")

In [None]:
sns.lineplot(df, x="col1", y="col2")

In [None]:
# creating a random numpy array of shape 50 by 50
numpy_array = np.random.rand(50, 50)
# showing a numpy array as an image
# not really relevant for our project, but maybe useful for people working with images
plt.imshow(numpy_array)