<a target="_blank" href="https://colab.research.google.com/github/JLDC/Data-Science-Fundamentals/blob/master/notebooks/000_introduction-to-python.ipynb">
    <img src="https://i.ibb.co/2P3SLwK/colab.png"  style="padding-bottom:5px;" />Open this notebook in Google Colab
</a>

___

# Data Science Fundamentals

___

Welcome to the **Data Science Fundamentals** course. In this course, you will view many Jupyter notebooks (We will explain what these are).

We try to keep some similar structure across all the notebooks. In particular, we try to stick to a similar notation, be it mathematical notation or general notation indicating *how you should behave* when reading these notebooks.

Before we explain the notation, let us state the most important thing you should remember when going through those notebooks:

> You can freely modify the notebooks and try things out. **Do it.** There is a well-known adage that states
>
> ***I hear, I know. I see, I remember, I do, I understand.***
> I am a firm believer in this mantra. You could simply go through the notebooks and run the code, but you would only learn half as much. Try changing things and adding your custom code. You will run into errors, and that is okay. You will also improve, and in no time, you will become quite good at programming and data science. So do yourself a favor, and try to get your hands dirty as much as possible!

Now that this is out the way, let's quickly go over the notation you will encounter in the notebooks.



## Notation
___
### Notebook notation

In the notebooks, you will find sections with emojis. The emojis can be interpreted as follows:

➡️ ✏️: You have to either write a piece of code, discuss something with your classmates, or write something on paper. It's your turn to solve a problem.

🙀 🤯: This is a section with complicated concepts. These are either involved mathematical notation or advanced programming concepts. **It's okay if you don't understand the contents on your first readthrough**. However, if you are interested in data science, it's good to re-read those sections and ensure you understand them when you have more knowledge.

🤔: Pause and ponder. When learning, it's good to take a step back and think about what we are doing and why we are doing it this way. If you encounter this emoji, you should try to think about something and ensure you understand it.

___


### Mathematical notation
In general, we try to stick to the following notation in the notebooks, however, there might be some exceptions. If anything is unclear make sure to ask.

+ $x$: a scalar value, i.e., some real number.
+ $\mathbf{x}$: a vector.
+ $\mathbf{X}$: a matrix.
+ $x^{(i)}$ the $x$*-value* for the $i^\text{th}$ observation in a set of data points. You will also encounter this superscript on vectors, i.e., $\mathbf{x}^{(i)}$.
___

# Introduction to Jupyter
___
In this notebook, you will learn about the basics of Jupyter Lab and Python. This notebook's purpose is to get everybody up to speed on the basic tools used in this course. If you already had programming classes with Python, there might not be many new things to learn in here. Nonetheless, we recommend you go through the notebook so you can make sure you're knowledge is up-to-date.

We try to present many concepts that we later re-use in the more advanced notebooks. It's okay if you don't understand every detail on the first readthrough. The main idea of introducing this many concepts is that you can come back to a simple example later, when you don't understand what is going on in what of the more advanced notebooks.

## Getting started
___
The document you are reading is not a static web page, but an interactive environment called a **Jupyter notebook** that lets you write and execute code.

For example, here is a **code cell** with a short Python script that computes a value, stores it in a variable, and prints the result:

In [None]:
seconds_in_a_day = 24 * 60 * 60 # Compute the value and store it
seconds_in_a_day # Display the value

To execute the code in the above cell, select it with a click and then either press the play button to the left of the code, or use the keyboard shortcut "Command/Ctrl/Shift+Enter". To edit the code, just click the cell and start editing.

Variables that you define in one cell can later be used in other cells:

In [None]:
seconds_in_a_week = 7 * seconds_in_a_day # Use the variable defined in the previous cell
seconds_in_a_week # Display the result

Jupyter notebooks allow you to combine **executable code** and **rich text** in a single document, along with **images**, **HTML**, **LaTeX** and more. 


# Introduction to Python
___
Python is a general-purpose programming language, meaning it can be used to do a lot of different things. Building websites, creating games, analyzing data, programming AI and machine learning pipelines... you name it. Python does it all! Furthermore, Python is considered one of the easiest programming languages there is to learn.

However, this flexibility comes with a cost:
1. **Python is slow**. Writing a highly efficient program in pure Python is difficult and it will always be outperformed by other languages such as C++, Fortran, Julia, etc. However, this is not a big problem for us. In fact, Python comes with so-called packages (more on them later on), which are mainly written in C++ and Fortran. These packages basically allow us to write code in Python but with Fortran and C++ speed, unifying the best of both worlds. The only caveat, is that we need those packages to be fast, but don't worry! Python is widely adopted and the community is great. There is nearly no chance of you not finding a package for the task you are aiming to accomplish.
2. **Python is not tailored for data science or statistics**. While this is not a big problem in itself, sometimes other languages such as R or Julia, which were developed with numerical computation in mind, can be slightly more intuitive when it comes to doing data science. This doesn't mean Python is bad in that regards, quite to the contrary, but, as you will see, we will learn about *Python basics* and then, we will have to change the way we think when dealing with arrays in the numerical computation package `numpy`.

With all that being said, let's dive into the basics of Python.


## Variables and types
___
A variable is a name associated with a value, this is useful for storing values and re-using them as we go.

In [None]:
x = 10   # Assign the value 10 to the variable x
x        # Display the value of x

In [None]:
# You can do math with variables
x + 5

In [None]:
# Note that x is still 10, we did NOT assign x + 5 to x, we just used x to do some math
x

In [None]:
# You can overwrite an existing variable
x = 2 * 3 # You can also use math in the assignment of a variable
x

In [None]:
# You can also use variables in the assignment of variables
x = x + 3 # A shorthand for incrementation is x += 3
y = x + 3
x, y

### Variable names
You cannot use any variable name you'd like, in general there are two things to look out for:
1. There are some name of variables that will throw an error, such as starting your variable name with a number or using special characters like `?.\+-*/` (`_` is allowed and very often used, e.g. `my_variable = 3`).
2. There are reserved names. These are used by Python and cannot (or should not) be overwritten. E.g. `from`, `import`, `if`, `and`, `else`, `True`, `False`.

Lastly, in programming there are *conventions*, that is, people agree on how the code should be written, it's not enforced but it's **very good practice** to follow these. The problem is that different people and different programming languages have different conventions, so it can get a bit messy. In Python, the most widely used convention is to follow [PEP 8](https://www.python.org/dev/peps/pep-0008/).

If you pay close attention, you will see that we do not always strictly follow PEP 8. In practice, the conventions you will use greatly depend on the project you are working on and the team you are working with.

As a rule of thumb, try to:
+ **Make your variable names expressive but not too long**, e.g., `dataframe_of_bike_rentals` is too long and `d` is not very informative. A good name for your data of bike rentals could be `rentals`.
+ **Use snake case**, i.e., separate words with an underscore. `this_is_an_example_of_a_variable_name_with_snake_case`, while `ThisIsAnExampleOfAVariableWithCamelCase`. Both snake and camel case are fine, but in Python, we generally use snake case for variables and use camel case to symbolize objects! You don't know the difference between objects and variables yet, but we won't be creating our own objects, so just stick to snake case, i.e., write everything small unless it's supposed to represent a matrix, then you might want to use somethings like `X` to represent the $\mathbf{X}$ matrix.

### Variable types
A variable can have different types. In the above example, our `x` was always an integer, the most common types are:

|Type|Description|
|----|-----------|
|`int`|Integer, a number in $\mathbb{Z}$|
|`float`| Decimal or floating point number, a number in $\mathbb{R}$|
|`str`| String, a character string, e.g. `"unisg"`|
|`bool`| Boolean, either `True` or `False`|

#### Integers and floating-point numbers

Integers and floating-point numbers (or floats) are the basic numeric types in Python. While it is important to know the difference between `int` and `float`, in most of what we will cover, Python will do the conversion automatically for us and we do not need to think about it much.

**⚠️ This doesn't mean you can just ignore this whole concept. As a matter of fact, it will be necessary/useful in a few cases when using the  numerical computation package `numpy`. ⚠️**

In [None]:
# Here is an example of an int being converted automatically to a float
x = 3
type(x) # Returns the type of an object

In [None]:
x = x / 2 # After division by 2, x will be 1.5 and thus cannot be an integer anymore
x, type(x)

#### Strings

Strings are sequences of characters, these are delimited either by single or double quotes (e.g. `x = 'unisg'` is the same thing as `x = "unisg"`, both assign the value unisg to the variable `x`.


In [None]:
# Character strings are defined between single or double quotes (but not both for the same string)
x = '1' # or x = "1"
x

In [None]:
# Strings are different from numbers and addition / multiplication will work differently
x + x

In [None]:
# Strictly speaking, a variable name and its content need not have any sensible
# relationship, however, using meaningful variable names is good practice. 
x = "my_string" # Assign the value 'my_string' to the variable x
my_string = "x" # Assigne the value 'x' to the variable my_string
x, my_string # Notice how x contains the value 'my_string' and my_string contains the value 'x'

In [None]:
x = '1'
x * 3 # Mutiplying a string by a number simply repeats the string (3 times in this case)

In [None]:
# When a string represents a number, it can be easily converted
int(x) + 1

In [None]:
# This does not work when the string cannot be interpreted as a number
x = "unisg"
int(x)

#### 🙀 🤯 f-Strings
We have seen above, how we can use `+` to combine strings, e.g. `"Hello" + "world"` will result in `"Helloworld"` (notice, no spaces!). However, in practice, this is not always very useful and there are a few other ways to combine variables into strings.

The way we use most often is called *f-Strings*. It's very powerful but slightly complicated to explain. The main idea is that we can write
```python
f"My string {my_variable}"
```
Notice how we have added an `f` in front of the string and something else in curly brackets. There are a few situations where f-Strings come in handy:
+ Annotation of figures based on the underlying variables
+ Formatting text (e.g., left-align, right-align, print only the first few decimals)
+ Creating strings based on some variables

Of course, you never *need* an f-String, you can always do it in another way as well, but it's often a practical way to achieve your goal when working with strings. The best way to understand f-Strings is using examples:


In [None]:
# A simple, not-so-useful example of f-Strings
var1 = "Hello"
var2 = "World"
# Create and output the f-String
f"{var1} {var2}"

'HeLo World'

In [None]:
# A more useful example of an f-String
favorite_number = 72
favorite_color = "purple"
# ⚠️ Notice how we can combine different types in an f-String!
f"My favorite number is {favorite_number} and my favorite color is {favorite_color}"

'My favorite number is 72 and my favorite color is purple'

We're not going to go into details on this, but you can also implement formatting directly into an f-String, as you can see below:

In [None]:
# Compute a fraction
numerator = 22
denominator = 7
fraction = 22 / 7
# A powerful use of f-Strings
f"The fraction {numerator}/{denominator} is equal to {fraction:.2f}."


'The fraction 22/7 is equal to 3.14.'

In [None]:
# Another powerful use of f-Strings
f"{denominator} is {100 * denominator / numerator:.2f}% of {numerator}."

As you can see from the examples above, not only can we do math directly in the f-Strings, we can also display only a specific amount of decimals.

#### Booleans

Booleans originate from [Boolean logic](https://en.wikipedia.org/wiki/Boolean_algebra) but practically, we can view them as a kind of number.

`True` is equivalent to `1` and `False` is equivalent to `0`. See the following truth table for two truth values `x` and `y` and for the basic operations `and` and `or`.

|x|y|x and y|x or y|
|:-:|:-:|:-:|:-:|
|0 | 0 | 0 | 0 |
| 1| 0 | 0 | 1 |
|0|1|0|1|
|1|1|1|1|

As we will see later, booleans are particularly important for control flow.


In [None]:
x = True
y = False
x and y # Corresponds to the 2nd column, 2nd row of the above table

In [None]:
x = False
y = True
x or y # Corresponds to the 4th column, 3rd row of the above table

#### Lists

Lists in Python are what other languages might call vectors or arrays. However, in Python, when talking about arrays, we generally mean `numpy` arrays (more on these later). Lists represent a sequence of different variables. Lists behave like strings when we use multiplication or addition, this can cause a bit of confusion, in particular if you make the mistake of thinking of lists of numbers as vectors in $\mathbb{R}^N$.

In [None]:
x = [1, 2, 3] # A list of the first three positive integers
type(x)

list

In [None]:
x * 3 # As with strings, multiplying a list will simply repeat it

[1, 2, 3, 1, 2, 3, 1, 2, 3]

In [None]:
y = ["a", 2, 1.5, "b", True] # A list mixing multiple variable types
x + y # As with strings, an addition will simply append one list to the other

[1, 2, 3, 'a', 2, 1.5, 'b', True]

In [None]:
# Concatenating two lists is easily done with the + sign, when we want to append
# a single element to a list we can use the following
x.append(4)
x # Display the value of x

[1, 2, 3, 4]

___
#### 🤔 Pause and ponder
Why is `x = [1, 2, 3, 4]` and not `[1, 2, 3]`? 

The `.append()` method works **in-place**, i.e., it directly modifies the object. 

⚠️ Be careful, this also means you can use
```python
x.append(my_number)
```
**but not**
```python
x = x.append(my_number)
```
___

In [None]:
l1 = [1, 2, 3] # A first list
l2 = [1, 2, 3] # A second list
l1.append(4) # Correctly append
l2 = l2.append(4) # Incorrectly append

In [None]:
l1 # l1 is the correct list [1, 2, 3, 4]

In [None]:
l2 # l2 is now "nothing"

You can access a particular element of a list by using square brackets, e.g. if `x` is a list, you can access its elements using `x[0]`, `x[1]`, etc.

Notice that in Python, we start counting at 0, this is customary in many (but not all) programming languages.

In [None]:
x = [1, 2, 3, 4]
# You can access a particular element of a list by using square brackets
x[2] # Notice how this is the 3rd element (we start at zero!)

In [None]:
# In Python we start counting at 0, hence x[0] will give the first element of x
x[0]

Sometimes, we might also want to access multiple elements at once, there are multiple ways to do this.

In [None]:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # Instantiate x as a list from 1 to 10
x # Display x

In [None]:
x[:2] # First 2 elements of x (also x[0:2])

In [None]:
x[:-1] # Everything but the last element of x

In [None]:
x[-1] # Only the last element of x

In [None]:
x[3:] # Everything but the first three element of x

In [None]:
x[::2] # Every second element in x

In [None]:
x[2::3] # Every third element in x, starting from the third one

It's best to play around a bit with list indexing to understand it, but the main idea is `start_value:end_value:step`. When `start_value` is empty, we start at the first item, when `end_value` is empty, we stop at the last item, and, when `step` is empty, we take steps of size 1.


## Functions
___
### Built-in functions
A function is an object that maps any number of arguments to an output. There are a few of *builtin* functions, which are functions that are built in the Python language (such as `print`, `pow`, or `sum`) and there are a lot of functions that become available once we load specific packages (more on that later). Furthermore, you can create your own functions to optimize your workflow.

Until now, we have executed code cells and the last line was displayed back to Jupyter. But what if we want to display more than just the last line? The approach we took isn't very practical. This can be done with the `print` function.

Notice from the example below, the general structure of function calls is to write `function(input_to_the_function)`. Sometimes, you can also have function without any arguments, such that you would write `function()`, and sometimes, a function can have multiple arguments, such that you would write `function(input1, input2, input3)`.


In [1]:
x = "Hello"
print(x) # Display the value of x 
x += " world!" # Add some more words to the variable x (notice how we use the shorthand for incrementation)
print(x) # Display the new value

Hello
Hello world!


We can use `print` to combine multiple strings and variables together, see the example below where we also use the builtin function `sum`

In [2]:
x = [1, 2, 3]
print("The sum of", x, "is", sum(x))

The sum of [1, 2, 3] is 6


If you are not sure how to use a specific function, you can always type a question mark and the function's name, e.g.

In [3]:
?pow

[0;31mSignature:[0m [0mpow[0m[0;34m([0m[0mbase[0m[0;34m,[0m [0mexp[0m[0;34m,[0m [0mmod[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Equivalent to base**exp with 2 arguments or base**exp % mod with 3 arguments

Some types, such as ints, are able to use a more efficient algorithm when
invoked using the three argument form.
[0;31mType:[0m      builtin_function_or_method


Above, we always wrote out our full lists, e.g., `x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]`. This is cumbersome and Python provides a function to create sequences of integers: `range`. Using `list(range(n))` we can create a list with numbers $0, 1, \dots, n-1$.

⚠️ `range(n)` always starts at zero and goes up to $n-1$ only, i.e. it has $n$ elements! ⚠️

If you want your sequence to start at, e.g., 1, you can use `range(1, n)`. Be careful, however, this will make a sequence from $1$ to $n-1$, so if you want the sequence to go from $1$ to $n$ (included), you will have to use `range(1, n+1)`.

You can also specify a third parameter in the `range` function, and that is the step size. It is one by default, but we can make a list of all odd numbers between $1$ and $10$ by using: `range(1, 11, 2)`.

In [4]:
list(range(1, 11, 2)) # List of all odd numbers between 1 and 10

[1, 3, 5, 7, 9]

#### ➡️ ✏️ Your turn
In the cell below, make a list of all even numbers between 0 and 20. (Consider 0 to be neither odd or even!)

In [None]:
# Enter your code below


### Custom functions
The coolest thing about functions is that you can create your own! Writing your custom function is very easy, it's writing a useful function that can be harder 😉.

The structure to writing your own function is always the same:
1. First, the keyword `def`.
2. Second, the name of the function.
3. Third, the inputs of the function in parentheses, followed by a colon.
4. Indented by a tab, the contents of the function
5. The last line should contain `return` and the value we want to return.

Let's make some simple functions to get used to this whole structure.

In [None]:
# Create a function that takes no inputs and just returns the string "Hello"
def greet():
    return "Hello"

In [None]:
# Execute the function
greet()

That's it. Our first function.

Of course, it's not very useful, so let's add an input to it.

In [None]:
# Create a function that greets a specific person
def greet_user(name):
    return "Hello " + name

In [None]:
# Use this function to say hello to the creator of Python
greet_user("Guido van Rossum")


## Control flow
___
Alright, we now know how to write our own function but you have probably realized that we didn't write any useful function until now. We need another building block: **control flow*!

Control flow let's us add conditional evaluation (e.g., only execute when something happens) and loops (repeat the same piece of code multiple times).

### Conditional evaluation / If-else statements
If-else statements allow to evaluate only parts of the program dependent on specific conditions. I'm sure you have encountered this type of control flow somewhere... perhaps in Excel! The main structure of a simple if-statement is the following

```python
if condition:
    ... # This code is executed if the condition is true, otherwise it is not executed
... # This code is executed after the if-statement, independent of whether the if-statement was executed or not
```

⚠️ Notice how, as with functions, **indentation** is key in Python. Everything indented belongs to the if statement, why everything else does not! ⚠️

In [3]:
if True:
  print("This code block will be executed")
if False:
  print("This code block will not be executed")
print("This will also be executed because it is not part of the if-statement above")

This code block will be executed
This will also be executed because it is not part of the if-statement above


Notice how only the first code block was executed, this is because the second condition was false and hence the program never entered the if statement. Clearly, the above example is not very practical because it will always evaluate in the same way. In general, we want to specify a certain **boolean condition** instead of using `True` and `False` directly

#### ➡️ ✏️ Your turn
Try changing the value of `x` and `y` and see how the example below behaves.

In [4]:
x = 10
y = 5
# If-statement, only execute the code within it if the statement (x > y) is true
if x > y:
  print("x is greater than y")
# Only execute the code if the statement (x < y) is true
if x < y:
  print("x is less than y")

x is greater than y


You can also combine multiple if statements that are mutually exclusive in an if-else statement. For instance, the two if-statements above can be written as:

In [None]:
if x > y:
  print("x is greater than y")
elif x < y: # Only executes if the condition is true and the conditions above are all false
  print("x is less than y")
else: # Only executes if no other condition if the statement was true
  print("x is equal to y")

In Python, numeric comparisons are done using the following symbols:
+ `x > y` evaluates as `True` if x is greater than y
+ `x < y` evaluates as `True` if x is less than y
+ `x >= y` evaluates as `True` if x is greater than or equal to y
+ `x <= y` evaluates as `True` if x is less than or equal to y
+ `x == y` evaluates as `True` if x is equal to y
+ `x != y` evaluates as `True` if x is not equal to y

### Loops
Loops are useful for repeated evaluation of expressions. There are two main types of loops, `for`-loops and `while`-loops.

In [None]:
# The following code executes the block within for each iteration of i from 0 to 4
for i in range(5):
  print(i)

In [None]:
# The same also be done using a while loop
i = 0
while i < 5:
  print(i)
  i += 1 # Increment i by 1

When using a `while` loop, make sure you have an exit condition, otherwise your code will never stop and you will have to interrupt it. For instance the block
```python
i = 0
while i >= 0:
  print(i)
```
will never reach an exit condition, as `i` is always greater or equal to 0. In general, try using a `for`-loop instead of a `while`-loop. Of course, there are some cases where using a `while`-loop is inevitable...

#### ➡️ ✏️ Your turn

Modify the code below (where there is a ✏️) such that it prints the sum of all numbers up to 100.

In [None]:
mysum = 0 # A variable that keeps track of the sum

# ✏️ ... modify
n = 0 

# Run the loop
for i in range(1, n):
    if i % 5 == 0: # Every 5 step, print the result
        print(f"At iteration {i:>3}, the sum is {mysum:>5}")
    
    # ✏️ ... modify (⚠️ think about incrementation)
    mysum = 0

# Print the final result
print("-"*37)
print(f"The final sum is {mysum}")

# Warn the user in case the final sum is not correct
if mysum != sum(range(101)):
    print(f"\n⛔ There is an error somewhere. The sum should be {sum(range(101))}. ⛔")

`for`-loops can also be used to iterate over **the elements of a collection**. For instance, instead of iterating over the numbers in a specified range, you want want to iterate over the elements of a list or the characters in a string. A `for`-loop can do this as well!

In [None]:
# Specify a text
text = "Data Science Fundamentals"
# Iterate over the letters in the text and print them
for letter in text:
    print(letter)

In [None]:
# Of course, this also works for a list (strings are just like lists in a sense!)
my_list = [1, 2, "a", text, "b", mysum]
# Iterate and print the elements
for el in my_list:
    print(el)

#### ➡️ ✏️ Your turn

You are given a dictionary, `inventory`, where the keys are the items a shop sells, and the values are the number of items in the inventory.  You are also given a list with different items, **for each item in them list**, print out a string indicating the number of items left in the inventory.

E.g., if your list consists of `apples`, `bananas`, `cherries`, your output should be:
```python
"There are 30 apples in the inventory"
"There are 15 bananas in the inventory"
"There are 150 cherries in the inventory"
```

In [None]:
# Modify the code below where needed
# Dictionary of the quantity of items in the inventory
inventory = {"Xylophones": 10, "Yoga Mats": 25, "Zucchinis": 13}
# This is the list we iterate over, can you figure out what `inventory.keys()` does?
item_list = inventory.keys()

# Enter your code here ✏️ ...

# First a for loop, over the elements in the list
# - For each element, print a string which indicates how many items are left in inventory


#### ➡️ ✏️ Your turn

In the cell below, do the following:

1. Run a for-loop of 100 iterations
2. At each iteration, add the number 1 to your list.

*Hint:* You should use `ones.append( )` within your loop!

In [1]:
ones = [] # Initialize an empty list

# Enter your code here ✏️ ...

# Print whether the result is the expected one
if not (all([o == 1 for o in ones]) and sum(ones) == 100):
    print("⛔ There is an issue in your code somewhere. ⛔")
else:
    print("✅ Well done!")

⛔ There is an issue in your code somewhere. ⛔


#### 🙀 🤯 List comprehensions
What is this strange notation we used above?

```python
[o == 1 for o in ones]
```

This is called **list comprehension** and it is extremely useful in Python. It is a bit hard to understand at first, but in essence, it allows us to compress in a list, what we would otherwise have to write in a `for`-loop.

Consider the example above, where you have added the number 1 to a list multiple times. This is somewhat tideous and not always very practical, furthermore, we had to instantiate the list `ones` first. Instead, we could have used list comprehension and written:

```python
ones = [1 for _ in range(100)]
```

In [2]:
[i for i in range(5)]

[0, 1, 2, 3, 4]

In [None]:
# Check it for yourself
ones = [1 for _ in range(5)]
ones # Display the result

It takes a while to get used to list comprehension and it's not important that you learn it just yet, but you should ideally be able to read what it going on when you come across it (because we will use it in the notebooks!)

Here are a few examples to get you started...

In [None]:
# Every even number up to 20
[i for i in range(1, 21) if i % 2 == 0]

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

In [None]:
# We can also use functions in list comprehension!
# Every power of 2 up to the 14th
[pow(2, i) for i in range(15)]

[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384]

In [None]:
# ... and f-Strings
# 🙀 🤯 Double list comprehension !?! 🥴
[f"{number}{letter}" for number in range(3) for letter in 'abc']

['0a', '0b', '0c', '1a', '1b', '1c', '2a', '2b', '2c']

Okay, okay... that's enough for now. You will have enough time to get used to it later but feel free to play around with it, as we said, list comprehensions are used **a lot** in Python and they are extremely powerful.


## Packages
___
We mentioned above that Python comes with a lot of handy packages for specific tasks such as developing websites, doing numerical computing, visualizing data, etc.

In Python, a package (also sometimes called library or module) is a set of functions (and/or objects) that extend the base Python functions. As Python is open source, anyone can contribute to the Python ecosystem by writing and publishing his or her own package.

On the one hand, this has the advantage that you will find an existing package for nearly any application you are trying to implement and, on the other hand, it also means that sometimes you should be wary of trusting a package... did the author really make no mistake in the code? Is the package doing what you think it is doing? This is not really a problem for major packages, but if you rely on a very specific, less known package, this are things to take into consideration.

In [None]:
# Importing a package easy
import math
# Calling a function from a package is also easy
math.sqrt(2) # Gives the square root of 2

In [None]:
# Sometimes, you don't want to import the full package
from math import sqrt
sqrt(2) # Now we can call sqrt directly, without prepending math.

In [None]:
# You can also use a shorthand when importing a package
import math as m
m.sqrt(2)

Looking at the above cell, we observe different ways of importing packages and their contents into our code.


|Method|Effect|
|----|----|
|`import package`|will import the contents of the package such that we can call `package.myfunction()`|
|`import package as pk`|will import the contents of the package under a chosen **alias**, we can now use `pk.myfunction()`|
|`from package import myfunction`|only imports the `myfunction` object from the package, all other objects will not be imported, we can now call `myfunction()` without prepending the package name or alias|
|`from package import *`|imports **all** the package objects. <span style="color:#fc0303">This is dangerous and should be avoided as much as possible</span>. As above, we can now use `myfunction()` without any prefix|

For this course, we will be working mainly with 4 packages:

+ [NumPy](https://numpy.org/): The numerical computing package for Python. This introduces arrays in the sense of vectors in $\mathbb{R}^N$, allows us to do linear algebra, fast vector computations, etc. When doing data science in Python, it's good to think of everything as a NumPy array. At first it might seem like we are not using this package as much as the other but this is wrong! This package provides the necessary foundation for all of the three other packages!
+ [pandas](https://pandas.pydata.org/): The Python data analysis library. This package provides tool for working with *tabular data* such as dataframes (you might know them from R). This package is our bread-and-butter for data cleaning and pre-processing, all analyis we will do starts by reading our data into a pandas dataframe.
+ [Matplotlib](https://matplotlib.org/): The visualization package for Python. This package provides tools to build many different plots, either static or interactive. We will use it a lot to illustrate our data and results. As the old adage goes: *a picture is worth a thousand words*. Being able to create beautiful visualizations is an important competence of a data scientist!
+ [scikit-learn](https://scikit-learn.org/stable/): The scientific machine learning package for Python. From linear regression to gradient boosted trees and neural networks... you name it, scikit-learn has it! Not only does this package provide the statistical models, it also provides many functionalities to help us tune and improve our models.

All those packages are, without doubt, state-of-the-art when it comes to doing data science and machine learning in Python. It should be mentioned, however, that when it comes to deep learning, scikit-learn is often overshadowed by [PyTorch](https://pytorch.org/), [Tensorflow](https://www.tensorflow.org/), or [JAX](https://jax.readthedocs.io/en/latest/). For our introduction to neural networks, scikit-learn will be enough, however, it would be dishonest to claim that modern deep learning is being done with scikit-learn. 


## Further resources
___
Programming is a skill that is easy to learn but hard to master. It can be difficult to compress multiple concepts in a short notebook and thus many important concepts such as *objects*, *scope of variables*, and *(im)mutability* were ignored completely throughout this introduction. 

If you understand everything we did in this notebook, you will surely be able to follow what we do in the main course, however, this is only the tip of the programming iceberg. It is definitely worthwhile to check out more complete resources that go over  concepts in more detail. 

A good starting point are the resources listed on the official Python website:
+ [Beginner's Guide](https://wiki.python.org/moin/BeginnersGuide)
+ [Python for Programmers](https://wiki.python.org/moin/BeginnersGuide/Programmers)

If you are more of hands-on person and enjoy solving programming and mathematical puzzles, you might want to have a look at websites like
+ [Project Euler](https://projecteuler.net/)
+ [HackerRank](https://www.hackerrank.com/)