<a href="https://colab.research.google.com/github/bkellenb/bios0032/blob/main/practicals/00_primer/00b_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# B: Introduction to Python

Now that you know how to use [Google Colab](https://colab.research.google.com/) and
[Jupyter](https://jupyter.org/) notebooks, let us get started with Python!


## Objectives

The aim of this notebook is to introduce you to some of the most important syntax of Python.
Hopefully, it will also convince you that Python is actually a really easy language.😄



## Contents

1. [Basic data types](#1-basic-data-types)
2. [Lists](#2-lists)
3. [Dictionaries](#3-dictionaries)
4. [If-Elif-Else](#4-if-elif-else-statements)
5. [For-loops](#5-for-loops)
6. [Functions](#6-functions)
7. [Try-Except](#7-try-except-statements)
8. [Files](#8-operations-on-files)
9. [Extras](#9-extra-tips-and-tricks)
10. [Conclusion and Further Reading](#10-conclusion-and-further-reading)



## Notes

- If a line starts with the fountain pen symbol (🖌️), it asks you to implement a code part or
answer a question.
- Lines starting with the light bulb symbol (💡) provide important information or tips and tricks.

----

## 1. Basic data types

Like many other programming languages, Python uses a number of data types:

In [None]:
# data types
x = 2                   # integer
y = 3.532               # float
s = 'Hello'             # string
b = True                # boolean
n = None                # special None value (NULL in R)

# basic operations
print(x + y)            # addition
print(x - y)            # subtraction
print(x * y)            # multiplication
print(x / y)            # division
print(y % x)            # modulo

print(s + ' world!')    # string concatenation
print(b and False)      # logical AND
print(b or False)       # logical OR

💡 The hash symbol `#` starts a comment.

In [None]:
# shorthands for operations with assignment
x += 3                  # equivalent to x = x + 3; same is available for other operators

## 2. Lists

A list in Python is a mutable, ordered sequence of values.

In [None]:
my_list = ['b', 'a', 'd', 'f', 'e', 'c']

print(len(my_list))     # length (number of elements)

# accessing elements of a list
print(my_list[0])       # index starts at zero!
print(my_list[0:3])     # from...to (excluding last)
print(my_list[-1])      # last element
print(my_list[1:4:2])   # from..to, every 2nd

💡 **Indices in Python start at zero**. Never forget!

Since lists are mutable, we can easily add, remove and reorder elements, concatenate lists, _etc._:

In [None]:
my_list.append('g')         # append element at the end
print(my_list)

my_list.insert(1, 'ab')     # insert element at index (again: they start at zero)
print(my_list)

my_list.remove('b')         # remove element
print(my_list)

my_list.sort()              # sort in ascending order
print(my_list)

Built-in functions:

In [None]:
print(len(my_list))         # length of list, str, etc.; anything with more than one value

print(sum([1, 2, 3]))
print(max([4, 5, 6]))

## 3. Dictionaries

A dictionary (`dict`) contains key-value pairs, giving you instant access to a value if you know its key.

In [None]:
my_dict = {
    'Apple': 1,
    'Banana': 2,
    'Cabbage': 3
}

print(my_dict)
print(len(my_dict))         # also dicts have a length

# check if dict contains an element
print('Apple' in my_dict)
print('Pear' in my_dict)

# accessing elements of a dict
print(my_dict['Apple'])     # accessing value stored under key 'Apple'

# adding a new element
my_dict['Daikon'] = 4       # adding new value 4 under key 'Daikon'
print(my_dict['Daikon'])

# removing elements
del my_dict['Daikon']       # deleting entry with key 'Daikon'
print(my_dict)

# replacing elements
my_dict['Apple'] = -5       # if a key is already present, its value will be replaced with a new one
print(my_dict)

## 4. If-Elif-Else statements

In [None]:
if 'Daikon' in my_dict:
    print('Cabbage salad tonight!')

elif len(my_dict) > 3:
    print('More than three items in the pantry.')

else:
    my_dict = {}
    print('Just threw out everything from the kitchen.')

**Note:** Python does not use curly braces (`{}`) to denote nested code blocks, but _indents_ (white
spaces). They have to be aligned properly.

💡 Tip: always use one tab per indent, this will automatically be converted to four white spaces.

## 5. For loops

In [None]:
for key in my_dict:
    print(key + ':  ' + str(my_dict[key]))

## 6. Functions

Functions can be defined with special keyword `def`...

In [None]:
def select_positives(number_list):
    selected_numbers = []
    for number in number_list:
        if number >= 0:
            selected_numbers.append(number)
    return selected_numbers

...and called as follows:

In [None]:
my_list = list(range(-50, 50, 5))        # one way to define a sequence/range of numbers

print('Original list:')
print(my_list)


# call function, store return argument into a new variable
my_list_filtered = select_positives(my_list)

print('Filtered list:')
print(my_list_filtered)

You can also:
- Specify multiple input arguments
- Give default values to them
- Specify multiple output arguments

for example:

In [None]:
def split_train_test(data,
                     percentage_train=60,
                     percentage_val=30,
                     stratify_by_species=True):
    '''
        Receives an input dataframe and splits it into training, validation, and test sets,
        optionally stratified by species.
    '''
    num_train = int(percentage_train/100 * len(data))
    num_val = int(percentage_val/100 * len(data))
    num_test = len(data) - num_train - num_val
    data_train = data[0:num_train]
    data_val = data[num_train:num_train+num_val]
    data_test = data[-num_test:-1]
    return data_train, data_val, data_test


# to call the function and store all of its return values
d_train, d_val, d_test = split_train_test(my_list,
                                          percentage_train=70)

* In the function above, all inputs except for `data` have a default value. This means that `data`
must be provided when calling the function, but all the others are optional. In the example, a
custom value for `percentage_train` is provided (`70`), but not for `percentage_val` nor
`stratify_by_species`; for those two cases, the defaults (`60` and `True`) will be used.
* As you can see, you can name function input arguments explicitly (`percentage_train=70`) or just provide them in order they are specified in the function definition (`data`).
* The function returns three outputs. Accordingly, we assign the outputs to three new variables
  (`d_train`, `d_val`, `d_test`).

💡 The three quotes (`'''...'''`) denote a multi-line string in Python. Placed right below a function
definition, these can be used to provide documentation about what the function does, what its inputs
are, and what it returns.

## 7. Try-Except statements

A try-except statement _tries_ to run a given code block. If any of the commands in that block causes an error (_e.g._, division by zero, access of the 5th element in a list of length 4), it raises an _exception_:

In [None]:
try:
    # will try to execute this code block, line-by-line
    index = 0
    print(my_list[index])

except Exception as e:
    # if and only if any line above doesn't work (it raises an exception), this block is executed
    print('An error occurred:')
    print(e)

🖌️ The code block above runs just fine. See what happens if you set `index = 100` in the "try"
block.

## 8. Operations on files

### 8.1 Creating file paths

In [None]:
import os           # os: built-in package with useful operating system-level functionality

file_path = os.path.join('data', 'example_files', 'text_file.txt')  # combine paths
print(file_path)

parent_folder, file_name = os.path.split(file_path)     # split into parent folder & file name

os.makedirs(parent_folder, exist_ok=True)               # create folder, don't raise an exception if it already exists

### 8.2 Reading and writing files

Basic files can be read and written in both text and binary mode.

In [None]:
# write a bunch of strings to a text file
with open(file_path, 'w') as file_handle:
    for item in my_list:
        file_handle.write(str(item) + '\n')     # write each item per line


# read file contents
with open(file_path, 'r') as file_handle:
    # read as a single string
    text = file_handle.read()
    print(text)

with open(file_path, 'r') as file_handle:
    # read as a list of strings, one per line
    text_list = file_handle.readlines()
    print(len(text_list))
    for line in text_list:
        print(line.strip())         # .strip() removes leading/trailing white spaces and \r, \n, etc.

Read/write modes (complete list [here](https://www.manpagez.com/man/3/fopen/)):
- `r`: read text file
- `w`: write to text file
- `a`: append to text file
- `rb`: read binary text file
- `wb`: write binary text file

💡 The `with` statement is useful to define a block on an item (_e.g._, a file handle) that needs to be closed upon completion. In the examples above, any code written under the `with` block has access to `file_handle`, but not code afterwards. This makes sure that `file_handle` automatically gets closed once reading/writing files has been completed.

## 9. Extra tips and tricks

### 9.1 F-strings

Python provides a really handy way to elegantly use formatted strings (f-strings). Anything inside
curly braces (`{...}`) will be executed as Python code. This can be variables, code statements,
function calls, _etc._.

In [None]:
print(f'The length of my_list is {len(my_list)}.')

You can even do things like formatting (rounding) a floating point number to _n_ decimal points:

In [None]:
pi = 3.141592653

print(f'The value of pi is {pi}.\nRounded to three decimal points, it is {pi:.3f}.')

To use quotes within a string, remember that Python accepts both single (`'`) and double (`"`) variants:

In [None]:
print(f'The value for key "Apple" in the dict is {my_dict["Apple"]}.')

💡 You can also "escape" quotes with a backslash (`\`). This includes special instructions like `\n` (print new line), `\t` (tab), `\\` (print backslash) etc.

### 9.2 Other sequence types

**Tuple**

A tuple is an _unmodifiable_ list: you cannot add, remove, or reorder elements.

In [None]:
my_tuple = ('cat', 'dog', 'parrot')     # define tuples with round braces

print(len(my_tuple))                    # all static functions are the same as for list,
print(my_tuple[-1])                     # but there are no functions like ".append", etc.

my_tuple = ('zebra',)                   # make sure to add a trailing comma for single-valued tuples

my_tuple = tuple(my_list)               # you can also convert a list to a tuple and vice versa,
                                        # using functions tuple(...) and list(...)

💡 If you have multiple return outputs in a function (see above) and only assign them to a single
variable, that variable will be a tuple holding all of the individual outputs.

**Set**

A set is an _unordered_ collection of _unique_ values (every value appears at most once).

In [None]:
my_set = {'cat', 'cat', 'dog', 'parrot'}        # use curly braces to define a set
print(my_set)                                   # did you notice how "cat" only appears once?

my_new_set = set(['dog', 'zebra'])              # convert from list, tuple, etc.

print(my_set.intersection(my_new_set))          # intersection: values that are present in both sets
print(my_set.union(my_new_set))                 # union: values present in either or both sets
print(my_set.difference(my_new_set))            # difference: values present in my_set but not my_new_set

### 9.3 Miscellaneous tips

**Enumerate**

This is useful to get the value and an index when iterating over a list, tuple, _etc._.

In [None]:
for index, value in enumerate(my_tuple):
    print(f'[index {index}] value: {value}')

**Zip**

The `zip(...)` command allows iterating through multiple lists/tuples/_etc._ element-wise.

In [None]:
my_tuple = ('First', 'Second', 'Third')
my_list = ['Apple', 'Banana', 'Mango']

for first, second in zip(my_tuple, my_list):
    print(f'{first}: {second}')

This also allows you to create dicts very quickly from lists of keys and values:

In [None]:
my_dict = dict(zip(my_tuple, my_list))

print(my_dict)

## 10. Conclusion and further reading

That's all we need to know for now about Python:
- data types: numbers (int, float), str, bool
- list, dict, (tuple, set)
- if / elif / else statements
- for loops
- functions
- try-except statements
- reading and writing files

If you wish to dive deeper, you may want to check out the official [beginner's guide to Python](https://wiki.python.org/moin/BeginnersGuide).


In the next notebook, we will be looking into three of the most useful libraries:
- Numpy (for matrix operations)
- Pandas (tabular data I/O and operations)
- Matplotlib (plotting)