# WetSuite NLP crash course
# Part 1: Getting started

Welcome to the WetSuite NLP crash course! This small crash course aims to get anyone started with deploying NLP-based tools to research governmental documents.

Before you get started, make sure to read the [full introductory README of this crash course](https://github.com/WetSuiteLeiden/example-notebooks/tree/main/wetsuite-nlp-crash-course).

# Purpose of this notebook

In this notebook, we explain how to interact and run a Python Jupyter notebook such as this. Once everything is up and running, we make some first steps in the Python programming language.

# Step 1: What is a notebook?
A Python Notebook, such as this file, is an interactive programming environment. It consists of blocks of text (such as this) and blocks of runnable Python code. When you run the notebook on your computer (or online in a service such as Google Colab), you can change the code and run it again, to see how the output changes. The output of each code block is shown below the code block.

# Step 2: How do I run a Python Jupyter notebook?

There are generally two different approaches to run a notebook, with each their own ups and downs. The first approach is to run it locally on your own computer. The second approach is to run it online via a cloud computing provider, such as Google Colab.

When running notebooks locally on your own PC, you will have to install all the required tools which can be a hassle. You will also have to store downloaded libraries, datasets and results locally on your PC. That can be both useful, because you can more easily interact with your dataset and results, but also a burden, since you might not have the required storage space available on your PC. The speed at which the code will run is also dependent on the processing power of your PC, which can be quite a bit faster than Google Colab if you have a powerful PC available, but also slower if you do not have such hardware available.

The primary reason to use a cloud service such as Google Colab is that no local installation is required: everything runs on Google's servers. When you're just starting with programming (for example with this crash course), this can really help lower the barrier to entry! However, Colab might be slower than your own PC and requires a Google account. Furthermore, for some research it might not be acceptable to store your data and/or results on Google's servers.

Below are the instructions to get started with either method. Fear not however, if you change your mind later, you can always come back and choose for the other path!

## Step 2a: Running notebooks in Google Colab (recommended for beginners)

To run a notebook in Google Colab, each part in this series has a Google Colab button:

<a href="https://colab.research.google.com/github/WetSuiteLeiden/example-notebooks/blob/main/wetsuite-nlp-crash-course/1-getting-started.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Google Colab"/></a>

This is essentially a link to Google Colab, which then creates a copy of this notebook in your Google account. Here you can do whatever you want with the notebook!

## Step 2b: Running notebooks locally on your computer

(TODO: Should be worked out & tested further)

### Windows
- First we need to install Python. Go to [Python releases for Windows](https://www.python.org/downloads/windows/) and download the latest release for your PC. Generally, you will need the "Windows installer (64-bit)".
- Run the Python installer you just downloaded.
- Now that we have installed Python, we need a practical environment to develop our programs: an integrated development environment (IDE). Microsoft's Visual Studio Code is a widely used open-source IDE. [Download Visual Studio Code](https://code.visualstudio.com/Download) and run the installer.
- Run Visual Studio Code and [install the VS Code Python extension](https://code.visualstudio.com/docs/python/python-tutorial).
- Download the [WetSuiteLeiden/example-notebooks](https://github.com/WetSuiteLeiden/example-notebooks/tree/main) repository from GitHub (click the green "Code" button and then click "Download ZIP")
- Extract the downloaded ZIP-file at a location on your PC of your choice.
- Open the example-notebooks folder you just downloaded and extracted in Visual Studio Code.
- Create a virtual environment (for instructions, see [this tutorial](https://code.visualstudio.com/docs/python/python-tutorial#_create-a-virtual-environment))
- Open the notebook you want to run.
- Press run. When Visual Studio Code asks to select a kernel, select the virtual environment you've just created.
Done!

### Mac
TODO

### Linux
_The instructions for Linux distributions are a bit more concise as more technical background is expected if you already run a Linux-based OS._
- Ensure a recent version of Python is installed
- Optionally, install an IDE of your choice.
- Create a Python virtual environment
- Install the `notebook` package from PyPI using `pip`.
- Run Jupyter and open the notebook in your browser.

# Step 3: Getting to know Python
Now that you have the notebook up and running, we can write and run our first python code.

In our notebook, we can execute each snippet of code seperately. But make sure that you've run all previous code blocks at least once before you run the next, otherwise some things might not work. If you change any of the code, be sure to re-run all following code blocks. If you get stuck, you can always try to clear all the outputs, restart and run all code blocks again.

A classic first program is just one that prints out "Hello, world!". Here we do just that:

In [None]:
# This is a comment: just a piece of text for the human that reads this source code.
# Python will ignore it!

# We call the print function with the text we want to print as an argument.
print("Hello, world!")

That was easy!

... but not that useful. Software is all ultimately about the processing of information. To facilitate this effectively, different types of representation of information exist in Python. The essentials are:

In [None]:
# In Python, any value has a type, for example: an integer
a = 5
print(a, type(a))

# Floating point number
b = 3.14159
print(b, type(b))

# A piece of text
c = "Hello, WetSuite!"
print(c, type(c))

# A boolean: either true or False
boolf = False
boolt = True
print(boolf, type(boolf))
print(boolt, type(boolt))

# A list: an ordered collection of items that may change in length.
d = [1, 2, 3, 4, 5]
print(d, type(d))
# Accessing an item in a list can be done through the index.
# Note that the first element has 0 as index. Beware of off-by-one errors!
print(d[1], type(d[1]))

# Lists can contain any type, such as other lists
e = [[], [1], [1,2]]

# Besides lists, we also have dictionaries: unordered collections which associate
# a specific key with a value (which can be any Python value). Values don't
# have to be all of the same type, but think about it if you need that.
f = {
    "key1": "Hi there!",
    "key2": "Another item",
    "num": 543
}
print(f, type(f))
print(f["key1"], type(f["key1"]))
print(f["num"], type(f["num"]))

# Tuples are an ordered collection of fixed length, possibly with different data types of its members:
t1 = (0, "Hi")
t2 = (1, "Hello")
print(t1, type(t1), "\t", t1[0], type(t1[0]), "\t", t1[1], type(t1[1]))

Generally, a program will not be just a linear list of statements assigning values to variables. We want to do some things in some specific cases, or iterate over all items in a list or dictionary. This is called _[control flow](https://en.wikipedia.org/wiki/Control_flow)_. In Python, we have [_while_ loops](https://docs.python.org/3/tutorial/introduction.html#first-steps-towards-programming), [_for_ loops](https://docs.python.org/3/tutorial/controlflow.html#for-statements) and [_if/elif/else_ statements](https://docs.python.org/3/tutorial/controlflow.html#if-statements) as basics ways to control the flow of your program. Here are some simple examples:

In [None]:
# While loops: as long as the condition after _while_ is True, the "body" of the loop is repeated.
# So make sure it will end at some point!
i = 1
while i < 5:
    print(f"While iteration #{i}") # We're printing a Python f-string, which is a more practical way to format strings, see: https://fstring.help/
    i += 1 # This is a shorthand for 'n = n + 1'

# A for loop will execute the body of the loop for each item in some iterable collection such as lists, dictionaries and even tuples!
l = [0, 1, 2, 3, 4, 5]
for n in l:
    print(n)

    # However, do note that if you alter the n variable, the underlying list will not be changed.
    n += 1
    print(n)

print(l)

# In an if/elif/else statement, you can perform certain actions once if a condition holds.
# elif means "else if", and both an elif or else clause can be omitted:

if 1 == 1:
    print("Yep!")

# Each if/elif/else-statement starts with an if, so this means we start with a new if/elif/else-statement.
if 1 == 2:
    print("Hrm")
elif 1 == 3:
    print("Hrm?!")
else:
    print("None were true")

Another important building block of programs are _functions_. Functions are named routines of code that process a given input and return some output. Functions allow you to re-use code developed by others (by calling functions that are part of Python, or by importing external libraries), or to re-use your own code. It also helps you structure your program in a sensible manner.

In [None]:
# Functions are defined by the def keyword:

# In this example, the types of the parameters (a, b) and the type of the returned result
# are explicitly noted using typing annotations. This is however not strictly required,
# and Python will not verify that you give correctly typed inputs when calling the function.
def my_first_function(n: int, text: str) -> list[str]:
    """My first function returns a list with n times the text."""
    # The above line is also a comment, but can be multiple lines
    # long

    i = 0
    res = []
    while i < n:
        res.append(text)
        i = i + 1

    return res

print(my_first_function(0, "text"))
print(my_first_function(3, "text"))
print(my_first_function(5, "some text"))

This was a very short introduction to Python: there are far more features in the language, that we will definitely see in the rest of the course. If you want to have a more thorough introduction of Python before you continue on, check out [the official Python Tutorial](https://docs.python.org/3/tutorial/index.html), or an online introductory course on Python such as [Codecademy's "Learn Python 3" course](https://www.codecademy.com/learn/learn-python-3) or [Kaggle's "Learn Python" course](https://www.kaggle.com/learn/python).

# Done! [Click here to go to part 2](https://github.com/WetSuiteLeiden/example-notebooks/blob/main/wetsuite-nlp-crash-course/2-introduction-to-datasets.ipynb)