# Introduction to the course practicals

**Read the following sections carefully before you start working on the exercises in the other notebooks.** 

The first part of this notebook provides an explanation of the basic steps for installing your Python distribution, configuring your Python environment, and using basic Python commands. It will also give you important information on how to debug your code in case something does not work as expected. The second part describes the general setup of all notebooks, teaches you how to work with Jupyter and how to quickly and efficiently solve problems while doing the exercises.

NB: If you want to work on this notebook with your group simultaneously, i.e. more people working from different laptops on the same jupyter notebook, you can try out Google Colab by clicking this button:

<a target="_blank" href="https://colab.research.google.com/github/Computational-Biology-TUe/8BB020_Intro-machine-learning/blob/main/practicals/part0_intro.ipynb">
  <img src="assets/colab-badge.svg" alt="Open In Colab"/>
</a>

<div id='prog_skills'></div>

## Setting up the Python environment

<div id='py_install'></div>

In this course, we will be working with Anaconda (a Python distribution). The following instructions give an overview of essential steps prior to using Jupyter notebooks on Windows. 

### Python installation and configuration

Here is how to install your Python distribution platform:

1. [Download](https://www.anaconda.com/products/individual) and install Anaconda (it automatically comes with the latest Python version).

2. Follow the instructions in the dialog window. Make sure to check the box **Add Anaconda to my PATH environment variable** in order to be able to use Jupyter notebooks. 

3. Installation will follow.

4. To check whether the path to Anaconda has been added to your environment variables, go to *Edit the system environment variables* in the start menu, and click the *Environment variables* button in the dialogue window.


<div id='py_terminal'></div>

### Using Python terminal, setting up a Python environment

For the course exercises, you will need to install additional Python packages that are not included in the basic Anaconda Python distribution. It is recommended to install these packages in a dedicated Python environment. A Conda environment is a directory in which you can install files and packages such that their dependencies will not interact with other environments, which is very useful if you develop code for different courses or research projects. These packages can either be installed using a conda .yml file or manually using the `conda` and/or `pip` package managers. To run the complete development environment for this course, you need to install six additional Python packages: `jupyter, matplotlib, numpy, scikit-learn, scipy, pandas, networkx, pytorch`. 

1. Open the Anaconda terminal from the Start menu on Windows.
2. Create a `conda` environment: In (Anaconda) command prompt, write `conda create --name myenv` (to create an environment with a specific Python version, specify the version at the end of this command line `python=3.8`; and to add specific packages to the environment, specify them afterwards in the same command line, e.g. `conda create -n myenv python=3.8 scipy=0.15.0 numpy`). Check the [requirements file](https://github.com/tueimage/8dc00-mia/blob/master/requirements.txt) for the package versions you need to install.
    
Here is an example you can follow to first create an environment that you will use in this course, then activate it and finally install the required packages.

````bash
conda create --name 8bb020
conda activate 8bb020
conda install matplotlib jupyter numpy scikit-learn scipy pandas networkx pytorch
````
The default destination folder for your newly created Python environment will be in `C:\<path-to-anaconda>\envs\myenv`. **Note!** You have to activate the `8bb020` environment every time you start working on the assignments with `conda activate 8bb020`.


<div id='basic_imp_eng_math'></div>

### Prerequisite knowledge and recommended resources

You should already have sufficient prerequisite knowledge of Python programming from previous courses, including familiarity with scientific computing using NumPy and data visualization with Matplotlib. In this course, you will build upon and expand those skills. Therefore, it’s highly recommended to refresh your knowledge by reviewing earlier course materials.

Additionally, we **strongly recommend** that you explore the following resources we've prepared:

- [Python essentials](https://github.com/tueimage/essential-skills/blob/master/python-essentials.md)
- [Numerical and scientific computing in Python](https://github.com/tueimage/essential-skills/blob/master/scientific-computing.md)
- (optional) [Python tips](https://github.com/tueimage/essential-skills/blob/master/python-hints.md) 

<div id='jn_error_search'></div>

### How to efficiently search for solutions to Python coding errors

Code bugs, glitches and unexpected behavior occur frequently whenever you develop code snippets, test your implementation or integrate your solution into someone else's code. What is usually time-consuming and demotivating for students, is searching for solutions to Python errors that may show an utterly confusing explanation on the screen. You copy the error text, open your browser, paste it in a search engine, and a long list of sometimes completely unrelated solutions is thrown in front of you. Yes, this can be very frustrating. 

The good news is that errors in Python have a very specific form, called a *traceback*. Though intimidating at times, tracebacks inform you broadly about what went wrong in your program, including indication of the line of code where the error occurred and what type of error it was. Tracebacks may have multiple levels (reaching up to 20 levels deep!), which results in long error messages. Note however, that the length of these error statements does not reflect the severity of the problem as the messages contain all functions that were called upon before the error was encountered. You will typically find the error at the bottom of the traceback messages. Most commonly seen tracebacks include:

- `SyntaxError` (describes a "language" issue related to the syntax of the program)
- `IndentationError` (is related to how your code is indented)
- `NameError` (shows up when a variable definition is missing, does not exist, or its name is misspelled)
- `IndexError` (Python indexing starts at $0$; this error occurs when you try to wrongly access list or array elements)
- `FileNotFoundError` (occurs when the file you aim to read is not found in the given destination on your disk)
- `IOError` (appears when you are trying to read a file that is open for writing or vice versa)

You may find examples of traceback errors on this [educational website on errors and exceptions in Python](https://swcarpentry.github.io/python-novice-inflammation/09-errors/index.html).

Sometimes you are referred to the documentation pages of a certain library, where it is clearly described how to use a function, and how to fill in its mandatory input parameters. Check for example the [numpy documentation](https://numpy.org/doc/) to understand the structure of documenting Python libraries. Apart from documentation resources, a comprehensive repository of various hacks, solutions, workarounds and tips for programmers can be found on [Stack Overflow](https://stackoverflow.com/), where enthusiastic programmers post solutions to miscellaneous problems and glitches found in codes of users from all around the globe. If you still cannot find your solution, post your question on Stack Overflow, and hopefully an answer will be available for you soon. And as of very recently, large language models (LLMs) such as ChatGPT and Microsoft Copilot are being extensively used for coding and debugging purposes. Debugging code and helping understand traceback messages is of the most relevant abilities of these models.

Although Google and ChatGPT are friendly debugging and coding assistants (and some programmers have learnt a programming language on a simple trial-error basis), prevention in programming is key to obtaining a functional code. General advice is to program defensively, i.e. assume errors will arise and write test code first to detect problems in an early stage. Small tests with pre- and postconditions will help you determine what the code is supposed to eventually do.

<div id='jn_workflow'></div>

## Jupyter notebook workflow

<div id='jn_general'></div>

### General information about notebooks

#### Getting started with Jupyter

We recommend using *Jupyter Notebook* to follow the exercises and run the example code. The use of Jupyter Notebooks are covered in the [Python essentials](https://github.com/tueimage/essential-skills/blob/master/python-essentials.md) module. An alternative is [Jupyter Lab](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html) which has a bit more advanced functionality that some might find useful. It is best if you change the directory to the directory containing the code before starting Jupyter Notebook. Similarly, you can start the integrated development environment *Spyder* by typing `spyder` in the Anaconda Prompt.

To open a Jupyter notebook editor, you have several options:

1. Open Anaconda Navigator (may take some time to open), and launch Jupyter.
2. Open a Windows command prompt / Windows Powershell, change to the Anaconda environment that you set up for the course and type `jupyter notebook` (note the space in between `jupyter` and `notebook`); this way will only work if you have added Anaconda to your Path.


#### Code and data repository structure

To get started, you have to save the course's GitHub repository to your local machine (either as a ZIP archive or by running the command `git clone <link_to_repository>`), say into a folder named `8BB020` on your machine. Once downloaded or cloned, you will see the following folder and file structure:

```bash
8BB020
.
|____code
| |____tests.py
| |____demos.py
| |____figures.py
|____data
|____notebooks
| |____0 Introduction.ipynb
| |____1 Linear models.ipynb
| |____2 Deep learning.ipynb
|____README.md
```
The code for this course is organized in Python modules (e.g., `tests.py`, `demos.py`, `figures.py`) stored in the `code` folder. These modules contain Python functions written and provided by the instructors, offering various functionalities that simplify tasks and are used within the notebooks. For example, the testing functions (e.g., `tests.py`) can be used to validate the code you develop in the exercises. These tests are  pre-integrated into the notebooks, although they will not give correct results until the code for the specified assignment is completed. Similarly,  `demos.py` contains the code for the interactive widgets in the notebooks and `figures.py` contains code for generating some of the displayed figures. 

As a student, you only need to focus on the Jupyter notebooks in the `notebooks` folder, which contain all the exercise and project instructions. These notebooks are structured with a narrative, interspersed with references to theory, code snippets, example figures and demo widgets. **You’ll only interact and code within the notebooks themselves**, i.e. changing code in the modules under `code` is not needed. 

In addition to this introductory notebook, we have two more: one on Linear models and one on Deep learning, covering roughly the first and second part of the course. Each notebook contains four exercises and ends with the project assignment. During the practical sessions you can read throughout the text of the notebooks, go over the examples and work on the exercises. **The exercises are NOT graded** but doing them correctly will give you the prerequisite code and knowledge to complete the project work at the end of the notebook. **The project work IS graded**.

Finally, the `data` folder contains all of the data necessary to complete the exercises and projects. 

#### Notation

The following notation is used the notebooks. Vectors and matrices are represented by a bold typeface, matrices with uppercase and vectors with lowercase letters, e.g. the matrix $\mathbf{X}$, the vector $\mathbf{w}$ etc. Compare this with the notation for scalars: $X$, $w$. In-line Python function (i.e. definition) names, commands, files and variables are represented in a highlighted monospace font, e.g. `X`, `w`, `imshow(I)`, `some_python_definition()`, `some_file.py` etc.

### User interface and useful commands in Jupyter notebooks

Jupyter notebooks is an interactive computing environment. There is a comprehensive documentation describing the [Notebook Basics](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html), where you can learn about what happens when you first start the Jupyter notebook server and the dashboard appears in front of you. When working with the notebooks in this course, you will see the [User Interface](https://jupyter-notebook.readthedocs.io/en/stable/ui_components.html) which allows you to run code, work on exercises and answer questions interactively. Instead of using the UI buttons and interactive tools, you may prefer to use keyboard commands optimized for efficient work with the notebooks. Here are a couple of most useful commands in this course:

- Basic navigation: `Enter` (enter edit mode), `Esc` (enter command mode), `Shift-Enter` (confirm editing)  
- Saving notebooks: `s` (save)
- Change cell types: `m` (markdown), `y` (code)
- Cell creation: `a` (add cell above), `b` (add cell below)
- Cell editing: `c` (copy cell), `v` (paste cell), `d, d` (delete cell), `z` (undo deletion), `x` (cut cell)

**Note!** It may well be that you cannot view Jupyter notebooks on the GitHub webpage correctly. Therefore, it is essential that you clone the GitHub repository to your local folder (free to choose by yourself), and work locally.

<div id='jn_debugging'></div>

### (Optional) Debugging and editing your Python code directly in Jupyter notebook

Editing code directly in Jupyter, which offers no linking, auto-complete or other comforts of a decent editor, might sometimes be difficult. There are several open-source Integrated Development Environments (IDEs) enabling fast and efficient software development, code editing and debugging. Examples of these tools are [PyCharm](https://www.jetbrains.com/pycharm/), [MS Visual Studio Code](https://code.visualstudio.com/?wt.mc_id=DX_841432) or [Sublime Text](https://www.sublimetext.com/), to name some. On the bright side, such code editors offer miscellaneous utilities, such as auto-complete, suggestions for code enhancements, automatic installation of missing Python libraries, etc. While all these features make it much easier to develop your functionalities, setting up an IDE might be cumbersome, especially if you have never worked with any code editing software before. Eventually, these IDEs yield larger benefits when working on extensive projects that entail much more code writing, integration, and testing compared with what is necessary in this course. 

While working on your notebooks, unexpected events may occur. If so, the first aid for you may be the documentation page [What to do when things go wrong](https://jupyter-notebook.readthedocs.io/en/stable/troubleshooting.html) describing how to proceed when Jupyter fails to start, your kernel cannot be launched, a notebook does not load or does not work in a browser.

Therefore, it is essential you learn how to debug and edit your Python code directly in Jupyter notebooks (in a web browser). You can do so by making use of the so-called **magic commands**. Magic commands are IPython kernel enhancements of the normal Python code, dedicated to problem solving. An extensive list of magic commands with examples of their use can be found on the website called [28 Jupyter Notebook Tips and Tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/). Below, we will mention some of those magic commands which you will see in the Jupyter notebooks of this course.  

#### Cell execution history
As long as your Python kernel is active, there is an input history logging the code execution of each cell. This comes in handy when you have accidentally deleted a cell. 

#### Autoreload
The notebook typically needs to be restarted whenever you edit the code of an already imported module or package. To avoid making it tedious, we use the following two magic commands:

```
%load_ext autoreload
%autoreload 2
```

#### %debug and the IPython debugger
For debugging, you can use the `%debug` command. Whenever you encounter an error or exception, just open a new notebook cell, type %debug and run the cell. Then, a command line will be opened, where you can perform code testing and inspect all variables up to the line which triggered the error. Type `n` and hit `Enter` to run the next line of code (The `→` arrow shows you the current position). Use `c` to continue until the next breakpoint. `q` quits the debugger and code execution.

<center width="100%"><img src="assets/debugging_1.png" width="400"></center>

Another option is to make use of the IPython debugger library. Import the library as set_trace (```from IPython.core.debugger import set_trace```) and use the ```set_trace()``` in any code cell of your notebook to create one or more breakpoints. Executed cell will stop evaluating code at the first breakpoint and open a command line for detailed inspection. In case any of your imported modules or functions do not work, you may also deploy the debugger there.

<center width="100%"><img src="assets/debugging_2.png" width="400"></center>

#### JupyterLab extensions for debugging
The Jupyter project is under constant development and a plethora of extensions for the user interface including more notebooks viewers have been available as [JupyterLab extensions](https://jupyterlab.readthedocs.io/en/stable/user/extensions.html). Among the various tools JupyterLab offers, advanced debugging functionalities may come in handy. Nevertheless, these additional Jupyter API enhancers are absolutely not mandatory to install for the purpose of our course.

## Warmup exercises

Let's now do some coding exercises to make sure that your Python setup works correctly, but also to refresh your coding skills.

If you want to work on the exercises together in your group but from separate computers, you can try Google Colab for collaborating on the jupyter notebook:

*Still need to include link to final notebooks* (https://openincolab.com/)



## Exercise 1

Solve the following well-determined system of linear equations using NumPy.

$
\begin{align}
2x_1 + 3x_2 - x_3 + 4x_4 &= 10 \\
-x_1 + 5x_2 + 2x_3 - x_4 &= 7 \\
3x_1 - 2x_2 + 4x_3 + x_4 &= 15 \\
4x_1 + x_2 - 2x_3 + 3x_4 &= 6
\end{align}
$

***Hint***: You can solve this system of linear equations by representing it in matrix form and calling `numpy.linalg.solve()`. You can find the documentation for it [here](https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html).


In [None]:
import numpy as np

# your implementation goes here

Now consider the following system of equations:

$
\begin{align}
2x_1 + 3x_2 - x_3 + 4x_4 &= 10 \\
-x_1 + 5x_2 + 2x_3 - x_4 &= 7 \\
3x_1 - 2x_2 + 4x_3 + x_4 &= 15 \\
4x_1 + x_2 - 2x_3 + 3x_4 &= 6 \\
2x_1 + x_2 + 3x_3 - x_4 &= 8
\end{align}
$

`numpy.linalg.solve()` cannot be used to solve this system. Explain why, find a `np.linalg` function that can solve such a system (you can look in the NumPy documentation) and then implement the solution. 

In [None]:
# your implementation goes here

## Exercise 2

In this exercise we will use the Wisconsin Breast Cancer dataset that can be easily loaded with the widely used Scikit-learn library. The dataset contains 569 instances with 30 numeric, predictive features extracted from digitized images of fine needle aspirates (FNA) of breast masses, including measurements like mean radius, texture, perimeter, area, and more. It can be used to classify tumors as malignant or benign, with 212 malignant and 357 benign cases, and is widely used in machine learning research for developing and testing classification algorithms. A full description can be printed with ```print(breast_cancer.DESCR)```.

In [None]:
from sklearn.datasets import load_breast_cancer

breast_cancer = load_breast_cancer()

# uncomment to print full description
# print(breast_cancer.DESCR)

The code below will create an interactive widget that will enable you to explore the dataset two predictive features at a time. You can select two features at a time and then see the distribution of those features along with their target (benign or malignant). 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact, widgets

X = breast_cancer.data
y = breast_cancer.target

feature_names = breast_cancer.feature_names
class_names = breast_cancer.target_names

def plot_feature_scatter(feature_1, feature_2):
    plt.figure(figsize=(10, 6))
    
    feature_1_idx = np.where(feature_names == feature_1)[0][0]
    feature_2_idx = np.where(feature_names == feature_2)[0][0]
    
    for target_class in np.unique(y):
        plt.scatter(
            X[y == target_class, feature_1_idx],
            X[y == target_class, feature_2_idx],
            alpha=0.5,
            label=class_names[target_class]  # Use class names in the legend
        )
    
    plt.xlabel(feature_1)
    plt.ylabel(feature_2)
    plt.title(f'Scatter Plot: {feature_1} vs {feature_2}')
    plt.legend(title='Diagnosis')
    plt.grid(True)
    plt.show()

feature_selector_1 = widgets.Dropdown(
    options=feature_names,
    description='Feature 1:',
    disabled=False,
)

feature_selector_2 = widgets.Dropdown(
    options=feature_names,
    description='Feature 2:',
    disabled=False,
)

interact(plot_feature_scatter, feature_1=feature_selector_1, feature_2=feature_selector_2);


Find a combination of two features that you think best sepeparates the two classes (benign and malignant). Class separation refers to how distinctly the data points belonging to different classes (benign and malignant) are grouped apart from each other, with minimal overlap, making it easier to distinguish between the two.

Write NumPy code that will compute the mean value for the two features that you have selected, separately for each class (i.e. mean for benign instances and one for malignant instances). 

In [None]:
feature_1 = 'my feature 1'
feature_2 = 'my feature 2'

# your implementation goes here

Now assume that you have a new, unknown sample for which you have the features values but do not know if it benign or malignant. How can you use the per-class mean values that you have computed above to make a prediction? Implement this in NumPy. 

In [None]:
# your implementation goes here

## One last thing

Before we end this introduction, let's make sure that the remaining libraries used in this code are correctly installed and can be imported by running the code below.

In [None]:
libraries = ['pandas', 'networkx', 'torch']

for lib in libraries:
    try:
        __import__(lib)
        print(f"{lib} imported successfully.")
    except ImportError:
        print(f"{lib} is not installed.")