<a href="https://colab.research.google.com/github/mahynski/chemometrics_short_course/blob/main/notebooks/Introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Jupyter Notebook

## The Basics

This is [Jupyter](https://jupyter.org/) notebook.  These notebooks are the *de facto* standard tool for data science, machine learning, and artificial intelligence work these days.  They are simple, easy to use, and very powerful since they blend:

* computations
* outputs
* explanatory text
* mathematics
* images
* media
* *generative AI* for code suggestions

There are 2 types of "cells" in notebooks:
1. Code (defaults to python)
2. Text (some editors call it [Markdown](https://www.markdownguide.org/) because that is the language used to render the text)

You can add a new type of cell from ```Insert > Code cell```, for example; you can also hover over the bottom of cell in Colab.

Cells are executed by pressing `Shift`+`Enter` simultaneously.

In [1]:
# Example code cell
pi = 3.14159
2*pi

6.28318

The cells with text use a language called [Markdown](https://www.markdownguide.org/). Markdown can help you organize your thoughts and work by creating all sorts of nice text and structure.  Here is a [cheat sheet](https://www.markdownguide.org/cheat-sheet/) for easy reference.  Some examples include:

```markdown
# Headers

## Subheaders

**bolded words**

Tables are easy, too!

| Header | Column 1 |
| Sample 1 | 1.23 |
| Sample 2 | 2.34 |
```

You can also writing nice equations with [LaTeX](https://www.overleaf.com/learn/how-to/Writing_Markdown_in_LaTeX_Documents): $E = mc^2$

We will make use of these capabilities throughout the course.

❓ DOUBLE CLICK ON THIS CELL TO LOOK AT TEXT FORMATTING OPTIONS AVAILABLE ON COLAB

You can setup and run Jupyter notebooks from a server on your personal machine, a remote server, or right from [Google Drive](drive.google.com) using [Google Colab](https://colab.research.google.com/).  These can be configured to display notebooks in different ways and include different features.  For the sake of simplicity and ease we will work from Colab for this course.

## Google Colab

[Google Colab](https://colab.research.google.com/) is free for anyone with a Google account.  You can purchase paid tiers of service which comes with access to powerful GPUs and other features.  At the time of writing Colab comes with free access to CPUs, GPUs, and [tensor processing units](https://cloud.google.com/tpu) (TPUs).

Check out the runtime capabilities in the icon at the top right!

Go to ```Runtime > Change runtime type```.  You can also use this to run an R kernel instead of a python.

The advantage of this is you can test out code, then scale up the resources behind your notebook as needed.  The free tier is plenty powerful for all the analysis we will do in this course and for many chemometric applications.

You can also change your look and feel from ```Tools > Settings```.

---
Perhaps most importantly, you can directly connect this to your [Google Drive](drive.google.com) and store, save, and process data directly in the cloud.

Option 1: Copy/paste the code below
```python
    from google.colab import drive
    drive.mount('/content/drive')
```

Option 2: Select the "Files" tab on the left and the code will populate automatically.

In [2]:
# You can search your mounted drive using the tools on the right or by using linux commands prefixed by the "!"
!ls ./drive/MyDrive

'Colab Notebooks'  'Google Form Template'   installation_notes	 research


Google also provides "code snippets" (see `<>` on the left) which can help you find publicly available examples of code that does certain functions. This can help you write code faster, but be wary of running code you do not fully understand.

When running, this notebook "lives" on a Google server somewhere.  To save your work when done, go to `File > Save a copy in Drive`

## Runtime Environment

The order of execution matters in your notebook.

In [3]:
a = 1

In [4]:
a = 2

In [5]:
a

2

* If you are unsure, you can restart the runtime by going to `Runtime > Restart session` in Colab.  This will wipe all saved variables, calculations, etc.

  * Changing a Colab runtime type will have the same effect.

* You can also `Runtime > Restart and run all` which will restart the runtime and then go cell-by-cell and execute each one in order until the end of the notebook or an error occurs.

* This will NOT unmount your Google Drive nor will it uninstall any packages you might have already installed in your runtime.

## Installing libraries and tools

[pip](https://pip.pypa.io/en/stable/) is the package installer for python and can be used to install things in your runtime environment.

Packages can found in the [Python Package Index](https://pypi.org/).

Many scientific and data science pcakages are automatically installed in Colab so you only need to `import` them, as we will see later.

You can install a new package in a `code` cell using the following command.
```
!pip install name_of_package
```

Let's install [watermark](https://github.com/rasbt/watermark) which is a tool that will help us keep track of the versions of libraries and software installed.

In [None]:
!pip install watermark

In [6]:
# Let's try it out by importing the library
import watermark

In [7]:
# We can see a function's signature and "Docstring" by using a single "?" before or after the command
watermark.watermark?

In [None]:
# This is equivalent to calling help() on a function
help(watermark.watermark)

In [9]:
# We can see the exact code with 2 question marks
watermark.watermark??

In [10]:
print(watermark.watermark())

Last updated: 2024-04-22T17:50:55.088318+00:00

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 7.34.0

Compiler    : GCC 11.4.0
OS          : Linux
Release     : 6.1.58+
Machine     : x86_64
Processor   : x86_64
CPU cores   : 2
Architecture: 64bit



In [11]:
# We can also use the watermark magic extension by loading it
%load_ext watermark

In [12]:
# This is a convenient command since it will print basic information about the machine you are running on and what versions of libraries you have loaded
%watermark -t -m -v --iversions

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 7.34.0

Compiler    : GCC 11.4.0
OS          : Linux
Release     : 6.1.58+
Machine     : x86_64
Processor   : x86_64
CPU cores   : 2
Architecture: 64bit

watermark: 2.4.3



In [13]:
# After importing a new library it will automatically show up
import numpy as np

%watermark -t -m -v --iversions

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 7.34.0

Compiler    : GCC 11.4.0
OS          : Linux
Release     : 6.1.58+
Machine     : x86_64
Processor   : x86_64
CPU cores   : 2
Architecture: 64bit

numpy    : 1.25.2
watermark: 2.4.3



It is good practice to have one cell at the top of your notebook where you load all the libraries you need, then call watermark to make these visible.

```
import numpy as np
import scipy as sp
import pandas as pd

%watermark -t -m -v --iversions
```

## Saving Code

Things can change over time, especially when writing new code and debugging.  As a result, restarting your runtime and re-running your calculations one last time from start to finish is a good idea.  In addition, you can export (or just copy and paste) python code to a `.py` file, then import it.

In [14]:
def Fibonacci(n):
  """
  This is where your Docstring goes.

  Input
  -----
  n : int
    Fibonacci number to get.

  Returns
  -------
  number : int
    The nth Fibonacci number.
  """

	# Check if input is 0 then it will
	# print incorrect input
	if n < 0:
		print("Incorrect input")

	# Check if n is 0
	# then it will return 0
	elif n == 0:
		return 0

	# Check if n is 1,2
	# it will return 1
	elif n == 1 or n == 2:
		return 1

	else:
		return Fibonacci(n-1) + Fibonacci(n-2)

In [19]:
for i in range(10):
  print(Fibonacci(i))

0
1
1
2
3
5
8
13
21
34


In [33]:
# Add the absolute path of the directory where your file is stored
import sys, os
sys.path.append(
    os.path.join(os.path.abspath('./'), 'drive/MyDrive/Colab Notebooks')
)

In [34]:
os.path.join(os.path.abspath('./'), 'drive/MyDrive/Colab Notebooks')

['/content',
 '/env/python',
 '/usr/lib/python310.zip',
 '/usr/lib/python3.10',
 '/usr/lib/python3.10/lib-dynload',
 '',
 '/usr/local/lib/python3.10/dist-packages',
 '/usr/lib/python3/dist-packages',
 '/usr/local/lib/python3.10/dist-packages/IPython/extensions',
 '/root/.ipython',
 './drive/MyDrive/Colab Notebooks',
 './drive/MyDrive/Colab Notebooks',
 '/content/drive/MyDrive/Colab Notebooks/']

In [None]:
print(os.path.join(os.path.abspath('./'), 'drive/MyDrive/Colab Notebooks'))

In [35]:
import fibonacci

# The Python Language

In [None]:
# matplotlib inline, notebook (3D)

# Common Chemometric Problems

# Introductory Statistics

In [None]:
# n-1 vs. N in std dev