:::{.callout-tip collapse="true"}
## Other formats
Download the workshop as a Jupyter notebook <a href="workshop1_intro.ipynb" download>here</a>.

Open the workshop as a Jupyter notebook in Google Colab [here](https://colab.research.google.com/drive/1rG9j0eWVt4ORoNcEKa89ccnZmYBLeDYb?usp=sharing) (with your NJIT Google account).
:::


## Overview

### Machine Learning and Optimization Seminar
The goal of the Machine Learning and Optimization seminar this year is to expose the participants to topics in machine learning and optimization by:

1. Facilitating hands-on workshops and group discussions to explore and gain experience
2. Inviting speakers to introduce machine learning and optimization concepts

We are excited to get started on this but recognize that since neither machine learning nor optimization are standardized in the department, participants will have a varied level of exposure to different topics.
Our hope is that we can use this disparity of experience to increase collaboration during the workshops in a way that can't be achieved during the talks.
All are encouraged to share their knowledge and experience with one another openly during the workshops and to give feedback to the organizers after.

All workshop material will be available [here](machinelearning_optimization_seminar.qmd) for later reference.

This first workshop is focused on tooling and the basic concepts of machine learning and optimization with the goal that everyone can be on the same footing for later workshops.

## Your Python environment 
It is safe to say that Python is the language of choice for machine learning.
This interpreted language has a very clear and high-level syntax, is extremely convenient for interactive programming and debugging, and has an enormous user base of enterprises, researchers, and hobbyists who have built an almost infinite collection of open-source packages for every topic.
For these three reasons the workshops for this seminar will use Python.

To get started, we will give the basics of the Python programming language. 
Just kidding!
That would take too long.
We will instead guide you on how to install Python most conveniently, teach you how to get started learning about Python, and then point you to a curated library of much more high-quality instruction for using Python.
A list of some such references as well as documentation for setup and important Python packages can be found in the Appendix [here](#python-resources-and-packages).

### Setting up
:::{.callout-tip}
If you are looking for the easiest and most immediate way to get going, check out the [Google Colab](#google-colab) section.
:::
:::{.callout-note}
This section will require use of a terminal emulator for installing and running Python.
If you are not familiar with the terminal, check out [this quick tutorial](https://mrkaluzny.com/blog/terminal-101-getting-started-with-terminal/) to get started.

If you are using a computer with Windows, the terminal instructions may not apply.
:::

Python is installed by default on MacOS and most Linux distributions.
However, it can be challenging to navigate between the versions and packages that your operating system uses and those needed for other projects.
Thus, there are a variety of version, package, and environment management tools:

- **Version management**: Which version of Python are you using? Can you change versions to run a specific Python program if it requires?
    - `pyenv`
    - `conda`/`mamba`
- **Package management**: How can you install the many amazing Python packages people have created?
    - `pip`
    - `conda`/`mamba`
- **Environment management**: If you have two projects that require different packages (or different versions of the same package), can you switch which packages are available depending on which project you are working on?
    - `venv`
    - `virtualenv`
    - `poetry`
    - `conda`/`mamba`
    - many more
    
The `conda` package manager is the only one that fills all three roles.
It is formally a part of the Anaconda Python distribution which is a favorite in the fields of data science and machine learning.
`mamba` is a newer and faster rewrite used in exactly the same way and which is highly recommended.

The best way to get started with `mamba` is to install `mambaforge`.
You can find installer downloads for Windows, MacOS, or Linux [here](https://github.com/conda-forge/miniforge#mambaforge).

For Windows, run the `.exe` file once it is downloaded.

For MacOS and Linux, open a terminal and navigate to the download location:
```bash
cd ~/Downloads
```
Then run the installer as follows:
```bash
./Mambaforge-Linux-x86_64.sh
```
The installer will walk you through a few steps and end by asking if you'd like to "initialize Mambaforge by running conda init?"
Answer yes and restart your terminal.
This final command will have added `conda` and `mamba` to your system `$PATH` variable, which means it is available to your terminal.
Once restarted, run `mamba -V` to print the version and to verify that the installation worked.

### Environments
The idea of a `conda`/`mamba` environment is that once an environment is created and activated, all new packages installed will be added to that environment and will be accessible to any Python program run while the environment is active.
As an example, let's create an environment called `workshop` with a specific version of Python installed.
The following will create the environment and install a specific version of `python`:
```bash
mamba create -n workshop python=3.9
```
Once created, we can list our environments via the command
```bash
mamba env list
```
```
# conda environments:
#
base                     /home/user/mambaforge
workshop                 /home/user/mambaforge/envs/workshop
```
Note that there is a "base" environment which is where `conda` and `mamba` themselves are installed as well as their dependencies.
The best practice is to create an environment for each of your projects to minimize dependency issues (when packages require separate versions of the same package).

To activate our new environment:
```bash
mamba activate workshop
```
Running `mamba env list` will now show our active environment via an asterisk:
```
base                     /home/user/mambaforge
workshop              *  /home/user/mambaforge/envs/workshop
```

### Installing packages
Now that we have activated the `workshop` `conda` environment, let's install some common machine learning packages in Python.
It is as easy as writing:
```bash
mamba install numpy matplotlib pandas jupyter scipy scikit-learn scikit-image
```
This command will search the [conda-forge](https://conda-forge.org/#page-top) repository of packages and install the most up-to-date versions (the `forge` in `mambaforge`).

:::{.callout-tip}
Either `conda` or `mamba` could be used for all the commands discussed in this section.
However, `mamba` is significantly faster when installing packages.
:::

Now that these packages have been installed, we can easily use them in an interactive `ipython` prompt (installed with the `jupyter` package):
```bash
ipython
```
```default
Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:58:50)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: import matplotlib.pyplot as plt

In [3]: x = np.linspace(0,10,100)

In [4]: y = np.sin(x)

In [5]: plt.plot(x,y); plt.show()
```
This should plot a simple $\sin$ curve.

### Cleaning up
After we are done using the environment that has our desired version of Python and the needed packages, we can go back to our regular terminal by deactivating the environment:
```bash
mamba deactivate workshop
```
If we have somehow broken our environment and need to remove it:
```bash
mamba env remove -n workshop
```
There are many more commands and functionalities that `conda` and `mamba` provide that can be found in the [python resources and packages](#python-resources-and-packages) section of the Appendix.

### Google Colab
As an alternative to the entire procedure above, you can use an online [Jupyter Notebook](https://jupyter.org) service hosted by Google called [Colab](https://colab.research.google.com).
This service will get you up and running immediately but cannot save your environment between notebooks and has limited functionality to run scripts or save data.
Thus, if your notebook requires a package that is not installed by default, you will need to add the installation command in one of the first notebook cells.
For example, to install the [reservoirpy]() package, we would write in a notebook cell:
```default
!pip install reservoirpy
```
In a notebook, the `!` denotes a terminal command.
The package will now be ready for import and use within the current notebook session:
```default
from reservoirpy.nodes import Input,Reservoir,Ridge
```

## Basic concepts of machine learning 

Machine learning is, at its most basic, automated data analysis, usually with the goal of finding patterns or making predictions.
The "machines" in this analysis are equations or algorithms and the "learning" is usually some form of parameter selection and/or fitting.
Due to the uncertain nature of most data, the majority of these models are probabilistic in nature.
In fact, it can be hard to distinguish the methodological lines between what is termed "machine learning" and the field of statistics.
However, there are some important distinctions between the tools, goals, and terminology of the two areas.
Today, machine learning has emerged as a broad description of almost any data-driven computing which may or may not include classical descriptive and inferential statistics [@murphy2012machine].

At first glance, machine learning can be separated into three main classes: 

- **Supervised learning**: Given dependent and independent variable data, train a model which effectively maps the independent variable data to produce the dependent variable data.
    - Generalized linear models (linear, logistic, etc. regression)
    - Naive Bayes
    - Neural networks (most)
    - Support vector machine (SVM)
    - Random forests
    - etc.
- **Unsupervised learning**: Given data, find patterns (no specified output, though there is still a measure of success)
    - Clustering
    - Mixture models
    - Dimensionality reduction
    - Association rules
    - etc.
- **Reinforcement learning**: Given input data and desired outcomes, simulate and use the results to update a model to improve the simulation's ability to achieve those outcomes
    - Q-learning
    - SARSA
    - etc.

There is an enormous amount of interest in machine learning methods currently, thus there is also an enormous amount of high-quality material discussing it.
We will end our introduction here and direct you to established textbooks [@james2013introduction;@hastie2009elements;@murphy2012machine], NJIT classes (Math 478, Math 678, Math 680, CS 675, CS 677), and online resources (too many to even start listing).

### General procedure
In practice, machine learning algorithms often boil down to an optimization problem.
To characterize this in a few steps, consider a problem with data $x$:

1. Select a model representation $f$ with parameters $p$ for the problem:
$$
y = f(x;p)
$$
2. Determine an appropriate objective function $\mathcal{L}$:
$$
\mathcal{L}(f(x;p),x)
$$
3. Use an optimization method $\mathcal{O}$ with parameters $d$ to find parameters $p$ that minimize or maximize the objective for the model:
$$
p^* = \mathcal{O}(\mathcal{L},f,x;d)
$$

In some sense, this is the same procedure as _inverse problems_ in traditional applied mathematics but with a broader set of models $f$ that may or may not be based on first-principles understanding of the problem.

### Incorporating data
There are a variety of choices for models $f$ and objectives $\mathcal{L}$ depending on the class of problem being considered (supervised, unsupervised, or reinforcement).
For supervised learning (the most common), the objective is often to predict or generate the output or dependent variable data of some process.
For this, data is usually separated into three sets:

1. **Training data** ($x$): used to tune the parameters $p$
2. **Validation data** ($x^v$): used to evaluate the generalization of the model $f$ to data not in the training set during training
3. **Testing data** ($x^t$): used to benchmark the predictive or generative ability of the model after training is completed

## A first machine learning problem
These workshops are about learning by doing, so let's build understanding by fitting a simple "machine" to some data as a supervised problem.
Consider some data $(x,y)$:

In [None]:
#| code-fold: true
import numpy as np
import matplotlib.pyplot as plt
def f_known(x):
    part1 = np.exp(-x)*(np.sin((x)**3) + np.sin((x)**2) - x)
    part2 = 1/(1 + np.exp(-1*(x-1)))
    return part1 + part2
xsamples = np.random.uniform(-1/2,5,100)
ysamples = f_known(xsamples)
plt.scatter(xsamples,ysamples)
plt.xlabel("$x$"); plt.ylabel("$y$"); plt.title("Data")
plt.show()

We would like to fit a model of the following form to this data:
$$
f(x;p_0,p_1) = e^{-p_0x}(\sin((p_0x)^3) + \sin((p_0x)^2) - p_0x) + \frac{1}{1 + e^{-p_1(x-1)}}
$$
To formulate this as a machine learning/optimization problem, we can consider a simple $L^2$ objective/loss in which we would like to minimize the $L^2$ norm distance between the model output $f(x)$ and the true data $y$:
$$
\mathcal{L(f(x;\vec{p}),y)} = ||f(x) - y||_2^2
$$
The problem can then be written as the unconstrained optimization problem:
$$
p^* = \underset{\vec{p}}{\text{minimize }} \mathcal{L}(f(x;\vec{p}),y)
$$
We then expect our model $f(x;p^*)$ to represent a "machine" that has accurately "learned" the relationship between $x$ and $y$.

There are several ways to approach this problem, but a simple and popular approach for a continuous and unconstrained problem is to use an iterative gradient method.

### Gradient descent
_Gradient descent_ is a straightforward method taught early in an undergraduate numerical methods class.
Its simplicity and cheap computational cost has made it popular for machine learning methods (which can contain so many parameters that computing the Hessian for second-order methods like Newton's method becomes infeasible).
Beginning with an initial parameter guess $\vec{p}_0$, its update procedure can be written as:
$$
\begin{align*}
v^i &= -\alpha \nabla_p \mathcal{L} \\
\vec{p}^{i+1} &= \vec{p}^i + v^i
\end{align*}
$$
where $\alpha^i$ controls the step size in the direction of the gradient (usually called a "learning rate" in machine learning).
This method will follow the gradient of the objective/loss $\mathcal{L}$ until the objective is sufficiently small, or until it reaches a steady state.

Simply implemented in Python, this method can be written as:

In [None]:
def gradient_descent(fp,x0,lr=.2,tol=1e-12,steps=10000):
    x = x0
    xs = [x]
    for s in range(steps):
        xnew = x - lr*fp(x)
        if np.linalg.norm(fp(xnew)) < tol:
            print("Converged to objective loss gradient below {} in {} steps.".format(tol,s))
            return x,xs
        elif np.linalg.norm(xnew - x) < tol:
            print("Converged to steady state of tolerance {} in {} steps.".format(tol,s))
            return x,xs
        x = xnew
        xs.append(x)
    print("Did not converge after {} steps (tolerance {}).".format(steps,tol))
    return x,xs

However, this method contains a troublesome parameter $\alpha^i$ which, if chosen too large, could prevent convergence of the solution or, if chosen too small, could require an unreasonable number of steps to converge.
For this reason, "vanilla" (or normal) gradient descent is almost always replaced with a modified method in learning problems[@sebastian_ruder_2020;@john_chen_2020].

The following demonstrates an animation of the above gradient descent method applied to our data with two different learning rates, one successful, one not.
It uses the following animation code and the `autograd` automatic differentiation library that will be further discussed later:

In [None]:
#| code-fold: true
#| code-summary: Animation code
from matplotlib import animation as anim
from matplotlib import gridspec
from mpl_toolkits.mplot3d import Axes3D

def animate_steps_2d(xs,func,xmin=-.1,xmax=2.5,ymin=-1,ymax=3,interval=50,path=True):
    def anim_func(i):
        ax.clear()
        ax1.clear()
        ax2.clear()
        # Add surface plot
        ax.plot_surface(X,Y,Z,cmap="gist_earth")
        x_loss = loss(xs[i])
        ax.scatter(xs[i][0],xs[i][1],x_loss,zorder=100,color="red",s=100)
        ax.set_xlabel("$p_1$")
        ax.set_ylabel("$p_2$")
        ax.set_zlabel("loss")
        if path:
            temp_x1 = [xs[j][0] for j in range(i)]
            temp_x2 = [xs[j][1] for j in range(i)]
            temp_losses = [loss(xs[j]) for j in range(i)]
            ax.plot(temp_x1,temp_x2,temp_losses,color="orange")
        loss_fx = [func([fxs[j],xs[i][1]]) for j in range(len(fxs))]
        loss_fy = [func([xs[i][0],fys[j]]) for j in range(len(fys))]
        # Add flat plots for perspective
        ax1.plot(fxs,loss_fx)
        ax1.scatter(xs[i][0],x_loss,color="red",s=100,zorder=100)
        ax1.set_xlabel("$p_1$")
        ax1.set_ylabel("loss")
        ax1.set_xlim(np.min(X),np.max(X))
        ax1.set_ylim(np.min(Z),np.max(Z))
        ax2.plot(fys,loss_fy)
        ax2.scatter(xs[i][1],x_loss,color="red",s=100,zorder=100)
        ax2.set_xlabel("$p_2$")
        ax2.set_ylabel("loss")
        ax2.set_xlim(np.min(Y),np.max(Y))
        ax2.set_ylim(np.min(Z),np.max(Z))

    fig = plt.figure(figsize=(10,6))
    gs = gridspec.GridSpec(12,20)
    ax = fig.add_subplot(gs[0:12,0:16],projection="3d",computed_zorder=False)
    ax1 = fig.add_subplot(gs[0:5,16:20])
    ax2 = fig.add_subplot(gs[7:12,16:20])
    ax.view_init(47,47)

    fxs = np.linspace(xmin,xmax,100)
    fys = np.linspace(ymin,ymax,100)
    X,Y = np.meshgrid(fxs,fys)
    Z = np.zeros_like(X)
    for i in range(X.shape[0]):
        for j in range(X.shape[1]):
            Z[i,j] = func([X[i,j],Y[i,j]])

    tanim = anim.FuncAnimation(fig,anim_func,interval=50,frames=len(xs))
    plt.show()

In [None]:
#| panel: fill
from autograd import grad
import autograd.numpy as anp
def f_model(p):
    part1 = anp.exp(-p[0]*xsamples)*(anp.sin((p[0]*xsamples)**3) + anp.sin((p[0]*xsamples)**2) - p[0]*xsamples) 
    part2 = 1/(1 + anp.exp(-p[1]*(xsamples-1)))
    return part1 + part2
loss = lambda p: anp.sum((f_model(p) - ysamples)**2)
grad_loss = grad(loss) # automatically differentiated

p0 = np.array([.7,.2])
xs = gradient_descent(grad_loss,p0,.01,tol=1e-8,steps=1000)[1]
animate_steps_2d(xs,loss)

### Adaptive steps (adagrad, adadelta, RMSprop)

### With momentum (momentum, Nesterov acceleration)

### Combining ideas (Adam and more)
<!-- TODO: Simple overview of adding momentum to gradient descent -->
<!-- TODO: nesterov acceleration https://www.codingninjas.com/codestudio/library/nesterov-accelerated-gradient-->
<!-- TODO: RMSProp https://optimization.cbe.cornell.edu/index.php?title=RMSProp -->
<!-- TODO: ADAM (combo of momentum and RMSProp averaging) https://medium.com/analytics-vidhya/momentum-rmsprop-and-adam-optimizer-5769721b4b19 -->

## Conclusion

### Future workshop schedule
To keep your interest as we begin on this fairly generic beginning, the schedule for workshops for the semester is planned as follows:

## Appendix

### Python resources and packages
**Python**

- [Python documentation (especially sections 3-9)](https://docs.python.org/3.9/tutorial/introduction.html)
- [Quick cheatsheet of general Python knowledge](https://www.pythoncheatsheet.org)
- [Quicker introduction](https://learnxinyminutes.com/docs/python/)

**Conda/mamba**

- [Conda user guide](https://docs.conda.io/projects/conda/en/latest/user-guide/index.html)
- [Mamba website](https://mamba.readthedocs.io/en/latest/index.html)

**Essential packages**

- [`numpy`](https://numpy.org) - Creating, manipulating, and operating (linear algebra, fft, etc.) on multi-dimensional arrays. A list of packages built on `numpy` for a variety of domains can be found on the homepage under the _ECOSYSTEM_ heading.
- [`scipy`](https://scipy.org) - Fundamentals in optimization, integration, interpolation, differential equations, statistics, signal processing, etc.
- [`matplotlib`](https://matplotlib.org) - Static or interactive plots and animations
- [`scikit-learn`](https://scikit-learn.org/stable/index.html) - Standard machine learning tools and algorithms built on `numpy`, `scipy`, and `matplotlib`
- [`pandas`](https://pandas.pydata.org) - Easily represent, manipulate, and visualize structured datasets (matrices with names for columns and rows)
- [`keras`](https://keras.io) - High level neural network framework built on `tensorflow`
- [`tensorflow`](https://www.tensorflow.org) - In depth neural network framework focused on ease and production
- [`pytorch`](https://pytorch.org) - In depth neural network framework focused on facilitating the path from research to production
- [`scikit-image`](https://scikit-image.org) - Image processing algorithms and tools
- [`jupyter`](https://jupyter.org) - Interactive "notebook" style programming

### Julia as an alternative to Python
Julia is a fairly new language that has been mainly proposed as an alternative to Python and Matlab, though it is general use.
Its strength and its weakness is that it is "just-in-time" compiled (meaning your code is automatically analyzed and compiled just before it is run).
A clever language design combined with just-in-time compilation makes Julia as clear to read and write as Python while being much faster.
It can even approach the speed of C when written carefully.
However, the just-in-time compilation and type system remove a chunk of the interactive convenience of Python and its young age also means that it does not have the volume of packages that Python does.

Nonetheless, it is an elegant and high-performance language to use and has shown rapid growth recently.
Concise, simple, and easy to read and contribute to packages have been quickly emerging and it already provides many useful tools.
As a result, it is worth describing it's installation process, environment management, and noteable packages.

#### Installation
The officially supported method of installation for Julia is now using the `juliaup` version manager.
The [installer](https://github.com/JuliaLang/juliaup#windows) can be downloaded from the Windows store on Windows or run on MacOS or Linux with:

```bash
curl -fsSL https://install.julialang.org | sh
```
#### Environments
Julia comes with a standard environment and package manager named [`Pkg`](https://pkgdocs.julialang.org/v1/).
Interestingly, the easiest way to use it is to run the Julia REPL (read-eval-print-loop), i.e. to run `julia` interactively.
You can do so by typing `julia` into the terminal.
You will then be presented with a terminal interface such as:
```default
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0 (2022-08-17)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia>
```
Typing `]` will put you into "`Pkg` mode":
```default
(@v1.8) pkg>
```
Type `?` and hit enter to get options in this mode.
We can create and activate a new environment called workshop with the command:
```default
(@v1.8) pkg> activate --shared workshop
```
Note that the `--shared` flag will make a "global" environment that can be accessed from any directory.
If we were to leave out this flag, `Pkg` would put a `Project.toml` and `Manifest.toml` file in the current directory that contain the name of the environment, its installed packages, and their dependencies.
This can be useful to easily isolate and share environments.
After running this command, our `Pkg` mode will have changed to represent the active environment:
```default
(@workshop) pkg>
```

#### Installing packages
To install some packages in the active environment, write:
```default
(@workshop) pkg> add Plots MLJ DataFrames Flux Pluto
```
These packages will install and precompile.
To test one of them, press backspace to leave `Pkg` mode and input:
```default
julia> using Plots
[ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
julia> x = range(0,10,100);
julia> y = sin.(x);
julia> plot(x,y)
```
This should show a plot of a simple $\sin$ curve.
Note that the precompilation of `Plots` took some time.
However, this will not need to occur again until the package is updated.
Also note that the call to `plot(x,y)` took some time.
This is due to the just-in-time compilation.
Now that the compilation has been done for inputs of the types of `x` and `y`, if you run `plot(x,y)` again, it should be almost instantaneous.

#### Cleaning up
To deactivate the environment, enter the `Pkg` mode again by pressing `]` on an empty line, then enter:

```default
(@workshop) pkg> activate
```
To delete the environment we created you can delete the environment folder at the listed location at creation.
This is usually `/home/user/.julia/environments/workshop` on MacOS or Linux.

#### References

**Julia**

- [Julia documentation](https://docs.julialang.org/en/v1/)
- [Quick cheatsheet of Julia](https://juliadocs.github.io/Julia-Cheat-Sheet/)
- [Comparison of the syntax of Julia, Python, and Matlab](https://cheatsheets.quantecon.org)

**Packages**

As compared to Python, Julia has many scientific computing tools built into its standard library.
Thus, a lot of the functionality found in `numpy` are loaded by default.
On the other hand, because of the interoperability of the language and the reduced need for a polyglot codebase (i.e. needing C and Fortran code for a Python package to be fast), packages are usually much smaller modules in Julia.
For example, the functionality of the `scipy` package in Python can be found spread across possibly a dozen different packages in Julia.
This is convenient to only load and use what you need, but inconvenient in that it may require more searching to find and the interfaces may not be standardized.
The following are some packages that roughly recreate the essential Python packages [here](#python-resources-and-packages). 

- `numpy` - [Standard library](https://docs.julialang.org/en/v1/manual/arrays/),[`FFTW.jl`](https://juliamath.github.io/FFTW.jl/latest/)
- `scipy` - [`Statistics.jl`]()
- `matplotlib` - [`Plots.jl`](https://docs.juliaplots.org/latest/)
- `scikit-learn` - [`MLJ.jl`](https://alan-turing-institute.github.io/MLJ.jl/dev/)
- `pandas` - [`DataFrames.jl`](https://dataframes.juliadata.org/stable/)
- `keras`,`tensorflow`,`pytorch` - [`Flux.jl`](https://fluxml.ai/Flux.jl/stable/)
- `scikit-image` - [`Images.jl`](https://juliaimages.org/latest/install/)
- `jupyter` - [`Pluto.jl`](https://github.com/fonsp/Pluto.jl) although you can use Julia with Jupyter via [`IJulia.jl`](https://julialang.github.io/IJulia.jl/stable/)