:::{.callout-tip collapse="true"}
## Other formats
Download the workshop as a Jupyter notebook <a href="workshop1_intro.ipynb" download>here</a>.

Open the workshop as a Jupyter notebook in Google Colab [here](https://colab.research.google.com/drive/1rG9j0eWVt4ORoNcEKa89ccnZmYBLeDYb?usp=sharing) (with your NJIT Google account).
:::


## Overview

### Machine Learning and Optimization Seminar
The goal of the Machine Learning and Optimization seminar this year is to expose the participants to topics in machine learning and optimization by:

1. Facilitating hands-on workshops and group discussions to explore and gain experience
2. Inviting speakers to introduce machine learning and optimization concepts

We are excited to get started on this but recognize that since neither machine learning nor optimization are standardized in the department, each participant will have a different level of experience and exposure to the different topics.
Although this will require some additional work on the part of the organizers (and additional commitment from the participants), our hope is that this disparity of experience will increase the collaboration between faculty, students in any year of the program, and even visitors from other departments.
Whether you are a participant or an organizer please take the opportunity to share their knowledge and experience with one another openly during the workshops.

As a student run seminar, we also recognize that some participants may have expertise beyond the workshop organizer.
If this is the case for you, please engage in discussion of the material during the workshop and give feedback after.
All the workshop material will be available online for later reference.

### This workshop
This first workshop is mostly focused on tooling and the basic concepts of machine learning and optimization with the goal that everyone can be prepared on the same footing for later workshops.
It will also give each of us a chance to get a view of everyone else's perspective, interests, and experience.

## Your programming environment 
It is safe to say that Python is the language of choice for machine learning.
This interpreted language has a very clear and high-level syntax, is extremely convenient for interactive programming and debugging, and has an enormous user base of enterprises, researchers, and hobbyists who have built an almost infinite array of open-source packages for every topic.
It is exactly for these three reasons that the workshops for this seminar will use Python.

So, to get started, we will give the basics of the Python programming language. 
Just kidding!
That would take too long.
We will instead guide you on how to install Python most conveniently, teach you how to get started learning about Python, and then point you to a curated library of much more high-quality instruction for using Python.
A list of some such references as well as documentation for setup and important Python packages can be found in the Appendix [here](#python-resources-and-packages).

### Setting up
:::{.callout-tip}
If you are looking for the easiest and most immediate way to get going, check out the [Google Colab](#google-colab) section.
:::
:::{.callout-note}
This section will require use of a terminal emulator for installing and running Python.
If you are not familiar with the terminal, check out [this quick tutorial](https://mrkaluzny.com/blog/terminal-101-getting-started-with-terminal/) to get started.

If you are using a computer with Windows, the terminal instructions may not apply.
:::

Python is installed by default on MacOS and most Linux distributions.
However, it can get challenging to navigate between the versions and packages that your operating system uses and those needed for other projects.
Thus, there are a variety of version, package, and environment management tools:

- **Version management**: Which version of Python are you using? Can you change versions to run a specific Python program if it requires?
    - `pyenv`
    - `conda`/`mamba`
- **Package management**: How can you install the many amazing Python packages people have created?
    - `pip`
    - `conda`/`mamba`
- **Environment management**: If you have two projects that require different packages (or different versions of the same package), can you switch which packages are available depending on which project you are working on?
    - `venv`
    - `virtualenv`
    - `poetry`
    - `conda`/`mamba`
    - many more
    
The `conda` package manager is the only one that fills all three roles.
It is formally a part of the Anaconda Python distribution which is a favorite for those in the fields of data science and machine learning.
Mamba is a newer and faster rewrite used in exactly the same way and which is highly recommended.

The best way to get started with `mamba` is to install `mambaforge`.
You can find installer downloads for Windows, MacOS, or Linux [here](https://github.com/conda-forge/miniforge#mambaforge).

For Windows, run the `.exe` file once it is downloaded.

For MacOS and Linux, open a terminal and navigate to the download location:
```bash
cd ~/Downloads
```
Then run the installer as follows:
```bash
./Mambaforge-Linux-x86_64.sh
```
The installer will walk you through a few steps and end by asking if you'd like to "initialize Mambaforge by running conda init?"
Answer yes to this and restart your terminal.
This final command will have added `conda` and `mamba` to your system `$PATH` variable, which means it is available to your terminal.
Once restarted, run `mamba -V` to print the version and to verify that the installation worked.

### Environments
The idea of a `conda`/`mamba` environment is that once an environment is created and activated, all new packages added will be added to that environment and will be accessible to any Python program run while the environment is active.
As an example, let's create an environment called `workshop` with a specific version of Python installed.
The following will create the environment and install the desired `python` version:
```bash
mamba create -n workshop python=3.9
```
Once created, we can list our environments via the command
```bash
mamba env list
```
```
# conda environments:
#
base                     /home/user/mambaforge
workshop                 /home/user/mambaforge/envs/workshop
```
Note that there is a "base" environment which is where `conda` and `mamba` themselves are installed as well as their dependencies.
The best practice is to create an environment for each of your projects to minimize dependency issues (when packages depend on other packages and sometimes require separate versions of the same package).

To activate our new environment:
```bash
mamba activate workshop
```
Running `mamba env list` will now show our active environment via an asterisk:
```
base                     /home/user/mambaforge
workshop              *  /home/user/mambaforge/envs/workshop
```

### Installing packages
Now that we have an active `conda` environment, let's install some common machine learning packages in python.
It is as easy as writing:
```bash
mamba install numpy matplotlib pandas jupyter scipy scikit-learn scikit-image
```
This command will search the [conda-forge](https://conda-forge.org/#page-top) repository of packages and install the most up to date versions.
Hence the `forge` in `mambaforge`.

:::{.callout-tip}
Either `conda` or `mamba` could be used for all the commands discussed in this section.
However, when installing packages, `mamba` is significantly faster.
:::

Now that these packages have been installed, we can easily use them in an interactive `ipython` prompt (installed with the `jupyter` package):
```bash
ipython
```
```default
Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:58:50)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: import matplotlib.pyplot as plt

In [3]: x = np.linspace(0,10,100)

In [4]: y = np.sin(x)

In [5]: plt.plot(x,y); plt.show()
```
This should plot a simple $\sin$ curve.

### Cleaning up
After we are done using the environment that has our desired version of Python and the needed packages, we can go back to our regular terminal by deactivating the environment:
```bash
mamba deactivate workshop
```
If we have somehow broken our environment and need to remove it:
```bash
mamba env remove -n workshop
```
There are many more commands and functionalities that `conda` and `mamba` provide so further references can be found in the next section.

### Google Colab
As an alternative to the entire procedure above, you can use an online [Jupyter Notebook](https://jupyter.org) service hosted by Google called [Colab](https://colab.research.google.com).
This service will get you up and running immediately but cannot save your environment between notebooks.
Thus, if your notebook requires a package that is not installed by default, you will need to add the installation command in one of the first notebook cells.
For example, to install the [reservoirpy]() package, we would write in a notebook cell:
```default
!pip install reservoirpy
```
In a notebook, the `!` denotes a terminal command.
The package will now be ready for import and use within the current notebook session:
```default
from reservoirpy.nodes import Input,Reservoir,Ridge
```


## Basic concepts of machine learning 

## Warming up
Let's use what we have understood so far to fit a simple "machine" to some data.
Given a prescribed model form and some data, we can fit our model using a simple gradient descent algorithm.

### Gradient descent
<!-- TODO: Come up with simple model in which parameters are non-convex -->
<!-- TODO: Simple overview of gradient descent -->
<!-- TODO: Simple animation code to use to generate plots of parameter iterations, should work formomentum version also -->

### With momentum
<!-- TODO: Simple overview of adding momentum to gradient descent -->
<!-- TODO: nesterov acceleration https://www.codingninjas.com/codestudio/library/nesterov-accelerated-gradient-->
<!-- TODO: RMSProp https://optimization.cbe.cornell.edu/index.php?title=RMSProp -->
<!-- TODO: ADAM (combo of momentum and RMSProp averaging) https://medium.com/analytics-vidhya/momentum-rmsprop-and-adam-optimizer-5769721b4b19 -->

## Conclusion

### Future workshop schedule
To keep your interest as we begin on this fairly generic beginning, the schedule for workshops for the semester is planned as follows:

## Appendix

### Python resources and packages
**Python**

- [Python documentation (especially sections 3-9)](https://docs.python.org/3.9/tutorial/introduction.html)
- [Quick cheatsheet of general Python knowledge](https://www.pythoncheatsheet.org)
- [Quicker introduction](https://learnxinyminutes.com/docs/python/)

**Conda/mamba**

- [Conda user guide](https://docs.conda.io/projects/conda/en/latest/user-guide/index.html)
- [Mamba website](https://mamba.readthedocs.io/en/latest/index.html)

**Essential packages**

- [`numpy`](https://numpy.org) - Creating, manipulating, and operating (linear algebra, fft, etc.) on multi-dimensional arrays. A list of packages built on `numpy` for a variety of domains can be found on the homepage under the _ECOSYSTEM_ heading.
- [`scipy`](https://scipy.org) - Fundamentals in optimization, integration, interpolation, differential equations, statistics, signal processing, etc.
- [`matplotlib`](https://matplotlib.org) - Static or interactive plots and animations
- [`scikit-learn`](https://scikit-learn.org/stable/index.html) - Standard machine learning tools and algorithms built on `numpy`, `scipy`, and `matplotlib`
- [`pandas`](https://pandas.pydata.org) - Easily represent, manipulate, and visualize structured datasets (matrices with names for columns and rows)
- [`keras`](https://keras.io) - High level neural network framework built on `tensorflow`
- [`tensorflow`](https://www.tensorflow.org) - In depth neural network framework focused on ease and production
- [`pytorch`](https://pytorch.org) - In depth neural network framework focused on facilitating the path from research to production
- [`scikit-image`](https://scikit-image.org) - Image processing algorithms and tools
- [`jupyter`](https://jupyter.org) - Interactive "notebook" style programming

### Julia as an alternative to Python
Julia is a fairly new language that has been mainly proposed as an alternative to Python and Matlab, though it is general use.
Its strength and its weakness is that it is "just-in-time" compiled (meaning your code is automatically analyzed and compiled just before it is run).
A clever language design combined with just-in-time compilation makes Julia as clear to read and write as Python while being much faster.
It can even approach the speed of C when written carefully.
However, the just-in-time compilation and type system remove a chunk of the interactive convenience of Python and its young age also means that it does not have the volume of packages that Python does.

Nonetheless, it is an elegant and high-performance language to use and has shown rapid growth recently.
Concise, simple, and easy to read and contribute to packages have been quickly emerging and it already provides many useful tools.
As a result, it is worth describing it's installation process, environment management, and noteable packages.

#### Installation
The officially supported method of installation for Julia is now using the `juliaup` version manager.
The [installer](https://github.com/JuliaLang/juliaup#windows) can be downloaded from the Windows store on Windows or run on MacOS or Linux with:

```bash
curl -fsSL https://install.julialang.org | sh
```
#### Environments
Julia comes with a standard environment and package manager named [`Pkg`](https://pkgdocs.julialang.org/v1/).
Interestingly, the easiest way to use it is to run the Julia REPL (read-eval-print-loop), i.e. to run `julia` interactively.
You can do so by typing `julia` into the terminal.
You will then be presented with a terminal interface such as:
```default
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0 (2022-08-17)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia>
```
Typing `]` will put you into "`Pkg` mode":
```default
(@v1.8) pkg>
```
Type `?` and hit enter to get options in this mode.
We can create and activate a new environment called workshop with the command:
```default
(@v1.8) pkg> activate --shared workshop
```
Note that the `--shared` flag will make a "global" environment that can be accessed from any directory.
If we were to leave out this flag, `Pkg` would put a `Project.toml` and `Manifest.toml` file in the current directory that contain the name of the environment, its installed packages, and their dependencies.
This can be useful to easily isolate and share environments.
After running this command, our `Pkg` mode will have changed to represent the active environment:
```default
(@workshop) pkg>
```

#### Installing packages
To install some packages in the active environment, write:
```default
(@workshop) pkg> add Plots MLJ DataFrames Flux Pluto
```
These packages will install and precompile.
To test one of them, press backspace to leave `Pkg` mode and input:
```default
julia> using Plots
[ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
julia> x = range(0,10,100);
julia> y = sin.(x);
julia> plot(x,y)
```
This should show a plot of a simple $\sin$ curve.
Note that the precompilation of `Plots` took some time.
However, this will not need to occur again until the package is updated.
Also note that the call to `plot(x,y)` took some time.
This is due to the just-in-time compilation.
Now that the compilation has been done for inputs of the types of `x` and `y`, if you run `plot(x,y)` again, it should be almost instantaneous.

#### Cleaning up
To deactivate the environment, enter the `Pkg` mode again by pressing `]` on an empty line, then enter:

```default
(@workshop) pkg> activate
```
To delete the environment we created you can delete the environment folder at the listed location at creation.
This is usually `/home/user/.julia/environments/workshop` on MacOS or Linux.

#### References

**Julia**

- [Julia documentation](https://docs.julialang.org/en/v1/)
- [Quick cheatsheet of Julia](https://juliadocs.github.io/Julia-Cheat-Sheet/)
- [Comparison of the syntax of Julia, Python, and Matlab](https://cheatsheets.quantecon.org)

**Packages**

As compared to Python, Julia has many scientific computing tools built into its standard library.
Thus, a lot of the functionality found in `numpy` are loaded by default.
On the other hand, because of the interoperability of the language and the reduced need for a polyglot codebase (i.e. needing C and Fortran code for a Python package to be fast), packages are usually much smaller modules in Julia.
For example, the functionality of the `scipy` package in Python can be found spread across possibly a dozen different packages in Julia.
This is convenient to only load and use what you need, but inconvenient in that it may require more searching to find and the interfaces may not be standardized.
The following are some packages that roughly recreate the essential Python packages [here](#python-resources-and-packages). 

- `numpy` - [Standard library](https://docs.julialang.org/en/v1/manual/arrays/),[`FFTW.jl`](https://juliamath.github.io/FFTW.jl/latest/)
- `scipy` - [`Statistics.jl`]()
- `matplotlib` - [`Plots.jl`](https://docs.juliaplots.org/latest/)
- `scikit-learn` - [`MLJ.jl`](https://alan-turing-institute.github.io/MLJ.jl/dev/)
- `pandas` - [`DataFrames.jl`](https://dataframes.juliadata.org/stable/)
- `keras`,`tensorflow`,`pytorch` - [`Flux.jl`](https://fluxml.ai/Flux.jl/stable/)
- `scikit-image` - [`Images.jl`](https://juliaimages.org/latest/install/)
- `jupyter` - [`Pluto.jl`](https://github.com/fonsp/Pluto.jl) although you can use Julia with Jupyter via [`IJulia.jl`](https://julialang.github.io/IJulia.jl/stable/)