*This notebook is part of  course materials for CS 345: Machine Learning Foundations and Practice at Colorado State University.
Original versions were created by Asa Ben-Hur with updates by Ross Beveridge.
The content is availabe [on GitHub](https://github.com/asabenhur/CS345).*

*The text is released under the [CC BY-SA license](https://creativecommons.org/licenses/by-sa/4.0/), and code is released under the [MIT license](https://opensource.org/licenses/MIT).*


<a href="https://colab.research.google.com/github//asabenhur/CS345/blob/master/fall22/notebooks/module00_01_intro.ipynb">
  <img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# CS 345 Course Introduction


### Course Basics

**Instructor:** Asa Ben-Hur

The syllabus is on the [course website](https://www.cs.colostate.edu/~cs345/).

### What is machine learning

**Machine learning:**  the construction and study of systems that learn from data.

It is a different way of solving problems.

* **Traditional programming**:  you have to tell the computer explicitly how to transform an input to the desired output.
* The **machine learning** approach:  you give the computer the ability to learn without being explicitly programmed.

How do we do that?

By showing the machine examples of how to map an input to an output.  The computer is not explicitly programmed to solve the problem.  It learns to do that directly from the data.

The term "machine learning" was coined in the early 1950s by [Arthur Samuel](https://en.wikipedia.org/wiki/Arthur_Samuel), an early pioneer in this area.


### A Classic Example of Supervised Learning

Example problem:  handwritten digit recognition.

Some examples from the [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database):

<img style="padding: 10px; float:center;" alt="MNIST dataset by Josef Steppan CC BY-SA 4.0" src="https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png" width="350">

### Applications of machine learning

* **Computer vision**: character recognition / face recognition
* **Natural language processing**
* **Speech recognition**
* **Recommender systems**: Netflix, Amazon
* **Basic science**: biology, chemistry, physics, math
* **Applied science**:  health care, self driving cars
* **Games**:  Until the arrival AlphaGO the game of GO was "unsolved" by AI; here's a very nice [documentary](https://www.youtube.com/watch?v=WXuK6gekU1Y) that describes the matches between AlphaGO and Lee Seidol.


### Some necessary background

Machine learning is an interdisciplinary field that requires background in multiple areas:

* Linear algebra for working with vectors and matrices
* Statistics and probability for reasoning about uncertainty
* Calculus for optimization
* Programming for efficient implementation of the algorithms


### Course objectives

The machine learning toolbox:

* Formulating a problem as an ML problem
* Understanding a variety of ML algorithms
* Running and interpreting ML experiments
* Understanding what makes ML work – theory and practice


### Python

Why Python?

<img style="float: right;" src="https://www.python.org/static/community_logos/python-logo.png" width="200">

* A concise and intuitive language
* Simple, easy to learn syntax
* Highly readable, compact code
* Supports object oriented and functional programming
* Strong support for integration with other languages (C,C++,Java)
* Cross-platform compatibility
* Free
* Makes programming fun!

**We assume you already know the basics of Python**. That said, most of us can benefit from a little review, so we will do a quick guided walkthrough of some Python basics in the following [notebook](https://github.com/asabenhur/CS345/blob/master/fall22/notebooks/module00_02_python_intro.ipynb).
You may also want to take advantage of: [A Whirlwind Tour of the Python Language](https://github.com/jakevdp/WhirlwindTourOfPython).

### Class poll - experience with Python

### Why Python for machine learning?

In the earliest days of computer science it was a competition between the first high-level programming languages, namely [Fortran](https://en.wikipedia.org/wiki/Fortran) and [Lisp](https://en.wikipedia.org/wiki/Lisp_(programming_language)). This history is important because it highlights that the needs of AI programmers have historically differed significantly from other areas of CS and Lisp accomdated these differences. 

In recent years Python has emerged as the language of choice for machine learning and data science.
Indeed, for those not fond of many nested parentheses, Python will be an improvement. Here are a few of the aspects of Python that make it a great choice for the needs of machine learning and data science:

* An interpreted language – allows for interactive data analysis
* Libraries for plotting and vector/matrix computation
* Many machine learning packages available:  scikit-learn, TensorFlow, PyTorch
* Language of choice for many ML practitioners (other options: R)

As an example, below is a sample visualization showing different ways to classify/label data using scikit-learn, plotted with Matplotlib.

![image](https://scikit-learn.org/stable/_images/sphx_glr_plot_classifier_comparison_001.png)


### The tools we will cover in this course:

* ``NumPy``:  highly efficient manipulation of vectors and matrices
* ``Matplotlib``: data visualization
* ``scikit-learn``:  the "standard" machine learning package in Python

### Python version and Anaconda

<img style="float: right;" src="https://upload.wikimedia.org/wikipedia/en/c/cd/Anaconda_Logo.png" alt="drawing" width="150"/>

Use version 3.X of Python.

If setting up Python on your personal machine, we recommend the [Anaconda](https://www.anaconda.com/distribution/) Python distribution which is a data-science oriented distribution that includes all the tools we will use in this course. Note that the smaller lighter [Miniconda](https://docs.conda.io/en/latest/miniconda.html) may serve you just fine. 

A discussion thread for help installing Anaconda on your machine of choice is now up on MS Teams.


### IPython and the Jupyter Notebook

The Jupyter notebook is a browser-based interface to the ``IPython`` Python shell.
In addition to executing Python/IPython statements, the notebook allows the user to include formatted text, static and dynamic visualizations, mathematical equations, and much more. 
**It is the standard way of sharing data science analyses.**

<img style="float: right;" src="https://upload.wikimedia.org/wikipedia/commons/3/38/Jupyter_logo.svg" width="100">


To invoke the jupyter notebook use the command:

```bash
jupyter notebook
```

which brings up the Jupyter notebook browser.  To open a specific notebook:

```bash 
jupyter notebook notebook_name.ipynb
```

### Class Poll - have you been to Jupyter before?

### Google Colab

<img style="float: right;" src="https://colab.research.google.com/img/colab_favicon_256px.png" alt="drawing" width="100"/>


Please also consider setting yourself up to use [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb). As I am demonstrating during this lecture, the notebooks for the class can be opened directly from the github repository for the course by clicking the "open in colab" badge.  If using Colab, make sure to copy the notebook to your google drive so that any changes you make will be saved.
Jupyter notebook and Google Colab are entirely compatible with each other, and have a very similar interface.

### A Taste of Jupyter

There are two primary types of cells in Jupyter:

In [6]:
# this is a code cell
print("Hello world!")

Hello world!


This is a **markdown** cell.  You can type *text* and good looking equations.  Double click on a markdown cell in order to see the markup.

You can have headings of various levels:

### This is a heading

As well lists (bulleted or numbered):

1. Numbered item
* Bulleted item

Equations can be created using [LaTex](https://www.latex-project.org/) without the need for an equation editor:

$$f(x) = \frac{1}{2\pi} e^{-2 x^2 / \sigma^2}$$

Here is a nice [cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) that lists markdown syntax.

Next let us explore the use of code cells:

In [3]:
2 + 3

5

You can run shell commands:

In [4]:
!ls

module00_01_intro.ipynb
module00_02_python_intro.ipynb
module00_03_numpy.ipynb
module00_04_matplotlib.ipynb
module01_01_intro.ipynb
module01_01_labeled_data.ipynb
module01_02_linalg_meet_geometry.ipynb
module01_03_probability_likelihood.ipynb
module02_01_hyperplanes.ipynb
module02_02_perceptron.ipynb
module02_03_support_vector.ipynb
module02_04_nearest_neighbors.ipynb
module02_05_nn_and_pca.ipynb
module03_01_nn_to_regression.ipynb
module03_02_linear_regression.ipynb
module03_03_derivatives_regression_outliers.ipynb
module03_04_multivariate_linear_regression.ipynb
module03_05_linear_regression_gradient_descent.ipynb
module04_01_polynomial_basis_regression.ipynb
module04_02_regularization.ipynb
module05_01_cross_validation.ipynb
module05_02_quantifying_mistakes.ipynb
module06_01_neural_networks_mlp.ipynb
module06_02_neural_networks_keras.ipynb
module06_03_neural_networks_mnist.ipynb
module06_03a_faiss_knn_mnist.ipynb
module06_04_cnns_cifar10.ipynb
module07_01_decision_trees.ipynb
module0

The `%` sign is used for *magic commands*, which are iPython shell commands.  For example, to find what other magic commands there are

In [5]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%

The `%timeit` magic is one that we will use quite regularly.  Let's learn what it's for:

In [6]:
%timeit?

[0;31mDocstring:[0m
Time execution of a Python statement or expression

Usage, in line mode:
  %timeit [-n<N> -r<R> [-t|-c] -q -p<P> -o] statement
or in cell mode:
  %%timeit [-n<N> -r<R> [-t|-c] -q -p<P> -o] setup_code
  code
  code...

Time execution of a Python statement or expression using the timeit
module.  This function can be used both as a line and cell magic:

- In line mode you can time a single-line statement (though multiple
  ones can be chained with using semicolons).

- In cell mode, the statement in the first line is used as setup code
  (executed but not timed) and the body of the cell is timed.  The cell
  body has access to any variables created in the setup code.

Options:
-n<N>: execute the given statement <N> times in a loop. If <N> is not
provided, <N> is determined so as to get sufficient accuracy.

-r<R>: number of repeats <R>, each consisting of <N> loops, and take the
best result.
Default: 7

-t: use time.time to measure the time, which is the default on U

Now try this:

In [7]:
import antigravity

### Mastering the Jupyter notebook

To be more productive in using notebooks, I highly recommend exploring the notebook keyboard shortcuts.
Here is a useful [blog post](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/) that provides a detailed overview.
You will also need to know the basics of Markdown syntax.

One of the nice features of the Jupyter notebook is that it supports writing mathematical equation using LaTex.  Here is another example of what you can do with LaTex:

$$
\sum_{i=1}^N x_i^2 + \alpha 
$$


And here is the markup that generated this formula
```latex
$$
\sum_{i=1}^N x_i^2 + \alpha
$$
```
All LaTex commands are preceded by a `\`, and as you can see, it is quite intuitive!