Available at http://www.comp.nus.edu.sg/~cs3244/1910/02.colab

![Machine Learning](https://www.comp.nus.edu.sg/~cs3244/1910/img/banner-1910.png)
---
See **Credits** below for acknowledgements and rights.  For NUS class credit, you'll need to do the corresponding _Assessment_ in [CS3244 in Coursemology](http://coursemology.org/courses/1677) by the respective deadline (as in Coursemology). 

**You must acknowledge that your submitted Assessment is your independent work, see questions in the Assessment at the end.**


**Learning Outcomes for Week 02**

After finishing these exercises and watching the videos, you should be able to:
* Concept Learning:
  * Have a basic understanding of concepts (as target functions), hypotheses, attributes, and hypotheses spaces.
  * Understand the inductive learning bias and assumption and how the results in feasible learning.
  * Understand the basic methods for concept learning using the Find-S, Rote-Learner, and Candidate-Elimination algorithm.
* Review of Jupyter notebook
  * Understand the role of Python, Jupyter Notebook, and Google Colab.
  * Understand the documentation methods in notebooks.
  * Be aware of the execution of notebooks and some common mistakes.
* Review of the Mathematical Foundations of Machine Learning
  * Review basic linear algebra and probability needed for this course.

_Welcome to the Week 02 Python notebook._ This week is a special week because we are learning terminology and recapping background for you.  As such, the video lectures and the notebook aren't going to cover the same things.  We introduce **concept learning** in the lecture videos, and will be reviewing this material in Tutorial 2, which will combine both this week (Week 2) and Week 3's material.  

But this notebook, we'll be doing something else completely different.  We'll be examining the fundamentals for the programming / experimental component through Jupyter notebook in the Pre-tutorial Work, and then reviewing your math fundamentals in the Post-tutorial Work.

# Week 02: Pre-tutorial Work

## 1 Introduction to Python notebook

In this section, we will have a brief introduction to the Python notebook environment, so as to level out the expertise on this important technology among all of our students.  This week you don't have a Pre-tutorial Assessment to copy into Coursemology, so the work is a bit lighter this week.

### .a What is Jupyter Notebook?

As a server-client application, the Jupyter Notebook application allows you to edit and run your Python notebooks via a web browser. The application can be executed on a PC without Internet access, or it can be installed on a remote server, where you can access it through the Internet.



Its two main components are the $kernels$ and a $dashboard$.

* A $kernel$ is a program that runs and introspects the user’s code. The Jupyter Notebook App has a kernel for Python code, but there are also kernels available for other programming languages.

* The $dashboard$ of the application not only shows you the notebook documents that you have made and can reopen but can also be used to manage the kernels: you can know which ones are running and shut them down if necessary.

As discussed in W1's class, Jupyter can be slightly tricky to set up on a local computer; so for our class, we let Google do the work for us.  They have set up a web-based environment that replicates the Jupyter notebook environment, which we are heavily using in this class.  This web layer, [Google Colab](https://colab.research.google.com/notebooks/welcome.ipynb), allows us to use Jupyter notebook in web browsers directly without the need for installation **and** take advantage of some of Google's compute resources for free.  

For the more technically savvy or interested, you may want to set up the Jupyter environment on your own computer so you can use Jupyter and Python offline.

### .b Getting started with Python Notebooks

A Python Notebook is made up of a number of **cells**. There are mainly two type of cells: *text* and *code*. In *text* cells, you can write plain text, `MarkDown` or even `LaTeX`. In the *code* cells, you can write code blocks (obviously in Python!). You can execute a code cell by clicking on it and pressing `shift`+`enter`. When you do so, the code in the cell will run, and the output of the cell will be displayed beneath the cell. Let's look at the following images to better understand how can you execute a code block.

#### Step 1

Create a code cell and write any python code you wish to execute. Let's say we write the following code:

<img src="https://www.comp.nus.edu.sg/~cs3244/1910/img/intro_to_colab_01.png" width="800">

#### Step 2
Now if you want to execute the code block, press the play button as shown in the image below. Once you run the one code cell, the output will be displayed. After running the notebook looks like this:

<img src="https://www.comp.nus.edu.sg/~cs3244/1910/img/intro_to_colab_02.png" width="800">

#### Step 3
If you declare any variable in any of the cell above the current one, then the variable is considered as a **Global Variable**. Global variables are shared between all cells after execution (inclusive of cells above the execution site). Executing the second cell thus gives the following result:

<img src="https://www.comp.nus.edu.sg/~cs3244/1910/img/intro_to_colab_03.png" width="800">

#### A Common Mistake
By convention, the notebooks are expected to be run from top to bottom. Forgetting to execute some cells in between or executing them out-of-order can result in $\color{red}{errors}$.  For example, if we don't run the first cell which calculates the value of $x$, then we will get some error like this:

<img src="https://www.comp.nus.edu.sg/~cs3244/1910/img/intro_to_colab_04.png" width="800">

This out-of-order execution is meant to be a "feature" but can cause headaches for many when replicating errors.  You may sometimes want to start over by resetting the runtime (from the `runtime` menu item).

#### Save your notebook
After you modified a notebook for one of the assignments by modifying or executing some of its cells, remember to **save your changes**!

<img src="https://www.comp.nus.edu.sg/~cs3244/1910/img/intro_to_colab_05.png" width="500">

### .c Code

Great!  Now we have the basic idea of how a notebook works. Let's get ourselves better acquainted by running code. Let's follow along on this exercise.

As with simple `.py` codes, we can import whatever library we want for our task:

In [0]:
## You can start with importing the necessary libraries for your code
# Here we are importing two important libraries, pandas and numpy
# - pandas helps with data analytics and data science, organizing data into _data frames_, often denoted with a variable "df"
# - numpy helps with numerical calculations, especially in our case, with linear algebra and general matrix operations 
import pandas as pd
import numpy as np

In [0]:
## After that, you can add, remove or edit code cells according to your needs
# 
# Here we are creating a data frame using the pandas library on a numpy array that is declared inline.
df = pd.DataFrame(data=np.array([[1,2,3],[4,5,6]],dtype=int), columns=['A','B','C'])

In [0]:
## Show the dataframe
#
# We ask the data frame to print itself out for our inspection
df

In [0]:
# Don't forget to insert explanatory text or titles and subtitles to clarify your code.  
# TAs and staff will be reading your code and assigning grades for clarity

### .d Text

Text cells can either follow regular text format or [`Markdown`](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) format.

**Pro Tip**: Jupyter notebook also supports `LaTeX` math notation through a the MathJax library. If you want to insert `LaTeX` in your text cells, put your `LaTeX` math within $ delimiters, just like the below.  **N.B.** You can double-click the formula to view the raw input, just with any editable Jupyter notebook:

$c = \sqrt{a^2 + b^2}$

You can also choose to ```display()``` your `LaTeX` output:

In [0]:
from IPython.display import display, Math, Latex
display(Math(r'\sqrt{a^2 + b^2}')) 

### .e 'Magical Commands' in Python Notebook

There are some predefined **‘magic functions’** that will make your work a lot more interactive.

To see which magic commands you have available in your interpreter, you can simply run the following:

In [0]:
%lsmagic

If you're looking for more information on the magics commands or on functions, you can always use the "?" (question mark), just like this:

In [0]:
# Retrieving documentation on the alias_magic command
?%alias_magic

In [0]:
# Retrieving information on the range() function
?range

Note that if you want to start a single-line expression to run with the `magics` command, you can do this by using "%" . For multi-line expressions, use "%%" . The following example illustrates the difference between the two:

In [0]:
%time x = range(100)

In [0]:
%%timeit x = range(100)
    max(x)

### .f Tips To Use Python Notebook Effectively and Efficiently

When you write code in a Python notebook:
- Provide comments and documentation to your code, which will improve the readability of your code
- Consider a consistent naming scheme, code grouping, limit your line length, etc
- Refactor your code if necessary

When sharing your notebook with others:
- Try to keep the cells of your notebook simple: don't exceed the width of your cell and make sure that you don't put too many related functions in one cell.  
- If possible, import your packages in the first code cell of your notebook, and
- Display the graphics inline by using the magic command `%matplotlib inline`

#### Since Python Notebook will be one of the main tools used in CS3244 assessments, it's highly encouraged that you search online and find more about it. Make sure you are familar with this tool 😎😎

**N.B.** The contents of this part of notebook is adapted from *Jupyter Notebook* tutorial in [DataCamp Community](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook).

---
# Week 2: Post-Tutorial Work

You will watch additional videos on **concept learning** in the post-tutorial videos to finish up our understanding of concept learning.

**Your Turn (Question 1)**: What should you do if you find your code running for an unexpectedly long time?

Choose from: _Leave it, stop running this cell, shut down the kernel_

**Your Turn (Question 2)**: What should you do if you find the notebook not responding?

Choose from: _Leave it, close the page, stop running the notebook_

## 2 Mathematical Prerequisites

This is because this week we want you to devote some time to reviewing the prerequisites for this course.  Everyone has slightly different foundations coming into this course, so as with the Jupyter notebook tutorial, it's important to level out everyone's basic expertise with respect to our mathematical foundation as well.  

We've also uploaded a mathematical review .pdf file for your self review.  If you have questions or aren't familiar with some of the concepts there, please come to our help session during the normal class time. 

**Linear Algebra Review**.
* For those who have a good grasp of Linear Algebra already, you may just want to refresh yourself using [Chapter 2 Linear Algebra](http://www.deeplearningbook.org/contents/linear_algebra.html) of the Goodfellow et al. _[Deep Learning](http://www.deeplearningbook.org/)_ book (The book itself is very good, we'd recommend it for any of you wanting to learn more about deep learning, plus it's freely accessible online).
* If you're a little shakier with your LA foundations, you can work your way back up.  You should be familar with Eigendecomposition from your previous linear algebra course.  Have a look at 3Blue1Brown's [Essence of Linear Algebra](http://3b1b.co/eola) playlist.  This is a beautiful series of videos that have been [animated in Python](https://github.com/3b1b/manim) (go Python!) by its creator.  Watch what you need to understand the penultimate [Eigenvectors and eigenvalues](https://www.youtube.com/watch?v=PFDu9oVAE-g&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&index=14).  As stated by the 3Blue1Brown, Eigen-"things" are not too difficult to understand once you have a solid foundation on linear transformations, determinants, linear systems and change of basis.
* Review the textbook from _[Immersive Math](http://immersivemath.com/ila/)_.  This is a recent, online textbook for linear algebra that is close to finished.  Many of the linear algebra concepts will be much better explained by this textbook through its 3D interactive animations.

**Probability and Statistics Review**
These two terms are inverse processes between data and a model.  _Probability_ is the process of analyzing a model to discover what data it will generate; _Statistics_ uses data to generate a model.  The second definition is close to what we mean by ML, so it's no surprise that statistics is a primary contributor to the ML world; in fact, in much more rigorous detail than we will go over in this course.   
* It's good to have a good grasp of the [normal distribution](http://www.statisticshowto.com/probability-and-statistics/normal-distributions/).  I'd suggest being familiar with its properties.  These will be handy when we talk about random noise affecting learned models.
* For this lecture where we discuss Bayes' Rule in part, understanding different parts that make up the posterior, and how it can be calculated from the data likelihoods, marginals and priors.

**Your Turn (Question 3)**: Did you actually review the resources above to your satisfaction?

Choose from: _Yes I did, No I didn't, I wanted to . . ., I did a cursory review_

### .a Quiz Time (hidden until you open it) 
Now that you think you've done an adequate review; let's have a mock test.    When you're ready, open up the below section and take the mini-quiz.  Resist the temptation to look up answers on the Web.  But don't forget to take this mini-quiz before turning in your post-tutorial notebook, ok?  

Again, you can do the submission as many times as you like until you have a version that you're happy with, then finalize your submission.

**Your Turn (Question 4)**: Is matrix multiplication associative?  Is it commutative?   

Choose from: _Yes to both, No to both, Yes to associative and no to commutative, Yes to commutative and no to associative_

**Your Turn (Question 5)**: Explain what the L2 norm is, in layman terms.

_Replace with your answer_

**The _Monty Hall Problem_**.  

<img src="https://www.comp.nus.edu.sg/~cs3244/1910/img/Monty_open_door.svg" width="600">

(photo credits: in the public domain, authored by user Cepheus @ _Wikipedia_)

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?"  

**Your Turn (Question 6)**: Is it to your advantage to switch your choice?

Choose from: _Yes, No, It depends_

**Your Turn (Question 7)**: From the above Question 4, why or why not? Justify, as you would to a peer in the class.  

You may find it useful to use [LaTeX math notation](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html), which is probably easier learned from examples than in the tutorial linked.  You can see examples of the source for the equations we use in our notebooks simply by double-clicking cells in your copy of the notebook).

_Replace with your answer_

Some of you may already be familiar with this problem from previous classes before university.  [Monty Hall](https://en.wikipedia.org/wiki/Monty_Hall) was a famous TV personality, hosting a game show _Let's make a deal_ in the 1960s to the 1980s.  He offered versions of this problem to contestants.  This problem flummoxed many members of the public, including mathematics Ph.D. holders and (famously, [Paul Ërdos](https://en.wikipedia.org/wiki/Paul_Erd%C5%91s)), who wrote in to [Marilyn von Savant](https://en.wikipedia.org/wiki/Marilyn_vos_Savant), the columnist who offered the correct solution to the problem, to tell her she was wrong.


**N.B.** If you got stuck on any of these problems, feel free to look it up on the Web (don't forget to give credit in your _Declaration_), and discuss it on the discussion forum so that everyone can benefit.  You can also come to the in-class help session, where some of our staff will be able to help you through either parts of this notebook and materials.

# Credits


Authored by Kong Zijin, [Min-Yen Kan](http://www.comp.nus.edu.sg/~kanmy) and Mohammad Neamul Kabir (2019), affiliated with [WING](http://wing.comp.nus.edu.sg), [NUS School of Computing](http://www.comp.nus.edu.sg) and [ALSET](http://www.nus.edu.sg/alset). Inspired in part by Andrew Ng's Coursera course and Yaser S. Abu-Mostafa's Caltech course.
Licensed as: [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/ ) (CC BY 4.0).
Please retain and add to this credits cell if using this material as a whole or in part.   Credits for photos given in their captions.