# Week 1: Introduction to Machine Learning

## 1. What is machine learning?

### 1.1. Common Sense Explanation

Machine-learning algorithms detect patterns and learn how to make predictions and recommendations by processing data and experiences, rather than by receiving explicit programming instruction. The algorithms also adapt in response to new data and experiences to improve efficacy over time - <b>McKinsey</b>

### 1.2. Classical Explanations

The "classic" definition is by Tom M. Mitchell (a famous computer scientist):

<i>“A computer program is said to learn from experience <b>E</b> with some class of tasks <b>T</b> and performance measure <b>P</b> if its performance at tasks in <b>T</b>, as measured by <b>P</b>, improves with experience <b>E</b>.”</i>

For example, consider playing GTA V:

* <b>E</b> = the experience of playing many games of Breakout


* <b>T</b> = the task of playing Breakout


* <b>P</b> = the probability that the program will win the next game of Breakout.

Other definitions include:

- “Machine Learning at its most basic is the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world.” – <b>Nvidia</b> 


- “Machine learning is the science of getting computers to act without being explicitly programmed.” – <b>Stanford</b>


- “Machine learning is based on algorithms that can learn from data without relying on rules-based programming.”- <b>McKinsey & Co.</b>


- “Machine learning algorithms can figure out how to perform important tasks by generalizing from examples.” – <b>University of Washington</b>


- “The field of Machine Learning seeks to answer the question “How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?” – <b>Carnegie Mellon University</b>

-----------------

## 2. What does Machine Learning achieve?

Machine Larning provides <b>predictions</b> and <b>prescriptions</b>.  For example, see below diagram from McKinsey's Executive Guide to AI:

<img src="../Images/desPresPred.png" width=80%/>

-----------------

## 3. Machine Learning vs. Traditional Programming

<img src="../Images/tradvsml.png" width=80%/>

-----------------

## 4. Types of Marchine Learning

There are <b>three</b> kinds of machine learning:
    
1. Supervised Learning


2. Unsupervised Learning


3. Reinforcement Learning

-----------------

### 4.1. Supervised Learning

<b>What it is:</b> an algorithm uses training data and feedback from humans to learn the relationship of given inputs to a given output (eg, how the inputs “time of year” and “interest rates” predict housing prices).

<img src="../Images\supervisedLearning.png" width=20%>

<b>When to use it:</b> you know how to classify the input data and the type of behavior you want to predict, but you need the algorithm to calculate it for you on new data.

<b>Goal:</b> to train a system how to map a given input (X) to a corresponding output (Y).

<b>Dataset:</b> a dataset of input data (X) and corresponding output data (Y).  For instance, it could be a table summarising that for N number of houses, the square footage (X), and corresponding house price (Y) for each house. This is an example of a Labelled Dataset.  

<b>How:</b> the system passes the Labelled Dataset through an algorithm to iteratively generate the resulting mathematical function that best maps a given input (X) to a corresponding output (Y) with the desired degree of accuracy.  As this process is iterative, a human is often in the loop to add or remove data from the dataset, or correct machine generated outputs to provide additional feedback to improve the mathematical function being approximated by the system and therefore its accuracy.

<b>Sub-types:</b>

1. <b>Regression:</b> where the output variable is a real and continous value, e.g. price, height, temperature etc.


2. <b>Classification:</b> where the output variable is a discrete value such as a category or group, e.g. "spam vs. not spam", "black vs. white" etc.

<b>Example:</b> the ability to tag friends and family in Facebook photos has been ‘learned’ from users’ tagging of photos (including yours!) to approximate the relationships of pixels in each photo to the tags that users apply, e.g. “Alistair” vs. “Not Alistair”.

------------


### 4.2. Unsupervised Learning

<b>What it is:</b> an algorithm explores input data without being given an explicit output variable (eg, explores customer demographic data to identify patterns).

<img src="../Images\unsupervisedLearning.png" width=20%>

<b>When to use it:</b> you do not know how to classify the data, and you want the algorithm to find patterns and classify the data for you

<b>Goal:</b> to model the underlying structure and distribution of data to learn more about the data.

<b>Dataset:</b> unlike supervised learning, the dataset contains only the input data (X) and no corresponding output data (Y). 

<b>How:</b> identifies mathematical relationships determined by the input data.  The simplest possible unsupervised learning problem is probably “what’s the mean of the data”, which is “learned” through processing the input to determine the mean.

<b>Sub-types:</b>

1. <b>Association:</b> where you want to discover rules that describe large portions of your data, such as "people who tend to buy X also tend to buy Y".


2. <b>Clustering:</b> where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behaviour.

<b>Example:</b> online retailers use unsupervised “clustering” algorithms to discover the inherent groupings of customers by purchasing behaviour or “association” algorithms to describe data, e.g. people that buy books of genre A often also buy books of genre Z.

------------

### 4.3. Reinforcement Learning

<b>What it is:</b> an algorithm learns to perform a task simply by trying to maximize rewards it receives for its actions (eg, maximizes points it receives for increasing returns of an investment portfolio).

<img src="../Images\reinf.png" width=40%>

<b>When to use it:</b> you don’t have a lot of training data; you cannot clearly define the ideal end state; or the only way to learn about the environment is to interact with it.

<b>Goal:</b> to learn how to accomplish a particular goal.

<b>Dataset:</b> the variables of the environment in which the goal maximising behaviour needs to take place.

<b>How:</b> 

1. The algorithmic agent takes an action on the environment.

2. The algorithmic receives either: (a) a reward if the action brings the agent a step closer to achieving its goal; or (b) a penalty if the action takes the agent a step away from achieving its goal.

3. The algorithmic agent optimises for the best series of actions by correcting itself over time to optimise its ability to achieve its long-term goal, i.e. in a game enviroment this is winning the games vs. winning the move.

<b>Example:</b> Google's Alpha Go, whereby Alpha Go learned certain combinations of moves against different contextaul rearrangements of the environment (i.e. the other player's moves) make it more or less likely that a given move will optmise its chances of winning the overall game.

-----------------

## 5. High Level Mathematical Representation

In most machine learning problems we can describe the high level process as follows:

<img src="../Images\H6qTdZmYEeaagxL7xdFKxA_2f0f671110e8f7446bb2b5b2f75a8874_Screenshot-2016-10-23-20.14.58.png" width=30%>

This is showing that we take a dataset, pass it into the appropriate learning algorithm to generate the necessary hypothesis that is able to accurately produce an output $y$ for a given input $x$.  

## 5.1. The Hypothesis

The starting hypothesis is usually described as:

\begin{align}
h_\theta(x) = \theta_0 + \theta_1 x
\end{align}

whereby this means what function of x (i.e. $h_\theta)$)  most closely produces the correct output $y$.  $\theta_0$ and $\theta_1$ are weights (aka coefficients or model paramaters - see further below). 

In most machine learning problems our goal is to find the perfect values of $\theta_0$ and $\theta_1$ to make our hypothesis as accurate as possible in terms of producing the correct output $y$ for a given $x$ input.

## 5.2. Parameters

Parameters are key to machine learning algorithms. They are the part of the model that is <b>learned</b> from historical training data.

#### Machine Learning

In classical machine learning literature, we may think of the model as the hypothesis and the parameters as the tailoring of the hypothesis to a specific set of data.

Often model parameters are estimated using an optimization algorithm, which is a type of efficient search through possible parameter values.

#### Statistics

In statistics, you may assume a distribution for a variable, such as a Gaussian distribution. Two parameters of the Gaussian distribution are the mean (mu) and the standard deviation (sigma). This holds in machine learning, where these parameters may be estimated from data and used as part of a predictive model.
Programming: 

#### Programming

In programming, you may pass a parameter to a function. In this case, a parameter is a function argument that could have one of a range of values. In machine learning, the specific model you are using is the function and requires parameters in order to make a prediction on new data.

#### Parametric vs. non-parametric

1. Fixed number of parameters = parametric


2. Variable number of parameters = nonparametric.

## 5.2.1. Model Parameters

A model parameter is a configuration variable that is:

1. <b>internal</b> to the model;  


2. whose value <b>can</b> be estimated or learned from data;


3. <b>not</b> set by the machine learning practitioner;


4. often <b>saved</b> as part of the learned model.

Some examples of model parameters:

* The weights in an artificial neural network.


* The support vectors in a support vector machine.


* The coefficients in a linear regression or logistic regression.

## 5.2.2. Model Hyperparemeters

A model hyperparameter is a configuration that is:

1. <b>external</b> to the model;


2. whose value <b>cannot</b> be estimated or learned from data.


3. <b>are</b> set by the machine learning practitioner;


4. often <b>set heuristically, i.e. with rules</b>; and


5. tuned manually to improve their impact on the model.

We cannot know the best value for a model hyperparameter on a given problem.  Instead:

* We may use rules of thumb;


* Copy values used on other problems; or


* Search for the best value by trial and error.

When a machine learning algorithm is tuned for a specific problem, such as when you are using a grid search or a random search, then you are tuning the hyperparameters of the model or order to discover the parameters of the model that result in the most skillful predictions.

Model hyperparameters are often referred to as model parameters which can make things confusing. A good rule of thumb to overcome this confusion is as follows:

<center><b>If you have to specify a model parameter manually then it is probably a model hyperparameter.</b></center>

Some examples of model hyperparameters include:

* The learning rate for training a neural network.


* The C and sigma hyperparameters for support vector machines.


* The k in k-nearest neighbors.

-----------------------

## 6. Machine Learning vs. Deep Learning vs. AI

### 6.1. As a hierarchy

<img src="../Images\aihierarchy.png" width=100%/>

### 6.1.1. Artificial Intelligence (AI)

AI is the suitcase word that encompasses the ability of a machine to perform cognitive functions we associate with human minds, such as: (1) perceiving, (2) reasoning, (3) learning, (4) interacting with the environment, (5) problem solving, and even (6) exercising creativity. 

Examples of technologies that enable AI to solve business problems are robotics and autonomous vehicles, computer vision, language, virtual agents, and machine learning.

### 6.1.2. Machine Learning

See above!

### 6.1.3. Deep Learning

#### What is it?

Deep learning is a type of machine learning that can process a wider range of data resources, requires less data preprocessing by humans, and can often produce more accurate results than traditional machine-learning approaches (although it requires a larger amount of data to do so). 

#### How does it work?

In deep learning, interconnected layers of software-based calculators known as “neurons” form a neural network. The network can ingest vast amounts of input data and process them through multiple layers that learn increasingly complex features of the data at each layer. The network can then make a determination about the data, learn if its determination is correct, and use what it has learned to make determinations about new data. For example, once it learns what an object looks like, it can recognize the object in a new image

#### When to use it?

1. Deep Learning out performs other techniques if the data size is large. But with small data size, traditional Machine Learning algorithms are preferable.


2. Deep Learning techniques need to have high end infrastructure to train in reasonable time.


3. When there is lack of domain understanding for feature introspection, Deep Learning techniques outshines others as you have to worry less about feature engineering.


4. Deep Learning really shines when it comes to complex problems such as image classification, natural language processing, and speech recognition.

#### Useful Resources

- https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-1b6a99177063

- https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/an-executives-guide-to-ai - see deep learning sections!

-----------------

## 7. Natural Language Processing ("NLP")

NLP is a <b>subfield</b> of computer science, information engineering and AI concerned with the interactions of computers and human language.

<img src="../Images\NLP2.png" width=100%>

-----------------

## 8. Useful Resources

### 8.1. High Level 

- [McKinsey's Executive Guide to AI](https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/an-executives-guide-to-ai)
- [The above in PDF](https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Analytics/Our%20Insights/An%20executives%20guide%20to%20AI/Executives-guide-to-AI)

### 8.2. AI Business Thought Leadership

- [BCG's Reshaping Business with AI Guide](https://www.bcg.com/Images/Reshaping%20Business%20with%20Artificial%20Intelligence_tcm9-177882.pdf)
- [BCG's Is Your Business Ready fo AI Guide](https://www.bcg.com/publications/2017/strategy-technology-digital-is-your-business-ready-artificial-intelligence.aspx)
- [BCG's Competing in the AI Age Guide](https://www.bcg.com/publications/2017/competing-in-age-artificial-intelligence.aspx)
- [BCG's Build or Buy AI Dilemna Guide](https://www.bcg.com/en-gb/publications/2018/build-buy-dilemma-artificial-intelligence.aspx)
- [BCG's Ten Things Every Manager Should Know about AI Guide](https://www.bcg.com/publications/2017/technology-digital-strategy-ten-things-every-manager-should-know-about-artificial-intelligence.aspx)
- [BCG's What's Holding Back AI Guide](https://www.bcg.com/en-gb/publications/2017/infographic-what-is-holding-back-artificial-intelligence.aspx)

### 8.3. Cheatsheets

- https://ml-cheatsheet.readthedocs.io/en/latest/