# #1 - Introduction to Machine Learning #

### 1) Why machine learning? ###

**Artificial intelligence** will shape our future more powerfully than any other innovation this century. Anyone who does not understand it will soon find themselves feeling left behind, waking up in a world full of technology that feels more and more like magic.

The rate of acceleration is already astounding. After a couple of AI winters and periods of false hope over the past four decades, rapid advances in data storage and computer processing power have dramatically changed the game in recent years.

##### AI & Machine Learning Achievements #####

- In 2015, Google trained a conversational agent (AI) that could not only convincingly interact with humans as a tech support helpdesk, but also discuss morality, express opinions, and answer general facts-based questions. But 2018 -> [Google Duplex](https://www.youtube.com/watch?time_continue=6&v=D5VN56jQMWM)

- Database mining Large datasets from growth of automation/web. E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand. E.g., Autonomous helicopter, handwriting recognition, most of Natural Language Processing (NLP), Computer Vision.
- Self-customizing programs E.g., Amazon, Netflix product recommendations
- Understanding human learning (brain, real AI).
- Face detection: Identify faces in images (or indicate if a face is present).

- Email filtering: Classify emails into spam and not-spam.

- Medical diagnosis: Diagnose a patient as a sufferer or non-sufferer of some disease.

- Weather prediction: Predict, for instance, whether or not it will rain tomorrow.

- Weaponizing AI [video](https://www.youtube.com/watch?v=TlO2gcs1YvM) and [claire](https://claire-ai.org/)



### 2) What is machine learning? ###

Two definitions of Machine Learning are offered. 

- Arthur Samuel described it as: **"the field of study that gives computers the ability to learn without being explicitly programmed."** This is an older, informal definition.

- Tom Mitchell provides a more modern definition: **"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."**

Example: playing chess.

E = the experience of playing many games of chess

T = the task of playing chess

P = the probability that the program will win the next game.

In general, any machine learning problem can be assigned to one of two broad classifications:

**Supervised learning & Unsupervised learning**

### 3) AI and Machine Learning ###

![AI Tree](https://cdn-images-1.medium.com/max/1000/1*QJG2nMIqWHmLp2j4c0GVuQ.png)

**Artificial intelligence** is the study of agents that perceive the world around them, form plans, and make decisions to achieve their goals. Its foundations include mathematics, logic, philosophy, probability, linguistics, neuroscience, and decision theory. Many fields fall under the umbrella of AI, such as computer vision, robotics, machine learning, and natural language processing.

**Machine learning is a subfield of artificial intelligence.** Its goal is to enable computers to learn on their own. A machine’s learning algorithm enables it to identify patterns in observed data, build models that explain the world, and predict things without having explicit pre-programmed rules and models.

Applying AI, we wanted to build better and intelligent machines. But except for few mere tasks such as finding the shortest path between point A and B, we were unable to program more complex and constantly evolving challenges. **There was a realisation that the only way to be able to achieve this task was to let machine learn from itself.** This sounds similar to a child learning from its self. So machine learning was developed as a new capability for computers. And now machine learning is present in so many segments of technology, that we don’t even realise it while using it.

Finding patterns in data on planet earth is possible only for human brains. The data being very massive, the time taken to compute is increased, and this is where Machine Learning comes into action, to help people with large data in minimum time.

If big data and cloud computing are gaining importance for their contributions, machine learning as technology helps analyse those big chunks of data, easing the task of data scientists in an automated process and gaining equal importance and recognition.

The techniques we use for data mining have been around for many years, but they were not effective as they did not have the competitive power to run the algorithms. If you run deep learning with access to better data, the output we get will lead to dramatic breakthroughs which is machine learning.

### 4) Kinds of Machine Learning ###

There are three kinds of Machine Learning Algorithms.

1. Supervised Learning

2. Unsupervised Learning

3. Reinforcement Learning

### 5) Supervised Learning ###

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

Supervised learning problems are categorized into "regression" and "classification" problems. In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

Speaking mathematically, supervised learning is where you have both input **variables (x) and output variables(Y)** and can use an algorithm to derive the mapping function from the input to the output.

The mapping function is expressed as Y = f(X).

![](https://cdn-images-1.medium.com/max/1000/1*GE2joNHaJKIKIiwxT5jE4Q.png)

**Example 1:**

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.

We could turn this example into a classification problem by instead making our output about whether the house "sells for more or less than the asking price." Here we are classifying the houses based on price into two discrete categories.

**Example 2:**

(a) Regression - Given a picture of a person, we have to predict their age on the basis of the given picture

(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or benign. 

### 6) Unsupervised Learning ###

Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.

We can derive this structure by clustering the data based on relationships among the variables in the data.

With unsupervised learning there is no feedback based on the prediction results. Mathematically, unsupervised learning is when you only have input data (X) and no corresponding output variables.

**Example:**

Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.

Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds at a [cocktail party](https://en.wikipedia.org/wiki/Cocktail_party_effect)).

Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as “people that buy X also tend to buy Y”.

### 7) Reinforcement Learning ###

A computer program will interact with a dynamic environment in which it must perform a particular goal (such as playing a game with an opponent or driving a car). The program is provided feedback in terms of rewards and punishments as it navigates its problem space.

Using this algorithm, the machine is trained to make specific decisions. It works this way: **the machine is exposed to an environment where it continuously trains itself using trial and error method.**

![](https://cdn-images-1.medium.com/max/1000/1*koj-1K7EmiERGx9l_Okjnw.png)

### 8) Model Representation ###

Model Representation

To establish notation for future use, we’ll use x(i)x^{(i)}x(i) to denote the “input” variables (living area in this example), also called input features, and y(i)y^{(i)}y(i) to denote the “output” or target variable that we are trying to predict (price). A pair (x(i),y(i))(x^{(i)} , y^{(i)} )(x(i),y(i)) is called a training example, and the dataset that we’ll be using to learn—a list of m training examples (x(i),y(i));i=1,...,m—is called a training set. Note that the superscript “(i)” in the notation is simply an index into the training set, and has nothing to do with exponentiation. We will also use X to denote the space of input values, and Y to denote the space of output values. In this example, X = Y = ℝ.

![](https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/H6qTdZmYEeaagxL7xdFKxA_2f0f671110e8f7446bb2b5b2f75a8874_Screenshot-2016-10-23-20.14.58.png?expiry=1543363200000&hmac=ldbXOMU0LejWRR9mV65knhocc6tf_Ydj8fC8GOVP-8s)

To describe the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis. Seen pictorially, the process is therefore like this:

When the target variable that we’re trying to predict is continuous, such as in our housing example, we call the learning problem a regression problem. When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call it a classification problem.

### Mathematics Needed? ###
Here is the minimum level of mathematics that is needed for Machine Learning Engineers / Data Scientists.

1. Linear Algebra (Matrix Operations, Projections, Factorisation, Symmetric Matrices, Orthogonalisation)

2. Probability Theory and Statistics (Probability Rules & Axioms, Bayes’ Theorem, Random Variables, Variance and Expectation, Conditional and Joint Distributions, Standard Distributions.)

3. Calculus (Differential and Integral Calculus, Partial Derivatives)

4. Algorithms and Complex Optimisations (Binary Trees, Hashing, Heap, Stack)

### Appendix - Next Week ###

-  What are we going to use? Tools and libraries

### Bibliography ###
- Medium Article - [1](https://medium.com/machine-learning-for-humans/why-machine-learning-matters-6164faf1df12)
- Coursera Course Machine Learning - [2](https://www.coursera.org/learn/machine-learning/)
- Posts on introduction - [3](https://towardsdatascience.com/introduction-to-machine-learning-db7c668822c4)