<a href="https://colab.research.google.com/github/BethanyG/ML_Mondays_WWCodePython/blob/master/Maths_for_ML_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



                                            
 #       **Mathematics for Machine Learning** - Part I





Before we move onto `NumPy`, `Pandas`, & the whole "Scientific Python Stack", we think it would be good to get familiar with some of the underlying **mathematics for ML.**

&nbsp;

_**Please don't worry or panic!**_.  Yes - these can be very complex topics, but you don't have to master them in their _entirety_ to get benifit from them, or to use the principals & equations effectivly.  Additionally, we can always decide to deleve deeper into particular topics, if you need clarification or want more!


&nbsp;

###**At the Heart:  Random Variables**

The "learning" part of Machine Learning leverages `Random Variables`.    

In probability & statistics, a `Random Variable` is a type of _**variable**_ whose value is subject to _**variations**_ due to  mathematical _**chance**_ (in other words probability -- that's the 'random' part). 


&nbsp;

In contrast to other programming or mathematical variables (_where a variable represents a **single** unknown or assigned value or data structure_), a `Random Variable` can take on an entire _**set of possible different values**_ -- each of which carries an associated probability (odds, chance) of it happening.

&nbsp;

A `Random Variable’s` _**possible values**_ might represent the _**possible outcomes**_ of a yet-to-be-performed experiment, or the _**possible outcomes**_ of an experiment whose already-existing values are uncertain (for example -- the a result of incomplete information). `Random Variables ` can also represent either the results of an _**objectively random**_ process (_think rolling dice, spinning roulette, or a 'random walk'_), or  _**subjective random process**_ resulting from incomplete knowledge of a quantity or outcome (_What is the probability it will rain today & I'll need my umbrella??, If I surf in the Pacific, what are the chances of shark attack?, What are the chances that my bag Jelly Belly bag will have 53 red Jelly Beans?_).

&nbsp;

`Random Variables` can further be classified as either _**discrete**_ or as _**continuous**_ 

&nbsp;

A **Discrete Random Variable** represents a _countable_ number of _distinct values_ and can thus be _**quantified**_.   For example, `Random Variable` **R** can be defined as the number that comes up when you roll a "Fair Dice". So **R** can take the _**possible values**_ of **`[1,2,3,4,5,6]`** (_each of which has a probability of happening of 1/6 or 0.167 of the time_). Each potential outcome is _**distinct**_, & all possible outcomes are _**enumerated**_ (countable).

&nbsp;

In contrast, a **Continuous Random Variable** represents an _infinite number_ of possible values (uncountable). These values are drawn from an **interval** or **collection of intervals**.  


![alt text](https://github.com/BethanyG/ML_Mondays_WWCodePython/blob/master/images/Normal%20Curve.png)


For example, `Random Variable` **R** can represent the potential height of students in a class -- all of which fall within a "bell curve" or "normal curve" shaped set of intervals, but could be any one of an _infinite set_ of specific numbers. The probability (chance) of a specific student having a height that falls in a given interval is represented by the area under a section (interval) of the curve.



&nbsp;

The mathematical function (_formula, equation_) describing the possible values of a discreet or continuous `Random Variable` and the associated probabilities of each outcome is known as a _**probability distribution**_ or _**statistical distribution**_.

&nbsp;

Since all random functions, variables, & operations are based on these **_statistical distributions_** we're going to cover some of the more common and usefule ones in this session.

&nbsp;

We'll also go through some ***derivatives*** (they're one of the most important building blocks of ML algorithms -- used for thinkgs like _**Cost functions**_, & _**gradient descent**_).

&nbsp;

In the next session we'll dive deeper into statistical concepts and derivatives. In this one let's unveil equations,matrix,probability and dstributions.

Let's get started !


## Mathematical Refresher 

https://www.simplilearn.com/math-refresher-machine-learning-tutorial

## Where to find help?

1. Khan Academy-Best place for clear explanations:https://www.khanacademy.org/math/linear-algebra

2. Review Notes from Stanford :http://cs229.stanford.edu/section/cs229-linalg.pdf 


3. 3Blue1Brown- https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw

This you-tube channel is a good place to dive deep with great visualizations.

4. Coursera- https://www.coursera.org/specializations/mathematics-machine-learning

Long path but worth it and beneficial in long run 




*   Doubts?? 

Well that's why we have our slack channel, drop your doubt there :)



Advice : Start reading about the topics below and dig in further as required.

# Equations

An equation is a combination of n variables. Variables (a,b,x,y) are the terms which do not have any fixed value unlike constants(3,5,1,8).

We combine variables and constants using some mathematical operations (+,-,*,/) to create equations.

A linear equation is an equation having single power variables .

a1x1 + a2x2 + ... + anxn = b

On the other hand a polynomial equation will have powers of variables more than 1.

a1x1^2 + a2x2^3 + ... + anxn = b


Here power =degree and degree is the highest exponent of the polynomial.

## Matrix


https://www.khanacademy.org/math/algebra-home/alg-matrices

Matrix is nothing but an array.It consists of rows and columns.Arrays are nothing but a table of elements of same data types.

Vectors are one-D arrays and matrices are multi-dimensional arrays.

No of dimensions = Rank

Shape of the array = Shape 


There are operations that can be performed on matrices such as addition,subtraction etc . And intuitively that is pretty straightforward.
Every element in one array is operated with the corresponding element of another array when performing matrix operations .



You can  take a look on addition and subtracton yourself, we will cover the multiplication.


## Matrix multiplication

To multiply matrices we need dot product.

The "Dot Product" is where we multiply matching members, then sum up:

(1, 2, 3) • (7, 9, 11) = 1×7 + 2×9 + 3×11
    = 58

We match the 1st members (1 and 7), multiply them, likewise for the 2nd members (2 and 9) and the 3rd members (3 and 11), and finally sum them up.



## **Do not forget the order of matrices**

Two matrices can only be multiplied if the columns of one matrix match the rows of another matrix.

Let's  say A=(2,3) then B has to have the shape with number of rows=3.

For eg B can be (3,4) or (3,whatever)

If this order is mismatched , you can not multiply 2 matrices.

FYI:

There is a matrix called identity matrix in which only diagonal elements are 1 and 0 elsewhere.

**Points to be Noted!**

1. The number of columns in the left matrix must equal the number of rows in the right matrix.
2. The answer matrix always has the same number of rows as the left matrix and the same number of columns as the right matrix.
3. Order matters. Multiplying A•B is not the same as multiplying B•A.
4. Data in the left matrix should be arranged as rows., while data in the right matrix should be arranged as columns.

## Probability 

Again , stanford's pdf comes to rescue : http://cs229.stanford.edu/section/cs229-prob.pdf



1. The probability of any specific event is between 0 and 1 (inclusive). The sum of total probabilities of an event cannot exceed 1, that is, 0 <= p(x) <= 1.

2. Probability is all about the possibility of various outcomes. The set of all
possible outcomes is called the sample space. The sample space for a coin flip is {heads, tails}.

3. A random variable x, is a variable which randomly takes on values from a sample space. When playing with random variales each outcome is equally likely to occur.

## Distributions 


Before that continous and discrete value:

Values that can be anything between a range are continous and values specified or complete are discrete values.

## 1. Bernoulli Distribution 

The simplest distribution consisting of only 2 possible outcomes or values: 0 and 1.
0 if failure and 1 if success .

The probability mass function is given by: px(1-p)1-x 

## 2. Uniform Distribution


 Unlike Bernoulli Distribution, all the n number of possible outcomes of a uniform distribution are equally likely.

## 3. Binomial Distribution

It is a superset of Bernoulli distribution.When we have to calculate the possibilities of same event occuring more than once we head towards binomial distribution.Here also there are two possible outcomes but every trial is independent of another.
A binomial distribution having one trial is a bernoulli distribution.


https://math.stackexchange.com/questions/838107/what-is-the-difference-and-relationship-between-the-binomial-and-bernoulli-distr


## 4. Normal Distribution

It is a bell shaped distribution in which mean,median and mode are same(Time to google these terms :) ).
Number of values in the left and right of the distribution are equal which makes this distribution symmetrical.It shows how the values are distributed , most the values are centered at the peak. 

We will be using this distribution very often throughout.

## 5. Poissons Distribution

Similar to Binomial, all the events are independent of each other .
Then what's the difference ?

_Remember continuous and discrete_ ? 

Binomial is based on discrete events and poissons is based on continous events.

Breaking down furter -> Binomial is based on events that have fixed attempts and poissons is based on events having infinite attempts .

Poisson distributions are used to model occurences of events that could happen a very large number of times, but happen rarely

Further reading :



https://www.analyticsvidhya.com/blog/2017/09/6-probability-distributions-data-science/

# **Exercises**

*Practise makes a man perfect!*


Here's a list of some cool exercises so that you can apply what you learnt!

1. **Some Linear Algebra Fun** : https://www.albert.io/linear-algebra

2. **Remember multiplication of matrices is important?** :https://www.intmath.com/matrices-determinants/4-multiplying-matrices.php