# Probability

## Probability Basics

### Random Experiment

An experiment is random if the outcome of the experiment is not certain. An example is a coin toss where the outcome could be a head or a tail. 

### Sample Space

A set of all possible outcomes of an experiment is termed as a sample space. 

Considering the coin toss as a random experiment the sample space consists of all possible outcomes:
```python
S ={Head, Tail} 
```

<img src="../images/sample_space.png" style="width: 350px;">


<br/>
## Exercise:
Generate the sample space of a dice thrown and assign it to a numpy array, sample_space.

 - Print out the array

In [1]:
import numpy as np

# Modify the code below
sample_space = np.array([])


<p>What are the possible outcomes of a dice throw?</p>

In [2]:
sample_space = np.array([1, 2, 3, 4, 5, 6])
print(sample_space)

[1 2 3 4 5 6]


In [3]:
ref_tmp_var = False


try:
    ref_assert_var = False
    sample_space_ = np.array([1, 2, 3, 4, 5, 6])
    
    if np.all(sample_space == sample_space_):
      ref_assert_var = True
    else:
      ref_assert_var = False
    
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')
else:
    if ref_assert_var:
        ref_tmp_var = True
    else:
        print('Please follow the instructions given and use the same variables provided in the instructions.')


assert ref_tmp_var

continue





<br/><br/><br/>
## Random Variables

A probabilistic event has various possible outcomes. A random variable is a variable that is a set of all these possible outcomes. There are two types of random variables, discrete random variable and continous random variable:

### Discrete Random Variable

Consider the rolling of a die where possible outcomes as we learnt are between 1 to 6. Hence a random variable, 
```python
p_d = {1, 2, 3, 4, 5, 6}
```

Similarly, a random variable that describes a coin toss:
```python
p_c = {1, 0}
```

## Continous Random Variable

A Continous random variable can take on infinite values. Commonly the values are bounded by an interval. For example, picking a value on a number line in the interval (0, 1].

<img src="../images/random_variable.png" style="width: 700px;">


<br/>
## Exercise:

Consider an event where two dice are rolled.

 - Let D1 be the number on the first die
 - Let D2 be the number on the second die.
 - Let Y be the event that denotes the sum of the two random variables:
  - Y = D1 + D2
 - What values can Y take? Assign the list of values to a list variable Y and print it out.

In [4]:
import numpy as np

Y = []


<p>Use np.arange(min, max+1, step) function</p>

In [5]:
Y = np.arange(2, 13, 1)
print(Y)

[ 2  3  4  5  6  7  8  9 10 11 12]


In [6]:
ref_tmp_var = False


try:
    ref_assert_var = False
    Y_ = np.arange(2, 13, 1)
    
    if np.all(Y == Y_):
      ref_assert_var = True
    else:
      ref_assert_var = False
    
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')
else:
    if ref_assert_var:
        ref_tmp_var = True
    else:
        print('Please follow the instructions given and use the same variables provided in the instructions.')


assert ref_tmp_var

continue




<br/><br/><br/>
## Event Types

Probability of an event = Number of occurences of the event / Number of events in the sample space.


### Dependent Events

Dependent events are events where occurence of an event A is dependent or is affected by the occurence of the event B. Hence, the probability of the event A:

P(A|B)

For example, after picking a black ball from a box of colored balls, the probability of picking a black ball again will be small. In other words, the second event's probability is dependent on the color of the ball picked in the first event.

<img src="../images/dep_event.png" style="width: 700px;">


### Independent Events

Independent events are events where the occurence of an event in the sample space are not dependent or are affected by the occurrence of the rest of the events.

Hence, occurrence of the events A, B, C simultaneously is a product of the occurrence of the events independently. 

P(A, B, C) = P(A) x P(B) x P(C)

Since A is independent of B, probability of A given B, P(A|B) = P(A).

For example, if we pick a card from a deck and replace it before picking the next card, the probability of the second event is unaffected by the first event.

<img src="../images/indep_event.png" style="width: 700px;">



### Mutually Exclusive Events

Mutually exclusive events are events where occurrence of an event guarantees non-occurrence of the other events. An example is occurrence of a number 1 on dice throw will exclude occurrence of 2, 3, 4, 5 and 6.


<br/>
## Exercise:

What is the probability of an event in the rolling of the dice.

 - Assign to variable p_dice and print the value

In [5]:
# Occurence of an event / No. of events in the sample space

<p>define the sample space</p>

In [7]:
p_dice = 1.0/6.0

In [8]:
ref_tmp_var = False


try:
    ref_assert_var = False
    comp_ = abs(p_dice - 0.167)
    
    if comp_ < 0.1:
      ref_assert_var = True
    else:
      ref_assert_var = False
    
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')
else:
    if ref_assert_var:
        ref_tmp_var = True
    else:
        print('Please follow the instructions given and use the same variables provided in the instructions.')


assert ref_tmp_var

continue





<br/><br/><br/>
## Probability Distributions

### Probability Distributions

Probability Distributions are functions of a discrete random variable that measure the probability that the variable will lie in an interval. They are of two types: 

 - Probability Density Function (pdf)
 - Probability Mass Function (pmf)

A function where the area under the curve within an interval of values yields a probability of that variable that lies in the interval. This is applicable for continuous random variables.

### Probability Mass Function (PMF)

A PMF is a histogram of probability of a discrete random variable. This is most useful for our study of Data Science. Let us continue to understand pmf via the previous example of rolling two dice.

### Two Rolling Dice Problem

We looked at the scenario where two dice was rolled and studied the associated random variable. Mathematically,

 - D1: Random variable that a single die is rolled
 - D2: Random variable that another single die is rolled
 - Y = D1 + D2, the random variable of interest
 
Suppose our outcome is the number 4, the possible combinations that led to the outcome are:


<img src="../images/sumofdice.png" style="width: 350px;">

<img src="../images/twodice.png" style="width: 350px;">




You can observe that there are 4 combinations that yield the outcome 4. Also these individual combinations occur with the same probability as they are independent events (Each roll of two dice is an event and hence one event doesnt affect any other event or roll or two dice). Since the combination of 2 and 2 are just one combination, the two dice yield the combination once, hence the number of occurences of the unique combination of (2, 2) is 1.

If you were to count all possible combinations of D1, D2 for each value of Y = {2, 3, ..., 12}, then there would be 36 numbers.

Hence, the probability of number 4 = {3 combinations / 36 combinations } = 1/12 = 0.083


<br/>
## Exercise:

Scenario: You are in a guessing game where two dice are rolled and you are rewarded if you guessed the total number correctly.

 - What number would you bet on and why?
 - Assign the possible combinations of the random variable as tuples to the list Y.
 - Compute the probability of your number and assign it to p_y.

In [9]:
import numpy as np

Y = []



<p>You would like to guess the number that has the higest probability.</p>

In [10]:
Y = np.array([(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3)])
p_y = 6/36

In [11]:
ref_tmp_var = False


try:
    ref_assert_var = False
    Y_ = np.array([(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3)])
    p_y_ = 6/36
    
    if np.all(Y == Y_) and (p_y == p_y_):
      ref_assert_var = True
    else:
      ref_assert_var = False
    
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')
else:
    if ref_assert_var:
        ref_tmp_var = True
    else:
        print('Please follow the instructions given and use the same variables provided in the instructions.')


assert ref_tmp_var

continue
