## Combinations Of Random Variables 
Lets start with a basic example of combinations of variables:<br>
Daily, lets say you walk on average is 1.1 hours per day. And you bike 0.6 hours per day. Also lets say you collected data for a year so you have a pretty good grasp of the mean and standard deviations of your daily walks and biking. Also lets state the distribution is normally distribution. 

* $\mu_w = 1.1$
* $\sigma_w = 0.2$
* $\mu_b = 0.6$
* $\sigma_b = 0.1$

Lets combined the activities to see how active you were overall. Lets call it total activity<Br>
So what is the mean and standard deviation of total activity time?<br>
<br>
For the mean we can calculate the total activity time by adding the activities together <br>

$\displaystyle \qquad \mu_{total} = \mu_1 + \mu_2$<br><br>
$\displaystyle \qquad \mu_{a} = \mu_w + \mu_b = 1.1 + 0.6 = 1.7$<br><br>

For the standard deviation it is not that straight forward. First you have to calculate the variance for walking and biking, then add them, and finally getting the square root of the sum to get the standard deviation<br><br>

$\displaystyle \qquad \sigma^2_{total} = \sigma^2_1 + \sigma^2_2$<br><br> 
$\displaystyle \qquad \sigma^2_a = 0.04 + 0.01 = 0.05$<br><br>
$\displaystyle \qquad \sigma_a = \sqrt{0.05} = 0.2237$<br><br>

If I wanted to find the difference of the means and standard deviations the approach is similar.<br>
For the mean you simply subtract<br>

$\displaystyle \qquad \mu_{difference} = \mu_1 - \mu_2$<br><br>
But for the standard deviation, and yes this is counter intuitive but you $\color{firebrick} \text{ sum }$ the variances to get the differences <br><br>
$\displaystyle \qquad \sigma^2_{difference} = \sigma^2_1 \color{firebrick} + \color{defaultcolor}\sigma^2_2$<br><br>
Then you get calculate the square root of the variance to get the standard deviation. <br><br>
ℹ️NOTE:<br>
<b>Since the distributions for walking and biking were both normally distributed, the sum and difference of the two are normally distributed as well</b><Br>
<br>

### Example
Lets say there is a popcorn company that sales popcorn in pie shaped tins that are separated into three equals bins<br>
We also have historical data of the weights of each flavor in each tin. Also we know this data is normally distributed<br><br>
<b><i>What is the probability that a packed tin weighs less than less than 3.25 pounds? </i></b><br>
<br>


1. <b>Document the mean and standard deviations of each flavor weights from the data</b><br>

* cheddar (D)
 * $\displaystyle \mu_d = 1$
 * $\sigma_d = 0.1$<br><br><br>

* caramel (M)
 * $\displaystyle \mu_m = 1$ 
 * $\sigma_m = 0.1$<br><br><br>

* chocolate (C)
 * $\displaystyle \mu_c = 1$ 
 * $\sigma_c = 0.1$
<br><br>



In [19]:

# importing the modules
from bokeh.plotting import figure, output_notebook, show, curdoc 
from bokeh.models import Div 
from bokeh.layouts import column
import math

# INDUSTRY PERCENTAGES
# Agriculture - 17.1%
# Industry - 29.1%
# Services - 53.8%

# file to save the model
#output_file("gfg.html")
output_notebook(hide_banner=True)

curdoc().theme = 'dark_minimal'
		
# instantiating the figure object
graph = figure(title = "3 Flavors Of Popcorn", width = 400, height = 400)

# name of the sectors
sectors = ["cheddar", "caramel", "chocolate"]

# % tage weightage of the sectors
percentages = [33.3, 33.3, 33.4]

# converting into radians
radians = [math.radians((percent / 100) * 360) for percent in percentages]

# starting angle values
start_angle = [math.radians(0)]
prev = start_angle[0]
for i in radians[:-1]:
	start_angle.append(i + prev)
	prev = i + prev

# ending angle values
end_angle = start_angle[1:] + [math.radians(0)]

# center of the pie chart
x = 0
y = 0

# radius of the glyphs
radius = 1

# color of the wedges
color = ["goldenrod", "saddlebrown", "chocolate"]

# plotting the graph
for i in range(len(sectors)):
	graph.wedge(x, y, radius
				,start_angle = start_angle[i]
				,end_angle = end_angle[i]
				,color = color[i])
				#,legend_label = sectors[i])
    
# hide x and y ticks
graph.xaxis.axis_label = None 

div = Div(text=r"""
<p>
<br>
$$\color{goldenrod} \large \text{Cheddar (D)}$$<br><br>
$$\color{saddlebrown} \large \text{Chocolate (C)}$$<br><br>
$$\color{chocolate} \large \text{Caramel (M)}$$<br><br>
<p />

""")


# displaying the graph
show(column(graph, div))


<Br>
<p>
<b>
1. Calculate the total mean weight, and total standard deviation of the flavors<br><br>
ℹ️ NOTE:<BR></b>
Since each distribution of the flavors is normally distributed, the totals of mean and standard deviation will be normalized as well<br><br>

We will denote the $\mu$ and $\sigma$ of the totalled mean and standard deviation with the <b><i>w</i></b> subscript<br>
<br>
$\mu_w$ can be calculated as:<br><br>

$\displaystyle \qquad \mu_w = \mu_d + \mu_m + \mu_c = 1 + 1 + 1 = 3$<br><br>

To calculate $\sigma_w$ we first add the variances of D, M, and C, then square root the sum.<br>
To get the variance we must first square the standard deviations of each flavor. This is a bit of odd logic given we already know the standard deviation of each, but adding the standard deviations of each to get $\sigma_w$ is not the correct calculation.<br><br>
In the real world you would probably calculate the variance of the 3 distributions first<br><br>
Here are the calculations<br><br>

$\displaystyle \qquad \sigma^2_d = 0.1^2 = 0.01$<br><br>
$\displaystyle \qquad \sigma^2_m = 0.1^2 = 0.01$<br><br>
$\displaystyle \qquad \sigma^2_c = 0.1^2 = 0.01$<br><br>
<br><br>
Now add the variances and square root the sum to get the standard deviation of w:<br><br>
$\displaystyle \text{the variance of total weight is denoted as }\sigma^2_w$<br><br>
$\displaystyle \text{the standard deviation of total weight is denoted as }\sigma_w$<br><br>
$\displaystyle \qquad \sigma^2_w = \sigma^2_d + \sigma^2_m + \sigma^2_c = 0.01 + 0.01 + 0.01 = 0.03$<br><br>
$\displaystyle \qquad \sigma_w = \sqrt{\sigma^2_w} = \sqrt{0.03} \approx 0.1732$<br><br>
</p>


In [8]:

import numpy as np 
from IPython.display import display, Math 
import sys
sys.path.insert(0, '..')
import resources.datum as datum 
import resources.glyph as glyph

data = datum.Data()
dash = glyph.Glyph(title='Popcorn Company')

mu = 3
sigma = 0.1732
neg_sigma = - 0.1732
N = 100



x = np.linspace(1, 5, N)
y = data.get_normal_dist(x = x, mu = mu, sigma = sigma)


# tin weight probabilities 
dash.make_line(x = x, y = y, width = 2, label = 'popcorn weights')
dash.make_vert_line(x = x
                    , data_point = mu
                    , mu = mu
                    , sigma = sigma
                    , label = "mean")

# standard deviation times 1
dash.make_vert_line(x = x
                    , data_point = (mu + sigma)
                    , mu = mu
                    , sigma = sigma
                    , label = "+ 1 std dev"
                    , color = 'goldenrod'
                    )

# standard deviation times - 1
dash.make_vert_line(x = x
                    , data_point = (mu - sigma)
                    , mu = mu
                    , sigma = sigma
                    , label = "- 1 std dev"
                    , color = 'goldenrod'
                    )

# tin weight of 3.25 pounds 

tin_weight = 3.25 # for image only

dash.make_vert_line(x = x
                    , data_point = tin_weight
                    , mu = mu
                    , sigma = sigma
                    , label = "3.25 pds"
                    , color = 'red'
                    ,alpha = 0.90
                    )





dash.show()

msg = '\\displaystyle \\mu : %s\\\\~\\\\'
msg = msg + '\\sigma: %s \\\\~\\\\'

display(Math(msg%(mu, sigma)))


<IPython.core.display.Math object>

<p>
The visual representation of the data points is ok, but that does answer the question what is the probability of a tin weighing 3.25 pounds?<br><br>

Notice that the the $\displaystyle \color{red}3.25$ is out side of 1 $\displaystyle \color{goldenrod} \sigma~ 3.1732$ <br><br></p>

<p><b>
2. Calculate the probability of w </b><br>
<br><br>
</p>
<b>
a. Calculate the distance between the data point and the mean<br><br>
</b>

$\displaystyle \qquad 3.25 - 3 = 0.25$<br><br>

<b>
b. Convert the distance 3.25 into a standard deviation<br><br>
</b>

$\displaystyle \qquad \dfrac{\text{data point}}{\text{std dev}} = \dfrac{0.25}{0.1732} = 1.44$<br><br>
&emsp;So 0.25 is 1.44 standard deviations from the mean<br><br>

<b>
c. Calculate the the z - score of the standard deviation 1.444<br>
</b>
&emsp;(see code below)

In [2]:

from IPython.display import display, Math
import sys
sys.path.insert(0, '..')
import resources.datum as datum 

data = datum.Data()

data_point = 3.25

std_dev_val = 1.44

z_score = data.get_z_percentile(std_dev_val)

msg = '\\displaystyle \\text{The standard deviation value  %s has a z-score probability of %s}\\\\~\\\\'
msg = msg + '\\large \\color{forestgreen} \\text{Therefore the the probabability of tin weighing less than %s is %s pct}'

display(Math(msg%(
    std_dev_val, round(z_score, 4)
    , data_point, round((z_score * 100), 2)
)))

<IPython.core.display.Math object>

In [3]:

import numpy as np 
from IPython.display import display, Math 
import sys
sys.path.insert(0, '..')
import resources.datum as datum 
import resources.glyph as glyph

data = datum.Data()
dash = glyph.Glyph(title='Popcorn Company')

mu = 3
sigma = 0.1732
neg_sigma = - 0.1732
N = 100



x = np.linspace(1, 5, N)
y = data.get_normal_dist(x = x, mu = mu, sigma = sigma)


# tin weight probabilities 
dash.make_line(x = x
               , y = y
               , width = 2
               , color = 'white'
               , label = 'popcorn weights')

# NOTE: i AM USING 0.60 INSTEAD OF 0.92 FOR VISUAL EFFECT 
x_area = x[ : int(len(x) * 0.60)]
y1_area = np.zeros(len(x_area))
y2_area = y[ : int(len(y) * 0.60)]

dash.make_varea(x = x_area
                , y1 = y1_area
                , y2 = y2_area
                , fill_color = 'forestgreen'
                , fill_alpha = 0.50
                , legend_label = "92 pct probability")


dash.show()

msg = '\\displaystyle \\mu : %s\\\\~\\\\'
msg = msg + '\\sigma: %s \\\\~\\\\'

display(Math(msg%(mu, sigma)))

<IPython.core.display.Math object>