# Symbulate Lab 6 - Joint and Conditional Distributions

This Jupyter notebook provides a template for you to fill in.  Read the notebook from start to finish, completing the parts as indicated.  To run a cell, make sure the cell is highlighted by clicking on it, then press SHIFT + ENTER on your keyboard.  (Alternatively, you can click the "play" button in the toolbar above.)

In this lab you will use the Symbulate package.  Many of the commands are discussed in the [Multiple RV Section](https://dlsun.github.io/symbulate/joint.html), the [Conditioning Section](https://dlsun.github.io/symbulate/conditioning.html), or the [Graphics Section](https://dlsun.github.io/symbulate/graphics.html) of the [Symbulate documentation](https://dlsun.github.io/symbulate/index.html). **You should use Symbulate commands whenever possible.**  If you find yourself writing long blocks of Python code, you are probably doing something wrong.  For example, you should not need to write any `for` loops.

There are 3 parts, and at the end of each part there are some reflection questions.  There is no need to type a response to the reflection questions, but you should think about them and discuss them with your partner to try to make sense of your simulation results.

**Warning:** You may notice that many of the cells in this notebook are not editable. This is intentional and for your own safety. We have made these cells read-only so that you don't accidentally modify or delete them. However, you should still be able to execute the code in these cells.

In [1]:
from symbulate import *
%matplotlib inline

# Part I: Two Discrete random variables

Roll a fair six-sided die five times and let $X$ be the largest of the five rolls and $Y$ the smallest.

Before proceeding, make some guesses about how the following will behave.
- Joint distribution of $X$ and $Y$
- Conditional distribution of $Y$ given $X=5$.

# a)

Define the random variables $X$ and  $Y$.

In [2]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# b)

Simulate 10000 $(X, Y)$ pairs and store the values as `xy`.  Estimate the covariance and the correlation.  ([Hint](https://dlsun.github.io/symbulate/joint.html#ampersand) and [hint](https://dlsun.github.io/symbulate/joint.html#cov) and [hint](https://dlsun.github.io/symbulate/joint.html#corr))

In [3]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# c)

Make a scatterplot of the simulated values.  ([Hint](https://dlsun.github.io/symbulate/joint.html#plot).  Note that it is recommnded to use `jitter=True` when the variables involved are discrete.)

In [4]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# d)

Make a tile plot of the simulated values.  ([Hint](https://dlsun.github.io/symbulate/graphics.html#tile))

In [5]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# e)

Use simulation to approximate the conditional distribution of $Y$ given $X=5$ and approximate the conditional mean $E(Y | X=5)$ and the conditional standard deviation.  ([Hint](https://dlsun.github.io/symbulate/conditioning.html#pipe), but also see all of the [Conditioning Section](https://dlsun.github.io/symbulate/conditioning.html).)

In [6]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# f) Reflection questions

Recall the guesses you made at the start of the problem, and inspect your results from the previous parts.  Can you explain the behavior you observed for the following?

- Joint distribution of $X$ and $Y$
- Conditional distribution of $Y$ given $X=5$.

**TYPE YOUR RESPONSE HERE.**

# Part II: Two continuous random variables

Suppose that the base $U$ and height $V$ of a random rectangle are independent random variables, with each following a Uniform(0, 1) distribution.  Let $X$ be the perimeter of the rectangle and $Y$ its area.  In this part you will investigate the joint distribution of $X$ and $Y$.

Before proceeding, make some guesses about how the following will behave.
- Joint distribution of $X$ and $Y$
- Marginal distribution of $Y$
- Conditional distribution of $Y$ given $X=2$.

# a)

Define appropriate random variables $U, V, X, Y$.  ([Hint](https://dlsun.github.io/symbulate/joint.html#unpack), but also see the [Multiple RV Section](https://dlsun.github.io/symbulate/joint.html) in general.)

In [7]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# b)

Simulate 10000 $(X, Y)$ pairs and store the values as `xy`.  Estimate the covariance and the correlation.  ([Hint](https://dlsun.github.io/symbulate/joint.html#ampersand) and [hint](https://dlsun.github.io/symbulate/joint.html#cov) and [hint](https://dlsun.github.io/symbulate/joint.html#corr))

In [8]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# c)

Make a scatterplot of the simulated values.  ([Hint](https://dlsun.github.io/symbulate/graphics.html#scatter))

In [9]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# d)

Make a two-dimensional histogram of the simulated values.  ([Hint](https://dlsun.github.io/symbulate/graphics.html#hist2d))

In [10]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# e)

Make a two-dimensional density plot of the simulated values.  ([Hint](https://dlsun.github.io/symbulate/graphics.html#density2d))

In [11]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# f)

Use simulation to approximate the marginal distribution of $Y$ and approximate its mean and standard deviation.

In [12]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# g)

Use simulation to approximate the conditional distribution of $Y$ given $X=2$ and approximate the conditional mean $E(Y | X=2)$ and the conditional standard deviation.  (Warning: be careful!  See this [hint](https://dlsun.github.io/symbulate/conditioning.html#pipe) and especially this [hint](https://dlsun.github.io/symbulate/conditioning.html#continuous).)

In [13]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# h) Reflection questions

Recall the guesses you made at the start of the problem, and inspect your results from the previous parts.  Can you explain the behavior you observed for the following?

- Joint distribution of $X$ and $Y$
- Marginal distribution of $Y$
- Conditional distribution of $Y$ given $X=2$.

**TYPE YOUR RESPONSE HERE.**

# Part III: Joint Gaussian random variables

Just like Gaussian (Normal) distributions are the most important probability distributions, joint Gaussian (Multivariate Normal) distributions are the most important joint distributions.  In this part you will investigate two random variables which have a joint Gaussian distribution.

Suppose that SAT Math ($M$) and Reading ($R$) scores of CalPoly students have a Bivariate Normal
(joint Gaussian) distribution.
- Math scores have mean 635 and SD 85.
- Reading scores have mean 595 and SD 70.
- The correlation between scores is 0.6.

Let $X = M + R$, the total of the two scores.  Let $Y = M- R$, the difference between Math and Reading scores.

# a)

Define RVs $M, R, X, Y$. ([Hint](https://dlsun.github.io/symbulate/common_joint.html#bvn))

In [14]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# b)

Simulate 10000 $(M, R)$ pairs.  Use the simulation results to approximate $E(M)$, $E(R)$, $SD(M)$, $SD(R)$, and $Corr(M, R)$. 

In [15]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# c)

Make a scatterplot of the simulated values.  Add histograms of the marginal distributions.  (Hint: `.plot(type=["scatter", "marginal"])`.

In [16]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# d)

Make a density plot of the simulated values. Add density plots of the marginal distributions.  (Hint: `.plot(type=["density", "marginal"])`.

In [17]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# e)

Now simulate 10000 values of $X = M+R$.  Plot the approximate distribution of $X$ and estimate $E(X)$ and $SD(X)$.

In [18]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# f)

Now simulate 10000 values of $Y = M - R$.  Plot the approximate distribution of $Y$ and estimate $E(Y)$ and $SD(Y)$.

In [19]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# g)

Use simulation to approximate the distribution of $M$ given $R=700$.  Make a plot of the approximate distribution and estimate the conditional mean $E(M | R = 700)$ and the conditional standard deviation.  (Warning: be careful!  See this [hint](https://dlsun.github.io/symbulate/conditioning.html#pipe) and especially this [hint](https://dlsun.github.io/symbulate/conditioning.html#continuous).)

In [20]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

# h) Reflection questions

Inspect your results from the previous parts.

- How would you describe the shape of the scatterplot/density plot of $M$ and $R$?
- How would you describe the marginal distributions of $M$ and $R$?
- How does the distribution of $M+R$ compare to the distribution of $M-R$?  In particular, how do the SDs compare?  How do the SDs compare to the case when $M$ and $R$ are independent? Can you explain why this makes sense?
- How would you describe the conditional distribution of $M$ given $R=700$?  How does it compare to the marginal distribution of $M$?  Can you explain why this makes sense?  Be sure to consider mean and sd.

**TYPE YOUR RESPONSE HERE.**

## Submission Instructions

Before you submit this notebook, click the "Kernel" drop-down menu at the top of this page and select "Restart & Run All". This will ensure that all of the code in your notebook executes properly. Please fix any errors, and repeat the process until the entire notebook executes without any errors.