🏷️sec_random_variables
In :numref:sec_prob
we saw the basics of how to work with discrete random variables, which in our case refer to those random variables which take either a finite set of possible values, or the integers. In this section, we develop the theory of continuous random variables, which are random variables which can take on any real value.
Continuous random variables are a significantly more subtle topic than discrete random variables. A fair analogy to make is that the technical jump is comparable to the jump between adding lists of numbers and integrating functions. As such, we will need to take some time to develop the theory.
To understand the additional technical challenges encountered when working with continuous random variables, let us perform a thought experiment. Suppose that we are throwing a dart at the dart board, and we want to know the probability that it hits exactly
To start with, we imagine measuring a single digit of accuracy, that is to say with bins for
However, when we look closer, this does not match our question! We wanted exact equality, whereas these bins hold all that fell between say
Undeterred, we continue further. We measure even more precisely, say
However, this does not solve anything! We have just pushed the issue down one digit further. Let us abstract a bit. Imagine we know the probability that the first
What this means is that in essence each additional digit of accuracy we require should decrease probability of matching by a factor of
The value
Notice that if we know the position accurate to
Let us take this one final step further. We have been thinking about the point
eq_pdf_deriv
Indeed, :eqref:eq_pdf_deriv
precisely defines the probability density function. It is a function
%matplotlib inline
from d2l import mxnet as d2l
from IPython import display
from mxnet import np, npx
npx.set_np()
# Plot the probability density function for some random variable
x = np.arange(-5, 5, 0.01)
p = 0.2*np.exp(-(x - 3)**2 / 2)/np.sqrt(2 * np.pi) + \
0.8*np.exp(-(x + 1)**2 / 2)/np.sqrt(2 * np.pi)
d2l.plot(x, p, 'x', 'Density')
#@tab pytorch
%matplotlib inline
from d2l import torch as d2l
from IPython import display
import torch
torch.pi = torch.acos(torch.zeros(1)).item() * 2 # Define pi in torch
# Plot the probability density function for some random variable
x = torch.arange(-5, 5, 0.01)
p = 0.2*torch.exp(-(x - 3)**2 / 2)/torch.sqrt(2 * torch.tensor(torch.pi)) + \
0.8*torch.exp(-(x + 1)**2 / 2)/torch.sqrt(2 * torch.tensor(torch.pi))
d2l.plot(x, p, 'x', 'Density')
#@tab tensorflow
%matplotlib inline
from d2l import tensorflow as d2l
from IPython import display
import tensorflow as tf
tf.pi = tf.acos(tf.zeros(1)).numpy() * 2 # Define pi in TensorFlow
# Plot the probability density function for some random variable
x = tf.range(-5, 5, 0.01)
p = 0.2*tf.exp(-(x - 3)**2 / 2)/tf.sqrt(2 * tf.constant(tf.pi)) + \
0.8*tf.exp(-(x + 1)**2 / 2)/tf.sqrt(2 * tf.constant(tf.pi))
d2l.plot(x, p, 'x', 'Density')
The locations where the function value is large indicates regions where we are more likely to find the random value. The low portions are areas where we are unlikely to find the random value.
Let us now investigate this further. We have already seen what a probability density function is intuitively for a random variable
eq_pdf_def
But what does this imply for the properties of
First, probabilities are never negative, thus we should expect that
Second, let us imagine that we slice up the eq_pdf_def
the probability is approximately
so summed over all of them it should be
This is nothing more than the approximation of an integral discussed in :numref:sec_integral_calculus
, thus we can say that
We know that
Indeed, digging into this further shows that for any
We may approximate this in code by using the same discrete approximation methods as before. In this case we can approximate the probability of falling in the blue region.
# Approximate probability using numerical integration
epsilon = 0.01
x = np.arange(-5, 5, 0.01)
p = 0.2*np.exp(-(x - 3)**2 / 2) / np.sqrt(2 * np.pi) + \
0.8*np.exp(-(x + 1)**2 / 2) / np.sqrt(2 * np.pi)
d2l.set_figsize()
d2l.plt.plot(x, p, color='black')
d2l.plt.fill_between(x.tolist()[300:800], p.tolist()[300:800])
d2l.plt.show()
f'approximate Probability: {np.sum(epsilon*p[300:800])}'
#@tab pytorch
# Approximate probability using numerical integration
epsilon = 0.01
x = torch.arange(-5, 5, 0.01)
p = 0.2*torch.exp(-(x - 3)**2 / 2) / torch.sqrt(2 * torch.tensor(torch.pi)) +\
0.8*torch.exp(-(x + 1)**2 / 2) / torch.sqrt(2 * torch.tensor(torch.pi))
d2l.set_figsize()
d2l.plt.plot(x, p, color='black')
d2l.plt.fill_between(x.tolist()[300:800], p.tolist()[300:800])
d2l.plt.show()
f'approximate Probability: {torch.sum(epsilon*p[300:800])}'
#@tab tensorflow
# Approximate probability using numerical integration
epsilon = 0.01
x = tf.range(-5, 5, 0.01)
p = 0.2*tf.exp(-(x - 3)**2 / 2) / tf.sqrt(2 * tf.constant(tf.pi)) +\
0.8*tf.exp(-(x + 1)**2 / 2) / tf.sqrt(2 * tf.constant(tf.pi))
d2l.set_figsize()
d2l.plt.plot(x, p, color='black')
d2l.plt.fill_between(x.numpy().tolist()[300:800], p.numpy().tolist()[300:800])
d2l.plt.show()
f'approximate Probability: {tf.reduce_sum(epsilon*p[300:800])}'
It turns out that these two properties describe exactly the space of possible probability density functions (or p.d.f.'s for the commonly encountered abbreviation). They are non-negative functions
eq_pdf_int_one
We interpret this function by using integration to obtain the probability our random variable is in a specific interval:
eq_pdf_int_int
In :numref:sec_distributions
we will see a number of common distributions, but let us continue working in the abstract.
In the previous section, we saw the notion of the p.d.f. In practice, this is a commonly encountered method to discuss continuous random variables, but it has one significant pitfall: that the values of the p.d.f. are not themselves probabilities, but rather a function that we must integrate to yield probabilities. There is nothing wrong with a density being larger than
In particular, by using :eqref:eq_pdf_int_int
, we define the c.d.f. for a random variable
Let us observe a few properties.
-
$F(x) \rightarrow 0$ as$x\rightarrow -\infty$ . -
$F(x) \rightarrow 1$ as$x\rightarrow \infty$ . -
$F(x)$ is non-decreasing ($y > x \implies F(y) \ge F(x)$). -
$F(x)$ is continuous (has no jumps) if$X$ is a continuous random variable.
With the fourth bullet point, note that this would not be true if
In this example, we see one of the benefits of working with the c.d.f., the ability to deal with continuous or discrete random variables in the same framework, or indeed mixtures of the two (flip a coin: if heads return the roll of a die, if tails return the distance of a dart throw from the center of a dart board).
Suppose that we are dealing with a random variables
The mean encodes the average value of a random variable. If we have a discrete random variable
eq_exp_def
The way we should interpret the mean (albeit with caution) is that it tells us essentially where the random variable tends to be located.
As a minimalistic example that we will examine throughout this section, let us take eq_exp_def
that, for any possible choice of
Thus we see that the mean is
Because they are helpful, let us summarize a few properties.
- For any random variable
$X$ and numbers$a$ and$b$ , we have that$\mu_{aX+b} = a\mu_X + b$ . - If we have two random variables
$X$ and$Y$ , we have$\mu_{X+Y} = \mu_X+\mu_Y$ .
Means are useful for understanding the average behavior of a random variable, however the mean is not sufficient to even have a full intuitive understanding. Making a profit of $$10 \pm
This leads us to consider the variance of a random variable. This is a quantitative measure of how far a random variable deviates from the mean. Consider the expression
A reasonable thing to try is to look at
In particular, they look at
eq_var_def
The last equality in :eqref:eq_var_def
holds by expanding out the definition in the middle, and applying the properties of expectation.
Let us look at our example where
Thus, we see that by :eqref:eq_var_def
our variance is
This result again makes sense. The largest
We will list a few properties of variance below:
- For any random variable
$X$ ,$\mathrm{Var}(X) \ge 0$ , with$\mathrm{Var}(X) = 0$ if and only if$X$ is a constant. - For any random variable
$X$ and numbers$a$ and$b$ , we have that$\mathrm{Var}(aX+b) = a^2\mathrm{Var}(X)$ . - If we have two independent random variables
$X$ and$Y$ , we have$\mathrm{Var}(X+Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)$ .
When interpreting these values, there can be a bit of a hiccup. In particular, let us try imagining what happens if we keep track of units through this computation. Suppose that we are working with the star rating assigned to a product on the web page. Then
This summary statistics can always be deduced from the variance by taking the square root! Thus we define the standard deviation to be
In our example, this means we now have the standard deviation is
The properties we had for the variance can be restated for the standard deviation.
- For any random variable
$X$ ,$\sigma_{X} \ge 0$ . - For any random variable
$X$ and numbers$a$ and$b$ , we have that$\sigma_{aX+b} = |a|\sigma_{X}$ - If we have two independent random variables
$X$ and$Y$ , we have$\sigma_{X+Y} = \sqrt{\sigma_{X}^2 + \sigma_{Y}^2}$ .
It is natural at this moment to ask, "If the standard deviation is in the units of our original random variable, does it represent something we can draw with regards to that random variable?" The answer is a resounding yes! Indeed much like the mean told we the typical location of our random variable, the standard deviation gives the typical range of variation of that random variable. We can make this rigorous with what is known as Chebyshev's inequality:
eq_chebyshev
Or to state it verbally in the case of
To see how this statement is rather subtle, let us take a look at our running example again where eq_chebyshev
with
This means that
Let us visualize this. We will show the probability of getting the three values as three vertical bars with height proportional to the probability. The interval will be drawn as a horizontal line in the middle. The first plot shows what happens for
# Define a helper to plot these figures
def plot_chebyshev(a, p):
d2l.set_figsize()
d2l.plt.stem([a-2, a, a+2], [p, 1-2*p, p], use_line_collection=True)
d2l.plt.xlim([-4, 4])
d2l.plt.xlabel('x')
d2l.plt.ylabel('p.m.f.')
d2l.plt.hlines(0.5, a - 4 * np.sqrt(2 * p),
a + 4 * np.sqrt(2 * p), 'black', lw=4)
d2l.plt.vlines(a - 4 * np.sqrt(2 * p), 0.53, 0.47, 'black', lw=1)
d2l.plt.vlines(a + 4 * np.sqrt(2 * p), 0.53, 0.47, 'black', lw=1)
d2l.plt.title(f'p = {p:.3f}')
d2l.plt.show()
# Plot interval when p > 1/8
plot_chebyshev(0.0, 0.2)
#@tab pytorch
# Define a helper to plot these figures
def plot_chebyshev(a, p):
d2l.set_figsize()
d2l.plt.stem([a-2, a, a+2], [p, 1-2*p, p], use_line_collection=True)
d2l.plt.xlim([-4, 4])
d2l.plt.xlabel('x')
d2l.plt.ylabel('p.m.f.')
d2l.plt.hlines(0.5, a - 4 * torch.sqrt(2 * p),
a + 4 * torch.sqrt(2 * p), 'black', lw=4)
d2l.plt.vlines(a - 4 * torch.sqrt(2 * p), 0.53, 0.47, 'black', lw=1)
d2l.plt.vlines(a + 4 * torch.sqrt(2 * p), 0.53, 0.47, 'black', lw=1)
d2l.plt.title(f'p = {p:.3f}')
d2l.plt.show()
# Plot interval when p > 1/8
plot_chebyshev(0.0, torch.tensor(0.2))
#@tab tensorflow
# Define a helper to plot these figures
def plot_chebyshev(a, p):
d2l.set_figsize()
d2l.plt.stem([a-2, a, a+2], [p, 1-2*p, p], use_line_collection=True)
d2l.plt.xlim([-4, 4])
d2l.plt.xlabel('x')
d2l.plt.ylabel('p.m.f.')
d2l.plt.hlines(0.5, a - 4 * tf.sqrt(2 * p),
a + 4 * tf.sqrt(2 * p), 'black', lw=4)
d2l.plt.vlines(a - 4 * tf.sqrt(2 * p), 0.53, 0.47, 'black', lw=1)
d2l.plt.vlines(a + 4 * tf.sqrt(2 * p), 0.53, 0.47, 'black', lw=1)
d2l.plt.title(f'p = {p:.3f}')
d2l.plt.show()
# Plot interval when p > 1/8
plot_chebyshev(0.0, tf.constant(0.2))
The second shows that at
# Plot interval when p = 1/8
plot_chebyshev(0.0, 0.125)
#@tab pytorch
# Plot interval when p = 1/8
plot_chebyshev(0.0, torch.tensor(0.125))
#@tab tensorflow
# Plot interval when p = 1/8
plot_chebyshev(0.0, tf.constant(0.125))
The third shows that for
# Plot interval when p < 1/8
plot_chebyshev(0.0, 0.05)
#@tab pytorch
# Plot interval when p < 1/8
plot_chebyshev(0.0, torch.tensor(0.05))
#@tab tensorflow
# Plot interval when p < 1/8
plot_chebyshev(0.0, tf.constant(0.05))
This has all been in terms of discrete random variables, but the case of continuous random variables is similar. To intuitively understand how this works, imagine that we split the real number line into intervals of length eq_exp_def
say that
where
Similarly, using :eqref:eq_var_def
the variance can be written as
Everything stated above about the mean, the variance, and the standard deviation still applies in this case. For instance, if we consider the random variable with density
we can compute
and
As a warning, let us examine one more example, known as the Cauchy distribution. This is the distribution with p.d.f. given by
# Plot the Cauchy distribution p.d.f.
x = np.arange(-5, 5, 0.01)
p = 1 / (1 + x**2)
d2l.plot(x, p, 'x', 'p.d.f.')
#@tab pytorch
# Plot the Cauchy distribution p.d.f.
x = torch.arange(-5, 5, 0.01)
p = 1 / (1 + x**2)
d2l.plot(x, p, 'x', 'p.d.f.')
#@tab tensorflow
# Plot the Cauchy distribution p.d.f.
x = tf.range(-5, 5, 0.01)
p = 1 / (1 + x**2)
d2l.plot(x, p, 'x', 'p.d.f.')
This function looks innocent, and indeed consulting a table of integrals will show it has area one under it, and thus it defines a continuous random variable.
To see what goes astray, let us try to compute the variance of this. This would involve using :eqref:eq_var_def
computing
The function on the inside looks like this:
# Plot the integrand needed to compute the variance
x = np.arange(-20, 20, 0.01)
p = x**2 / (1 + x**2)
d2l.plot(x, p, 'x', 'integrand')
#@tab pytorch
# Plot the integrand needed to compute the variance
x = torch.arange(-20, 20, 0.01)
p = x**2 / (1 + x**2)
d2l.plot(x, p, 'x', 'integrand')
#@tab tensorflow
# Plot the integrand needed to compute the variance
x = tf.range(-20, 20, 0.01)
p = x**2 / (1 + x**2)
d2l.plot(x, p, 'x', 'integrand')
This function clearly has infinite area under it since it is essentially the constant one with a small dip near zero, and indeed we could show that
This means it does not have a well-defined finite variance.
However, looking deeper shows an even more disturbing result. Let us try to compute the mean using :eqref:eq_exp_def
. Using the change of variables formula, we see
The integral inside is the definition of the logarithm, so this is in essence
Machine learning scientists define their models so that we most often do not need to deal with these issues, and will in the vast majority of cases deal with random variables with well-defined means and variances. However, every so often random variables with heavy tails (that is those random variables where the probabilities of getting large values are large enough to make things like the mean or variance undefined) are helpful in modeling physical systems, thus it is worth knowing that they exist.
The above work all assumes we are working with a single real valued random variable. But what if we are dealing with two or more potentially highly correlated random variables? This circumstance is the norm in machine learning: imagine random variables like sec_naive_bayes
a model that under-performs due to such an assumption). We need to develop the mathematical language to handle these correlated continuous random variables.
Thankfully, with the multiple integrals in :numref:sec_integral_calculus
we can develop such a language. Suppose that we have, for simplicity, two random variables
Similar reasoning to the single variable case shows that this should be approximately
for some function
-
$p(x, y) \ge 0$ ; -
$\int _ {\mathbb{R}^2} p(x, y) ;dx ;dy = 1$ ; -
$P((X, Y) \in \mathcal{D}) = \int _ {\mathcal{D}} p(x, y) ;dx ;dy$ .
In this way, we can deal with multiple, potentially correlated random variables. If we wish to work with more than two random variables, we can extend the multivariate density to as many coordinates as desired by considering
When dealing with multiple variables, we oftentimes want to be able to ignore the relationships and ask, "how is this one variable distributed?" Such a distribution is called a marginal distribution.
To be concrete, let us suppose that we have two random variables
As with most things, it is best to return to the intuitive picture to figure out what should be true. Recall that the density is the function
There is no mention of
Our density does not directly tell us about what happens in this case, we need to split into small intervals in
This tells us to add up the value of the density along a series of squares in a line as is shown in :numref:fig_marginal
. Indeed, after canceling one factor of epsilon from both sides, and recognizing the sum on the right is the integral over
Thus we see
This tells us that to get a marginal distribution, we integrate over the variables we do not care about. This process is often referred to as integrating out or marginalized out the unneeded variables.
When dealing with multiple random variables, there is one additional summary statistic which is helpful to know: the covariance. This measures the degree that two random variable fluctuate together.
Suppose that we have two random variables
eq_cov_def
To think about this intuitively: consider the following pair of random variables. Suppose that
where eq_cov_def
:
When
A quick note on the covariance is that it only measures these linear relationships. More complex relationships like
For continuous random variables, much the same story holds. At this point, we are pretty comfortable with doing the transition between discrete and continuous, so we will provide the continuous analogue of :eqref:eq_cov_def
without any derivation.
For visualization, let us take a look at a collection of random variables with tunable covariance.
# Plot a few random variables adjustable covariance
covs = [-0.9, 0.0, 1.2]
d2l.plt.figure(figsize=(12, 3))
for i in range(3):
X = np.random.normal(0, 1, 500)
Y = covs[i]*X + np.random.normal(0, 1, (500))
d2l.plt.subplot(1, 4, i+1)
d2l.plt.scatter(X.asnumpy(), Y.asnumpy())
d2l.plt.xlabel('X')
d2l.plt.ylabel('Y')
d2l.plt.title(f'cov = {covs[i]}')
d2l.plt.show()
#@tab pytorch
# Plot a few random variables adjustable covariance
covs = [-0.9, 0.0, 1.2]
d2l.plt.figure(figsize=(12, 3))
for i in range(3):
X = torch.randn(500)
Y = covs[i]*X + torch.randn(500)
d2l.plt.subplot(1, 4, i+1)
d2l.plt.scatter(X.numpy(), Y.numpy())
d2l.plt.xlabel('X')
d2l.plt.ylabel('Y')
d2l.plt.title(f'cov = {covs[i]}')
d2l.plt.show()
#@tab tensorflow
# Plot a few random variables adjustable covariance
covs = [-0.9, 0.0, 1.2]
d2l.plt.figure(figsize=(12, 3))
for i in range(3):
X = tf.random.normal((500, ))
Y = covs[i]*X + tf.random.normal((500, ))
d2l.plt.subplot(1, 4, i+1)
d2l.plt.scatter(X.numpy(), Y.numpy())
d2l.plt.xlabel('X')
d2l.plt.ylabel('Y')
d2l.plt.title(f'cov = {covs[i]}')
d2l.plt.show()
Let us see some properties of covariances:
- For any random variable
$X$ ,$\mathrm{Cov}(X, X) = \mathrm{Var}(X)$ . - For any random variables
$X, Y$ and numbers$a$ and$b$ ,$\mathrm{Cov}(aX+b, Y) = \mathrm{Cov}(X, aY+b) = a\mathrm{Cov}(X, Y)$ . - If
$X$ and$Y$ are independent then$\mathrm{Cov}(X, Y) = 0$ .
In addition, we can use the covariance to expand a relationship we saw before. Recall that is
With knowledge of covariances, we can expand this relationship. Indeed, some algebra can show that in general,
This allows us to generalize the variance summation rule for correlated random variables.
As we did in the case of means and variances, let us now consider units. If
To see what makes sense, let us perform a thought experiment. Suppose that we convert our random variables in inches and dollars to be in inches and cents. In this case the random variable
eq_cor_def
we see that this is a unit-less value. A little mathematics can show that this number is between
Returning to our explicit discrete example above, we can see that eq_cor_def
to see that
This now ranges between
As another example, consider
and thus by :eqref:eq_cor_def
that
Thus we see that the correlation is
Let us again plot a collection of random variables with tunable correlation.
# Plot a few random variables adjustable correlations
cors = [-0.9, 0.0, 1.0]
d2l.plt.figure(figsize=(12, 3))
for i in range(3):
X = np.random.normal(0, 1, 500)
Y = cors[i] * X + np.sqrt(1 - cors[i]**2) * np.random.normal(0, 1, 500)
d2l.plt.subplot(1, 4, i + 1)
d2l.plt.scatter(X.asnumpy(), Y.asnumpy())
d2l.plt.xlabel('X')
d2l.plt.ylabel('Y')
d2l.plt.title(f'cor = {cors[i]}')
d2l.plt.show()
#@tab pytorch
# Plot a few random variables adjustable correlations
cors = [-0.9, 0.0, 1.0]
d2l.plt.figure(figsize=(12, 3))
for i in range(3):
X = torch.randn(500)
Y = cors[i] * X + torch.sqrt(torch.tensor(1) -
cors[i]**2) * torch.randn(500)
d2l.plt.subplot(1, 4, i + 1)
d2l.plt.scatter(X.numpy(), Y.numpy())
d2l.plt.xlabel('X')
d2l.plt.ylabel('Y')
d2l.plt.title(f'cor = {cors[i]}')
d2l.plt.show()
#@tab tensorflow
# Plot a few random variables adjustable correlations
cors = [-0.9, 0.0, 1.0]
d2l.plt.figure(figsize=(12, 3))
for i in range(3):
X = tf.random.normal((500, ))
Y = cors[i] * X + tf.sqrt(tf.constant(1.) -
cors[i]**2) * tf.random.normal((500, ))
d2l.plt.subplot(1, 4, i + 1)
d2l.plt.scatter(X.numpy(), Y.numpy())
d2l.plt.xlabel('X')
d2l.plt.ylabel('Y')
d2l.plt.title(f'cor = {cors[i]}')
d2l.plt.show()
Let us list a few properties of the correlation below.
- For any random variable
$X$ ,$\rho(X, X) = 1$ . - For any random variables
$X, Y$ and numbers$a$ and$b$ ,$\rho(aX+b, Y) = \rho(X, aY+b) = \rho(X, Y)$ . - If
$X$ and$Y$ are independent with non-zero variance then$\rho(X, Y) = 0$ .
As a final note, you may feel like some of these formulae are familiar. Indeed, if we expand everything out assuming that
This looks like a sum of a product of terms divided by the square root of sums of terms. This is exactly the formula for the cosine of the angle between two vectors
Indeed if we think of norms as being related to standard deviations, and correlations as being cosines of angles, much of the intuition we have from geometry can be applied to thinking about random variables.
- Continuous random variables are random variables that can take on a continuum of values. They have some technical difficulties that make them more challenging to work with compared to discrete random variables.
- The probability density function allows us to work with continuous random variables by giving a function where the area under the curve on some interval gives the probability of finding a sample point in that interval.
- The cumulative distribution function is the probability of observing the random variable to be less than a given threshold. It can provide a useful alternate viewpoint which unifies discrete and continuous variables.
- The mean is the average value of a random variable.
- The variance is the expected square of the difference between the random variable and its mean.
- The standard deviation is the square root of the variance. It can be thought of as measuring the range of values the random variable may take.
- Chebyshev's inequality allows us to make this intuition rigorous by giving an explicit interval that contains the random variable most of the time.
- Joint densities allow us to work with correlated random variables. We may marginalize joint densities by integrating over unwanted random variables to get the distribution of the desired random variable.
- The covariance and correlation coefficient provide a way to measure any linear relationship between two correlated random variables.
- Suppose that we have the random variable with density given by
$p(x) = \frac{1}{x^2}$ for$x \ge 1$ and$p(x) = 0$ otherwise. What is$P(X > 2)$ ? - The Laplace distribution is a random variable whose density is given by
$p(x = \frac{1}{2}e^{-|x|}$ . What is the mean and the standard deviation of this function? As a hint,$\int_0^\infty xe^{-x} ; dx = 1$ and$\int_0^\infty x^2e^{-x} ; dx = 2$ . - I walk up to you on the street and say "I have a random variable with mean
$1$ , standard deviation$2$ , and I observed$25%$ of my samples taking a value larger than$9$ ." Do you believe me? Why or why not? - Suppose that you have two random variables
$X, Y$ , with joint density given by$p_{XY}(x, y) = 4xy$ for$x, y \in [0,1]$ and$p_{XY}(x, y) = 0$ otherwise. What is the covariance of$X$ and$Y$ ?
:begin_tab:mxnet
Discussions
:end_tab:
:begin_tab:pytorch
Discussions
:end_tab:
:begin_tab:tensorflow
Discussions
:end_tab: