diff --git a/docs/mathematical_notation.md b/docs/mathematical_notation.md
index d464dec..a3a7baa 100644
--- a/docs/mathematical_notation.md
+++ b/docs/mathematical_notation.md
@@ -12,7 +12,7 @@
 
 * - Symbol
   - Formula
-  - Article
+  - Explained
 * - $\mu$
   - $\sum_{x} k P(X=x) = \int_{-\infty}^{\infty} x f(x) d x$
   - [🔗](expected-value)
diff --git a/docs/probability/continuous_distributions.md b/docs/probability/continuous_distributions.md
index a2f02a2..d413180 100644
--- a/docs/probability/continuous_distributions.md
+++ b/docs/probability/continuous_distributions.md
@@ -54,8 +54,24 @@ $P(X=a)=\int_{a}^{a} f(x) d x=0 \text { for all real numbers } a$
 
 Random variable $X \sim U[a,b]$ has the uniform distribution on the interval \[a, b\] if its density function is
 
-```{image} https://cdn.mathpix.com/snip/images/C3YIEOiPSsTEyCokT28x7xwBtWiAMEuJgXY7ljXUKpM.original.fullsize.png
-:width: 600
+```{code-cell}
+import torch
+import matplotlib.pyplot as plt
+import seaborn as sns
+from scipy.stats import uniform
+
+sns.set_theme(style="darkgrid")
+
+# random numbers from uniform distribution
+n = 10000
+start = 10
+width = 20
+data_uniform = uniform.rvs(size=n, loc = start, scale=width)
+ax = sns.displot(data_uniform,
+                  bins=100,
+                  kde=True)
+ax.set(xlabel='Uniform Distribution ', ylabel='Frequency')
+plt.show()
 ```
 
 $$
@@ -105,8 +121,13 @@ For random variable $X \sim U(0,23)$. Find P(2 \< X \< 18)
 
 $P(2 < X < 18) = (18-2)\cdot \frac 1 {23-0} = \frac {16}{23}$
 
-## Exponential rv
+## Exponential Distribution
+The exponential distribution is a continuous probability distribution that often concerns the amount of time until some
+specific event happens. 
+It is a process in which events happen continuously and independently at a constant average rate. The exponential
+distribution has the key property of being memoryless.
 
+### Applications
 The family of exponential distributions provides probability models that are widely used in engineering and science
 disciplines to describe **time-to-event** data.
 
@@ -115,22 +136,65 @@ disciplines to describe **time-to-event** data.
 - Waiting time in a queue
 - Length of service time
 - Time between customer arrivals
+- the amount of money spent by the customer
+- Calculating the time until the radioactive particle decays
 
 ### PDF
+The continuous random variable, say X is said to have an exponential distribution, if it has the following probability
+density function:
 
 $$
-f(x;\lambda) = \begin{cases} \lambda  e^{ - \lambda x} & x \ge 0, \\ 0 & x < 0. \end{cases} =\lambda e^{-\lambda x} I_{(0, \infty)}(x)
+\large f(x;\lambda) = \begin{cases} \lambda  e^{ - \lambda x} & x \ge 0, \\ 0 & x < 0. \end{cases} =\lambda e^{-\lambda x} I_{(0, \infty)}(x)
 $$
 
+λ is called the distribution rate.
+
+```{code-cell}
+import torch
+import matplotlib.pyplot as plt
+import seaborn as sns
+from scipy.stats import expon
+
+sns.set_theme(style="darkgrid")
+
+data_expon = expon.rvs(scale=1,loc=0,size=1000)
+ax = sns.displot(data_expon,
+                  kde=True,
+                  bins=100)
+ax.set(xlabel='Exponential Distribution', ylabel='Frequency')
+plt.show()
+```
+
 ### Expected Value
+The mean of the exponential distribution is calculated using the integration by parts.
+
+$$
+\begin{aligned}
+&E[X]=\int_{0}^{\infty} x f(x) d x=\int_0^{\infty} x \lambda e^{-\lambda x} d x \\
+&=\lambda\left[\left|\frac{-x e^{-\lambda x}}{\lambda}\right|_0^{\infty}+\frac{1}{\lambda} \int_0^{\infty} e^{-\lambda x} d x\right] \\
+&=\lambda\left[0+\frac{1}{\lambda} \frac{-e^{-\lambda x}}{\lambda}\right]_0^{\infty} \\
+&=\lambda \frac{1}{\lambda^2} \\
+&=\frac{1}{\lambda}
+\end{aligned}
 
-$E(X) = \int_{0}^{\infty} x f(x) d x = \int_{0}^{\infty} x \lambda  e^{ - \lambda x} d x = \frac{1}{\lambda}$
+E[X^2]&= \int_{0}^{\infty} x^2 f(x) d x \\ 
+&= \int_{0}^{\infty} x^2 \lambda  e^{ - \lambda x} d x \\
+&= \frac{2}{\lambda^2}
+$$
 
-$E(X^2) = \int_{0}^{\infty} x^2 f(x) d x = \int_{0}^{\infty} x^2 \lambda  e^{ - \lambda x} d x = \frac{2}{\lambda^2}$
 
 ### Variance
+To find the variance of the exponential distribution, we need to find the second moment of the exponential distribution
+
+$$
+V(X) &= E(X^2) - E(X)^2 \\
+&= \frac{2}{\lambda^2} - (\frac{1}{\lambda})^2 \\
+&= \frac{1}{\lambda^2}
+$$
 
-$V(X) = E(X^2) - E(X)^2 = \frac{2}{\lambda^2} - (\frac{1}{\lambda})^2 = \frac{1}{\lambda^2}$
+### Properties
+The most important property of the exponential distribution is the memoryless property. This property is also
+applicable to the geometric distribution.
 
 ## Normal (Gaussian) Distribution
 
@@ -529,6 +593,38 @@ $$
 
 R code: pnorm(1.2)
 
+####  Find P(X<4.1) when N(2, 3)?
+
+Let $X \sim N(2,3)$.
+Then
+
+$$
+\begin{aligned}
+P ( X \leq 4.1) &= P \left(\frac{ X -\mu}{\sigma} \leq \frac{4.1-2}{\sqrt{3}}\right) \\
+&= P (Z \leq 1.21) \\
+& \approx 0.8868
+\end{aligned}
+$$
+
+R Code: pnorm(1.21)
+
+```R
+z_score <- (4.1 - 2) / sqrt(3)
+pnorm(z_score)
+```
+
+$$
+\begin{aligned}
+& X _1, X _2, \ldots, X _{10} \stackrel{ id }{\sim} N (2,3) \\
+&\overline{ X } \sim N \left(\mu, \sigma^2 / n \right)= N (2,3 / 10) \\
+& P (\overline{ X } \leq 2.3)= P \left(\frac{\overline{ X }-\mu_{\overline{ X }}}{\sigma_{\overline{ X }}} \leq \frac{2.3-2}{\sqrt{3 / 10}}\right) \\
+&\frac{\overline{ X -\mu}}{\sigma / \sqrt{ n }}=\begin{aligned}
+&= P ( Z \leq 0.5477) \\
+& \approx 0.7081
+\end{aligned}
+\end{aligned}
+$$
+
 #### Interval between variables
 To find the probability of an interval between certain variables, you need to subtract cdf from another cdf.
 
@@ -625,4 +721,28 @@ pro=norm(1, 2).cdf(3.5) - norm(1,2).cdf(0)
 ax.text(0.2,0.02,round(pro,2), fontsize=20)
 plt.show()
 
-```
\ No newline at end of file
+```
+
+## Gamma Distribution
+The gamma distribution term is mostly used as a distribution which is defined as two parameters – shape parameter and
+inverse scale parameter, having continuous probability distributions. Its importance is largely due to its relation to
+exponential and normal distributions.
+
+Gamma distributions have two free parameters, named as alpha (α) and beta (β), where;
+
+- α = Shape parameter
+- β = Rate parameter (the reciprocal of the scale parameter)
+
+The scale parameter β is used only to scale the distribution. This can be understood by remarking that wherever the
+random variable x appears in the probability density, then it is divided by β. Since the scale parameter provides the
+dimensional data, it is seldom useful to work with the “standard” gamma distribution, i.e., with β = 1.
+
+### Gamma function:
+
+The gamma function $[10]$, shown by $\Gamma( x )$, is an extension of the factorial function to real (and complex)
+numbers. Specifically, if $n \in\{1,2,3, \ldots\}$, then
+
+$$
+\Gamma( n )=( n -1) !
+$$
+
diff --git a/docs/probability/hypothesis_testing.md b/docs/probability/hypothesis_testing.md
index d1693c9..7c9816e 100644
--- a/docs/probability/hypothesis_testing.md
+++ b/docs/probability/hypothesis_testing.md
@@ -38,11 +38,10 @@ from scipy.stats import norm
 
 
 sns.set_theme(style="darkgrid")
-sample = torch.normal(mean = 8, std = 16, size=(1,1000))
+sample = torch.normal(mean = 0, std = 1, size=(1,1000))
 
 sns.displot(sample[0], kde=True, stat = 'density',)
 plt.axvline(torch.mean(sample[0]), color='red', label='mean')
-
 plt.show()
 ```
 Example of random sample after it is observed:
@@ -54,14 +53,8 @@ $$
 Based on what you are seeing, do you believe that the true population mean $\mu$ is
 
 $$
-
-\begin{align}
-\mu<=3 \\
-or \\
-\mu>3 \\
-\text { The sample is } \overline{\mathrm{x}}=2.799
-\end{align}
-
+ \mu<=3 \text{ or } \mu>3 \\
+\text { The sample mean is } \overline{\mathrm{x}}=2.799
 $$
 
 This is below 3 , but can we say that $\mu<3$ ?
@@ -92,16 +85,22 @@ $$
 
 **How do we formalize this stuff, We use hypothesis testing**
 
-Hypotheses:
+### Notation
 
 $\mathrm{H}_0: \mu \leq 3$ <- Null hypothesis \
 $\mathrm{H}_1: \mu>3 \quad$ Alternate hypothesis
 
-### Null hypothesis
-The null hypothesis is assumed to be true. 
+#### Null hypothesis
+The null hypothesis is a hypothesis that is assumed to be true. We denote it with an $H_0$.
 
-### Alternate hypothesis
+#### Alternate hypothesis
 The alternate hypothesis is what we are out to show.
+The alternative hypothesis is a hypothesis that we are looking for evidence for or **out to show**.
+We denote it with an $H_1$. 
+
+:::{note}
+Some people use the notation $H_a$ here
+:::
 
 **Conclusion is either**:\
 Reject $\mathrm{H}_0 \quad$ OR $\quad$ Fail to Reject $\mathrm{H}_0$
@@ -115,6 +114,9 @@ You don't know the exact distribution.\
 Means you know the distribution is normal but you don't know the mean and variance.
 
 #### Critical values
+Critical values for distributions are numbers that cut off specified areas under pdfs. For the
+N(0, 1) distribution, we will use the notation $z_\alpha$ to denote the value that cuts off area $\alpha$ to
+the right as depicted here. 
 
 ```{image} https://cdn.mathpix.com/snip/images/VhPT2BPUY6gNGGTSOLvZuK6iXJSLNFeOwMU3aI8Droc.original.fullsize.png
 :align: center
@@ -183,6 +185,30 @@ $= P \left(\right.$ Reject $H _0$ when $\left.\mu=5\right)$
 
 $\alpha$ is called the level of significance of the test. It is also sometimes referred to as the size of the test.
 
+$$
+\begin{aligned}
+\alpha &=\max P (\text { Type I Error }) \\
+&=\max _{\mu \in H _0} P \left(\text { Reject } H _0 ; \mu\right) \\
+\beta &=\max P (\text { Type II Error }) \\
+&=\max _{\mu \in H _1} P \left(\text { Fail to Reject } H _0 ; \mu\right)
+\end{aligned}
+$$
+
+### Power of the test
+
+$1-\beta$ is known as the
+power of the test
+$$
+\begin{gathered}
+1-\beta=1-\max _{\mu \in H _1} P \left(\text { Fail to Reject } H _0 ; \mu\right) \\
+=\min _{\mu \in H _1}\left(1- P \left(\text { Fail to Reject } H _0 ; \mu\right)\right) \\
+=\min _{\mu \in H _1} P \left(\text { Reject } H _0 ; \mu\right) \quad \begin{array}{c}
+\text { High power } \\
+\text { is good! }
+\end{array}
+\end{gathered}
+$$
+
 ### Step One
 Choose an estimator for μ.
 
@@ -224,10 +250,10 @@ Give a conclusion!
 
 $0.05= P ($ Type I Error) \
 $= P \left(\right.$ Reject $H _0$ when true $)$ \
-$= P (\overline{ X }< c$ when $\mu=5)$
+$= P (\overline{ X }< \text{ c when } \mu=5)$
 
 
-$ = P \left(\frac{\overline{ X }-\mu_0}{\sigma / \sqrt{ n }}<\frac{ c -5}{2 / \sqrt{10}}\right.$ when $\left.\mu=5\right)
+$ = P \left(\frac{\overline{ X }-\mu_0}{\sigma / \sqrt{ n }}<\frac{ c -5}{2 / \sqrt{10}}\right.$ when $\left.\mu=5\right)$
 
 
 ```{image} https://cdn.mathpix.com/snip/images/A2zQa5iD99VnS5sLbiZ947KpZWH7i7xSbnJ6IZ88j2w.original.fullsize.png
@@ -248,3 +274,522 @@ $ = P \left(\frac{\overline{ X }-\mu_0}{\sigma / \sqrt{ n }}<\frac{ c -5}{2 / \s
 :alt: Errors in Hypothesis Testing
 :width: 80%
 ```
+
+### Formula
+
+Let $X_1, X_2, \ldots, X_n$ be a random sample from the normal distribution with mean $\mu$ and known variance $\sigma^2$.
+
+Consider testing the simple versus simple hypotheses
+
+$$
+H _0: \mu=\mu_0 \quad H _1: \mu=\mu_1
+$$
+
+where $\mu_0$ and $\mu_1$ are fixed and known.
+
+
+$$
+H_0: \mu=\mu_0 \\
+H _1: \mu=\mu_1 \\
+\mu_0<\mu_1 \\
+\text{ Reject H0, in favor of H1 if } \\
+
+\large \overline{ X }>\mu_0+ z _\alpha \frac{\sigma}{\sqrt{ n }}
+$$
+
+
+$$
+H_0: \mu=\mu_0 \\
+H _1: \mu=\mu_1 \\
+\mu_0>\mu_1 \\
+\text{ Reject H0, in favor of H1 if } \\
+
+\large \overline{ X }<\mu_0+ z_{1-\alpha} \frac{\sigma}{\sqrt{ n }}
+$$
+
+### Type II Error
+
+$$
+H_0: \mu=\mu_0 \\
+H _1: \mu=\mu_1 \\
+\mu_0<\mu_1
+$$
+
+$$
+\begin{aligned}
+& \beta= P (\text { Type II Error }) \\
+=& P \left(\text { Fail to Reject } H _0 \text { when false }\right) \\
+=& P \left(\overline{ X } \leq \mu_0+ z _\alpha \frac{\sigma}{\sqrt{ n }} \text { when } \mu=\mu_1\right) \\
+=& P \left(\overline{ X } \leq \mu_0+ z _\alpha \frac{\sigma}{\sqrt{ n }} ; \mu_1\right)
+\end{aligned}
+
+
+$$
+
+$$
+\begin{aligned}
+\beta &= P \left(\left(\frac{\overline{X} -\mu_1}{\sigma / \sqrt{ n }}\right) \leq \frac{\mu_0+ z _\alpha \frac{\sigma}{\sqrt{ n }}-\mu_1}{\sigma / \sqrt{ n }} ; \mu_1\right) \\
+&= P \left( Z \leq \frac{\mu_0+ z _\alpha \frac{\sigma}{\sqrt{ n }}-\mu_1}{\sigma / \sqrt{ n }}\right)
+\end{aligned}
+$$
+
+## Composite vs Composite Hypothesis
+
+$$
+\begin{aligned}
+& X _1, X _2, \ldots, X _{ n } \sim N \left(\mu, \sigma^2\right), \sigma^2 \text { known } \\
+& H _0: \mu \leq \mu_0 \quad \text { vs } \quad H _1: \mu>\mu_0
+\end{aligned}
+$$
+
+- Step One Choose an estimator for μ
+- Step Two Choose a test statistic: Reject $H_0$ , in favor of $H_1$ if $\bar{𝖷}$ > c, where c is to be determined.
+- Step Three Find c.
+
+## One-Tailed Tests
+
+Let $X_1, X_2, \ldots, X_n$ be a random sample from the normal distribution with mean $\mu$ and known variance $\sigma^2$.
+Consider testing the hypotheses
+
+$$
+H _0: \mu \geq \mu_0 \quad H _1: \mu<\mu_0
+$$
+
+where $\mu_0$ is fixed and known.
+
+
+### Step One
+Choose an estimator for μ.
+
+$$ 
+\widehat{\mu}=\bar{X}
+$$
+
+### Step Two
+
+Choose a test statistic or Give the “form” of the test.
+
+Reject $H _0$, in favor of $H _1$, if $\overline{ X }< c$ for some c to be determined.
+
+### Step Three
+
+Find c.
+
+$$
+\begin{aligned}
+\alpha &=\max _{\mu \geq \mu_0} P (\text { Type I Error }) \\
+&=\max _{\mu \geq \mu_0} P \left(\text { Reject } H _0 ; \mu\right) \\
+&=\max _{\mu \geq \mu_0} P (\overline{ X }< c ; \mu)
+\end{aligned}
+$$
+
+$$
+\begin{aligned}
+\alpha &=\max _{\mu \geq \mu_0} P (\overline{ X }< c ; \mu) \\
+&=\max _{\mu \geq \mu_0} P \left( Z <\frac{ c -\mu}{\sigma / \sqrt{ n }}\right) \\
+&=\max _{\mu \geq \mu_0} \Phi\left(\frac{ c -\mu}{\sigma / \sqrt{ n }}\right)
+\end{aligned}
+$$
+
+$$
+\begin{aligned}
+\alpha &=\max _{\mu \geq \mu_0} P (\overline{ X }< c ; \mu) \\
+&=\max _{\mu \geq \mu_0} P \left( Z <\frac{ c -\mu}{\sigma / \sqrt{ n }}\right) \\
+&=\max _{\mu \geq \mu_0} \Phi\left(\frac{ c -\mu}{\sigma / \sqrt{ n }}\right) \\
+\text { decreasing in } \mu
+\end{aligned}
+$$
+
+### Step four
+
+Reject $H _0$, in favor of $H _1$, if
+$$
+\overline{ X }<\mu_0+ z _{1-\alpha} \frac{\sigma}{\sqrt{ n }}
+$$
+
+### Example
+
+In 2019, the average health care annual premium for a family of 4 in the United States, was reported to be $\$ 6,015$.
+
+In a more recent survey, 100 randomly sampled families of 4 reported an average annual health care premium of $\$ 6,537$.
+Can we say that the true average is currently greater than $\$ 6,015$ for all families of 4?
+
+Assume that annual health care premiums are normally distributed with a standard deviation of $\$ 814$.
+Let $\mu$ be the true average for all families of 4.
+
+#### Step Zero
+Set up the hypotheses.
+
+$$
+H _0: \mu=6015 \quad H _1: \mu>6015
+$$
+
+Decide on a level of significance. $ \alpha=0.10$
+
+#### Step One
+Choose an estimator for $\mu$.
+
+$$
+\hat{\mu}=\bar{X}
+$$
+
+#### Step Two
+Give the form of the test.
+Reject $H _0$, in favor of $H _1$, if
+
+$$
+\bar{X}>c
+$$
+
+for some $c$ to be determined.
+
+#### Step Three
+Find c.
+
+$$
+\begin{aligned}
+\alpha &=\max _{\mu=\mu_0} P (\text { Type I Error; } \mu) \\
+&= P \left(\text { Type I Error; } \mu_0\right)
+\end{aligned}
+$$
+
+$$
+\begin{aligned}
+&\alpha= P \left(\text { Reject } H _0 ; \mu_0\right) \text { when }\\
+&= P \left(\overline{ X }> c ; \mu_0\right) \quad \text { it true!, }\\
+&= P \left(\frac{\overline{ X }-\mu_0}{\sigma / \sqrt{ n }}>\frac{ c -6015}{814 / \sqrt{100}} ; \mu_0\right)\\
+&=P\left(Z>\frac{c-6015}{814 / \sqrt{100}}\right)
+\end{aligned}
+$$
+
+$$
+\frac{c-6015}{814 / \sqrt{100}}=1.28
+$$
+
+#### Step Four
+Conclusion. Reject $H _0$, in favor of $H _1$, if
+
+$$
+\bar{X}>6119.19
+$$
+
+From the data, where $\bar{x}=6537$, we reject $H _0$ in favor of $H _1$.\
+The data suggests that the true mean annual health care premium is greater than $\$ 6015$.
+
+
+## Power Tests
+Let $X_1, X_2, \ldots, X_n$ be a random sample from any distribution with unknown parameter $\theta$ which takes values
+in a parameter space $\Theta$
+
+We ultimately want to test
+
+$$
+\begin{aligned}
+& H _0: \theta \in \Theta_0 \\
+& H _1: \theta \in \Theta \backslash \Theta_0
+\end{aligned}
+$$
+
+where $\Theta_0$ is some subset of $\Theta$.
+
+So in other words, if the null hypothesis was for you to test for an exponential distribution,
+whether lambda was between 0 and 2, the complement of that is not the rest of the real number line because the space is
+only non-negative values. So the complement of the interval from 0 to 2 in that space is 2 to infinity.
+
+
+$\gamma(\theta)= P \left(\right.$ Reject $H _0$ when the parameter is $\left.\theta\right)$
+$$
+\gamma(\theta)= P \left(\text { Reject } H _0 ; \theta\right)
+$$
+$\theta$ is an argument that can be anywhere in the parameter space $\Theta$.
+it could be a $\theta$ from $H _0$
+it could be a $\theta$ from $H _1$
+
+
+$$
+\begin{aligned}
+&\alpha=\max P \left(\text { Reject } H _0 \text { when true }\right) \\
+&=\max _{\theta \in \Theta_0} P \left(\text { Reject } H _0 ; \theta\right) \\
+&=\max _{\theta \in \Theta_0} \gamma(\theta) \longleftrightarrow \begin{array}{l}
+\text { Other notation } \\
+\text { is } \max _{\theta \in H _0} \\
+\hline
+\end{array} \\
+&
+\end{aligned}
+$$
+
+
+## Hypothesis Testing with P-Values
+
+Recall that p-values are defined as the following:
+A p-value is the probability that we observe a test statistic at least as extreme as the one we calculated, assuming the null hypothesis is true.
+It isn't immediately obvious what that definition means, so let's look at some examples to really get an idea of what p-values are, and how they work.
+
+Let's start very simple and say we have 5 data points: x = <1, 2, 3, 4, 5>. Let's also assume the data were generated
+from some normal distribution with a known variance $\sigma$ but an unknown mean $\mu_0$. What would be a good guess
+for the true mean?
+We know that this data could come from *any* normal distribution, so let's make two wild guesses:
+
+1. The true mean is 100.
+2. The true mean is 3.
+
+Intuitively, we know that 3 is the better guess. But how do we actually determine which of these guesses is more likely?
+By looking at the data and asking "how likely was the data to occur, assuming the guess is true?" 
+
+1. What is the probability that we observed x=<1,2,3,4,5> assuming the mean is 100? Probabiliy pretty low. And because the p-value is low, we "reject the null hypothesis" that $\mu_0 = 100$.
+2. What is the probability that we observed x=<1,2,3,4,5> assuming the mean is 3? Seems reasonable. However, something to be careful of is that p-values do not **prove** anything. Just because it is probable for the true mean to be 3, does not mean we know the true mean is 3. If we have a high p-value, we "fail to reject the null hypothesis" that $\mu_0 = 3$.
+
+What do "low" and "high" mean? That is where your significance level $\alpha$ comes back into play. We consider a p-value low if the p-value is less than $\alpha$, and high if it is greater than $\alpha$.
+
+## Two Tailed Tests
+
+Let $X_1, X_2, \ldots, X_n$ be a random sample from the normal distribution with mean $\mu$ and known variance $\sigma^2$.
+
+Derive a hypothesis test of size $\alpha$ for testing
+
+$$
+\begin{aligned}
+& H _0: \mu=\mu_0 \\
+& H _1: \mu \neq \mu_0
+\end{aligned}
+$$
+
+We will look at the sample mean $\bar{X} \ldots$ $\ldots$ and reject if it is either too high or too low.
+
+### Step One
+Choose an estimator for μ.
+
+$$ 
+\widehat{\mu}=\bar{X}
+$$
+
+### Step Two
+Choose a test statistic or Give the “form” of the test.
+
+
+Reject $H _0$, in favor of $H _1$ if either $\overline{ X }< c$ or $\bar{X}>d$ for some $c$ and $d$ to be determined.
+
+Easier to make it symmetric!
+Reject $H _0$, in favor of $H _1$ if either
+
+$$
+\begin{aligned}
+&\overline{ X }>\mu_0+ c \\
+&\overline{ X }<\mu_0- c
+\end{aligned}
+$$
+for some $c$ to be determined.
+
+### Step Three
+Find c.
+
+$$
+\begin{aligned}
+\alpha &=\max _{\mu=\mu_0} P (\text { Type I Error }) \\
+&=\max _{\mu=\mu_0} P \left(\text { Reject } H _0 ; \mu\right) \\
+&= P \left(\text { Reject } H _0 ; \mu_0\right)
+\end{aligned}
+$$
+
+$$
+\begin{aligned}
+&\alpha= P \left(\overline{ X }<\mu_0- c \text { or } \overline{ X }>\mu_0+ c ; \mu_0\right) \\
+&=1- P \left(\mu_0- c \leq \overline{ X } \leq \mu_0+ c ; \mu_0\right)
+\end{aligned}
+$$
+
+$$
+\begin{gathered}
+\alpha=1- P \left(\frac{- c }{\sigma / \sqrt{ n }} \leq Z \leq \frac{ c }{\sigma / \sqrt{ n }}\right) \\
+1-\alpha= P \left(\frac{- c }{\sigma / \sqrt{ n }} \leq Z \leq \frac{ c }{\sigma / \sqrt{ n }}\right)
+\end{gathered}
+$$
+
+
+$$
+\frac{c}{\sigma / \sqrt{n}}=z_{\alpha / 2}
+
+
+c=z_{\alpha / 2} \frac{\sigma}{\sqrt{n}}
+$$
+
+### Step Four
+Conclusion
+
+Reject $H _0$, in favor of $H _1$, if
+
+$$
+\begin{aligned}
+&\overline{ X }>\mu_0+ z _{\alpha / 2} \frac{\sigma}{\sqrt{n}} \\
+&\overline{ X }<\mu_0- z _{\alpha / 2} \frac{\sigma}{\sqrt{ n }}
+\end{aligned}
+$$
+
+
+### Example
+In 2019, the average health care annual premium for a family of 4 in the United States, was reported to be $\$ 6,015$.
+
+In a more recent survey, 100 randomly sampled families of 4 reported an average annual health care premium of $\$ 6,177$.
+Can we say that the true average, for all families of 4 , is currently different than the sample mean from 2019?
+$$
+\sigma=814 \quad \text { Use } \alpha=0.05
+$$
+
+Assume that annual health care premiums are normally distributed with a standard deviation of $\$ 814$.
+Let $\mu$ be the true average for all families of 4.
+Hypotheses:
+
+$$
+\begin{aligned}
+& H _0: \mu=6015 \\
+& H _1: \mu \neq 6015
+\end{aligned}
+$$
+
+$$
+\begin{aligned}
+&\bar{x}=6177 \quad \sigma=814 \quad n=100 \\
+&z_{\alpha / 2}=z_{0.025}=1.96 \\
+&\text { In R: qnorm(0.975) } \\
+&6015+1.96 \frac{814}{\sqrt{100}}=6174.5 \\
+&6015-1.96 \frac{814}{\sqrt{100}}=5855.5
+\end{aligned}
+$$
+
+We reject $H _0$, in favor of $H _1$. The data suggests that the true current average, for all families of 4 , is different than it was in 2019.
+
+```{image} https://cdn.mathpix.com/snip/images/_oA87qNHdN5Ozd0kgQL7PxguB7Yc7zoi__lLKXJGuZU.original.fullsize.png
+:align: center
+:alt: Errors in Hypothesis Testing
+:width: 80%
+```
+
+## Hypothesis Tests for Proportions
+
+A random sample of 500 people in a certain country which is about to have a national election were asked whether they preferred "Candidate A" or "Candidate B".
+From this sample, 320 people responded that they preferred Candidate A.
+
+Let $p$ be the true proportion of the people in the country who prefer Candidate A.
+
+Test the hypotheses
+$H _0: p \leq 0.65$ versus
+$H _1: p>0.65$
+Use level of significance $0.10$.
+We have an estimate
+
+$$
+\hat{p}=\frac{320}{500}=\frac{16}{25}
+$$
+
+
+### The Model
+
+Take a random sample of size $n$.
+Record $X_1, X_2, \ldots, X_n$ where
+$X_i= \begin{cases}1 & \text { person i likes Candidate A } \\ 0 & \text { person i likes Candidate B }\end{cases}$
+Then $X_1, X_2, \ldots, X_n$ is a random sample from the Bernoulli distribution with parameter $p$.
+
+Note that, with these 1's and 0's,
+$$
+\begin{aligned}
+\hat{p} &=\frac{\# \text { in the sample who like A }}{\# \text { in the sample }} \\
+&=\frac{\sum_{ i =1}^{ n } X _{ i }}{ n }=\overline{ X }
+\end{aligned}
+$$
+By the Central Limit Theorem, $\hat{p}=\overline{ X }$ has, for large samples, an approximately normal distribution.
+
+$$
+\begin{aligned}
+E[\hat{p}] &=E\left[X_1\right]=p \\
+\operatorname{Var}[\hat{p}] &=\frac{\operatorname{Var}\left[X_1\right]}{n}=\frac{p(1-p)}{n}
+\end{aligned}
+$$
+So, $\quad \hat{p} \stackrel{\text { approx }}{\sim} N\left(p, \frac{p(1-p)}{n}\right)$
+
+$$
+\hat{p} \stackrel{\text { approx }}{\sim} N\left(p, \frac{p(1-p)}{n}\right)
+$$
+In particular,
+$$
+\frac{\hat{p}-p}{\sqrt{\frac{p(1-p)}{n}}}
+$$
+behaves roughly like a $N(0,1)$ as $n$ gets large.
+
+$n >30$ is a rule of thumb to apply to all distributions, but we can (and should!) do better with specific
+distributions.
+
+- $\hat{p}$ lives between 0 and 1.
+- The normal distribution lives between $-\infty$ and $\infty$.
+- However, $99.7 \%$ of the area under a $N(0,1)$ curve lies between $-3$ and 3 ,
+
+
+$$
+\begin{aligned}
+&\hat{p} \stackrel{\text { approx }}{\sim} N\left(p, \frac{p(1-p)}{n}\right) \\
+&\Rightarrow \sigma_{\hat{p}}=\sqrt{\frac{p(1-p)}{n}}
+\end{aligned}
+$$
+
+Go forward using normality if the interval
+$$
+\left(\hat{p}-3 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+3 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right)
+$$
+is completely contained within $[0,1]$.
+
+### Step One
+
+Choose a statistic.
+$\widehat{p}=$ sample proportion for Candidate $A$
+
+### Step Two
+
+Form of the test.
+Reject $H _0$, in favor of $H _1$, if $\hat{ p }> c$.
+
+### Step Three
+Use $\alpha$ to find $c$
+Assume normality of $\hat{p}$ ?
+It is a sample mean and $n>30$.
+- The interval
+$$
+\left(\hat{p}-3 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+3 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right)
+$$
+is $(0.5756,0.7044)$
+
+$$
+\begin{aligned}
+\alpha &=\max _{p \in H_0} P (\text { Type I Error }) \\
+&=\max _{p \leq 0.65} P \left(\text { Reject } H _0 ; p \right) \\
+&=\max _{ p \leq 0.65} P (\hat{ p }> c ; p )
+\end{aligned}
+$$
+
+$$
+\begin{aligned}
+\alpha &=\max _{p \leq 0.65} P\left(\frac{\hat{p}-p}{\sqrt{\frac{p(1-p)}{n}}}>\frac{c-p}{\sqrt{\frac{p(1-p)}{n}}} ; p\right) \\
+& \approx \max _{p \leq 0.65} P\left(Z>\frac{c-p}{\sqrt{\frac{p(1-p)}{n}}}\right)
+\end{aligned}
+$$
+
+$$
+\begin{aligned}
+0.10 &=\max _{p \leq 0.65} P \left(Z>\frac{c-p}{\sqrt{\frac{p(1-p)}{n}}}\right) \\
+&=P\left(Z>\frac{c-0.65}{\sqrt{\frac{0.65(1-0.65)}{n}}}\right) \\
+& \Rightarrow \frac{c-0.65}{\sqrt{\frac{0.65(1-0.65)}{n}}}=z_{0.10}
+\end{aligned}
+$$
+
+Reject $H _0$ if
+
+$$
+\hat{p}>0.65+z_{0.10} \sqrt{\frac{0.65(1-0.65)}{n}}
+$$
+Formula
+
+$$
+\hat{p}> p +z_{0.10} \sqrt{\frac{p(1-p)}{n}}
+$$
+
+
+
diff --git a/docs/probability/random_variable.md b/docs/probability/random_variable.md
index 6abac7e..8c33771 100644
--- a/docs/probability/random_variable.md
+++ b/docs/probability/random_variable.md
@@ -335,7 +335,11 @@ $E[g(X)]=\int_{-\infty}^{\infty} g(x) f_{X}(x)) d x$
 - Measures of **spread** of a distribution.
 - Variance is a measure of dispersion.
 
-Defined as $\sigma^2$ or V(X).
+### Denoted by
+
+$$
+\large \sigma^2 \text{ or } V(X).
+$$
 
 $$
 V(X) = E[(X - E[X])^2] = E[(X - \mu)^2]  = E[X^2] - E[X]^2
diff --git a/docs/probability/what_is_probability.md b/docs/probability/what_is_probability.md
index 8a251c1..f43955c 100644
--- a/docs/probability/what_is_probability.md
+++ b/docs/probability/what_is_probability.md
@@ -66,7 +66,7 @@ Tossing a coin, Sample Space = {H,T}.
 <p style="text-align: center;">Image from byjus.com</p>
 
 ### Experiment or Trial
-Experiment is any action or process that generates observations or outcomes. \ 
+Experiment is any action or process that generates observations or outcomes.\ 
 E.g. The tossing of a coin, selecting a card from a deck of cards, throwing a dice etc.
 
 ### Outcome or Sample Point
diff --git a/docs/requirements.txt b/docs/requirements.txt
index 7cdb7b2..17fc53b 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -1,6 +1,6 @@
-torch~=1.12
-seaborn~=0.11
+torch~=1.13
+seaborn~=0.12
 scipy~=1.9
-myst-nb~=0.16
-sphinx-design~=0.2
+myst-nb~=0.17
+sphinx-design~=0.3
 sphinx-copybutton
\ No newline at end of file
diff --git a/requirements.txt b/requirements.txt
index 7822bd8..5f88124 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,4 @@
-Sphinx~=5.1
+Sphinx~=5.3
 -r docs/requirements.txt
 sphinx_rtd_theme
 sphinx-autobuild
\ No newline at end of file