From eec78f0c221ce8523ae2499a9b07e7f7152349ec Mon Sep 17 00:00:00 2001
From: Ikram Ali <mrikram1989@gmail.com>
Date: Tue, 1 Nov 2022 22:12:50 +0500
Subject: [PATCH] C3 week1 (#35)

* new changes
---
 docs/mathematical_notation.md                |  32 +--
 docs/probability/continuous_distributions.md |  22 +-
 docs/probability/hypothesis_testing.md       | 224 ++++++++++++++++---
 docs/probability/random_variable.md          |  14 ++
 docs/probability/what_is_probability.md      |  38 ++--
 5 files changed, 257 insertions(+), 73 deletions(-)

diff --git a/docs/mathematical_notation.md b/docs/mathematical_notation.md
index 62d5598..d464dec 100644
--- a/docs/mathematical_notation.md
+++ b/docs/mathematical_notation.md
@@ -6,19 +6,19 @@
 ## Probability and Statistics
 
 ```{list-table}
-:widths: 20 20 60
+:widths: 20 70 10
 :header-rows: 1
 :align: "center"
 
 * - Symbol
   - Formula
-  - Meaning
+  - Article
 * - $\mu$
-  - $\sum_{x} k P(X=x)$
-  - Mean | Expected Value | Waited Average | First Moment Generating Function
+  - $\sum_{x} k P(X=x) = \int_{-\infty}^{\infty} x f(x) d x$
+  - [🔗](expected-value)
 * - $V(X)$ or $\sigma^2$ 
-  - $E[(X - \mu)^2]$
-  - Variance of X
+  - $E[(X - E[X])^2] = E[(X - \mu)^2]  = E[X^2] - E[X]^2$
+  - [🔗](variance-link)
 * - $\sigma$
   - $\sqrt{V(X)}$
   - Standard deviation
@@ -29,22 +29,4 @@
   - The sample 
   - The sample mean is an average value
 
-```
-
-
-## Linear Algebra
-
-| Symbol        | Meaning                            |
-|---------------|------------------------------------|
-| x             | A single number, lowercase, italic |
-| $x$           | A vector, bold, lowercase, italic  |
-| $X$           | A matrix, bold, uppercase, italic  |
-| $\textbf{X}$  | A tensor, bold, uppercase          |
-| $X^T$         | Transpose of matrix X              |
-| $X^{-1}$      | Inverse of X                       |
-| $I$           | Identity matrix                    |
-| $X*Y$         | Element-wise product of X and Y    |
-| $X \otimes Y$ | Kronecker product of X and Y       |
-| $x \cdot y$   | Dot product of x and y             |
-| $tr(X)$       | Trace of X                         |
-| $det(X)$      | Determinant of X                   |
+```
\ No newline at end of file
diff --git a/docs/probability/continuous_distributions.md b/docs/probability/continuous_distributions.md
index 03158ab..a2f02a2 100644
--- a/docs/probability/continuous_distributions.md
+++ b/docs/probability/continuous_distributions.md
@@ -412,7 +412,7 @@ The QQ Plot allows us to see deviation of a normal distribution much better than
 The normal distribution with parameter values $\mu$ = 0 and $\sigma^2$ = 1 is called the standard normal
 distribution.
 
-A rv with the standard normal distribution is customarily denoted by $Z \sim N(0, 1)$
+A rv with the standard normal distribution is denoted by $Z \sim N(0, 1)$
 
 If $X \sim N\left(\mu, \sigma^2\right)$ then
 
@@ -430,12 +430,17 @@ $$
 
 $f_{Z}(x)=\frac{1}{\sqrt{2 \pi}} e^{-x^{2} / 2} \text { for }-\infty<x<\infty$
 
-### CDF
-
+### Cumulative distribution function
 We use special notation to denote the cdf of the standard normal curve
 
 $F(z)=\Phi(z)=P(Z \leq z)=\int_{-\infty}^{z} \frac{1}{\sqrt{2 \pi}} e^{-x^{2} / 2} d x$
 
+```{image} https://cdn.mathpix.com/snip/images/0pNOOMfnNhB8v3JJyGL6KB4SuVh3NhdSqz2oQJsiQTA.original.fullsize.png
+:align: center
+:alt: Cumulative distribution function for Normal distribution
+:width: 80%
+```
+
 ### Properties
 
 1. The standard normal density function is symmetric about the y axis.
@@ -451,6 +456,9 @@ $\text { If } X \sim N\left(\mu, \sigma^{2}\right), \text { then } \frac{X-\mu}{
 $\frac{X-\mu}{\sigma}$ Shifted by $\mu$ or (Centered at zero) and scaled by $\frac{1}{\sigma}$ that
 will give us variance of 1.
 
+
+$\mathrm{Z} \sim \mathrm{N}(0,1) \Rightarrow \sigma \mathrm{Z}+\mu \sim \mathrm{N}\left(\mu, \sigma^2\right)$
+
 ### Proving this proposition
 
 For any continuous random variable. Suppose we have Y rv, with Desnity function $f_{Y}(y)$
@@ -513,6 +521,14 @@ plt.show()
 
 ```
 
+$$
+\begin{aligned} \mathrm{P}(\mathrm{X} \leq 2) &=\mathrm{P}\left(\frac{\mathrm{X}-\mu}{\sigma} \leq \frac{2-3}{\sqrt{2}}\right) \\
+&=\mathrm{P}(\mathrm{Z} \leq 1.21) \\
+& \approx 0.30 \end{aligned}
+$$
+
+R code: pnorm(1.2)
+
 #### Interval between variables
 To find the probability of an interval between certain variables, you need to subtract cdf from another cdf.
 
diff --git a/docs/probability/hypothesis_testing.md b/docs/probability/hypothesis_testing.md
index e2aaed6..d1693c9 100644
--- a/docs/probability/hypothesis_testing.md
+++ b/docs/probability/hypothesis_testing.md
@@ -8,39 +8,43 @@ kernelspec:
 ```
 
 # Hypothesis Testing
-
-## What is Hypothesis Testing?
+Statistical inference is the process of learning about characteristics of a population based
+on what is observed in a relatively small sample from that population. A sample will never give us the
+entire picture though, and we are bound to make incorrect decisions from time to time. We
+will learn how to derive and interpret appropriate tests to manage this error and how to
+evaluate when one test is better than another. we
+will learn how to construct and perform principled hypothesis tests for a wide range of
+problems and applications when they are not.
+
+## What is Hypothesis
 - Hypothesis testing is an act in statistics whereby an analyst tests an assumption regarding a population parameter.
 - Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is most
   often used by scientists to test specific predictions, called hypotheses, that arise from theories.
 
+:::{note}
+Due to random samples and randomness in the problem, we can different errors in our hypothesis testing. These errors are
+  called Type I and Type II errors.
+:::
 
-### Steps
-
-There are 5 main steps in hypothesis testing:
-
-1. State your research hypothesis as a null hypothesis and alternate hypothesis (Ho) and (Ha or H1).
-2. Collect data in a way designed to test the hypothesis.
-3. Perform an appropriate statistical test.
-4. Decide whether to reject or fail to reject your null hypothesis.
-5. Present the findings in your results and discussion section.
+## Type of hypothesis testing
+Let $X_1, X_2, \ldots, X_n$ be a [random sample](random-sample) from the normal distribution with mean $\mu$ and
+variance $\sigma^2$
 
-#### State your null and alternate hypothesis
-After developing your initial research hypothesis it is important to restate it as a null ($H_0$) and alternate ($H_1$)
-hypothesis so that you can test it mathematically.
+```{code-cell}
+import torch
+import matplotlib.pyplot as plt
+import seaborn as sns
+from scipy.stats import norm
 
-- The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables.
-- The null hypothesis is a prediction of no relationship between the variables you are interested in.
 
-You want to test whether there is a relationship between gender and height. Based on your knowledge of human physiology,
-you formulate a hypothesis that men are, on average, taller than women. To test this hypothesis, you restate it as:
+sns.set_theme(style="darkgrid")
+sample = torch.normal(mean = 8, std = 16, size=(1,1000))
 
-Ho: Men are, on average, not taller than women.
-Ha: Men are, on average, taller than women.
-
-##### Example
-Let $X_1, X_2, \ldots, X_n$ be a random sample from the normal distribution with mean $\mu$ and variance $\sigma^2$
+sns.displot(sample[0], kde=True, stat = 'density',)
+plt.axvline(torch.mean(sample[0]), color='red', label='mean')
 
+plt.show()
+```
 Example of random sample after it is observed:
 
 $$
@@ -60,6 +64,20 @@ or \\
 
 $$
 
+This is below 3 , but can we say that $\mu<3$ ?
+
+This seems awfully dependent on the random sample we happened to get!
+Let's try to work with the most generic random sample of size 8:
+
+$$
+X_1, X_2, X_3, X_4, X_5, X_6, X_7, X_8
+$$
+
+Let $\mathrm{X}_1, \mathrm{X}_2, \ldots, \mathrm{X}_{\mathrm{n}}$ be a random sample of size $\mathrm{n}$ from the $\mathrm{N}\left(\mu, \sigma^2\right)$ distribution.
+
+$$
+\mathrm{X}_1, \mathrm{X}_2, \ldots, \mathrm{X}_{\mathrm{n}} \stackrel{\text { iid }}{\sim} \mathrm{N}\left(\mu, \sigma^2\right)
+$$
 
 The Sample mean is 
 
@@ -71,18 +89,162 @@ $$
 - We're going to tend to think that $\mu>3$ when $\bar{X}$ is "significantly" larger than 3.
 - We're never going to observe $\bar{X}=3$, but we may be able to be convinced that $\mu=3$ if $\bar{X}$ is not too far away.
 
+
+**How do we formalize this stuff, We use hypothesis testing**
+
 Hypotheses:
 
-$\mathrm{H}_0: \mu \leq 3$
-$\mathrm{H}_1: \mu>3 \quad$ alternate
+$\mathrm{H}_0: \mu \leq 3$ <- Null hypothesis \
+$\mathrm{H}_1: \mu>3 \quad$ Alternate hypothesis
+
+### Null hypothesis
+The null hypothesis is assumed to be true. 
 
-hypothesis
-- The null hypothesis is assumed to be true.
-- The alternate hypothesis is what we are out to show.
+### Alternate hypothesis
+The alternate hypothesis is what we are out to show.
 
-Conclusion is either:
+**Conclusion is either**:\
 Reject $\mathrm{H}_0 \quad$ OR $\quad$ Fail to Reject $\mathrm{H}_0$
 
-#### Errors in Hypothesis Testing
 
-##### Type I Error
\ No newline at end of file
+#### simple hypothesis
+A simple hypothesis is one that completely specifies the distribution. Do you know the exact distribution.
+
+#### composite hypothesis
+You don't know the exact distribution.\
+Means you know the distribution is normal but you don't know the mean and variance.
+
+#### Critical values
+
+```{image} https://cdn.mathpix.com/snip/images/VhPT2BPUY6gNGGTSOLvZuK6iXJSLNFeOwMU3aI8Droc.original.fullsize.png
+:align: center
+:alt: Critical values in Hypothesis Testing
+:width: 80%
+```
+
+```{image} https://cdn.mathpix.com/snip/images/M8w97dpXZ9nyvOgbPEuDObaVI9gS7Qmrt9gW7GHZeYs.original.fullsize.png
+:align: center
+:alt: Critical values example
+:width: 80%
+```
+
+## Errors in Hypothesis Testing
+
+Let $X_1, X_2, \ldots, X_n$ be a random sample from the normal distribution with mean $\mu$ and variance $\sigma^2=2$
+
+$$
+H _0: \mu \leq 3 \quad H _1: \mu>3
+$$
+
+**Idea**: Look at $\bar{X}$ and reject $H_0$ in favor of $H _1$ if $\overline{ X }$ is "large".\
+i.e. Look at $\bar{X}$ and reject $H_0$ in favor of $H _1$ if $\overline{ X }> c$ for some value $c$.
+
+
+```{image} https://cdn.mathpix.com/snip/images/JeCsNYRlM6qG5RBLyuckje_opt6MoxGFvrmOe5QyfT0.original.fullsize.png
+:align: center
+:alt: Errors in Hypothesis Testing
+:width: 80%
+```
+
+```{image} https://cdn.mathpix.com/snip/images/CQje4JfzfdpSnlrWFvGHbbIsWFMq67TI7pIRUiyzTF4.original.fullsize.png
+:align: center
+:alt: Errors in Hypothesis Testing
+:width: 80%
+```
+
+You are a potato chip manufacturer and you want to ensure that the mean amount in 15 ounce bags is at least 15 ounces.
+$\mathrm{H}_0: \mu \leq 15 \quad \mathrm{H}_1: \mu>15$
+
+### Type I Error
+The true mean is $\leq 15$ but you concluded i was $>15$. You are going to save some money because you won't be adding
+chips but you are risking a lawsuit!
+
+### Type II Error
+The true mean is $> 15$ but you concluded it was $\leq 15$ . You are going to be spending money increasing the amount
+of chips when you didn’t have to.
+
+## Developing a Test
+Let $X_1, X_2, \ldots, X_n$ be a random sample from the normal distribution with mean $\mu$ and known variance $\sigma^2$.
+
+Consider testing the simple versus simple hypotheses
+
+$$
+\begin{aligned}
+& H _0: \mu=5 \\
+& H _1: \mu=3
+\end{aligned}
+$$
+
+### level of significance
+
+Let $\alpha= P$ (Type I Error) \
+$= P \left(\right.$ Reject $H _0$ when it's true $)$ \
+$= P \left(\right.$ Reject $H _0$ when $\left.\mu=5\right)$
+
+$\alpha$ is called the level of significance of the test. It is also sometimes referred to as the size of the test.
+
+### Step One
+Choose an estimator for μ.
+
+$$ 
+\widehat{\mu}=\bar{X}
+$$
+
+### Step Two
+
+Choose a test statistic or Give the “form” of the test.
+
+- We are looking for evidence that $H _1$ is true.
+- The $N \left(3, \sigma^2\right)$ distribution takes on values from $-\infty$ to $\infty$.
+- $\overline{ X } \sim N \left(\mu, \sigma^2 / n \right) \Rightarrow \overline{ X }$ also takes on values from $-\infty$ to $\infty$.
+- It is entirely possible that $\bar{X}$ is very large even if the mean of its distribution is 3.
+- However, if $\bar{X}$ is very large, it will start to seem more likely that $\mu$ is larger than 3.
+- Eventually, a population mean of 5 will seem more likely than a population mean of 3.
+
+Reject $H _0$, in favor of $H _1$, if $\overline{ X }< c$ for some c to be determined.
+
+### Step Three
+
+Find c.
+
+- If $c$ is too large, we are making it difficult to reject $H _0$. We are more likely to fail to reject when it should be rejected.
+- If $c$ is too small, we are making it to easy to reject $H _0$.  We are more likely reject when it should not be rejected.
+
+This is where $\alpha$ comes in.
+
+$$
+\alpha&= P(Type I Error) \\
+&=P( \text{Reject } H_0 \text{ when true}) \\
+&=P (\overline{ X }< c \text{ when } \mu=3)
+$$
+
+### Step Four
+
+Give a conclusion!
+
+$0.05= P ($ Type I Error) \
+$= P \left(\right.$ Reject $H _0$ when true $)$ \
+$= P (\overline{ X }< c$ when $\mu=5)$
+
+
+$ = P \left(\frac{\overline{ X }-\mu_0}{\sigma / \sqrt{ n }}<\frac{ c -5}{2 / \sqrt{10}}\right.$ when $\left.\mu=5\right)
+
+
+```{image} https://cdn.mathpix.com/snip/images/A2zQa5iD99VnS5sLbiZ947KpZWH7i7xSbnJ6IZ88j2w.original.fullsize.png
+:align: center
+:alt: Errors in Hypothesis Testing
+:width: 80%
+```
+
+
+```{image} https://cdn.mathpix.com/snip/images/Q5ADdylsMg5__QGyDBeVgUtKCf5dpp5b24ur5L0phO4.original.fullsize.png
+:align: center
+:alt: Errors in Hypothesis Testing
+:width: 80%
+```
+
+```{image} https://cdn.mathpix.com/snip/images/T3f91rQbmLPwPT_cU3z8y51z-xQ8jdb9PtGskQ2pa3c.original.fullsize.png
+:align: center
+:alt: Errors in Hypothesis Testing
+:width: 80%
+```
diff --git a/docs/probability/random_variable.md b/docs/probability/random_variable.md
index 3d7bfd7..6abac7e 100644
--- a/docs/probability/random_variable.md
+++ b/docs/probability/random_variable.md
@@ -8,6 +8,18 @@ kernelspec:
 ```
 
 # Random Variables
+The first step to understand random variable is to do a fun experiment. Go outside in front of your house with a pen
+and paper. Take note of every person you pass and their hair color & height in centimeters. Spend about 10 minutes doing
+this.
+
+Congratulations! You have conducted your first experiment! Now you will be able to answer some questions such as:
+
+- How many people walked past you?
+- Did many people who walked past you have blue hair?
+- How tall were the people who walked past you on average?
+
+You pass 10 people in this experiment, 3 of whom have blue hair, and their average height may be 165.32 cm.
+In each of these questions, there was a number; a measurable quantity was attached.
 
 ## Definition
 
@@ -316,6 +328,7 @@ $\text { Let } x=g^{-1}(y) \text {. Then } d x=\frac{d}{d y} g^{-1}(y) d y$
 
 $E[g(X)]=\int_{-\infty}^{\infty} g(x) f_{X}(x)) d x$
 
+(variance-link)=
 ## Variance
 
 - Measures how far we expect our random variable to be from the mean.
@@ -476,6 +489,7 @@ $\text{Indicator function}_{A}(X) = \mathbf{1}_A(x) =\begin{cases} 1, & \text {
 
 Notation= $\mathbb{1} _{A}(x)$
 
+(random-sample)=
 ## Random Sample
 
 A collection of random variables is independent and identically distributed if each random variable has the same
diff --git a/docs/probability/what_is_probability.md b/docs/probability/what_is_probability.md
index 52ef27c..8a251c1 100644
--- a/docs/probability/what_is_probability.md
+++ b/docs/probability/what_is_probability.md
@@ -14,7 +14,7 @@ kernelspec:
 - Probability is the branch of mathematics that deals with the occurrence of a random event.
 - Probability is the measure of the likelihood of an event to happen.
 
-probability is the study of randomness and uncertainty. Probability theory is widely used in the area of studies such
+Probability is the study of randomness and uncertainty. Probability theory is widely used in the area of studies such
 as statistics, finance, gambling, artificial intelligence, machine learning, computer science, game theory, and
 philosophy.
 
@@ -22,21 +22,30 @@ philosophy.
 
 Some of the applications of probability are predicting results of the following events:
 
-1. that a customer will buy milk if they are also buying bread.
-2. Of getting at least 2 heads in 5 coin flips.
-3. Getting 3 and 5 on throwing a die.
-4. Choosing a card from the deck.
-5. Pulling a green candy from a bag of red candies.
-6. Winning a lottery 1 in many millions.
-7. \# of vehicles crossing a bridge in one day
-8. \# of customers arriving at a bank in a week
-
-### Major Applications of Probability
-
+::::{grid}
+
+:::{grid-item-card}
+Minor
+^^^^^^
+- that a customer will buy milk if they are also buying bread.
+- Of getting at least 2 heads in 5 coin flips. 
+- Getting 3 and 5 on throwing a die.
+- Pulling a green candy from a bag of red candies.
+- Winning a lottery 1 in many millions.
+- \# of customers arriving at a bank in a week
+:::
+
+:::{grid-item-card}
+Major
+^^^^^^
 - It is used for risk assessment and modelling in various industries
 - Weather forecasting or prediction of weather changes
 - Probability of a team winning in a sport based on players and strength of team
 - In the share market, chances of getting the hike of share prices
+:::
+
+::::
+
 
 ## Probability Terms
 The first thing we do when we start thinking about the probability list a number of things that could possibly happen.\
@@ -49,11 +58,12 @@ Suppose that we toss a die. Six numbers, from 1 to 6, can appear face up, but we
 will appear. The sample space is S = {1,2,3,4,5,6}.\
 Tossing a coin, Sample Space = {H,T}.
 
-```{image} https://cdn.mathpix.com/snip/images/NIgpfDh_vIIrB4C6KWs6SlZbyH4xSHzIPYolo_FcY-U.original.fullsize.png
+```{image} https://cdn.mathpix.com/snip/images/0nY3sA4gdcKRJqiIpDf5RvXHDTessQ-cDgjFFMMhuuE.original.fullsize.png
 :align: center
 :alt: Sample space
-:width: 60%
+:width: 80%
 ```
+<p style="text-align: center;">Image from byjus.com</p>
 
 ### Experiment or Trial
 Experiment is any action or process that generates observations or outcomes. \