Skip to content

Commit 84de9fe

Browse files
committed
Updated cat added 2 new notebooks
1 parent be357d5 commit 84de9fe

File tree

3 files changed

+96
-9
lines changed

3 files changed

+96
-9
lines changed

central_limit_theorem.ipynb

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,16 @@
77
"# Central Limit Theorem (CLT)\n",
88
"\n",
99
"## Definition:\n",
10-
"Let $X_{1}$, $X_{2}$, $X_{3}$,... be i.i.d with mean $\\mu$ and variance $\\sigma^{2}$. As $n \\rightarrow \\infty$, let $S=\\sum_{k=1}^n X_{i}$, we have $S \\rightarrow \\mathcal{N}(n\\mu, n\\sigma^{2})$ and $\\frac{S-n\\mu}{\\sqrt{n\\sigma^{2}}} \\rightarrow \\mathcal{N}(0,1)$\n",
10+
"Let $X_{1}$, $X_{2}$, $X_{3}$,... be i.i.d random variables from some distribution with finite mean $\\mu$ and finite variance $\\sigma^{2}$. \n",
11+
"\n",
12+
"As $n \\rightarrow \\infty$, let $S=\\sum_{k=1}^n X_{i}$, we have $S \\rightarrow \\mathcal{N}(n\\mu, n\\sigma^{2})$ and $\\frac{S-n\\mu}{\\sqrt{n\\sigma^{2}}} \\rightarrow \\mathcal{N}(0,1)$\n",
1113
"\n",
1214
"Equivalently, let $M=\\frac{1}{n}\\sum_{k=1}^n X_{i}$, we have\n",
13-
"$M \\rightarrow \\mathcal{N}(\\mu,\\sqrt{\\frac{\\sigma^2}{n}})$ and $\\frac{M-\\mu}{\\sqrt{\\frac{\\sigma^2}{n}}} \\rightarrow \\mathcal{N}(0,1)$"
15+
"$M \\rightarrow \\mathcal{N}(\\mu,\\frac{\\sigma^2}{n})$ and $\\frac{M-\\mu}{\\sqrt{\\frac{\\sigma^2}{n}}} \\rightarrow \\mathcal{N}(0,1)$\n",
16+
"\n",
17+
"\n",
18+
"Notation:\n",
19+
" - $\\mathcal{N}(\\mu,\\sigma^2)$ denotes [Normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) with mean of $\\mu$ and variance of $\\sigma^2$."
1420
]
1521
},
1622
{
@@ -19,7 +25,9 @@
1925
"source": [
2026
"## Discussions:\n",
2127
"\n",
22-
"Naturally CLT appears in questions that invovles sum or average of a large number of random variable and especially when the question only asks for an approximate answer. Here are a few quick examples."
28+
"Naturally CLT appears in questions that invovle sum or average of a large number of random variablse and especially when the questions only ask for an approximate answer. \n",
29+
"\n",
30+
"Here are a few examples."
2331
]
2432
},
2533
{
@@ -37,10 +45,16 @@
3745
"\n",
3846
"Let the outcome of each coin flip be a random variable $I_{i}$. Thus we are dealing with the random variable $S=\\sum_{i=1}^{400}I_{i}$. $S$ is te sume of a series of i.i.d Bernoulie trials, thus it follows Binomial distribution. So the exact answer is: $P(S\\geq210)= \\sum_{k=210}^{400}C_{400}^{k}\\left(\\frac{1}{2}\\right)^{400}$ which requires a program to calculate (Actually try implementing this, beware of roudoff errors and compare it against the approximate answer below.).\n",
3947
"\n",
48+
"\n",
49+
"Notation:\n",
50+
" - $C_{n}^{k}$ is the notation for \"[n choose k](https://en.wikipedia.org/wiki/Binomial_coefficient)\", which denotes the number of ways to choose k items from n items where order doesn't matter.\n",
51+
"\n",
4052
"<br>\n",
4153
"**Approximation**\n",
4254
"\n",
43-
"We use CLT to easily get an approxmate answer quickly. First recognize that for each $I_{i}$ we have $\\mu=0.5$ and $\\sigma^2=0.5\\times(1-0.5)=0.25$. Then, $Z=\\frac{S-400*0.5}{\\sqrt{400*0.25}}=\\frac{S-200}{10}$ is approximately $\\mathcal{N}(0,1)$. For $S \\geq 210$, we have $Z\\geq1$. The 68-95-99.7 rule tells us that for a standardized normal distribution, the probability of the random variable taking value more than 1 standard deviation away from the center is $1-0.68=0.32$ and thus the one sided probability for $P(Z\\geq1) = 0.16$."
55+
"We use CLT to easily get an approxmate answer quickly. First recognize that for each $I_{i}$ we have $\\mu=0.5$ and $\\sigma^2=0.5\\times(1-0.5)=0.25$. Then, $Z=\\frac{S-400*0.5}{\\sqrt{400*0.25}}=\\frac{S-200}{10}$ is approximately $\\mathcal{N}(0,1)$. For $S \\geq 210$, we have $Z\\geq1$. \n",
56+
"\n",
57+
"The 68-95-99.7 rule tells us that for a standard Normal distribution $\\mathcal{N}(0,1)$, the probability of the random variable taking value more than 1 standard deviation away from the center is $1-0.68=0.32$ and thus the one sided probability for $P(Z\\geq1) = 0.32/2 = 0.16$."
4458
]
4559
},
4660
{
@@ -50,17 +64,17 @@
5064
"<br>\n",
5165
"***Example2:***\n",
5266
"\n",
53-
"*Supposed you are going to use Monte Carlo simulation to estimate value of $\\pi$. How would you implement it? If we require an error of 0.001, how many trials/ data points do you need?*\n",
67+
"*Supposed you use Monte Carlo simulation to estimate the numerical value of $\\pi$. How would you implement it? If we require an error of 0.001, how many trials do you need?*\n",
5468
"\n",
5569
"**Solution**\n",
5670
"\n",
57-
"One possible implementation is to have a rectangle, say $x \\in [-1,1], y\\in[-1,1]$. If we uniformly randomly draw a point from this rectangle, the probability of the point following into the circle region $x^2+y^2\\lt1$ is the ratio of the area between the circle and rectangle. \n",
71+
"One possible implementation is to start with a rectangle, say $x \\in [-1,1], y\\in[-1,1]$. If we uniformly randomly draw a point from this rectangle, the probability $p$ of the point following into the circle region $x^2+y^2\\lt1$ is the ratio of the area between the circle and rectangle, i.e $p=\\frac{\\pi}{4}$\n",
5872
"\n",
59-
"Formally, let random indicator variable $I$ take value 1 if the point falls in the circle and 0 otherwise, then $p=P(I=1)=\\frac{\\pi}{4}$ and $E(I)=p$. If we do $n$ such trials, and define $M=\\frac{1}{n}\\sum_{k=1}^n I_{i}$, then $M$ follows approximately $\\mathcal{N}(\\mu_{I},\\frac{\\sigma_{I}^2}{n})$. In this setup, $\\mu_{I}=p=\\frac{\\pi}{4}$ and $\\sigma_{I}^2=p(1-p)$.\n",
73+
"Formally, let random indicator variable $I$ take value 1 if the point falls in the circle and 0 otherwise, then $P(I=1)=p$ and $E(I)=p$. If we do $n$ such trials, and define $M=\\frac{1}{n}\\sum_{k=1}^n I_{k}$, then $M$ follows approximately $\\mathcal{N}(\\mu_{I},\\frac{\\sigma_{I}^2}{n})$. In this setup, $\\mu_{I}=E(I)=p$ and $\\sigma_{I}^2=p(1-p)$ (see [Probability Distribution](prob-dist-discrete.ipynb) section for details on $\\sigma_{I}^2$).\n",
6074
"\n",
61-
"One thing we need to clarify with the interviewer is what error really means? She might tell you to consider it as the standard deviation of the estimated $\\pi$. Therefore the specified error translates into a required sigma of $\\sigma_{req}=\\frac{error}{4}$ for random variable $M$. Thus $n = \\frac{\\sigma_{I}^2}{\\sigma_{req}^2}$, it is about 2.7 million for our particular case.\n",
75+
"One thing we need to clarify with the interviewer is what error really means? She might tell you to consider it as the standard deviation of the estimated $\\pi$. Therefore the specified error translates into a required sigma of $\\sigma_{req}=\\frac{error}{4}$ for random variable $M$. Thus $n = \\frac{\\sigma_{I}^2}{\\sigma_{req}^2}=\\frac{p(1-p)}{(0.00025)^2}\\approx2.7\\times 10^6$.\n",
6276
"\n",
63-
"By the way, we can see that the number of trials $n$ scales with $\\frac{1}{error^2}$, which is caused by the $\\frac{1}{\\sqrt{n}}$ scaling of the $\\sigma_{M}$ in the CLT, and is generally the computationaly complexity entailed by Monte Carlo integration.\n"
77+
"By the way, we can see that the number of trials $n$ scales with $\\frac{1}{error^2}$, which is caused by the $\\frac{1}{\\sqrt{n}}$ scaling of the $\\sigma_{M}$ in the CLT, and is generally the computationaly complexity entailed by [Monte Carlo integration](https://en.wikipedia.org/wiki/Monte_Carlo_integration).\n"
6478
]
6579
},
6680
{

notations_prob.ipynb

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"collapsed": true
7+
},
8+
"source": []
9+
}
10+
],
11+
"metadata": {
12+
"kernelspec": {
13+
"display_name": "Python 2",
14+
"language": "python",
15+
"name": "python2"
16+
},
17+
"language_info": {
18+
"codemirror_mode": {
19+
"name": "ipython",
20+
"version": 2
21+
},
22+
"file_extension": ".py",
23+
"mimetype": "text/x-python",
24+
"name": "python",
25+
"nbconvert_exporter": "python",
26+
"pygments_lexer": "ipython2",
27+
"version": "2.7.6"
28+
}
29+
},
30+
"nbformat": 4,
31+
"nbformat_minor": 0
32+
}

prob-dist-discrete.ipynb

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Probability Distributions - Discrete"
8+
]
9+
},
10+
{
11+
"cell_type": "code",
12+
"execution_count": null,
13+
"metadata": {
14+
"collapsed": true
15+
},
16+
"outputs": [],
17+
"source": []
18+
}
19+
],
20+
"metadata": {
21+
"kernelspec": {
22+
"display_name": "Python 2",
23+
"language": "python",
24+
"name": "python2"
25+
},
26+
"language_info": {
27+
"codemirror_mode": {
28+
"name": "ipython",
29+
"version": 2
30+
},
31+
"file_extension": ".py",
32+
"mimetype": "text/x-python",
33+
"name": "python",
34+
"nbconvert_exporter": "python",
35+
"pygments_lexer": "ipython2",
36+
"version": "2.7.13"
37+
}
38+
},
39+
"nbformat": 4,
40+
"nbformat_minor": 2
41+
}

0 commit comments

Comments
 (0)