This repository has been archived by the owner on May 11, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 863
/
variance.html
311 lines (284 loc) · 20 KB
/
variance.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
<!DOCTYPE html>
<html data-require="math math-format word-problems stat">
<head>
<meta charset="UTF-8">
<title>Variance</title>
<script src="../khan-exercise.js"></script>
</head>
<body>
<div class="exercise">
<div class="vars">
<var id="DATA_POINTS">randRange( 4, 6 )</var>
<var id="POPULATION">randRange( 20, 50 )</var>
<var id="TGT_MEAN">animalAvgLifespan( 1 )</var>
<var id="TGT_STDDEV">animalStddevLifespan( 1 )</var>
<var id="DATA">$.map( randGaussian( TGT_MEAN, TGT_STDDEV, DATA_POINTS ), function( lifespan ) {
lifespan = lifespan < 1 ? 1 : round( lifespan );
return randRange( 1, lifespan );
} )</var>
<var id="MEAN">roundTo( 1, mean( DATA ) )</var>
<var id="SQR_DEV">$.map( DATA, function( x ) { return roundTo( 2, ( x - MEAN ) * ( x - MEAN ) ); })</var>
<var id="VARIANCE">roundTo( 2, sum( SQR_DEV ) / ( DATA_POINTS - 1 ) )</var>
<var id="VARIANCE_POP">roundTo( 2, sum( SQR_DEV ) / DATA_POINTS )</var>
<var id="YEAR">new Plural(function(num) {
return $.ngettext("year", "years", num);
})</var>
<var id="YEARS_OLD">$._("%(years)s old", {years: plural_form(YEAR, MEAN)})</var>
<var id="YEAR_TEXT">$._("year")</var>
<var id="YEARS_TEXT">$._("years")</var>
</div> <!-- vars -->
<div class="problems">
<div id="population" data-calculator="">
<div class="problem" data-else="">
<p data-if="isSingular(DATA_POINTS)">You have found the following ages (in years) of all <var>DATA_POINTS</var> <var>animal( 1 )</var> at your local zoo:</p><p data-else="">You have found the following ages (in years) of all <var>DATA_POINTS</var> <var>plural_form(animal( 1 ), DATA_POINTS)</var> at your local zoo:</p>
<p><code>\qquad<var>DATA.join( ",\\enspace " )</var></code></p>
</div>
<p class="question">
What is the average age of the <var>plural_form(animal( 1 ))</var> at your zoo? What is the variance?
You may round your answers to the nearest tenth.
</p>
<div class="solution" data-type="multiple">
<p>
Average age:<br><code>\quad</code>
<span class="sol short40" data-inexact="" data-max-error="0.15" data-type="decimal"><var>mean( DATA )</var></span> years old
</p>
<p>
Variance:<br><code>\quad</code>
<span class="sol short40" data-inexact="" data-max-error="0.15" data-type="decimal"><var>sum( SQR_DEV ) / DATA_POINTS</var></span> years<code>^2</code>
</p>
<div class="example">decimals, like <code>7.5</code></div>
<div class="example">answers within <code>\pm 0.15</code> are accepted to allow for rounding part-way through</div>
</div> <!-- solution -->
<div class="hints">
<p data-if="isSingular(DATA_POINTS)">
Because we have data for all <var>DATA_POINTS</var> <var>animal( 1 )</var> at the zoo, we are able
to calculate the <span class="hint_blue">population mean</span>
<code>(\color{<var>BLUE</var>}{\mu})</code> and
<span class="hint_pink">population variance</span> <code>(\color{<var>PINK</var>}{\sigma^2})</code>.
</p><p data-else="">
Because we have data for all <var>DATA_POINTS</var> <var>plural_form(animal( 1 ), DATA_POINTS)</var> at the zoo, we are able
to calculate the <span class="hint_blue">population mean</span>
<code>(\color{<var>BLUE</var>}{\mu})</code> and
<span class="hint_pink">population variance</span> <code>(\color{<var>PINK</var>}{\sigma^2})</code>.
</p>
<div>
<p>
To find the <span class="hint_blue">population mean</span>, add up the values of all <code class="hint_green"><var>DATA_POINTS</var></code>
ages and divide by <code class="hint_green"><var>DATA_POINTS</var></code>.
</p>
<p>
<code>
\color{<var>BLUE</var>}{\mu} \quad = \quad
\dfrac{\sum\limits_{i=1}^{\color{<var>GREEN</var>}{N}} x_i}{\color{<var>GREEN</var>}{N}} \quad = \quad
\dfrac{\sum\limits_{i=1}^{\color{<var>GREEN</var>}{<var>DATA_POINTS</var>}} x_i}{\color{<var>GREEN</var>}{<var>DATA_POINTS</var>}}
</code>
</p>
</div>
<p>
<code>
\color{<var>BLUE</var>}{\mu} \quad = \quad
\dfrac{<var>plus.apply( KhanUtil, DATA )</var>}{\color{<var>GREEN</var>}{<var>DATA_POINTS</var>}} \quad = \quad
\color{<var>BLUE</var>}{<var>MEAN</var>\text{ <var>YEARS_OLD</var>}}
</code>
</p>
<div>
<p>
Find the <span class="hint_purple">squared deviations from the mean</span> for each <var>animal(1)</var>.
</p>
<div class="fake_header">
<span style="width: 100px;">
Age<br>
<code>x_i</code>
</span><span style="width: 150px;">
<span class="hint_gray">Distance from the mean</span>
<code>(x_i - \color{<var>BLUE</var>}{\mu})</code>
</span><span style="width: 150px;">
<code>(x_i - \color{<var>BLUE</var>}{\mu})^2</code>
</span>
</div>
<div class="fake_row" data-each="DATA as i, POINT">
<span style="width: 100px;">
<code><var>POINT</var></code> <var>plural_form(YEAR, POINT )</var>
</span><span class="hint_gray" style="width: 150px;">
<code><var>roundTo( 2, POINT - MEAN )</var></code> <var>plural_form(YEAR, roundTo( 2, POINT - MEAN ) )</var>
</span><span class="hint_purple" style="width: 150px;">
<code><var>SQR_DEV[ i ]</var></code> <var>plural_form(YEAR, SQR_DEV[ i ] )</var><code>^2</code>
</span>
</div>
</div>
<div>
<p>
Because we used the <span class="hint_blue">population mean</span><code>(\color{<var>BLUE</var>}{\mu})</code> to compute the
<span class="hint_purple">squared deviations from the mean</span>, we can find the <span class="hint_red">variance</span>
<code>(\color{red}{\sigma^2})</code>, without introducing any bias, by simply averaging the
<span class="hint_purple">squared deviations from the mean</span>:
</p>
<p>
<code>
\color{red}{\sigma^2} \quad = \quad
\dfrac{\sum\limits_{i=1}^{\color{<var>GREEN</var>}{N}} (x_i - \color{<var>BLUE</var>}{\mu})^2}{\color{<var>GREEN</var>}{N}}
</code>
</p>
</div>
<p>
<code>
\color{red}{\sigma^2} \quad = \quad
\dfrac{<var>plus.apply( KhanUtil, $.map( SQR_DEV, function( x ) { return "\\color{purple}{" + x + "}"; }) )</var>}
{\color{<var>GREEN</var>}{<var>DATA_POINTS</var>}}
</code>
</p>
<p>
<code>
\color{red}{\sigma^2} \quad = \quad
\dfrac{\color{purple}{<var>roundTo( 2, sum( SQR_DEV ) )</var>}}{\color{<var>GREEN</var>}{<var>DATA_POINTS</var>}} \quad = \quad
\color{red}{<var>VARIANCE_POP</var>\text{ <var>plural_form(YEAR, VARIANCE_POP )</var>}^2}
</code>
</p>
<p><strong>
<span data-if="isSingular(MEAN)">The average <var>animal( 1 )</var> at the zoo is <var>MEAN</var> year old.</span><span data-else="">The average <var>animal( 1 )</var> at the zoo is <var>MEAN</var> years old.</span>
<span data-if="isSingular(VARIANCE_POP)">The population variance
is <var>VARIANCE_POP</var> year<code>^2</code>.</span><span data-else="">The population variance
is <var>VARIANCE_POP</var> years<code>^2</code>.</span>
</strong></p>
</div> <!-- hints -->
</div> <!-- population -->
<div id="sample" data-calculator="">
<div class="problem" data-else="">
<p>
<span data-if="isSingular(DATA_POINTS)">You have found the following ages (in years) of <var>DATA_POINTS</var> <var>animal( 1 )</var>.</span><span data-else="">You have found the following ages (in years) of <var>DATA_POINTS</var> <var>plural_form(animal( 1 ), DATA_POINTS)</var>.</span>
<span data-if="isSingular(POPULATION)">The <var>plural_form(animal( 1 ))</var> are randomly selected from the <var>POPULATION</var> <var>animal( 1 )</var> at your local zoo:</span><span data-else="">The <var>plural_form(animal( 1 ))</var> are randomly selected from the <var>POPULATION</var> <var>plural_form(animal( 1 ), POPULATION)</var> at your local zoo:</span>
</p>
<p><code>\qquad<var>DATA.join( ",\\enspace " )</var></code></p>
</div>
<p class="question">
Based on your sample, what is the average age of the <var>plural_form(animal( 1 ))</var>? What is the variance?
You may round your answers to the nearest tenth.
</p>
<div class="solution" data-type="multiple">
<p>
Average age:<br><code>\quad</code>
<span class="sol short40" data-inexact="" data-max-error="0.15" data-type="decimal"><var>mean( DATA )</var></span> years old
</p>
<p>
Variance:<br><code>\quad</code>
<span class="sol short40" data-inexact="" data-max-error="0.15" data-type="decimal"><var>sum( SQR_DEV ) / ( DATA_POINTS - 1 )</var></span> years<code>^2</code>
</p>
<div class="example">decimals, like <code>0.75</code></div>
<div class="example">answers within <code>\pm 0.15</code> are accepted to allow for rounding part-way through</div>
</div> <!-- solution -->
<div class="hints">
<p data-if="isSingular(POPULATION)">
Because we only have data for a small sample of the <var>POPULATION</var> <var>animal( 1 )</var>, we are only able
to estimate the population mean and variance by finding the <span class="hint_blue">sample mean</span>
<code>(\color{<var>BLUE</var>}{\overline{x}})</code> and
<span class="hint_pink">sample variance</span> <code>(\color{<var>PINK</var>}{s^2})</code>.
</p><p data-else="">
Because we only have data for a small sample of the <var>POPULATION</var> <var>plural_form(animal( 1 ), POPULATION)</var>, we are only able
to estimate the population mean and variance by finding the <span class="hint_blue">sample mean</span>
<code>(\color{<var>BLUE</var>}{\overline{x}})</code> and
<span class="hint_pink">sample variance</span> <code>(\color{<var>PINK</var>}{s^2})</code>.
</p>
<div>
<p>
To find the <span class="hint_blue">sample mean</span>, add up the values of all <code class="hint_green"><var>DATA_POINTS</var></code>
samples and divide by <code class="hint_green"><var>DATA_POINTS</var></code>.
</p>
<p>
<code>
\color{<var>BLUE</var>}{\overline{x}} \quad = \quad
\dfrac{\sum\limits_{i=1}^{\color{<var>GREEN</var>}{n}} x_i}{\color{<var>GREEN</var>}{n}} \quad = \quad
\dfrac{\sum\limits_{i=1}^{\color{<var>GREEN</var>}{<var>DATA_POINTS</var>}} x_i}{\color{<var>GREEN</var>}{<var>DATA_POINTS</var>}}
</code>
</p>
</div>
<p>
<code>
\color{<var>BLUE</var>}{\overline{x}} \quad = \quad
\dfrac{<var>plus.apply( KhanUtil, DATA )</var>}{\color{<var>GREEN</var>}{<var>DATA_POINTS</var>}} \quad = \quad
\color{<var>BLUE</var>}{<var>MEAN</var>\text{ <var>YEARS_OLD</var>}}
</code>
</p>
<p data-if="isSingular(MEAN)">
Find the <span class="hint_purple">squared deviations from the mean</span> for each sample. Since we don't know the
population mean, estimate the mean by using the <span class="hint_blue">sample mean</span> we just calculated
<code>(\color{<var>BLUE</var>}{\overline{x}} = \color{<var>BLUE</var>}{<var>MEAN</var>\text{ <var>YEAR_TEXT</var>}})</code>.
</p><p data-else="">
Find the <span class="hint_purple">squared deviations from the mean</span> for each sample. Since we don't know the
population mean, estimate the mean by using the <span class="hint_blue">sample mean</span> we just calculated
<code>(\color{<var>BLUE</var>}{\overline{x}} = \color{<var>BLUE</var>}{<var>MEAN</var>\text{ <var>YEARS_TEXT</var>}})</code>.
</p>
<div>
<div class="fake_header">
<span style="width: 100px;">
Age<br>
<code>x_i</code>
</span><span style="width: 150px;">
<span class="hint_gray">Distance from the mean</span>
<code>(x_i - \color{<var>BLUE</var>}{\overline{x}})</code>
</span><span style="width: 150px;">
<code>(x_i - \color{<var>BLUE</var>}{\overline{x}})^2</code>
</span>
</div>
<div class="fake_row" data-each="DATA as i, POINT">
<span style="width: 100px;">
<code><var>POINT</var></code> <var>plural_form(YEAR, POINT )</var>
</span><span class="hint_gray" style="width: 150px;">
<code><var>roundTo( 2, POINT - MEAN )</var></code> <var>plural_form(YEAR, roundTo( 2, POINT - MEAN ) )</var>
</span><span class="hint_purple" style="width: 150px;">
<code><var>SQR_DEV[ i ]</var></code> <var>plural_form(YEAR, SQR_DEV[ i ] )</var><code>^2</code>
</span>
</div>
</div>
<div>
<p>
Normally we can find the variance <code>(\color{red}{s^2})</code> by averaging the
<span class="hint_purple">squared deviations from the mean</span>. But remember we don't know the real
population mean—we had to estimate it by using the <span class="hint_blue">sample mean</span>.
</p>
<p>
<span data-if="isSingular(DATA_POINTS)">The age of any particular <var>animal( 1 )</var> in our sample is likely to be closer to the average age
of the <var>DATA_POINTS</var> <var>animal( 1 )</var> we sampled.</span><span data-else="">The age of any particular <var>animal( 1 )</var> in our sample is likely to be closer to the average age
of the <var>DATA_POINTS</var> <var>plural_form(animal( 1 ), DATA_POINTS)</var> we sampled.</span>
<span data-if="isSingular(POPULATION)">Compared with the average age
of all <var>POPULATION</var> <var>animal( 1 )</var> in the zoo.</span><span data-else="">Compared with the average age
of all <var>POPULATION</var> <var>plural_form(animal( 1 ), POPULATION)</var> in the zoo.</span>
<span>Because of that, the <span class="hint_purple">squared deviations from the mean</span> we calculated will
probably underestimate the actual deviations from the population mean.</span>
</p>
<p>
To compensate for this underestimation, rather than simply averaging the <span class="hint_purple">squared deviations from the mean</span>,
we total them and divide by <code class="hint_green">n - 1</code>.
</p>
<p>
<code>
\color{red}{s^2} \quad = \quad
\dfrac{\sum\limits_{i=1}^{\color{<var>GREEN</var>}{n}} (x_i - \color{<var>BLUE</var>}{\overline{x}})^2}{\color{<var>GREEN</var>}{n - 1}}
</code>
</p>
</div>
<p>
<code>
\color{red}{s^2} \quad = \quad
\dfrac{<var>plus.apply( KhanUtil, $.map( SQR_DEV, function( x ) { return "\\color{purple}{" + x + "}"; }) )</var>}
{\color{<var>GREEN</var>}{<var>DATA_POINTS</var> - 1}}
</code>
</p>
<p>
<code>
\color{red}{s^2} \quad = \quad
\dfrac{\color{purple}{<var>roundTo( 2, sum( SQR_DEV ) )</var>}}{\color{<var>GREEN</var>}{<var>DATA_POINTS - 1</var>}} \quad = \quad
\color{red}{<var>VARIANCE</var>\text{ <var>plural_form(YEAR, VARIANCE )</var>}^2}
</code>
</p>
<p><strong>
<span data-if="isSingular(MEAN)">We can estimate that the average <var>animal( 1 )</var> at the zoo is <var>MEAN</var> year old.</span><span data-else="">We can estimate that the average <var>animal( 1 )</var> at the zoo is <var>MEAN</var> years old.</span>
<span data-if="isSingular(VARIANCE)">There is a variance
of <var>VARIANCE</var> year<code>^2</code>.</span><span data-else="">There is a variance
of <var>VARIANCE</var> years<code>^2</code>.</span>
</strong></p>
</div> <!-- hints -->
</div> <!-- sample -->
</div> <!-- problems -->
</div>
</body>
</html>