/
Testing.html
456 lines (452 loc) · 33.8 KB
/
Testing.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
<div id="ipython-notebook">
<a class="interact-button" href="http://data8.berkeley.edu/hub/interact?repo=textbook&path=notebooks/Testing.ipynb">Interact</a>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [['$','$']],
processEscapes: true
}
});
</script>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Our investigation of jury panels is an example of <em>hypothesis testing</em>. We wished to know the answer to a yes-or-no question about the world, and we reasoned about random samples in order to answer the question. In this section, we formalize the process.</p></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Sampling-from-a-Distribution">Sampling from a Distribution<a class="anchor-link" href="#Sampling-from-a-Distribution">¶</a></h2><p>The rows of a table describe individual elements of a collection of data. In the previous section, we worked with a table that described a distribution over categories. Tables that describe the proportions or counts of different categories in a sample or population arise often in data analysis as a summary of a data collection process. The <code>sample_from_distribution</code> method draws a sample of counts for the categories. Its implementation uses the same <code>np.random.multinomial</code> method from the previous section.</p>
<p>The table below describes the proportion of the eligible potential jurors in Alameda County (estimated from 2000 census data and other sources) for broad categories of race and ethnic background. This table was compiled by Professor Weeks, a demographer at San Diego State University, for the Alameda County trial <em>People v. Stuart Alexander</em> in 2002.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">population</span> <span class="o">=</span> <span class="n">Table</span><span class="p">([</span><span class="s2">"Race"</span><span class="p">,</span> <span class="s2">"Eligible"</span><span class="p">])</span><span class="o">.</span><span class="n">with_rows</span><span class="p">([</span>
<span class="p">[</span><span class="s2">"Asian"</span><span class="p">,</span> <span class="mf">0.15</span><span class="p">],</span>
<span class="p">[</span><span class="s2">"Black"</span><span class="p">,</span> <span class="mf">0.18</span><span class="p">],</span>
<span class="p">[</span><span class="s2">"Latino"</span><span class="p">,</span> <span class="mf">0.12</span><span class="p">],</span>
<span class="p">[</span><span class="s2">"White"</span><span class="p">,</span> <span class="mf">0.54</span><span class="p">],</span>
<span class="p">[</span><span class="s2">"Other"</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">],</span>
<span class="p">])</span>
<span class="n">population</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
<thead>
<tr>
<th>Race</th> <th>Eligible</th>
</tr>
</thead>
<tbody>
<tr>
<td>Asian </td> <td>0.15 </td>
</tr>
</tbody>
<tbody><tr>
<td>Black </td> <td>0.18 </td>
</tr>
</tbody>
<tbody><tr>
<td>Latino</td> <td>0.12 </td>
</tr>
</tbody>
<tbody><tr>
<td>White </td> <td>0.54 </td>
</tr>
</tbody>
<tbody><tr>
<td>Other </td> <td>0.01 </td>
</tr>
</tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The <code>sample_from_distribution</code> method takes a number of samples and a column index or label. It returns a table in which the distribution column has been replaced by category counts of the sample.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">sample_size</span> <span class="o">=</span> <span class="mi">1453</span>
<span class="n">population</span><span class="o">.</span><span class="n">sample_from_distribution</span><span class="p">(</span><span class="s2">"Eligible"</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
<thead>
<tr>
<th>Race</th> <th>Eligible</th> <th>Eligible sample</th>
</tr>
</thead>
<tbody>
<tr>
<td>Asian </td> <td>0.15 </td> <td>233 </td>
</tr>
</tbody>
<tbody><tr>
<td>Black </td> <td>0.18 </td> <td>245 </td>
</tr>
</tbody>
<tbody><tr>
<td>Latino</td> <td>0.12 </td> <td>177 </td>
</tr>
</tbody>
<tbody><tr>
<td>White </td> <td>0.54 </td> <td>787 </td>
</tr>
</tbody>
<tbody><tr>
<td>Other </td> <td>0.01 </td> <td>11 </td>
</tr>
</tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Each count is selected randomly in such a way that the chance of each category is the corresponding proportion in the original <code>"Eligible"</code> column. Sampling again will give different counts, but again the <code>"White"</code> category is much more common than <code>"Other"</code> because of its much higher proportion.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">population</span><span class="o">.</span><span class="n">sample_from_distribution</span><span class="p">(</span><span class="s2">"Eligible"</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
<thead>
<tr>
<th>Race</th> <th>Eligible</th> <th>Eligible sample</th>
</tr>
</thead>
<tbody>
<tr>
<td>Asian </td> <td>0.15 </td> <td>227 </td>
</tr>
</tbody>
<tbody><tr>
<td>Black </td> <td>0.18 </td> <td>247 </td>
</tr>
</tbody>
<tbody><tr>
<td>Latino</td> <td>0.12 </td> <td>155 </td>
</tr>
</tbody>
<tbody><tr>
<td>White </td> <td>0.54 </td> <td>819 </td>
</tr>
</tbody>
<tbody><tr>
<td>Other </td> <td>0.01 </td> <td>5 </td>
</tr>
</tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The option <code>proportions=True</code> draws sample counts and then divides each count by the total number of samples. The result is another distribution; one based on a sample.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">sample</span> <span class="o">=</span> <span class="n">population</span><span class="o">.</span><span class="n">sample_from_distribution</span><span class="p">(</span><span class="s2">"Eligible"</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">,</span> <span class="n">proportions</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">sample</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
<thead>
<tr>
<th>Race</th> <th>Eligible</th> <th>Eligible sample</th>
</tr>
</thead>
<tbody>
<tr>
<td>Asian </td> <td>0.15 </td> <td>0.140399 </td>
</tr>
</tbody>
<tbody><tr>
<td>Black </td> <td>0.18 </td> <td>0.189264 </td>
</tr>
</tbody>
<tbody><tr>
<td>Latino</td> <td>0.12 </td> <td>0.125258 </td>
</tr>
</tbody>
<tbody><tr>
<td>White </td> <td>0.54 </td> <td>0.532691 </td>
</tr>
</tbody>
<tbody><tr>
<td>Other </td> <td>0.01 </td> <td>0.0123882 </td>
</tr>
</tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>This ut sample can also be used to generate a sample. The sample of a sample is called a <em>resampled sample</em> or a <em>bootstrap sample</em>. It is not a random sample of the original distribution, but shares many characteristics with such a sample.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">sample</span><span class="o">.</span><span class="n">sample_from_distribution</span><span class="p">(</span><span class="s1">'Eligible sample'</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
<thead>
<tr>
<th>Race</th> <th>Eligible</th> <th>Eligible sample</th> <th>Eligible sample sample</th>
</tr>
</thead>
<tbody>
<tr>
<td>Asian </td> <td>0.15 </td> <td>0.140399 </td> <td>204 </td>
</tr>
</tbody>
<tbody><tr>
<td>Black </td> <td>0.18 </td> <td>0.189264 </td> <td>258 </td>
</tr>
</tbody>
<tbody><tr>
<td>Latino</td> <td>0.12 </td> <td>0.125258 </td> <td>163 </td>
</tr>
</tbody>
<tbody><tr>
<td>White </td> <td>0.54 </td> <td>0.532691 </td> <td>810 </td>
</tr>
</tbody>
<tbody><tr>
<td>Other </td> <td>0.01 </td> <td>0.0123882 </td> <td>18 </td>
</tr>
</tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Empirical-Distributions">Empirical Distributions<a class="anchor-link" href="#Empirical-Distributions">¶</a></h2><p>An <em>empirical distribution</em> of a statistic is a table generated by computing the statistic for many samples. The previous section used empirical distributions to reason about whether or not a sample had a statistic that was typical of a random sample. The following function computes the empirical histogram of a statistic, which is expressed as a function that takes a sample.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">empirical_distribution</span><span class="p">(</span><span class="n">table</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">f</span><span class="p">):</span>
<span class="n">stats</span> <span class="o">=</span> <span class="n">Table</span><span class="p">([</span><span class="s1">'Sample #'</span><span class="p">,</span> <span class="s1">'Statistic'</span><span class="p">])</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">k</span><span class="p">):</span>
<span class="n">sample</span> <span class="o">=</span> <span class="n">table</span><span class="o">.</span><span class="n">sample_from_distribution</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">)</span>
<span class="n">statistic</span> <span class="o">=</span> <span class="n">f</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
<span class="n">stats</span><span class="o">.</span><span class="n">append</span><span class="p">([</span><span class="n">i</span><span class="p">,</span> <span class="n">statistic</span><span class="p">])</span>
<span class="k">return</span> <span class="n">stats</span>
</pre></div></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>This function can be used to compute an empirical distribution of the total variation distance from the population distribution.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">tvd_from_eligible</span><span class="p">(</span><span class="n">sample</span><span class="p">):</span>
<span class="n">counts</span> <span class="o">=</span> <span class="n">sample</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">'Eligible sample'</span><span class="p">)</span>
<span class="n">distribution</span> <span class="o">=</span> <span class="n">counts</span> <span class="o">/</span> <span class="nb">sum</span><span class="p">(</span><span class="n">counts</span><span class="p">)</span>
<span class="k">return</span> <span class="n">total_variation_distance</span><span class="p">(</span><span class="n">population</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">'Eligible'</span><span class="p">),</span> <span class="n">distribution</span><span class="p">)</span>
<span class="n">empirical_distribution</span><span class="p">(</span><span class="n">population</span><span class="p">,</span> <span class="s1">'Eligible'</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">,</span> <span class="mi">1000</span><span class="p">,</span> <span class="n">tvd_from_eligible</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
<thead>
<tr>
<th>Sample #</th> <th>Statistic</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 </td> <td>0.0248796 </td>
</tr>
</tbody>
<tbody><tr>
<td>1 </td> <td>0.0195664 </td>
</tr>
</tbody>
<tbody><tr>
<td>2 </td> <td>0.00401927</td>
</tr>
</tbody>
<tbody><tr>
<td>3 </td> <td>0.0165864 </td>
</tr>
</tbody>
<tbody><tr>
<td>4 </td> <td>0.0207502 </td>
</tr>
</tbody>
<tbody><tr>
<td>5 </td> <td>0.013042 </td>
</tr>
</tbody>
<tbody><tr>
<td>6 </td> <td>0.0297041 </td>
</tr>
</tbody>
<tbody><tr>
<td>7 </td> <td>0.0153544 </td>
</tr>
</tbody>
<tbody><tr>
<td>8 </td> <td>0.00830695</td>
</tr>
</tbody>
<tbody><tr>
<td>9 </td> <td>0.014384 </td>
</tr>
</tbody>
</table>
<p>... (990 rows omitted)</p></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The empirical distribution can be visualized using a histogram.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">empirical_distribution</span><span class="p">(</span><span class="n">population</span><span class="p">,</span> <span class="s1">'Eligible'</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">,</span> <span class="mi">1000</span><span class="p">,</span> <span class="n">tvd_from_eligible</span><span class="p">)</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_png output_subarea ">
<img src="../notebooks-images/Testing_18_0.png"/></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="U.S.-Supreme-Court,-1965:-Swain-vs.-Alabama">U.S. Supreme Court, 1965: Swain vs. Alabama<a class="anchor-link" href="#U.S.-Supreme-Court,-1965:-Swain-vs.-Alabama">¶</a></h2><p>In the early 1960's, in Talladega County in Alabama, a black man called Robert Swain was convicted of raping a white woman and was sentenced to death. He appealed his sentence, citing among other factors the all-white jury. At the time, only men aged 21 or older were allowed to serve on juries in Talladega County. In the county, 26% of the eligible jurors were black, but there were only 8 black men among the 100 selected for the jury panel in Swain's trial. No black man was selected for the trial jury.</p>
<p>In 1965, the Supreme Court of the United States denied Swain's appeal. In its ruling, the Court wrote "... the overall percentage disparity has been small and reflects no studied attempt to include or exclude a specified number of [black men]."</p>
<p>Let us use the methods we have developed to examine the disparity between 8 out of 100 and 26 out of 100 black men in a panel of 100 drawn at random from among the eligible jurors.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">alabama</span> <span class="o">=</span> <span class="n">Table</span><span class="p">([</span><span class="s1">'Race'</span><span class="p">,</span> <span class="s1">'Eligible'</span><span class="p">])</span><span class="o">.</span><span class="n">with_rows</span><span class="p">([</span>
<span class="p">[</span><span class="s2">"Black"</span><span class="p">,</span> <span class="mf">0.26</span><span class="p">],</span>
<span class="p">[</span><span class="s2">"Other"</span><span class="p">,</span> <span class="mf">0.74</span><span class="p">]</span>
<span class="p">])</span>
<span class="n">alabama</span><span class="o">.</span><span class="n">set_format</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">PercentFormatter</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
<thead>
<tr>
<th>Race</th> <th>Eligible</th>
</tr>
</thead>
<tbody>
<tr>
<td>Black</td> <td>26% </td>
</tr>
</tbody>
<tbody><tr>
<td>Other</td> <td>74% </td>
</tr>
</tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>As our test statistic, we will use the number of black men in a random sample of size 100. Here is an empirical distribution of this statistic.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">value_for</span><span class="p">(</span><span class="n">category</span><span class="p">,</span> <span class="n">category_label</span><span class="p">,</span> <span class="n">value_label</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">in_table</span><span class="p">(</span><span class="n">sample</span><span class="p">):</span>
<span class="k">return</span> <span class="n">sample</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">category_label</span><span class="p">,</span> <span class="n">category</span><span class="p">)</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="n">value_label</span><span class="p">)</span><span class="o">.</span><span class="n">item</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">return</span> <span class="n">in_table</span>
<span class="n">num_black</span> <span class="o">=</span> <span class="n">value_for</span><span class="p">(</span><span class="s1">'Black'</span><span class="p">,</span> <span class="s1">'Race'</span><span class="p">,</span> <span class="s1">'Eligible sample'</span><span class="p">)</span>
<span class="n">black_in_sample</span> <span class="o">=</span> <span class="n">empirical_distribution</span><span class="p">(</span><span class="n">alabama</span><span class="p">,</span> <span class="s1">'Eligible'</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">10000</span><span class="p">,</span> <span class="n">num_black</span><span class="p">)</span>
<span class="n">black_in_sample</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
<thead>
<tr>
<th>Sample #</th> <th>Statistic</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 </td> <td>27 </td>
</tr>
</tbody>
<tbody><tr>
<td>1 </td> <td>31 </td>
</tr>
</tbody>
<tbody><tr>
<td>2 </td> <td>28 </td>
</tr>
</tbody>
<tbody><tr>
<td>3 </td> <td>27 </td>
</tr>
</tbody>
<tbody><tr>
<td>4 </td> <td>24 </td>
</tr>
</tbody>
<tbody><tr>
<td>5 </td> <td>23 </td>
</tr>
</tbody>
<tbody><tr>
<td>6 </td> <td>29 </td>
</tr>
</tbody>
<tbody><tr>
<td>7 </td> <td>28 </td>
</tr>
</tbody>
<tbody><tr>
<td>8 </td> <td>16 </td>
</tr>
</tbody>
<tbody><tr>
<td>9 </td> <td>25 </td>
</tr>
</tbody>
</table>
<p>... (9990 rows omitted)</p></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The numbers of black men in the first 10 repetitions were quite a bit larger than 8. The empirical histogram below shows the distribution of the test statistic over all the repetitions.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">black_in_sample</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">50</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</pre></div></div></div>
<div class="output_png output_subarea ">
<img src="../notebooks-images/Testing_24_0.png"/></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>If the 100 men in Swain's jury panel had been chosen at random, it would have been extremely unlikely for the number of black men on the panel to be as small as 8. We must conclude that the percentage disparity was larger than the disparity expected due to chance variation.</p></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Method-and-Terminology-of-Statistical-Tests-of-Hypotheses">Method and Terminology of Statistical Tests of Hypotheses<a class="anchor-link" href="#Method-and-Terminology-of-Statistical-Tests-of-Hypotheses">¶</a></h2><p>We have developed some of the fundamental concepts of statistical tests of hypotheses, in the context of examples about jury selection. Using statistical tests as a way of making decisions is standard in many fields and has a standard terminology. Here is the sequence of the steps in most statistical tests, along with some terminology and examples.</p>
<p><strong>STEP 1: THE HYPOTHESES</strong></p>
<p>All statistical tests attempt to choose between two views of how the data were generated. These two views are called <em>hypotheses</em>.</p>
<p><strong>The null hypothesis.</strong> This says that the data were generated at random under clearly specified assumptions that make it possible to compute chances. The word "null" reinforces the idea that if the data look different from what the null hypothesis predicts, the difference is due to nothing but chance.</p>
<p>In both of our examples about jury selection, the null hypothesis is that the panels were selected at random from the population of eligible jurors. Though the racial composition of the panels was different from that of the populations of eligible jurors, there was no reason for the difference other than chance variation.</p>
<p><strong>The alternative hypothesis.</strong> This says that some reason other than chance made the data differ from what was predicted by the null hypothesis. Informally, the alternative hypothesis says that the observed difference is "real."</p>
<p>In both of our examples about jury selection, the alternative hypothesis is that the panels were not selected at random. Something other than chance led to the differences between the racial composition of the panels and the racial composition of the populations of eligible jurors.</p></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>STEP 2: THE TEST STATISTIC</strong></p>
<p>In order to decide between the two hypothesis, we must choose a statistic upon which we will base our decision. This is called the <strong>test statistic</strong>.</p>
<p>In the example about jury panels in Alameda County, the test statistic we used was the total variation distance between the racial distributions in the panels and in the population of eligible jurors. In the example about Swain versus the State of Alabama, the test statistic was the number of black men on the jury panel.</p>
<p>Calculating the observed value of the test statistic is often the first computational step in a statistical test. In the case of jury panels in Alameda County, the observed value of the total variation distance between the distributions in the panels and the population was 0.14. In the example about Swain, the number of black men on his jury panel was 8.</p></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>STEP 3: THE PROBABILITY DISTRIBUTION OF THE TEST STATISTIC, UNDER THE NULL HYPOTHESIS</strong></p>
<p>This step sets aside the observed value of the test statistic, and instead focuses on <em>what the value might be if the null hypothesis were true</em>. Under the null hypothesis, the sample could have come out differently due to chance. So the test statistic could have come out differently. This step consists of figuring out all possible values of the test statistic and all their probabilities, under the null hypothesis of randomness.</p>
<p>In other words, in this step we calculate the probability distribution of the test statistic pretending that the null hypothesis is true. For many test statistics, this can be a daunting task both mathematically and computationally. Therefore, we approximate the probability distribution of the test statistic by the empirical distribution of the statistic based on a large number of repetitions of the sampling procedure.</p>
<p>This was the empirical distribution of the test statistic – the number of black men on the jury panel – in the example about Swain and the Supreme Court:</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">black_in_sample</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">50</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</pre></div></div></div>
<div class="output_png output_subarea ">
<img src="../notebooks-images/Testing_29_0.png"/></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>STEP 4. THE CONCLUSION OF THE TEST</strong></p>
<p>The choice between the null and alternative hypotheses depends on the comparison between the results of Steps 2 and 3: the observed test statistic and its distribution as predicted by the null hypothesis.</p>
<p>If the two are consistent with each other, then the observed test statistic is in line with what the null hypothesis predicts. In other words, the test does not point towards the alternative hypothesis; the null hypothesis is better supported by the data.</p>
<p>But if the two are not consistent with each other, as is the case in both of our examples about jury panels, then the data do not support the null hypothesis. In both our examples we concluded that the jury panels were not selected at random. Something other than chance affected their composition.</p></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>How is "consistent" defined?</strong> Whether the observed test statistic is consistent with its predicted distribution under the null hypothesis is a matter of judgment. We recommend that you provide your judgment along with the value of the test statistic and a graph of its predicted distribution. That will allow your reader to make his or her own judgment about whether the two are consistent.</p>
<p>If you do not want to make your own judgment, there are conventions that you can follow. These conventions are based on what is called the <strong>observed significance level</strong> or <em>P-value</em> for short. The P-value is a chance computed using the probability distribution of the test statistic, and can be approximated by using the empirical distribution in Step 3.</p>
<p><strong>Practical note on P-values and conventions.</strong> Place the observed test statistic on the horizontal axis of the histogram, and find the proportion in the tail starting at that point. That's the P-value.</p>
<p>If the P-value is small, the data support the alternative hypothesis. The conventions for what is "small":</p>
<ul>
<li><p>If the P-value is less than 5%, the result is called "statistically significant."</p>
</li>
<li><p>If the P-value is even smaller – less than 1% – the result is called "highly statistically significant."</p>
</li>
</ul>
<p><strong>More formal definition of P-value.</strong> The P-value is the chance, under the null hypothesis, that the test statistic is equal to the value that was observed or is even further in the direction of the alternative.</p>
<p>The P-value is based on comparing the observed test statistic and what the null hypothesis predicts. A small P-value implies that under the null hypothesis, the value of the test statistic is unlikely to be near the one we observed. When a hypothesis and the data are not in accord, the hypothesis has to go. That is why we reject the null hypothesis if the P-value is small.</p>
<p>The P-value for the null hypothesis that Swain's jury panel was selected at random from the population of Talladega is approximated using the empirical distribution as follows:</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">np</span><span class="o">.</span><span class="n">count_nonzero</span><span class="p">(</span><span class="n">black_in_sample</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="o"><=</span> <span class="mi">8</span><span class="p">)</span> <span class="o">/</span> <span class="n">black_in_sample</span><span class="o">.</span><span class="n">num_rows</span>
</pre></div></div></div>
<div class="output_text output_subarea output_execute_result">
<pre>0.0</pre></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>HISTORICAL NOTE ON THE CONVENTIONS</strong></p>
<p>The determination of statistical significance, as defined above, has become standard in statistical analyses in all fields of application. When a convention is so universally followed, it is interesting to examine how it arose.</p>
<p>The method of statistical testing – choosing between hypotheses based on data in random samples – was developed by Sir Ronald Fisher in the early 20th century. Sir Ronald might have set the convention for statistical significance somewhat unwittingly, in the following statement in his 1925 book <em>Statistical Methods for Research Workers</em>. About the 5% level, he wrote, "It is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not."</p>
<p>What was "convenient" for Sir Ronald became a cutoff that has acquired the status of a universal constant. No matter that Sir Ronald himself made the point that the value was his personal choice from among many: in an article in 1926, he wrote, "If one in twenty does not seem high enough odds, we may, if we prefer it draw the line at one in fifty (the 2 percent point), or one in a hundred (the 1 percent point). Personally, the author prefers to set a low standard of significance at the 5 percent point ..."</p>
<p>Fisher knew that "low" is a matter of judgment and has no unique definition. We suggest that you follow his excellent example. Provide your data, make your judgment, and explain why you made it.</p></div></div></div>