notebooks-html/Testing.html

<div id="ipython-notebook">
            <a class="interact-button" href="http://data8.berkeley.edu/hub/interact?repo=textbook&path=notebooks/Testing.ipynb">Interact</a>
            
<script type="text/x-mathjax-config">
  MathJax.Hub.Config({
    tex2jax: {
      inlineMath: [['$','$']],
      processEscapes: true
    }
  });
</script>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Our investigation of jury panels is an example of <em>hypothesis testing</em>. We wished to know the answer to a yes-or-no question about the world, and we reasoned about random samples in order to answer the question. In this section, we formalize the process.</p></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Sampling-from-a-Distribution">Sampling from a Distribution<a class="anchor-link" href="#Sampling-from-a-Distribution">¶</a></h2><p>The rows of a table describe individual elements of a collection of data. In the previous section, we worked with a table that described a distribution over categories. Tables that describe the proportions or counts of different categories in a sample or population arise often in data analysis as a summary of a data collection process. The <code>sample_from_distribution</code> method draws a sample of counts for the categories. Its implementation uses the same <code>np.random.multinomial</code> method from the previous section.</p>
<p>The table below describes the proportion of the eligible potential jurors in Alameda County (estimated from 2000 census data and other sources) for broad categories of race and ethnic background. This table was compiled by Professor Weeks, a demographer at San Diego State University, for the Alameda County trial <em>People v. Stuart Alexander</em> in 2002.</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">population</span> <span class="o">=</span> <span class="n">Table</span><span class="p">([</span><span class="s2">"Race"</span><span class="p">,</span> <span class="s2">"Eligible"</span><span class="p">])</span><span class="o">.</span><span class="n">with_rows</span><span class="p">([</span>
    <span class="p">[</span><span class="s2">"Asian"</span><span class="p">,</span>  <span class="mf">0.15</span><span class="p">],</span>
    <span class="p">[</span><span class="s2">"Black"</span><span class="p">,</span>  <span class="mf">0.18</span><span class="p">],</span>
    <span class="p">[</span><span class="s2">"Latino"</span><span class="p">,</span> <span class="mf">0.12</span><span class="p">],</span>
    <span class="p">[</span><span class="s2">"White"</span><span class="p">,</span>  <span class="mf">0.54</span><span class="p">],</span>
    <span class="p">[</span><span class="s2">"Other"</span><span class="p">,</span>  <span class="mf">0.01</span><span class="p">],</span>
    <span class="p">])</span>
<span class="n">population</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
    <thead>
        <tr>
            <th>Race</th> <th>Eligible</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Asian </td> <td>0.15    </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Black </td> <td>0.18    </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Latino</td> <td>0.12    </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>White </td> <td>0.54    </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Other </td> <td>0.01    </td>
        </tr>
    </tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The <code>sample_from_distribution</code> method takes a number of samples and a column index or label. It returns a table in which the distribution column has been replaced by category counts of the sample.</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">sample_size</span> <span class="o">=</span> <span class="mi">1453</span>
<span class="n">population</span><span class="o">.</span><span class="n">sample_from_distribution</span><span class="p">(</span><span class="s2">"Eligible"</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
    <thead>
        <tr>
            <th>Race</th> <th>Eligible</th> <th>Eligible sample</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Asian </td> <td>0.15    </td> <td>233            </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Black </td> <td>0.18    </td> <td>245            </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Latino</td> <td>0.12    </td> <td>177            </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>White </td> <td>0.54    </td> <td>787            </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Other </td> <td>0.01    </td> <td>11             </td>
        </tr>
    </tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Each count is selected randomly in such a way that the chance of each category is the corresponding proportion in the original <code>"Eligible"</code> column. Sampling again will give different counts, but again the <code>"White"</code> category is much more common than <code>"Other"</code> because of its much higher proportion.</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">population</span><span class="o">.</span><span class="n">sample_from_distribution</span><span class="p">(</span><span class="s2">"Eligible"</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
    <thead>
        <tr>
            <th>Race</th> <th>Eligible</th> <th>Eligible sample</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Asian </td> <td>0.15    </td> <td>227            </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Black </td> <td>0.18    </td> <td>247            </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Latino</td> <td>0.12    </td> <td>155            </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>White </td> <td>0.54    </td> <td>819            </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Other </td> <td>0.01    </td> <td>5              </td>
        </tr>
    </tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The option <code>proportions=True</code> draws sample counts and then divides each count by the total number of samples. The result is another distribution; one based on a sample.</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">sample</span> <span class="o">=</span> <span class="n">population</span><span class="o">.</span><span class="n">sample_from_distribution</span><span class="p">(</span><span class="s2">"Eligible"</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">,</span> <span class="n">proportions</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">sample</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
    <thead>
        <tr>
            <th>Race</th> <th>Eligible</th> <th>Eligible sample</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Asian </td> <td>0.15    </td> <td>0.140399       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Black </td> <td>0.18    </td> <td>0.189264       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Latino</td> <td>0.12    </td> <td>0.125258       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>White </td> <td>0.54    </td> <td>0.532691       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Other </td> <td>0.01    </td> <td>0.0123882      </td>
        </tr>
    </tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>This ut sample can also be used to generate a sample. The sample of a sample is called a <em>resampled sample</em> or a <em>bootstrap sample</em>. It is not a random sample of the original distribution, but shares many characteristics with such a sample.</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">sample</span><span class="o">.</span><span class="n">sample_from_distribution</span><span class="p">(</span><span class="s1">'Eligible sample'</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
    <thead>
        <tr>
            <th>Race</th> <th>Eligible</th> <th>Eligible sample</th> <th>Eligible sample sample</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Asian </td> <td>0.15    </td> <td>0.140399       </td> <td>204                   </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Black </td> <td>0.18    </td> <td>0.189264       </td> <td>258                   </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Latino</td> <td>0.12    </td> <td>0.125258       </td> <td>163                   </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>White </td> <td>0.54    </td> <td>0.532691       </td> <td>810                   </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Other </td> <td>0.01    </td> <td>0.0123882      </td> <td>18                    </td>
        </tr>
    </tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Empirical-Distributions">Empirical Distributions<a class="anchor-link" href="#Empirical-Distributions">¶</a></h2><p>An <em>empirical distribution</em> of a statistic is a table generated by computing the statistic for many samples. The previous section used empirical distributions to reason about whether or not a sample had a statistic that was typical of a random sample. The following function computes the empirical histogram of a statistic, which is expressed as a function that takes a sample.</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">empirical_distribution</span><span class="p">(</span><span class="n">table</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">f</span><span class="p">):</span>
    <span class="n">stats</span> <span class="o">=</span> <span class="n">Table</span><span class="p">([</span><span class="s1">'Sample #'</span><span class="p">,</span> <span class="s1">'Statistic'</span><span class="p">])</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">k</span><span class="p">):</span>
        <span class="n">sample</span> <span class="o">=</span> <span class="n">table</span><span class="o">.</span><span class="n">sample_from_distribution</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">)</span>
        <span class="n">statistic</span> <span class="o">=</span> <span class="n">f</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
        <span class="n">stats</span><span class="o">.</span><span class="n">append</span><span class="p">([</span><span class="n">i</span><span class="p">,</span> <span class="n">statistic</span><span class="p">])</span>
    <span class="k">return</span> <span class="n">stats</span>
</pre></div></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>This function can be used to compute an empirical distribution of the total variation distance from the population distribution.</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">tvd_from_eligible</span><span class="p">(</span><span class="n">sample</span><span class="p">):</span>
    <span class="n">counts</span> <span class="o">=</span> <span class="n">sample</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">'Eligible sample'</span><span class="p">)</span>
    <span class="n">distribution</span> <span class="o">=</span> <span class="n">counts</span> <span class="o">/</span> <span class="nb">sum</span><span class="p">(</span><span class="n">counts</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">total_variation_distance</span><span class="p">(</span><span class="n">population</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">'Eligible'</span><span class="p">),</span> <span class="n">distribution</span><span class="p">)</span>

<span class="n">empirical_distribution</span><span class="p">(</span><span class="n">population</span><span class="p">,</span> <span class="s1">'Eligible'</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">,</span> <span class="mi">1000</span><span class="p">,</span> <span class="n">tvd_from_eligible</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
    <thead>
        <tr>
            <th>Sample #</th> <th>Statistic</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>0       </td> <td>0.0248796 </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>1       </td> <td>0.0195664 </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>2       </td> <td>0.00401927</td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>3       </td> <td>0.0165864 </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>4       </td> <td>0.0207502 </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>5       </td> <td>0.013042  </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>6       </td> <td>0.0297041 </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>7       </td> <td>0.0153544 </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>8       </td> <td>0.00830695</td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>9       </td> <td>0.014384  </td>
        </tr>
    </tbody>
</table>
<p>... (990 rows omitted)</p></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The empirical distribution can be visualized using a histogram.</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">empirical_distribution</span><span class="p">(</span><span class="n">population</span><span class="p">,</span> <span class="s1">'Eligible'</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">,</span> <span class="mi">1000</span><span class="p">,</span> <span class="n">tvd_from_eligible</span><span class="p">)</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_png output_subarea ">
<img src="../notebooks-images/Testing_18_0.png"/></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="U.S.-Supreme-Court,-1965:-Swain-vs.-Alabama">U.S. Supreme Court, 1965: Swain vs. Alabama<a class="anchor-link" href="#U.S.-Supreme-Court,-1965:-Swain-vs.-Alabama">¶</a></h2><p>In the early 1960's, in Talladega County in Alabama, a black man called Robert Swain was convicted of raping a white woman and was sentenced to death. He appealed his sentence, citing among other factors the all-white jury. At the time, only men aged 21 or older were allowed to serve on juries in Talladega County. In the county, 26% of the eligible jurors were black, but there were only 8 black men among the 100 selected for the jury panel in Swain's trial. No black man was selected for the trial jury.</p>
<p>In 1965, the Supreme Court of the United States denied Swain's appeal. In its ruling, the Court wrote "... the overall percentage disparity has been small and reflects no studied attempt to include or exclude a specified number of [black men]."</p>
<p>Let us use the methods we have developed to examine the disparity between 8 out of 100 and 26 out of 100 black men in a panel of 100 drawn at random from among the eligible jurors.</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">alabama</span> <span class="o">=</span> <span class="n">Table</span><span class="p">([</span><span class="s1">'Race'</span><span class="p">,</span> <span class="s1">'Eligible'</span><span class="p">])</span><span class="o">.</span><span class="n">with_rows</span><span class="p">([</span>
    <span class="p">[</span><span class="s2">"Black"</span><span class="p">,</span> <span class="mf">0.26</span><span class="p">],</span>
    <span class="p">[</span><span class="s2">"Other"</span><span class="p">,</span> <span class="mf">0.74</span><span class="p">]</span>
<span class="p">])</span>
<span class="n">alabama</span><span class="o">.</span><span class="n">set_format</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">PercentFormatter</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
    <thead>
        <tr>
            <th>Race</th> <th>Eligible</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Black</td> <td>26%     </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>Other</td> <td>74%     </td>
        </tr>
    </tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>As our test statistic, we will use the number of black men in a random sample of size 100. Here is an empirical distribution of this statistic.</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">value_for</span><span class="p">(</span><span class="n">category</span><span class="p">,</span> <span class="n">category_label</span><span class="p">,</span> <span class="n">value_label</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">in_table</span><span class="p">(</span><span class="n">sample</span><span class="p">):</span>
        <span class="k">return</span> <span class="n">sample</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">category_label</span><span class="p">,</span> <span class="n">category</span><span class="p">)</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="n">value_label</span><span class="p">)</span><span class="o">.</span><span class="n">item</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">in_table</span>

<span class="n">num_black</span> <span class="o">=</span> <span class="n">value_for</span><span class="p">(</span><span class="s1">'Black'</span><span class="p">,</span> <span class="s1">'Race'</span><span class="p">,</span> <span class="s1">'Eligible sample'</span><span class="p">)</span>

<span class="n">black_in_sample</span> <span class="o">=</span> <span class="n">empirical_distribution</span><span class="p">(</span><span class="n">alabama</span><span class="p">,</span> <span class="s1">'Eligible'</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">10000</span><span class="p">,</span> <span class="n">num_black</span><span class="p">)</span>
<span class="n">black_in_sample</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
    <thead>
        <tr>
            <th>Sample #</th> <th>Statistic</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>0       </td> <td>27       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>1       </td> <td>31       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>2       </td> <td>28       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>3       </td> <td>27       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>4       </td> <td>24       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>5       </td> <td>23       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>6       </td> <td>29       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>7       </td> <td>28       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>8       </td> <td>16       </td>
        </tr>
    </tbody>
        <tbody><tr>
            <td>9       </td> <td>25       </td>
        </tr>
    </tbody>
</table>
<p>... (9990 rows omitted)</p></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The numbers of black men in the first 10 repetitions were quite a bit larger than 8. The empirical histogram below shows the distribution of the test statistic over all the repetitions.</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">black_in_sample</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">50</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</pre></div></div></div>
<div class="output_png output_subarea ">
<img src="../notebooks-images/Testing_24_0.png"/></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>If the 100 men in Swain's jury panel had been chosen at random, it would have been extremely unlikely for the number of black men on the panel to be as small as 8. We must conclude that the percentage disparity was larger than the disparity expected due to chance variation.</p></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Method-and-Terminology-of-Statistical-Tests-of-Hypotheses">Method and Terminology of Statistical Tests of Hypotheses<a class="anchor-link" href="#Method-and-Terminology-of-Statistical-Tests-of-Hypotheses">¶</a></h2><p>We have developed some of the fundamental concepts of statistical tests of hypotheses, in the context of examples about jury selection. Using statistical tests as a way of making decisions is standard in many fields and has a standard terminology. Here is the sequence of the steps in most statistical tests, along with some terminology and examples.</p>
<p><strong>STEP 1: THE HYPOTHESES</strong></p>
<p>All statistical tests attempt to choose between two views of how the data were generated. These two views are called <em>hypotheses</em>.</p>
<p><strong>The null hypothesis.</strong> This says that the data were generated at random under clearly specified assumptions that make it possible to compute chances. The word "null" reinforces the idea that if the data look different from what the null hypothesis predicts, the difference is due to nothing but chance.</p>
<p>In both of our examples about jury selection, the null hypothesis is that the panels were selected at random from the population of eligible jurors. Though the racial composition of the panels was different from that of the populations of eligible jurors, there was no reason for the difference other than chance variation.</p>
<p><strong>The alternative hypothesis.</strong> This says that some reason other than chance made the data differ from what was predicted by the null hypothesis. Informally, the alternative hypothesis says that the observed difference is "real."</p>
<p>In both of our examples about jury selection, the alternative hypothesis is that the panels were not selected at random. Something other than chance led to the differences between the racial composition of the panels and the racial composition of the populations of eligible jurors.</p></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>STEP 2: THE TEST STATISTIC</strong></p>
<p>In order to decide between the two hypothesis, we must choose a statistic upon which we will base our decision. This is called the <strong>test statistic</strong>.</p>
<p>In the example about jury panels in Alameda County, the test statistic we used was the total variation distance between the racial distributions in the panels and in the population of eligible jurors. In the example about Swain versus the State of Alabama, the test statistic was the number of black men on the jury panel.</p>
<p>Calculating the observed value of the test statistic is often the first computational step in a statistical test. In the case of jury panels in Alameda County, the observed value of the total variation distance between the distributions in the panels and the population was 0.14. In the example about Swain, the number of black men on his jury panel was 8.</p></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>STEP 3: THE PROBABILITY DISTRIBUTION OF THE TEST STATISTIC, UNDER THE NULL HYPOTHESIS</strong></p>
<p>This step sets aside the observed value of the test statistic, and instead focuses on <em>what the value might be if the null hypothesis were true</em>. Under the null hypothesis, the sample could have come out differently due to chance. So the test statistic could have come out differently. This step consists of figuring out all possible values of the test statistic and all their probabilities, under the null hypothesis of randomness.</p>
<p>In other words, in this step we calculate the probability distribution of the test statistic pretending that the null hypothesis is true. For many test statistics, this can be a daunting task both mathematically and computationally. Therefore, we approximate the probability distribution of the test statistic by the empirical distribution of the statistic based on a large number of repetitions of the sampling procedure.</p>
<p>This was the empirical distribution of the test statistic – the number of black men on the jury panel – in the example about Swain and the Supreme Court:</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">black_in_sample</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">50</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</pre></div></div></div>
<div class="output_png output_subarea ">
<img src="../notebooks-images/Testing_29_0.png"/></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>STEP 4. THE CONCLUSION OF THE TEST</strong></p>
<p>The choice between the null and alternative hypotheses depends on the comparison between the results of Steps 2 and 3: the observed test statistic and its distribution as predicted by the null hypothesis.</p>
<p>If the two are consistent with each other, then the observed test statistic is in line with what the null hypothesis predicts. In other words, the test does not point towards the alternative hypothesis; the null hypothesis is better supported by the data.</p>
<p>But if the two are not consistent with each other, as is the case in both of our examples about jury panels, then the data do not support the null hypothesis. In both our examples we concluded that the jury panels were not selected at random. Something other than chance affected their composition.</p></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>How is "consistent" defined?</strong> Whether the observed test statistic is consistent with its predicted distribution under the null hypothesis is a matter of judgment. We recommend that you provide your judgment along with the value of the test statistic and a graph of its predicted distribution. That will allow your reader to make his or her own judgment about whether the two are consistent.</p>
<p>If you do not want to make your own judgment, there are conventions that you can follow. These conventions are based on what is called the <strong>observed significance level</strong> or <em>P-value</em> for short. The P-value is a chance computed using the probability distribution of the test statistic, and can be approximated by using the empirical distribution in Step 3.</p>
<p><strong>Practical note on P-values and conventions.</strong> Place the observed test statistic on the horizontal axis of the histogram, and find the proportion in the tail starting at that point. That's the P-value.</p>
<p>If the P-value is small, the data support the alternative hypothesis. The conventions for what is "small":</p>
<ul>
<li><p>If the P-value is less than 5%, the result is called "statistically significant."</p>
</li>
<li><p>If the P-value is even smaller – less than 1% – the result is called "highly statistically significant."</p>
</li>
</ul>
<p><strong>More formal definition of P-value.</strong> The P-value is the chance, under the null hypothesis, that the test statistic is equal to the value that was observed or is even further in the direction of the alternative.</p>
<p>The P-value is based on comparing the observed test statistic and what the null hypothesis predicts. A small P-value implies that under the null hypothesis, the value of the test statistic is unlikely to be near the one we observed. When a hypothesis and the data are not in accord, the hypothesis has to go. That is why we reject the null hypothesis if the P-value is small.</p>
<p>The P-value for the null hypothesis that Swain's jury panel was selected at random from the population of Talladega is approximated using the empirical distribution as follows:</p></div></div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">np</span><span class="o">.</span><span class="n">count_nonzero</span><span class="p">(</span><span class="n">black_in_sample</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="o">&lt;=</span> <span class="mi">8</span><span class="p">)</span> <span class="o">/</span> <span class="n">black_in_sample</span><span class="o">.</span><span class="n">num_rows</span>
</pre></div></div></div>
<div class="output_text output_subarea output_execute_result">
<pre>0.0</pre></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>HISTORICAL NOTE ON THE CONVENTIONS</strong></p>
<p>The determination of statistical significance, as defined above, has become standard in statistical analyses in all fields of application. When a convention is so universally followed, it is interesting to examine how it arose.</p>
<p>The method of statistical testing – choosing between hypotheses based on data in random samples – was developed by Sir Ronald Fisher in the early 20th century. Sir Ronald might have set the convention for statistical significance somewhat unwittingly, in the following statement in his 1925 book <em>Statistical Methods for Research Workers</em>. About the 5% level, he wrote, "It is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not."</p>
<p>What was "convenient" for Sir Ronald became a cutoff that has acquired the status of a universal constant. No matter that Sir Ronald himself made the point that the value was his personal choice from among many: in an article in 1926, he wrote, "If one in twenty does not seem high enough odds, we may, if we prefer it draw the line at one in fifty (the 2 percent point), or one in a hundred (the 1 percent point). Personally, the author prefers to set a low standard of significance at the 5 percent point ..."</p>
<p>Fisher knew that "low" is a matter of judgment and has no unique definition. We suggest that you follow his excellent example. Provide your data, make your judgment, and explain why you made it.</p></div></div></div>