/
AB.html
284 lines (269 loc) · 29 KB
/
AB.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
<div id="ipython-notebook">
<a class="interact-button" href="http://datahub.berkeley.edu/user-redirect/interact?repo=textbook&path=notebooks/baby.csv&path=notebooks/AB.ipynb">Interact</a>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [['$','$']],
processEscapes: true
}
});
</script>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Using-the-Bootstrap-Method-to-Test-Hypotheses">Using the Bootstrap Method to Test Hypotheses<a class="anchor-link" href="#Using-the-Bootstrap-Method-to-Test-Hypotheses">¶</a></h2><p>We have used random permutations to see whether two samples are drawn from the same underlying distribution. The bootstrap method can also be used in such statistical tests of hypotheses.</p>
<p>The examples in this section are based on data on a random sample of 1,174 pairs of mothers and their newborn infants. The table <code>baby</code> contains the data. Each row represents a mother-baby pair. The variables are:</p>
<ul>
<li>the baby's birth weight in ounces</li>
<li>the number of gestational days</li>
<li>the mother's age in completed years</li>
<li>the mother's height in inches</li>
<li>the mother's pregnancy weight in pounds</li>
<li>whether or not the mother was a smoker</li>
</ul></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">baby</span> <span class="o">=</span> <span class="n">Table</span><span class="o">.</span><span class="n">read_table</span><span class="p">(</span><span class="s1">'baby.csv'</span><span class="p">)</span>
<span class="n">baby</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
<thead>
<tr>
<th>Birth Weight</th> <th>Gestational Days</th> <th>Maternal Age</th> <th>Maternal Height</th> <th>Maternal Pregnancy Weight</th> <th>Maternal Smoker</th>
</tr>
</thead>
<tbody>
<tr>
<td>120 </td> <td>284 </td> <td>27 </td> <td>62 </td> <td>100 </td> <td>False </td>
</tr>
</tbody>
<tbody><tr>
<td>113 </td> <td>282 </td> <td>33 </td> <td>64 </td> <td>135 </td> <td>False </td>
</tr>
</tbody>
<tbody><tr>
<td>128 </td> <td>279 </td> <td>28 </td> <td>64 </td> <td>115 </td> <td>True </td>
</tr>
</tbody>
<tbody><tr>
<td>108 </td> <td>282 </td> <td>23 </td> <td>67 </td> <td>125 </td> <td>True </td>
</tr>
</tbody>
<tbody><tr>
<td>136 </td> <td>286 </td> <td>25 </td> <td>62 </td> <td>93 </td> <td>False </td>
</tr>
</tbody>
<tbody><tr>
<td>138 </td> <td>244 </td> <td>33 </td> <td>62 </td> <td>178 </td> <td>False </td>
</tr>
</tbody>
<tbody><tr>
<td>132 </td> <td>245 </td> <td>23 </td> <td>65 </td> <td>140 </td> <td>False </td>
</tr>
</tbody>
<tbody><tr>
<td>120 </td> <td>289 </td> <td>25 </td> <td>62 </td> <td>125 </td> <td>False </td>
</tr>
</tbody>
<tbody><tr>
<td>143 </td> <td>299 </td> <td>30 </td> <td>66 </td> <td>136 </td> <td>True </td>
</tr>
</tbody>
<tbody><tr>
<td>140 </td> <td>351 </td> <td>27 </td> <td>68 </td> <td>120 </td> <td>False </td>
</tr>
</tbody>
</table>
<p>... (1164 rows omitted)</p></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Suppose we want to compare the mothers who smoke and the mothers who are non-smokers. Do they differ in any way other than smoking? A key variable of interest is the birth weight of their babies. To study this variable, we begin by noting that there are 715 non-smokers among the women in the sample, and 459 smokers.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">baby</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s1">'Maternal Smoker'</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_html rendered_html output_subarea output_execute_result">
<table border="1" class="dataframe">
<thead>
<tr>
<th>Maternal Smoker</th> <th>count</th>
</tr>
</thead>
<tbody>
<tr>
<td>False </td> <td>715 </td>
</tr>
</tbody>
<tbody><tr>
<td>True </td> <td>459 </td>
</tr>
</tbody>
</table></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The first histogram below displays the distribution of birth weights of the babies of the non-smokers in the sample. The second displays the birth weights of the babies of the smokers.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">baby</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="s1">'Maternal Smoker'</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="s1">'Birth Weight'</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">40</span><span class="p">,</span> <span class="mi">181</span><span class="p">,</span> <span class="mi">5</span><span class="p">),</span> <span class="n">unit</span><span class="o">=</span><span class="s1">'ounce'</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_png output_subarea ">
<img src="/notebooks-images/AB_6_0.png"/></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">baby</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="s1">'Maternal Smoker'</span><span class="p">,</span> <span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="s1">'Birth Weight'</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">40</span><span class="p">,</span> <span class="mi">181</span><span class="p">,</span> <span class="mi">5</span><span class="p">),</span> <span class="n">unit</span><span class="o">=</span><span class="s1">'ounce'</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_png output_subarea ">
<img src="/notebooks-images/AB_7_0.png"/></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Both distributions are approximately bell shaped and centered near 120 ounces. The distributions are not identical, of course, which raises the question of whether the difference reflects just chance variation or a difference in the distributions in the population.</p>
<p>This question can be answered by a test of the null and alternative hypotheses below. Because we are testing for the equality of two distributions, one for Category A (mothers who don't smoke) and the other for Category B (mothers who smoke), the method is known rather unimaginatively as <em>A/B testing</em>.</p></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>Null hypothesis.</strong> In the population, the distribution of birth weights of babies is the same for mothers who don't smoke as for mothers who do. The difference in the sample is due to chance.</p>
<p><strong>Alternative hypothesis.</strong> The two distributions are different in the population.</p>
<p><strong>Test statistic.</strong> Birth weight is a quantitative variable, so it is reasonable to use the difference between means as the test statistic. There are other reasonable test statistics as well; the difference between means is just one simple choice.</p>
<p>The observed difference between the means of the two groups in the sample is about 9.27 ounces.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">nonsmokers_mean</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">baby</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="s1">'Maternal Smoker'</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">'Birth Weight'</span><span class="p">))</span>
<span class="n">smokers_mean</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">baby</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="s1">'Maternal Smoker'</span><span class="p">,</span> <span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">'Birth Weight'</span><span class="p">))</span>
<span class="n">nonsmokers_mean</span> <span class="o">-</span> <span class="n">smokers_mean</span>
</pre></div></div></div>
<div class="output_text output_subarea output_execute_result">
<pre>9.2661425720249184</pre></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Implementing-the-bootstrap-method">Implementing the bootstrap method<a class="anchor-link" href="#Implementing-the-bootstrap-method">¶</a></h3><p>To see whether such a difference could have arisen due to chance under the null hypothesis, we could use a permutation test just as we did in the previous section. An alternative is to use the bootstrap, which we will do here.</p>
<p>Under the null hypothesis, the distributions of birth weights are the same for babies of women who smoke and for babies of those who don't. To see how much the distributions could vary due to chance alone, we will use the bootstrap method and draw new samples at random <em>with</em> replacement from the entire set of birth weights. The only difference between this and the permutation test of the previous section is that random permutations are based on sampling <em>without</em> replacement.</p>
<p>We will perform 10,000 repetitions of the bootstrap process. There are 1,174 babies, so in each repetition of the bootstrap process we drawn 1,174 times at random with replacement. Of these 1,174 draws, 715 are assigned to the non-smoking mothers and the remaining 459 to the mothers who smoke.</p>
<p>The code starts by sorting the rows so that the rows corresponding to the 715 non-smokers appear first. This eliminates the need to use conditional expressions to identify the rows corresponding to smokers and non-smokers in each replication of the sampling process.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="sd">"""Bootstrap test for the difference in mean birthweight</span>
<span class="sd">Category A: non-smoker Category B: smoker"""</span>
<span class="n">sorted_by_smoking</span> <span class="o">=</span> <span class="n">baby</span><span class="o">.</span><span class="n">select</span><span class="p">([</span><span class="s1">'Maternal Smoker'</span><span class="p">,</span> <span class="s1">'Birth Weight'</span><span class="p">])</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="s1">'Maternal Smoker'</span><span class="p">)</span>
<span class="c1"># calculate the observed difference in means</span>
<span class="n">meanA</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">sorted_by_smoking</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">'Birth Weight'</span><span class="p">)[:</span><span class="mi">715</span><span class="p">])</span>
<span class="n">meanB</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">sorted_by_smoking</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">'Birth Weight'</span><span class="p">)[</span><span class="mi">715</span><span class="p">:])</span>
<span class="n">observed_difference</span> <span class="o">=</span> <span class="n">meanA</span> <span class="o">-</span> <span class="n">meanB</span>
<span class="n">repetitions</span><span class="o">=</span><span class="mi">10000</span>
<span class="n">diffs</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">repetitions</span><span class="p">):</span>
<span class="c1"># sample WITH replacement, same number as original sample size</span>
<span class="n">resample</span> <span class="o">=</span> <span class="n">sorted_by_smoking</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">with_replacement</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="c1"># Compute the difference of the means of the resampled values, between Categories A and B</span>
<span class="n">diff_means</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">resample</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">'Birth Weight'</span><span class="p">)[:</span><span class="mi">715</span><span class="p">])</span> <span class="o">-</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">resample</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">'Birth Weight'</span><span class="p">)[</span><span class="mi">715</span><span class="p">:])</span>
<span class="n">diffs</span><span class="o">.</span><span class="n">append</span><span class="p">([</span><span class="n">diff_means</span><span class="p">])</span>
<span class="c1"># Display results </span>
<span class="n">differences</span> <span class="o">=</span> <span class="n">Table</span><span class="p">()</span><span class="o">.</span><span class="n">with_column</span><span class="p">(</span><span class="s1">'diff_in_means'</span><span class="p">,</span> <span class="n">diffs</span><span class="p">)</span>
<span class="n">differences</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">bins</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span>
<span class="n">plots</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s1">'Approx null distribution of difference in means'</span><span class="p">)</span>
<span class="n">plots</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s1">'Bootstrap A/B Test'</span><span class="p">)</span>
<span class="n">plots</span><span class="o">.</span><span class="n">xlim</span><span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
<span class="n">plots</span><span class="o">.</span><span class="n">ylim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Observed difference in means: '</span><span class="p">,</span> <span class="n">observed_difference</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_subarea output_stream output_stdout output_text">
<pre>Observed difference in means: 9.26614257202
</pre></div>
<div class="output_png output_subarea ">
<img src="/notebooks-images/AB_12_1.png"/></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The figure shows that under the null hypothesis of equal distributions in the population, the bootstrap empirical distribution of the difference between the sample means of the two groups is roughly bell shaped, centered at 0, stretching from about $-4$ ounces to $4$ ounces. The observed difference in the original sample is about 9.27 ounces, which is inconsistent with this distribution. So the conclusion of the test is that in the population, the distributions of birth weights of the babies of non-smokers and smokers are different.</p></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Bootstrap-A/B-testing">Bootstrap A/B testing<a class="anchor-link" href="#Bootstrap-A/B-testing">¶</a></h3><p>Let us define a function <code>bootstrap_AB_test</code> that performs an A/B test using the bootstrap method and the difference in sample means as the test statistic. The null hypothesis is that the two underlying distributions in the population are equal; the alternative is that they are not.</p>
<p>The arguments of the function are:</p>
<ul>
<li>the name of the table that contains the data in the original sample</li>
<li>the label of the column containing the code 0 for Category A and 1 for Category B</li>
<li>the label of the column containing the values of the response variable (that is, the variable whose distribution is of interest)</li>
<li>the number of repetitions of the resampling procedure</li>
</ul>
<p>The function returns the observed difference in means, the bootstrap empirical distribution of the difference in means, and the bootstrap empirical P-value. Because the alternative simply says that the two underlying distributions are different, the P-value is computed as the proportion of sampled differences that are at least as large in absolute value as the absolute value of the observed difference.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="sd">"""Bootstrap A/B test for the difference in the mean response</span>
<span class="sd">Assumes A=0, B=1"""</span>
<span class="k">def</span> <span class="nf">bootstrap_AB_test</span><span class="p">(</span><span class="n">table</span><span class="p">,</span> <span class="n">categories</span><span class="p">,</span> <span class="n">values</span><span class="p">,</span> <span class="n">repetitions</span><span class="p">):</span>
<span class="c1"># Sort the sample table according to the A/B column; </span>
<span class="c1"># then select only the column of effects.</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">table</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="n">categories</span><span class="p">)</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="n">values</span><span class="p">)</span>
<span class="c1"># Find the number of entries in Category A.</span>
<span class="n">n_A</span> <span class="o">=</span> <span class="n">table</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">categories</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">num_rows</span>
<span class="c1"># Calculate the observed value of the test statistic.</span>
<span class="n">meanA</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">response</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="n">values</span><span class="p">)[:</span><span class="n">n_A</span><span class="p">])</span>
<span class="n">meanB</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">response</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="n">values</span><span class="p">)[</span><span class="n">n_A</span><span class="p">:])</span>
<span class="n">observed_difference</span> <span class="o">=</span> <span class="n">meanA</span> <span class="o">-</span> <span class="n">meanB</span>
<span class="c1"># Run the bootstrap procedure and get a list of resampled differences in means</span>
<span class="n">diffs</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">repetitions</span><span class="p">):</span>
<span class="n">resample</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">with_replacement</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">d</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">resample</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="n">values</span><span class="p">)[:</span><span class="n">n_A</span><span class="p">])</span> <span class="o">-</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">resample</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="n">values</span><span class="p">)[</span><span class="n">n_A</span><span class="p">:])</span>
<span class="n">diffs</span><span class="o">.</span><span class="n">append</span><span class="p">([</span><span class="n">d</span><span class="p">])</span>
<span class="c1"># Compute the bootstrap empirical P-value</span>
<span class="n">diff_array</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">diffs</span><span class="p">)</span>
<span class="n">empirical_p</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">count_nonzero</span><span class="p">(</span><span class="nb">abs</span><span class="p">(</span><span class="n">diff_array</span><span class="p">)</span> <span class="o">>=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">observed_difference</span><span class="p">))</span><span class="o">/</span><span class="n">repetitions</span>
<span class="c1"># Display results</span>
<span class="n">differences</span> <span class="o">=</span> <span class="n">Table</span><span class="p">()</span><span class="o">.</span><span class="n">with_column</span><span class="p">(</span><span class="s1">'diff_in_means'</span><span class="p">,</span> <span class="n">diffs</span><span class="p">)</span>
<span class="n">differences</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">bins</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span><span class="n">normed</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">plots</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s1">'Approx null distribution of difference in means'</span><span class="p">)</span>
<span class="n">plots</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s1">'Bootstrap A/B Test'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Observed difference in means: "</span><span class="p">,</span> <span class="n">observed_difference</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Bootstrap empirical P-value: "</span><span class="p">,</span> <span class="n">empirical_p</span><span class="p">)</span>
</pre></div></div></div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>We can now use the function <code>bootstrap_AB_test</code> to compare the smokers and non-smokers with respect to several different response variables. The tests show a statistically significant difference between the two groups in birth weight (as shown earlier), gestational days, maternal age, and maternal pregnancy weight. It comes as no surprise that the two groups do not differ significantly in their mean heights.</p></div></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">bootstrap_AB_test</span><span class="p">(</span><span class="n">baby</span><span class="p">,</span> <span class="s1">'Maternal Smoker'</span><span class="p">,</span> <span class="s1">'Birth Weight'</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_subarea output_stream output_stdout output_text">
<pre>Observed difference in means: 9.26614257202
Bootstrap empirical P-value: 0.0
</pre></div>
<div class="output_png output_subarea ">
<img src="/notebooks-images/AB_17_1.png"/></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">bootstrap_AB_test</span><span class="p">(</span><span class="n">baby</span><span class="p">,</span> <span class="s1">'Maternal Smoker'</span><span class="p">,</span> <span class="s1">'Gestational Days'</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_subarea output_stream output_stdout output_text">
<pre>Observed difference in means: 1.97652238829
Bootstrap empirical P-value: 0.0367
</pre></div>
<div class="output_png output_subarea ">
<img src="/notebooks-images/AB_18_1.png"/></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">bootstrap_AB_test</span><span class="p">(</span><span class="n">baby</span><span class="p">,</span> <span class="s1">'Maternal Smoker'</span><span class="p">,</span> <span class="s1">'Maternal Age'</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_subarea output_stream output_stdout output_text">
<pre>Observed difference in means: 0.80767250179
Bootstrap empirical P-value: 0.0192
</pre></div>
<div class="output_png output_subarea ">
<img src="/notebooks-images/AB_19_1.png"/></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">bootstrap_AB_test</span><span class="p">(</span><span class="n">baby</span><span class="p">,</span> <span class="s1">'Maternal Smoker'</span><span class="p">,</span> <span class="s1">'Maternal Pregnancy Weight'</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_subarea output_stream output_stdout output_text">
<pre>Observed difference in means: 2.56033030151
Bootstrap empirical P-value: 0.0374
</pre></div>
<div class="output_png output_subarea ">
<img src="/notebooks-images/AB_20_1.png"/></div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">bootstrap_AB_test</span><span class="p">(</span><span class="n">baby</span><span class="p">,</span> <span class="s1">'Maternal Smoker'</span><span class="p">,</span> <span class="s1">'Maternal Height'</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>
</pre></div></div></div>
<div class="output_subarea output_stream output_stdout output_text">
<pre>Observed difference in means: -0.0905891494127
Bootstrap empirical P-value: 0.5411
</pre></div>
<div class="output_png output_subarea ">
<img src="/notebooks-images/AB_21_1.png"/></div></div>