<table style="width: 100%;">
    <tr style="background-color: transparent;"><td>
        <img src="https://data-88e.github.io/assets/images/blue_text.png" width="250px" style="margin-left: 0;" />
    </td><td>
        <p style="text-align: right; font-size: 10pt;"><strong>Economic Models</strong>, Fall 2021<br>
            Dr. Eric Van Dusen</p></td></tr>
</table>

# Lab 8: Water Guard Randomized Controlled Trial

This lab is an adaptation from a set of notebooks developed for a full semester Data Science Connector Course taught in Fall 2017, entitled "Behind the Curtain in Economic Development".  This dataset come from a randomized controlled trial household survey carried out in Eastern Kenya in 2007-2008. 

The purpose of the study was to understand how to promote the use of WaterGuard, a dilute sodium hypochlorite solution that was promoted for Point-of-use household water disinfection.  There were seven arms in the study, which will be more fully described in the following chart:


<img src="Slide1.png"  />

Within this table you can see the seven treatments arms -  control plus three treatments -  in the bolded boxes in the middle with the number of springs and households. The study was carried out as a part of a study of households who gather drinking water from springs in a rural area.  The three boxes at the bottom describe the three rounds of data collection - a baseline before the treatment, and a short term and long term follow-up.  

<!-- **Notebook Outline**

1. [Mapping](#Mapping)
2. [Balance Check](#Balance)
3. [Baseline and a Randomly Selected Compound](#Baseline)
4. [Chlorine Usage outcome variables](#Chlorine)
5. [Graph of outcomes by Treatment Arm](#Graph)  -->

In [1]:
from datascience import *
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
from pandas import read_stata

In [56]:
!jupyter nbextension enable --py --sys-prefix widgetsnbextension
!pip install gmaps
!jupyter nbextension enable --py --sys-prefix gmaps

## Mapping

<div id="Mapping"></div>


This first section works with a package in Jupyter called `gmaps`;
the documentation is [here](http://jupyter-gmaps.readthedocs.io/en/latest/gmaps.html)
and it is worth a short read through if you are interested.

As a side note, a basic mapping program called Folium is included in the `datascience` package. 
It allows us to make open source maps from python data. The documentation is [here](http://folium.readthedocs.io/en/latest/index.html). 
However in rural Kenya there are few roads and very limited coverage with the Open Street Map base layer that works in Folium. Therefore we will use the satellite layer that is available from Google Maps.

In [3]:
# Using Google Maps
import gmaps
import gmaps.datasets
gmaps.configure(api_key="AIzaSyArvSLcSYs0ro9W489ee8Gr4zgzUq4x3Qc") # Fill in with your API key

We will start by reading in a dataset of the coordinates of the springs that are used in the WaterGuard Promotion (WGP) study.  These springs were randomized into seven different treatment arms.  The springs are identified by a unique numerical id tag, and the common name in the local language.  


In [4]:
springsGPS = Table.read_table('WGPgps_forData8.csv')
springsGPS

In [5]:
# make a table wth just the North and East Gps columns 
locations = springsGPS.select("gpsn1", "gpse1")
locations

In [6]:
# once the map is displayed, click the tab to display the satellite view
fig = gmaps.figure()
markers = gmaps.marker_layer(locations.to_df())
fig.add_layer(markers)
fig

In [7]:
# Let's change the color of the symbols 
fig = gmaps.figure()
symbols = gmaps.symbol_layer(locations.to_df(),fill_color="red")
fig.add_layer(symbols)
fig

Now the most interesting bit of data is still not being used, the Treatment Arm. Let's assign different colors to the different treatment arms so that when we map it we can see if the arms appear to be randomly distributed.

The following is function assigns the 7 different treatment arms to a set of colors. [Here](https://www.w3.org/TR/css3-color/#html4) is the colors reference if you are interested!  


In [8]:
def color(arm):
    if arm == 1:
        return 'fuschia'
    elif arm == 2:
        return 'red'
    elif arm == 3:
        return 'purple'
    elif arm == 4:
        return 'green'
    elif arm == 5:
        return 'blue'
    elif arm == 6:
        return 'olive'
    elif arm == 7:
        return 'teal'

In [9]:
# Using the .apply method, you can apply any function to a data frame
colors = springsGPS.apply(color, "treatment_arm")
springsGPS = springsGPS.with_column("color", colors)
springsGPS

In [10]:
fig = gmaps.figure(map_type='HYBRID')
symbols = gmaps.symbol_layer(locations.to_df(),
                             stroke_color=list(springsGPS.column("color")),#['color'].tolist(),
                             fill_color=list(springsGPS.column("color"))#['color'].tolist()
                            )
fig.add_layer(symbols)
fig

Do the colors seem randomly distributed?

In fact, the randomization was performed on just a list of the springs using a random number generator. 
It did not take spatial distribution into effect.  


<!-- BEGIN QUESTION -->

**Question 1:** A Thought Experiment on Spatial Randomization
- What could you do to test whether the Treatment arms are spatially distributed?
- What could you do to randomize the treatment arms over space?

<!--
BEGIN QUESTION
name: q1
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



<div id="Balance"></div>

## Balance Check and Variable Names

### Baseline Survey 
This is our first look at the survey dataset.  These are a limited set of questions/answers from a simple and short baseline survey. However it is a lot bigger and messier than the datasets we have seen so far and in Data 8. 

Data variable names follow along with the survey below, referred to by the section, a,b,c... number, 1,2,3... and a few words about the question. 

The purpose of this section will be: 
* to get a familiarity with the dataset, 
* to look at some background descriptor variables of the households, 
* to start to think about missing values and coding of subsets of the data.  
* to check the randomization of households by seeeing if the different arms of the study are balanced across some of the key baseline variables.  

**The surveys that illustrate the raw data names are in a file linked [here](https://drive.google.com/open?id=1UVoiVn7LJ4rn7WEb-9BJ96jmdJ2FBk60). You have to go and look through this survey to understand the variables.**

**The code sheet that has the codes for some of the possible answers are in a file linked [here](https://drive.google.com/file/d/1iinJXExeVKV4Dm7tRKOiotoYUDSXMyqc). You have to go and look through this code sheet in a later section.**

In [11]:
WGP_baseline = Table.read_table("WGP_baseline_Data8.csv")
WGP_baseline

In [12]:
baseline = pd.read_csv("WGP_baseline_Data8.csv")
baseline

In [13]:
baseline.dropna(axis = 0)

### Misssing values 

If you look through the dataset above, and scroll to the right a ways to some of the last variables, you will notice that that there are a lot of cells with NaN, which means a missing value. For these cells no data was entered at the time of data entry. In some cases it may be appropriate to enter a zero and carry on with the analyis.  



In [14]:
WGP_base_dfna = WGP_baseline.to_df().fillna(0)
WGP_table = Table.from_df(WGP_base_dfna)
WGP_table

 Look at the variable names, and then look at the survey form to find the concordance of codes

In [15]:
# Here is a list of all of the possible categories / columns
list(WGP_table)

### What are some Variables that we want to specifically look at? ###

There are a lot of variables here and it can be kind of overwhelming, but it is good to see how many columns there can be in a comprehensive survey dataset.  

#### Front Page information - A variables

- household id
- spring id
- interviewer id

#### Information about respondent - B variables 

- tribe
- education
- age
- gender 
- group membership

#### Water Guard Use - C variables

For Waterguard (WG) usage

- `c1a` - Whether the respondent has ever heard of WG
- `c2a` - Whether the respondent has ever used WG
- `c3a` - Whether the respondent's water is currently treated with WG
- `c4a` - Whether the respondent has used WG in the past month

#### Durable / Capital Goods - D variables

- Whether the respondent has electricity / latrine / iron roof
- Number of of bicycle / radio / hoe / beds owned
- Number of animals owned

#### Child Health - E variables

- `e1_num_kids_under_5`: Number of kids under 5
- `e2_`:  This table becomes tricky because it has a different format. Each kid in the table is numbered 01, 02 and so on, and then the subsequent questions are keyed to that child number. e.g. `e2e_01_d_diarrhea`, `e2e_02_d_diarrhea` represent whether child 1 and 2 respectively have diarrhea. In total, four diseases are recorded:
    - Cough
    - Diarrhea
    - Malaria
    - Vomiting



 

### The Treatment Arm 

In the study, arm 1 is control, while Arms 2-7 are different types of treatment interventions:
 
- Arm 1 - Control
- Arm 2 - Household Script
- Arm 3 - Community Script
- Arm 4 - HH + Community Script
- Arm 5 - Flat-Fee Promoter + Coupons
- Arm 6 - Incentivized Promoter + Coupons
- Arm 7 - Incentivized Promoter + Dispenser at Spring

*How many households are in each Treatment Arm?*


In [16]:
WGP_table.group("treatment_arm")

### Baseline Check - Exposure to Water Guard Use 

Let's see how many households have ever used Water Guards.

The data is currently Coded as 1 = Yes and 2 = No, so we can't really make sense of the Mean of the variable in its current form. Instead, we will make a new column/variable with the 1 or 2 answers translated into Yes or No.
Notably, we must first filter out respondents that had missing values (with value 0) for this question.



In [17]:
WGP_ever = WGP_table.where('c2a_wg_used_ever', are.above(0))
WGP_ever.group("c2a_wg_used_ever")

In [18]:
#This helper function goes through a column of choice, and spits out yes or no based off each value in the column. It returns an array of these yes and no's
def translate_to_yesno(table, col):
    dummy=[]
    table=table.where(col, are.above(0))
    for i in np.arange(table.num_rows):
        if table.column(col).item(i) == 1:
            dummy.append('Yes')
        else: #if not 1 then its 2 and 2 means no
            dummy.append("No")
    return dummy

In [19]:
new = translate_to_yesno(WGP_ever, 'c2a_wg_used_ever')
WGP_ever = WGP_ever.with_column('c2a_wg_used_ever',new)
WGP_ever.group('c2a_wg_used_ever')

### Pivoting and Balance Checks

Now we will use a command called **Pivot** to create a new table that has the percent of households who have ever used Water Guard within each Treatment Arm. 

We can first use it to do a  **balance check** for Water Guard use across Arms.

In [20]:
ever_yesno = WGP_ever.pivot('c2a_wg_used_ever','treatment_arm')
ever_yesno

Converting to percentages...

In [21]:
total = ever_yesno.column(1) + ever_yesno.column(2)
ever_yesno = ever_yesno.with_columns('Percent No',ever_yesno.column(1) / total * 100, 
                                     'Percent Yes', ever_yesno.column(2) / total * 100)
ever_yesno

Let's also repeat the process for the variable of whether the households are currently using Water Guard, `c3a_wg_water_currently_treat`.

In [22]:
WGP_current = WGP_table.where('c3a_wg_water_currently_treat',are.not_equal_to(0))
new2 = translate_to_yesno(WGP_current,'c3a_wg_water_currently_treat')
WGP_current = WGP_current.with_column('c3a_wg_water_currently_treat',new2)
WGP_current.group("c3a_wg_water_currently_treat")

Do you notice a problem here? Look at the total numbers reported in the output above.

We can do the same percentage tables for the balance check but maybe there's a problem. 
Look at the total number of households answering the question and compare that to the total number from the previous section.

In [23]:
current_yesno = WGP_current.pivot('c3a_wg_water_currently_treat','treatment_arm')
total = current_yesno.column(1) + current_yesno.column(2)
current_yesno = current_yesno.with_columns('Percent No',current_yesno.column(1)/total * 100, 
                                           'Percent Yes', current_yesno.column(2)/total * 100)
current_yesno

This seems like a really high usage, but **maybe this is due to missing values**. 

Let's now also include the 0 (missing) values in our analysis.


In [24]:
current_yesnomissing = WGP_table.pivot('c3a_wg_water_currently_treat','treatment_arm')
total = current_yesnomissing.column(1) + current_yesnomissing.column(2) + current_yesnomissing.column(3)
current_yesnomissing = current_yesnomissing.with_columns(
                                     'Percent Missing',current_yesnomissing.column("0.0") / total * 100, 
                                     'Percent No',current_yesnomissing.column("2.0") / total * 100, 
                                     'Percent Yes', current_yesnomissing.column("1.0") / total * 100)
current_yesnomissing

<!-- BEGIN QUESTION -->

**Question 2**

- Explain the previous table clearly and concisely, as if you were explaining it to someone who didn't know the back story.
- What does each rows/column mean?
- Conduct a balance check: does the distribution of yes/no/missing look balanced across arms?

<!--
BEGIN QUESTION
name: q2
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



<div id="Baseline"></div>

## Baseline and a Randomly Selected Compound




Let's describe a household selected at random.

First, we will extract the household/compound id into an array.

In [25]:
hhld_array = WGP_table.column('a1_cmpd_id')
hhld_array

Next, we will draw randomly from this array.

In [26]:
randomhh = np.random.choice(hhld_array)
print("My randomly selected household is household number", randomhh)

Then, let's look at the data for our randomly selected household:

In [27]:
myfamily = WGP_table.where("a1_cmpd_id",np.random.choice(WGP_table.column('a1_cmpd_id')))
myfamily

Some of the variables may need some manipulation. 
Let's start with the age of the respondent:

In [28]:
birthyear = myfamily.column("b3_birth_year").item(0)
surveyyear = myfamily.column("a5_date_interview_year").item(0)
agecalc = surveyyear-birthyear  # 
agecalc

And their tribe:

In [29]:
print("Survey respondent Tribe", myfamily.column("b5_tribe").item(0))
print("Respondent Spouse Tribe", myfamily.column("b7_tribe_spouse").item(0))

Lastly, whether they have a latrine:

In [30]:
print("Does the household have a latrine?", myfamily.column("d3_latrine").item(0))

Remember in the answer above it is coded so that 1=Yes and 2=No.

<!-- BEGIN QUESTION -->

**Question 3:** Describe your randomly selected household and the respondent who is answering the survey.

1. Age
2. Tribe
3. Education 
4. Member of any groups b11-b15?
5. Occupation
6. Religion 
7. A summary of D variables, iron roof, floor materials, latrine, cattle, and others
8. Have they ever used WG?
9. Their treatment arm assignment
10. How many children do they have  
11. Gender and Age of children
12. Have any of the children been sick?

<!--
BEGIN QUESTION
name: q3
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



<div id="Chlorine"></div>

## Water Guard Usage outcome variables




### WGP Followup - Variability
The purpose of this section will be to continue on with the follow-up rounds of the Water Guard Promotion study.   In this section we have both the household reported use, and the use validated by checking the chlorine content of the water using a test kit.  

In [31]:
WGP3rds_table = Table.read_table('WGP_3waves_Data8.csv')
WGP3rds_table

This is a large dataset, basically three datasets merged together, one for baseline, one for short term follow up and one for long term followup. The column `round` describes these 3 time steps:

- Round = 1 : baseline
- Round = 2 : 3 week followup
- Round = 3 : 3 month followup

Notably, many of the variables are only asked in one of the three rounds. For example, the chlorine use variables are:

- The variable for self reported chlorine use was `c6n` in Round 2, and `c5n` in Round 3.
- The variable for chlorine use is `c12n21pnk` in Round 2 and `c15npt2or1pnk` in Round 3.

Instead, the following variables have been combined across rounds for the ease of programming:

- `Selfrptpct` is self reported chlorine use in both round 2 and round 3
- `Vldclpct`  is validated chlorine use in both rounds

In [32]:
WGP3rds_table.group("treatment_arm")

In [33]:
WGP3rds_table.group('round')

### Grouping by round + treatment arm

We want to create a multi-level group: each group should be a unique combination of the survey round and the treatment arm.  


In [34]:
WGP_3rds_outcomesonly= WGP3rds_table.select("round", "treatment_arm", "Selfrptpct", "Vldclpct")
WGP_3rds_outcomesonly.group(["round","treatment_arm"], np.mean).show(30)

### Making a smaller dataset

Lets break out a smaller dataset of the variables we want to focus on; just for Round 2 and the outcome variables.

In [35]:
WGPRd2 = WGP3rds_table.where("round", 2).select("a1_cmpd_id","treatment_arm",
                                           "c6_current_water_treated_wg", 
                                           'c6_curr_water_treat_other_c',
                                           'c12_chlorine_meter_reading',
                                           'c11_chlorine_color','c12n21pnk', 'c6n'
                                          )
WGPRd2

A quick examination of the estimated Water Guard usage in Round 2 across all treatment arms:

In [36]:
np.mean(WGPRd2.column('c12n21pnk'))

### A/B Testing

To see if the treatment was significant, we can utilize A/B testing. Recall from Data 8 that an A/B tests if two numerical samples come from the same underlying distribution, simulating the null hypothesis by shuffling the treatment assignment labels. 

Let's conduct an A/B testing to compare the difference in chlorine use (using the measures validated by a Chlorine measurement) between treatment and control. For this exercise, we will compare between arm 1 (the control) and arm 5 (flat-fee promoter + coupon).

First, let's calculate the observed difference between the two groups.


In [37]:
relevant_households = WGPRd2.where("treatment_arm", are.contained_in(make_array(1, 5)))

grouped_tbl = relevant_households.group("treatment_arm", np.mean)
obs_diff = grouped_tbl.column("c12n21pnk mean").item(1) - grouped_tbl.column("c12n21pnk mean").item(0)
print("The observed difference is:", obs_diff )

Now we will simulate under the null, shuffling the treatment assignment labels.

In [38]:
mean_diffs = make_array()
for i in np.arange(1000):
    shuffled_labels = relevant_households.sample(with_replacement = False).column("treatment_arm")
    shuffled_tbl = relevant_households.with_column("shuffled treatment", shuffled_labels)
    grouped_tbl = shuffled_tbl.group("shuffled treatment", np.mean)
    mean_diff = grouped_tbl.column("c12n21pnk mean").item(1) - grouped_tbl.column("c12n21pnk mean").item(0)
    mean_diffs = np.append(mean_diffs, mean_diff)
Table().with_columns("Difference in means", mean_diffs).hist()
plt.scatter(obs_diff, 0, c = 'red')

### Computing Confidence Intervals

Another way to compare between two groups is to examine and compare confidence intervals of each group's mean.
Let's compute a confidence interval for the percent of households using Chlorine (the measures validated by a Chlorine measurement) via resampling.


In [39]:
chlorine_uses = make_array()
for i in np.arange(1000):
    bootstrapped_sample = WGPRd2.sample()
    sample_mean = np.mean(bootstrapped_sample.column('c12n21pnk'))
    chlorine_uses = np.append(chlorine_uses, sample_mean)
Table().with_columns("Chlorine use mean", chlorine_uses).hist()

In [40]:
lower = percentile(2.5, chlorine_uses)
upper = percentile(97.5, chlorine_uses)
print(f"The 95% confidence interval is [{lower}, {upper}]")

Using this technique we can look at the Confidence Intervals for each of the treatment arms.

In [41]:
# Arm 1
arm_1 = WGPRd2.where("treatment_arm", 1)
chlorine_uses_arm1 = make_array()
for i in np.arange(1000):
    bootstrapped_sample = arm_1.sample()
    sample_mean = np.mean(bootstrapped_sample.column('c12n21pnk'))
    chlorine_uses_arm1 = np.append(chlorine_uses_arm1, sample_mean)
Table().with_columns("Chlorine use mean (arm 1)", chlorine_uses_arm1).hist()

In [42]:
lower = percentile(2.5, chlorine_uses_arm1)
upper = percentile(97.5, chlorine_uses_arm1)
print(f"The 95% confidence interval for arm 1 is [{lower}, {upper}]")

Repeating for Arm 2...


In [43]:
arm_2 = WGPRd2.where("treatment_arm", 2)
chlorine_uses_arm2 = make_array()
for i in np.arange(1000):
    bootstrapped_sample = arm_2.sample()
    sample_mean = np.mean(bootstrapped_sample.column('c12n21pnk'))
    chlorine_uses_arm2 = np.append(chlorine_uses_arm2, sample_mean)
Table().with_columns("Chlorine use mean (arm 2)", chlorine_uses_arm2).hist()

In [44]:
lower = percentile(2.5, chlorine_uses_arm2)
upper = percentile(97.5, chlorine_uses_arm2)
print(f"The 95% confidence interval for arm 2 is [{lower}, {upper}]")

<!-- BEGIN QUESTION -->

**Question 4.1:** What can we tell by comparing the confidence intervals for Arm 1 and Arm 2? Do they overlap?  What does that mean?

<!--
BEGIN QUESTION
name: q4_1
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

In [45]:
arm = ...
chlorine_uses_arm = ...
for i in np.arange(1000):
    bootstrapped_sample = ...
    sample_mean = ...
    chlorine_uses_arm = ...
Table().with_columns("Chlorine use mean", chlorine_uses_arm).hist()

lower = percentile(2.5, chlorine_uses_arm)
upper = percentile(97.5, chlorine_uses_arm)
print(f"The 95% confidence interval is [{lower}, {upper}]")

<!-- BEGIN QUESTION -->

**Question 4.2:** In the cell above, test the data against another treatment arm. Construct the confidence interval via resampling and see if it is different than the control arm. Discuss your results in the cell below.

<!--
BEGIN QUESTION
name: q4_2
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



<div id="Graph"></div>

## Graph of outcomes by Treatment Arm




### WGP Followup Round 2, Round 3, and across rounds

Now we will work on making a summary graph for the WGP study. This graph should show the seven treatment arms with the levels of our outcomes, and include error bars (the standard deviations of our sample means). In addition, we can work on customizing and saving our graph.  

In [46]:
WGPRd2 = WGP3rds_table.where("round", 2).select("a1_cmpd_id",'treatment_arm','Selfrptpct', 'Vldclpct')
WGPRd2

In [47]:
# Group by treatment_arm and take the means of each group
round2_means = WGPRd2.group('treatment_arm', np.mean)
round2_means

In [48]:
# Save the means into an array for later use
round2_means_self_array = round2_means.column('Selfrptpct mean')
round2_means_vld_array = round2_means.column('Vldclpct mean')

Let's start with a bar chart of the self reported Water Guard usage across treatment arms.

In [49]:
round2_means.bar('treatment_arm','Selfrptpct mean') 

Comparing the self reported values against the validated values:

In [50]:
round2_means.bar('treatment_arm',make_array(2, 3)) 

Next, let's redo the same procedure for round 3.

In [51]:
WGPRd3 = WGP3rds_table.where("round",3).select("a1_cmpd_id",'treatment_arm','Selfrptpct', 'Vldclpct')
round3_means = WGPRd3.group('treatment_arm', np.mean)
round3_means_array = round3_means.column('Selfrptpct mean')
round3_means.bar('treatment_arm' ,make_array(2, 3)) 

## Optional - Practice with `matplotlib` and `pyplot`
Now let's try to make a graph that compares round 2 and round 3. 
This is a more complicated procedure, and requires us to use the `matplotlib` plotting library. 
Our `datascience` package uses `matplotlib` under the hood, but it is much more challenging to use.

Here are some references that you may find useful:
- https://matplotlib.org/gallery/api/barchart.html
- https://tonysyu.github.io/raw_content/matplotlib-style-gallery/gallery.html

In [52]:
plt.style.use('seaborn')  # You can try changing the style 

N = 7
ind = np.arange(N)  # the x locations for the groups
width = 0.3       # the width of the bars

fig, ax = plt.subplots()
rects1 = ax.bar(ind, round2_means_self_array, width, color='g')
rects2 = ax.bar(ind + width, round3_means_array, width, color='b')

# add some text for labels, title and axes ticks
ax.set_ylabel('Percent of households using Water Guard')
ax.set_title('Self reported Water Guard use')
ax.set_xlabel('Treatment Arm')
ax.set_xticks(ind + width / 2)
ax.set_xticklabels(('1', '2', '3', '4', '5', '6','7'))
ax.legend((rects1[0], rects2[0])
          ,('3 Week Visit', '3 Month Visit')  # relabeling Round 2 and Round 3
          ,bbox_to_anchor=(0.5, 1.0))  # placing the legend in the graph 
plt.show()

# If you want to save the figure into an image file
#plt.savefig("test.png")

<!-- BEGIN QUESTION -->

**Question 5:** Make a version of this graph for Validated Presence of WG

<!--
BEGIN QUESTION
name: q5
manual: true
-->

In [53]:
round3_means_vld_array = round3_means.column('Vldclpct mean')

N = 7
ind = np.arange(N)  # the x locations for the groups
width = 0.3       # the width of the bars

fig, ax = plt.subplots()
rects1 = ...
rects2 = ...

# add some text for labels, title and axes ticks
ax.set_ylabel('Percent of households using Water Guard')
ax.set_title('Validated Water Guard use') 
ax.set_xlabel('Treatment Arm')
ax.set_xticks(ind + width / 2)
ax.set_xticklabels(('1', '2', '3', '4', '5', '6','7'))
ax.legend((rects1[0], rects2[0])
          ,('3 Week Visit', '3 Month Visit')  # relabeling Round 2 and Round 3
          ,bbox_to_anchor=(0.5, 1.0))  # placing the legend in the graph 
plt.show()

# If you want to save the figure into an image file
#plt.savefig("test.png")

<!-- END QUESTION -->



Congrats! You've finished Lab 8!

---

## Submission

To submit this assignment, run the cell below and download the linked PDF by right-clicking on the link and selecting "Save Link As". Upload the downloaded PDF to Gradescope.

In [55]:
from otter.export import export_notebook
from IPython.display import display, HTML

export_notebook("lab08.ipynb", filtering=True, pagebreaks=True)
display(HTML("Download your PDF <a href='lab08.pdf' download>here</a>."))