<table style="width: 100%;">
    <tr style="background-color: transparent;"><td>
        <img src="https://data-88e.github.io/assets/images/blue_text.png" width="250px" style="margin-left: 0;" />
    </td><td>
        <p style="text-align: right; font-size: 10pt;"><strong>Economic Models</strong>, Fall 2024
        <br>
            Dr. Eric Van Dusen</p></td></tr>
</table>

# Lec9: Water Guard Randomized Controlled Trial

This notebook is an adaptation from a set of notebooks developed for a full semester Data Science Connector Course taught in Fall 2017, entitled "Behind the Curtain in Economic Development".  This dataset come from a randomized controlled trial household survey carried out in Eastern Kenya in 2007-2008. 

The purpose of the study was to understand how to promote the use of WaterGuard, a dilute sodium hypochlorite solution that was promoted for Point-of-use household water disinfection.  There were seven arms in the study, which will be more fully described in the following chart:


<img src="Slide1.png"  />

Within this table you can see the seven treatments arms -  control plus three treatments -  in the bolded boxes in the middle with the number of springs and households. The study was carried out as a part of a study of households who gather drinking water from springs in a rural area.  The three boxes at the bottom describe the three rounds of data collection - a baseline before the treatment, and a short term and long term follow-up.  

<!-- **Notebook Outline**

1. [Mapping](#Mapping)
2. [Balance Check](#Balance)
3. [Baseline and a Randomly Selected Compound](#Baseline)
4. [Chlorine Usage outcome variables](#Chlorine)
5. [Graph of outcomes by Treatment Arm](#Graph)  -->

In [1]:
from datascience import *
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
from pandas import read_stata
from ipyleaflet import Map, basemaps, Marker, AwesomeIcon



## Mapping

<div id="Mapping"></div>


This first section works with a package in Jupyter called ipyleaflet.

`ipyleaflet`;
the documentation is [here](https://ipyleaflet.readthedocs.io/en/latest/)
and it is worth a short read through if you are interested.


We want to use two different base maps - one is a satellite layer and oen is the Open Street Map layer.  

We will start by reading in a dataset of the coordinates of the springs that are used in the WaterGuard Promotion (WGP) study.  These springs were randomized into seven different treatment arms.  The springs are identified by a unique numerical id tag, and the common name in the local language.  


In [2]:
springsGPS = Table.read_table('WGPgps_forData8.csv')
springsGPS

a2_spring_id,a3_spring_name,treatment_arm,gpsn1,gpse1,gps1all
1010,JIKAZE SOWETO,6,0.39985,34.49,".39985,34.49003"
1013,ROHO SAFI,4,0.398933,34.4898,".3989333,34.48985"
1014,OTWOMA,6,0.339083,34.4355,".3390833,34.4355"
1015,MUKOYA,5,0.358317,34.4412,".3583167,34.44122"
1021,OKELLO,6,0.355983,34.4314,".3559833,34.43142"
5001,NAKHALIRO A,6,0.439167,34.3992,".4391667,34.39919"
5002,NAKHALIRO 'B',4,0.439033,34.4019,".4390333,34.40192"
5004,OSIMBO,1,0.411583,34.3644,".4115833,34.36442"
5007,TANDE,7,0.433667,34.4325,".4336667,34.43255"
5008,MUKABANA,6,0.408483,34.4591,".4084833,34.45912"


In [3]:
# make a table wth just the North and East Gps columns 
locations = springsGPS.select("gpsn1", "gpse1")
locations

gpsn1,gpse1
0.39985,34.49
0.398933,34.4898
0.339083,34.4355
0.358317,34.4412
0.355983,34.4314
0.439167,34.3992
0.439033,34.4019
0.411583,34.3644
0.433667,34.4325
0.408483,34.4591


Where in the world are we?

First of all lets look at the mean for the Lat and Long and we can center our map there


In [4]:

mean_longitude = springsGPS.column('gpse1').mean()
mean_latitude = springsGPS.column('gpsn1').mean()

print("Mean of 'gpse1':", mean_longitude)
print("Mean of 'gpsn1':", mean_latitude)


Mean of 'gpse1': 34.4179509091
Mean of 'gpsn1': 0.402962309091


The code cell below should display a map. However, it may not run the first time you click it - if this happens, try running all the cells above this one and then refreshing your browser. After a few refreshes, the maps should load.  




In [5]:

center = [0.4, 34.4]
zoom = 12
basemap=basemaps.Esri.WorldImagery
layout={'width': '800px', 'height': '600px'}

Map(basemap=basemap, center=center, zoom=zoom, layout=layout)

Map(center=[0.4, 34.4], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_…

Lets make a map of our sample sites ( springs) 

In [7]:
m = Map(basemap=basemap, center=center, zoom=zoom, layout=layout)

# Iterate through the rows in the dataset
for row in springsGPS.rows:
    latitude = row.item('gpsn1')
    longitude = row.item('gpse1')
    marker = Marker(location=(latitude, longitude))
    m.add_layer(marker)

m


Map(center=[0.4, 34.4], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_…

Now the most interesting bit of data is still not being used, the Treatment Arm. Let's assign different colors to the different treatment arms so that when we map it we can see if the arms appear to be randomly distributed.

The following is function assigns the 7 different treatment arms to a set of colors. [Here](https://www.w3.org/TR/css3-color/#html4) is the colors reference if you are interested!  


In [8]:
def color(arm):
    if arm == 1:
        return 'black'
    elif arm == 2:
        return 'red'
    elif arm == 3:
        return 'purple'
    elif arm == 4:
        return 'green'
    elif arm == 5:
        return 'blue'
    elif arm == 6:
        return 'pink'
    elif arm == 7:
        return 'orange'

In [9]:
# Using the .apply method, you can apply any function to a data frame
colors = springsGPS.apply(color, "treatment_arm")
springsGPS = springsGPS.with_column("color", colors)
springsGPS

a2_spring_id,a3_spring_name,treatment_arm,gpsn1,gpse1,gps1all,color
1010,JIKAZE SOWETO,6,0.39985,34.49,".39985,34.49003",pink
1013,ROHO SAFI,4,0.398933,34.4898,".3989333,34.48985",green
1014,OTWOMA,6,0.339083,34.4355,".3390833,34.4355",pink
1015,MUKOYA,5,0.358317,34.4412,".3583167,34.44122",blue
1021,OKELLO,6,0.355983,34.4314,".3559833,34.43142",pink
5001,NAKHALIRO A,6,0.439167,34.3992,".4391667,34.39919",pink
5002,NAKHALIRO 'B',4,0.439033,34.4019,".4390333,34.40192",green
5004,OSIMBO,1,0.411583,34.3644,".4115833,34.36442",black
5007,TANDE,7,0.433667,34.4325,".4336667,34.43255",orange
5008,MUKABANA,6,0.408483,34.4591,".4084833,34.45912",pink


In [10]:

m = Map( center=center, zoom=zoom, layout=layout)

for row in springsGPS.rows:
    latitude = row.item('gpsn1')
    longitude = row.item('gpse1')
    color = row.item('color')
    
    marker = Marker(
        location=(latitude, longitude),
        draggable=False,  # Set to True if you want to make the markers draggable
        title=color,      # Set the marker title to the color for tooltip
        alt=color         # Set the alt text to the color
    )
    
    # Apply the specified color to the marker
    marker.icon = AwesomeIcon(name='circle', marker_color=color)
    
    m.add_layer(marker)

m


Map(center=[0.4, 34.4], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_…

In [11]:

m=Map(basemap=basemap, center=center, zoom=zoom, layout=layout)

for row in springsGPS.rows:
    latitude = row.item('gpsn1')
    longitude = row.item('gpse1')
    color = row.item('color')
    
    marker = Marker(
        location=(latitude, longitude),
        draggable=False,  # Set to True if you want to make the markers draggable
        title=color,      # Set the marker title to the color for tooltip
        alt=color         # Set the alt text to the color
    )
    
    marker.icon = AwesomeIcon(name='circle', marker_color=color)
    
    m.add_layer(marker)

m

Map(center=[0.4, 34.4], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_…

Do the colors seem randomly distributed?

In fact, the randomization was performed on just a list of the springs using a random number generator. 
It did not take spatial distribution into effect.  
