# Import the libraries and data

In [2]:
import pandas as pd
import SaniPath as sp

hh = pd.read_csv('rsrc/HOUSEHOLD_EXAMPLE.csv')
cc = pd.read_csv('rsrc/COMMUNITY_EXAMPLE.csv')
sc = pd.read_csv('rsrc/SCHOOL_EXAMPLE.csv')
col = pd.read_csv('rsrc/SAMPLE_EXAMPLE.csv')
lab = pd.read_csv('rsrc/LAB_EXAMPLE.csv')

# Make sure all R libraries are installed

Conda should handle most of the dependencies, but this can confirm.

In [3]:
sp.RSetup('./r-requirements.txt') # wherever this r-requirements.txt file sits

Everything is installed!


<SaniPath.RSetup instance at 0x7f351afe8170>

# Setup the analysis class

Setup the analysis class instance with several things we need to know.  Not all are necessary all the time, but conducting analysis and generating a report, all are required. Otherwise, default values will be filled in. 

In [99]:
reload(sp)
analysis = sp.Analysis(
                 r_dir='./',
                 plot_dir = './plots/',
                 analysis_type = 'school',
                 # used for analysis ----
                 pathway_codes = {'p' : 2,
                                    'dw' : 3,
                                    'o' : 4,
                                    'l' : 7},
                 pathway_labels = {'p' : 'Produce',
                                 'dw' : 'Municipal and Drain Water',
                                 'o' : 'Ocean Water',
                                 'l' : 'Public Latrine'},
                 neighborhood_mapping = {'Jamaica Plain' : 1,
                                         'Brigthon' : 2},
                 # report specific arguments ----
                 city_name = 'Atlanta, GA',
                 lab_name = 'Bill Nye, Inc',
                 start_date = '2017-01-01',
                 lab_MF = False,
                 language = "English",
                 freq_thresh= 50
                      )


Attaching package: ‘purrr’



    accumulate, when



    set_names



    compact




Add data to the analysis object using the add_data() method.  Acceptable options match the forms we collect.  household, school, community, sample, and lab.

In [100]:
analysis.add_data('household', hh)
analysis.add_data('school', sc)
analysis.add_data('community', cc)
analysis.add_data('sample', col)
analysis.add_data('lab', lab)



# Frequency calculations (Pie Charts)

Use `compute_frequencies()` to calculate.  This depends on what `self.analysis_type` is set to at `__init__()`.  To check, look at analysis.analysis_type

In [67]:
print(analysis.analysis_type)
analysis.analysis_type = 'school'

combined


In [68]:
pie_chart_data = analysis.compute_frequencies()

Convert to and from R and Python using R's JSON reader.  It's much easier than trying to convert other ways.  We will need to store the results for caching purposes

In [69]:
print(analysis.get_frequencies())

[1] "[\n {\n \"sample\": 4,\n\"age\": \"Children\",\n\"neighborhood\": \"Neighborhood 1\",\n\"data\": [      1,      1,      1,      1,      1,      1,      1,      1,      1,      1,      1,      1,      1,      1,      1,      1,      2,      2,      2,      2,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      3,      4,      4,      4,      4,      4,      4,      5,      5,      5,      5,      5,      5,      5,      5,      5,      5,      5,      5,      5,      5,      5 ],\n\"plot_name\": \"Neighborhood 1, 4\\nChildren (N= 75)\",\n\"s\": \"o\",\n\"neighb\": 1,\n\"pop\": \"c\",\n\"analysis_type\": \"school\",\n\"fn\": \"./plots/pie_1_o_c_school.png\" \n},\n{\n \"sample\": 4,\n\"age\": \"Adults\",\n\"neighborhood\": \"Neighborhood 1\",\n\"data\": [      1,      1,

Import it back into the analysis class using `set_frequencies()`

In [72]:
analysis.set_frequencies(analysis.get_frequencies())

Each analysis type has these methods attached to them. 

# Concentrations

In [89]:
analysis.compute_concentrations()

In [84]:
print(analysis.concentrations)

[[1]]
[[1]]$s
[1] "p"

[[1]]$neighb
[1] 1

[[1]]$sample
[1] "Produce"

[[1]]$neighborhood
[1] "Neighborhood 1"

[[1]]$data
[1]    5 7894    5   NA  123

[[1]]$plot_name
[1] "Neighborhood 1, Produce\n(N=5)"

[[1]]$fn
[1] "./plots/hist_1_p.png"


[[2]]
[[2]]$s
[1] "l"

[[2]]$neighb
[1] 1

[[2]]$sample
[1] "Public Latrine"

[[2]]$neighborhood
[1] "Neighborhood 1"

[[2]]$data
[1] 5 5 5 5 5 5

[[2]]$plot_name
[1] "Neighborhood 1, Public Latrine\n(N=6)"

[[2]]$fn
[1] "./plots/hist_1_l.png"


[[3]]
[[3]]$s
[1] "dw"

[[3]]$neighb
[1] 1

[[3]]$sample
[1] "Municipal and Drain Water"

[[3]]$neighborhood
[1] "Neighborhood 1"

[[3]]$data
[1]  0.50 24.20 12.10  8.75  2.00  1.00  0.50

[[3]]$plot_name
[1] "Neighborhood 1, Municipal and Drain Water\n(N=7)"

[[3]]$fn
[1] "./plots/hist_1_dw.png"





In [85]:
x = analysis.get_concentrations()
analysis.set_concentrations(x)

# Exposure

Exposure calculations combine the other two types of analysis and simulate contamination based on the data collected.  It's pretty cool, but can take a while if running on a single core machine.  This is why we want to cache results. If you don't want to run in parallel set `parallel = False`, otherwise R will use as many cores as the machine has. 

In [None]:
analysis.compute_exposures()

# Report

In [95]:
analysis.compute_report()