### Active Data Resource Interface Objects (ADRIOs) Phase 1

The ADRIO system is being developed as a way to simplify the process of assembling GEOs as part of the simulation workflow. ADRIOs are derived from an abstract class and each exists to interface with a specific resource and extract specific data from that resource (doing any necessary filtering and transformation along the way). Future implementations will utilize ADRIO templates for different data sources, but only a generalized outline exists for now. The ADRIOs created in this phase are used to satisfy an example use case. That is:

I am a modeler working on a disease with human-to-human transmission and I believe certain social determinants of health are a significant factor in disease outcomes. I want to use EpiMoRPH to model a metapopulation at the county level in the US Southwest region. My existing disease data, that I hope to use for parameter fitting, comes from 2015. I need the following data for each county, filtering to Arizona, New Mexico, Nevada, Utah, and Colorado:

- Each county’s name and state.
- Geographic centroid.
- Population in 2015, total as well as aggregated into age brackets: under 20, 20-to-64, and 65 and older.
- Population density.
- Median household annual income.
- The [Dissimilarity Index](https://www.phenxtoolkit.org/protocols/view/211403) of racial segregation for selected minority groups (currently African Americans) calculated from the census tracts comprising each county.


All data for this use case comes from Census tables and/or shape files. As such, a US Census API key is required to download much of it. These ADRIOs are configured to search for a key in an environment variable named "CENSUS_API_KEY". Please make sure you have an [API key](https://api.census.gov/data/key_signup.html) and have assigned it to an environment variable with this name if running the code yourself.

### GEO Assembly

ADRIOs being able to individually fetch data is great, but we also need a more streamlined way to create a GEO object using data from a whole set of ADRIOs. That's where GEOSpecs come in. GEOSpecs are objects containing a list of geographic nodes and a list of ADRIOSpec objects, which are simply used to specify which ADRIOs to call on by their class name.

The below code creates a GEOSpec object through its initializer and serializes it to a .geo file using a predefined serialize method. With this file created, an entry can be added to the geo_library in epymorph.data.__init__, mapping the name of the Geo (in this case us_sw_counties_2015) to a string file path pointing to its .geo file.

In [2]:
from epymorph.adrio.adrio import GEOSpec, ADRIOSpec, serialize
from epymorph.adrio.adrio_census import Granularity

spec = GEOSpec('us_sw_counties_2015', Granularity.COUNTY.value,
               {'state': ['04','08','49','35','32'], 'county': ['*'], 'tract': ['*']}, 
                                       ADRIOSpec('NameAndState'),      # label ADRIO
                                      [ADRIOSpec('GEOID'),
                                       ADRIOSpec('Centroid'),
                                       ADRIOSpec('MedianIncome'),
                                       ADRIOSpec('Population'),
                                       ADRIOSpec('PopulationByAge'),
                                       ADRIOSpec('PopDensityKm2'),
                                       ADRIOSpec('DissimilarityIndex')], 2015)

serialize(spec, f'epymorph/data/geo/{spec.id}.geo')

With that done, the new Geo can be passed into the run command of the main epymorph program the same way existing .py Geo files have been. The only difference being that, instead of creating a Geo through the load function of a .py file, the Geo is created through a GEOBuilder object which uses a GEOSpec object to build a Geo through ADRIOs.

The below code shows the run command being executed with an ADRIO-based Geo passed in. You can see the Geo's creation process execute after requirments are loaded.

In [2]:
from epymorph.run import run

"""
toml file contents:

ipm = 'no'
mm = 'no'
geo = 'us_sw_counties_2015'
start_date = '2010-01-01'
duration = '150d'

[params]
theta = 0.1
move_control = 0.9
infection_duration = 4.0
immunity_duration = 90.0
infection_seed_loc = 0
infection_seed_size = 10000
"""

exit_code = run(
    input_path="scratch/adrio_test.toml",
    out_path= None,
    chart=None,
    profiling= False,
    engine_id=None
)

Loading requirements:
[✓] IPM (no)
[✓] MM (no)
[✓] Geo (us_sw_counties_2015)

Running simulation (BasicEngine):
• 2010-01-01 to 2010-05-31 (150 days)
• 158 geo nodes
|####################| 100% 
Runtime: 1.377s
Done


### Individual ADRIOS

In [3]:
from epymorph.adrio.census.name_and_state import NameAndState

adrio = NameAndState(spec=spec) # County and state name for all counties

data = adrio.fetch()

print(data[:10])

['Apache County, Arizona' 'Cochise County, Arizona'
 'Coconino County, Arizona' 'Gila County, Arizona'
 'Graham County, Arizona' 'Greenlee County, Arizona'
 'La Paz County, Arizona' 'Maricopa County, Arizona'
 'Mohave County, Arizona' 'Navajo County, Arizona']


In [4]:
from epymorph.adrio.census.geoid import GEOID

adrio = GEOID(spec=spec) # GEOID for all counties

data = adrio.fetch()

print(data[:10])

['04001' '04003' '04005' '04007' '04009' '04011' '04012' '04013' '04015'
 '04017']


In [1]:
from epymorph.adrio.census.centroid import Centroid

adrio = Centroid(spec=spec) # Geographic centroid for all counties

data = adrio.fetch()

print(data[:10])

[(-109.488846  , 35.39561637) (-109.75114065, 31.87956519)
 (-111.77051224, 35.83872946) (-110.8117366 , 33.79975225)
 (-109.88743768, 32.93271631) (-109.24009746, 33.21535711)
 (-113.98133797, 33.72926405) (-112.49123307, 33.34883422)
 (-113.75790897, 35.70406048) (-110.32140293, 35.39953935)]


In [1]:
from epymorph.adrio.census.population import Population

adrio = Population(spec=spec)  # Total population for each county

data = adrio.fetch()

print(data[:10])

[  72124  129647  136701   53165   37407    9023   20335 4018143  203362
  107656]


In [2]:
from epymorph.adrio.census.population_by_age import PopulationByAge

adrio = PopulationByAge(spec=spec)  # Aggregated population for each county

data = adrio.fetch()

print(data[:10])

[[  22828   39037    9793]
 [  31434   71109   25723]
 [  34188   86420   15400]
 [  11621   26653   14018]
 [  11086   21271    4807]
 [   2779    5183    1125]
 [   3822    8872    7283]
 [1071519 2342120  562347]
 [  41094  105003   54455]
 [  32138   57385   17347]]


In [3]:
from epymorph.adrio.census.median_income import MedianIncome

adrio = MedianIncome(spec=spec)  # Median income for each county

data = adrio.fetch()

print(data[:10])

[31757 45075 50234 39751 45964 51628 34466 54229 38488 35921]


In [2]:
from epymorph.adrio.census.pop_density_km2 import PopDensityKm2

adrio = PopDensityKm2(spec=spec) # Population density for each county

data = adrio.fetch()

print(data[:10])

[  2.   8.   3.   4.   3.   2.   2. 169.   6.   4.]


In [2]:
from epymorph.adrio.census.dissimilarity_index import DissimilarityIndex

adrio = DissimilarityIndex(spec=spec)    # Dissimilarity index for Afrian Americans for each county

data = adrio.fetch()

print(data[:10])

[0.68003176 0.44684799 0.40285892 0.69179817 0.52144034 0.24977493
 0.88432116 0.40776111 0.53031332 0.6586828 ]


### WIP ADRIOS

The next four cells are used to demonstrate work in progress ADRIOs for the Maricopa county cbg 2019 geo. They fetch data at the block group or tract level which is not yet easy to implement into the current system. These ADRIOs do not have caching functionality, but rather fetch and output their data directly.

In [5]:
from epymorph.adrio.adrio import GEOSpec, ADRIOSpec, serialize
from epymorph.adrio.adrio_census import Granularity

maricopa = GEOSpec('maricopa_cbg_2019', Granularity.CBG.value,
               {'state': ['04'], 'county': ['013'], 'tract': ['*'], 'block group': ['*']}, 
                                       ADRIOSpec('NameAndState'),      # label ADRIO
                                      [ADRIOSpec('GEOID'),
                                       ADRIOSpec('Centroid'),
                                       ADRIOSpec('MedianIncome'),
                                       ADRIOSpec('Population'),
                                       ADRIOSpec('PopulationByAge'),
                                       ADRIOSpec('PopulationByAgex6'),
                                       ADRIOSpec('PopDensityKm2'),
                                       ADRIOSpec('MedianAge'),
                                       ADRIOSpec('AverageHouseholdSize'),
                                       ADRIOSpec('GiniIndex')], 2015)

serialize(maricopa, f'epymorph/data/geo/{maricopa.id}.geo')

In [6]:
from epymorph.adrio.census.median_age import MedianAge

adrio = MedianAge(spec=maricopa)

data = adrio.fetch()

print(data[:10])

[34.1 43.4 46.8 43.6 36.1 36.8 30.7 31.8 37.  39.3]


In [2]:
from epymorph.adrio.census.average_household_size import AverageHouseholdSize

adrio = AverageHouseholdSize(spec=maricopa)

data = adrio.fetch()

print(data[:10])

[3.41 4.1  3.07 ... 1.7  1.67 2.23]


In [5]:
from epymorph.adrio.census.population_by_age_x6 import PopulationByAgex6

adrio = PopulationByAgex6(spec=maricopa)

data = adrio.fetch()

print(data[:10])

[[764 481 614 328 137 170]
 [959 501 685 373  51  93]
 [668 471 643 326 112 179]
 [737 216 676 431 426 428]
 [327 496 329 388  80  28]
 [387 285 461 329 232 224]
 [816 249 705 201  89 109]
 [944 593 929 270 180 244]
 [246 154 384 235 102  34]
 [151 164 373 326 237 326]]


In [4]:
from epymorph.adrio.census.gini_index import GiniIndex

adrio = GiniIndex(spec=maricopa)

data = adrio.fetch()

print(data[:10])

[0.4423 0.5439 0.5185 0.4983 0.5608 0.3515 0.4376 0.4137 0.3742 0.3793]


### Caching
In order to prevent long wait times for data retrieval every time the simulation runs, data fetched by ADRIOs are cached in CSV files located in a .cache folder in the adrio directory. Once data has been fetched and cached once, future retrieval of that data is instant. A CLI command also exists to fetch and cache data from ADRIOs without running the simulation, allowing users to pre-load or refresh their cache at any time. This is demonstrated below.

In [2]:
from epymorph.cache import cache_geo

# "cache" subcommand demo
exit_code = cache_geo(
    'us_sw_counties_2015',  # name of Geo
    True                    # -f arg: determines whether to overwrite existing data
)

Fetching GEO data from ADRIOs...
Fetching name_and_state
Fetching geoid
Fetching centroid
Fetching median_income
Fetching population
Fetching population_by_age
Fetching pop_density_km2
Fetching dissimilarity_index
...done
Data successfully cached
