### Active Data Resource Interface Objects (ADRIOs)

The ADRIO system is being developed as a way to simplify the process of assembling GEOs as part of the simulation workflow. ADRIOs are derived from an abstract class and each exists to interface with a specific resource and extract specific data from that resource (doing any necessary filtering and transformation along the way). ADRIOs utilize templates for different data sources which contain the core logic for data retrieval from that source. The ADRIOs created thus far support attributes used in two pre-existing Geos, and all utilize a US Census template. The first set were created with the following use case in mind:

I am a modeler working on a disease with human-to-human transmission and I believe certain social determinants of health are a significant factor in disease outcomes. I want to use EpiMoRPH to model a metapopulation at the county level in the US Southwest region. My existing disease data, that I hope to use for parameter fitting, comes from 2015. I need the following data for each county, filtering to Arizona, New Mexico, Nevada, Utah, and Colorado:

- Each county’s name and state.
- Geographic centroid.
- Population in 2015, total as well as aggregated into age brackets: under 20, 20-to-64, and 65 and older.
- Population density.
- Median household annual income.
- The [Dissimilarity Index](https://www.phenxtoolkit.org/protocols/view/211403) of racial segregation for selected minority groups (currently African Americans) calculated from the census tracts comprising each county.


All data for this use case comes from Census tables and/or shape files. As such, a US Census API key is required to download much of it. These ADRIOs are configured to search for a key in an environment variable named "CENSUS_API_KEY". Please make sure you have an [API key](https://api.census.gov/data/key_signup.html) and have assigned it to an environment variable with this name if running the code yourself.

### GEO Assembly

Assembly of whole Geos is done through GEOSpecs. GEOSpecs are objects containing an ID, year, granularity level, list of geographic nodes, and a list of ADRIOSpec objects, which are simply used to specify which ADRIOs to call on by their class name.

The below code creates a GEOSpec object through its initializer and serializes it to a .geo file using a predefined serialize method. With this file created, an entry can be added to the geo_library in epymorph.data.__init__, mapping the name of the Geo (in this case us_sw_counties_2015) to a string file path pointing to its .geo file.

In [4]:
from epymorph.adrio.adrio import GEOSpec, ADRIOSpec, serialize
from epymorph.adrio.census.adrio_census import Granularity

spec = GEOSpec('us_sw_counties_2015', Granularity.COUNTY.value,
               {'state': ['04','08','49','35','32'], 'county': ['*'], 'tract': ['*'], 'block group': ['*']}, 
                                       ADRIOSpec('Name'),      # label ADRIO
                                      [ADRIOSpec('GEOID'),
                                       ADRIOSpec('Centroid'),
                                       ADRIOSpec('MedianIncome'),
                                       ADRIOSpec('Population'),
                                       ADRIOSpec('PopulationByAge'),
                                       ADRIOSpec('PopDensityKm2'),
                                       ADRIOSpec('DissimilarityIndex')], 2015)

serialize(spec, f'epymorph/data/geo/{spec.id}.geo')

With that done, the new Geo can be passed into the run command of the main epymorph program the same way existing .py Geo files have been. The only difference being that, instead of creating a Geo through the load function of a .py file, the Geo is created through a GEOBuilder, which uses a GEOSpec object to build a Geo through ADRIOs.

The below code shows the run command being executed with an ADRIO-based Geo passed in. You can see the Geo's creation process execute after requirments are loaded.

In [2]:
from epymorph.run import run

"""
toml file contents:

ipm = 'sirs'
mm = 'centroids'
geo = 'us_sw_counties_2015'
start_date = '2010-01-01'
duration = '150d'

[init]
initializer = "single_location"
location = 0
seed_size = 10000

[params]
beta = 0.4
gamma = 0.25 # 1/4
xi = 0.111 # 1/90
phi = 40.0
theta = 0.1
move_control = 0.9
infection_duration = 4.0
immunity_duration = 90.0
"""

exit_code = run(
    input_path="scratch/adrio_test.toml",
    out_path= None,
    chart=None,
    profiling= False,
    engine_id=None
)

Loading requirements:
[✓] IPM (sirs)
[✓] MM (centroids)
[✓] Geo (us_sw_counties_2015)
Fetching GEO data from ADRIOs...
Fetching name
Fetching geoid
Fetching centroid
Fetching median_income
Fetching population
Fetching population_by_age
Fetching pop_density_km2
Fetching dissimilarity_index
...done

Running simulation (BasicEngine):
• 2010-01-01 to 2010-05-31 (150 days)
• 158 geo nodes
|####################| 100% 
Runtime: 14.473s
Done


### Individual ADRIOS

In [3]:
from epymorph.adrio.census.name import Name

adrio = Name(spec=spec) # Name of state, county, tract, or block group

data = adrio.fetch()

print(data[:10])

['Arizona' 'Colorado' 'Nevada' 'New Mexico' 'Utah']


In [4]:
from epymorph.adrio.census.geoid import GEOID

adrio = GEOID(spec=spec) # GEOID

data = adrio.fetch()

print(data[:10])

['04' '08' '32' '35' '49']


In [5]:
from epymorph.adrio.census.centroid import Centroid

adrio = Centroid(spec=spec) # Geographic centroid

data = adrio.fetch()

print(data[:10])

[( -99.81080001, 41.52714193) (-120.44686425, 47.38096155)
 (-106.10837877, 34.42136519) (-100.23049794, 44.4361533 )
 ( -85.2904514 , 37.52666029) ( -83.44633796, 32.64922284)
 ( -92.43926114, 34.899739  ) ( -77.7995569 , 40.8738297 )
 ( -89.66520805, 32.75086061) (-105.54781457, 38.99854554)]


In [3]:
from epymorph.adrio.census.population import Population

adrio = Population(spec=spec)  # Total population

data = adrio.fetch()

print(data[:10])

[1512 4699 5999 5916 4382 3999 7294 3771 4938 5043]


In [4]:
from epymorph.adrio.census.population_by_age import PopulationByAge

adrio = PopulationByAge(spec=spec)  # Aggregated population

data = adrio.fetch()

print(data[:10])

[[4788  790  203]
 [5098 2499  743]
 [5640 3453  570]
 [5899 3214  462]
 [5863 2044  297]
 [5125 2273  418]
 [6329 3854  759]
 [5013 2105  485]
 [5321 2687  721]
 [5055 3092  738]]


In [5]:
from epymorph.adrio.census.median_income import MedianIncome

adrio = MedianIncome(spec=spec)  # Median income

data = adrio.fetch()

print(data[:10])

[20972 26676 32250 24667 33438 25353 24985 22552 19271 41102]


In [3]:
from epymorph.adrio.census.pop_density_km2 import PopDensityKm2

adrio = PopDensityKm2(spec=spec) # Population density

data = adrio.fetch()

print(data[:10])

[  2.   8.   3.   4.   3.   2.   2. 169.   6.   4.]


In [7]:
from epymorph.adrio.census.dissimilarity_index import DissimilarityIndex

adrio = DissimilarityIndex(spec=spec)    # Dissimilarity index for Afrian Americans

data = adrio.fetch()

print(data[:10])

[0.5        0.5        0.21551724 0.11406844 0.8778626  0.62809917
 0.37078652 0.27702703 0.5        0.88292683]


### MARICOPA ADRIOS

The remaining cells are used to demonstrate ADRIOs for additional attributes used in the Maricopa county cbg 2019 geo.

In [2]:
from epymorph.adrio.adrio import GEOSpec, ADRIOSpec, serialize
from epymorph.adrio.census.adrio_census import Granularity

maricopa = GEOSpec('maricopa_cbg_2019', Granularity.CBG.value,
               {'state': ['04'], 'county': ['013'], 'tract': ['*'], 'block group': ['*']}, 
                                       ADRIOSpec('Name'),      # label ADRIO
                                      [ADRIOSpec('GEOID'),
                                       ADRIOSpec('Centroid'),
                                       ADRIOSpec('MedianIncome'),
                                       ADRIOSpec('TractMedianIncome'),
                                       ADRIOSpec('Population'),
                                       ADRIOSpec('PopulationByAge'),
                                       ADRIOSpec('PopulationByAgex6'),
                                       ADRIOSpec('PopDensityKm2'),
                                       ADRIOSpec('MedianAge'),
                                       ADRIOSpec('AverageHouseholdSize'),
                                       ADRIOSpec('GiniIndex')], 2015)

serialize(maricopa, f'epymorph/data/geo/{maricopa.id}.geo')

In [3]:
from epymorph.run import run

"""
toml file contents:

ipm = 'sirs'
mm = 'centroids'
geo = 'maricopa_cbg_2019'
start_date = '2010-01-01'
duration = '150d'

[init]
initializer = "single_location"
location = 0
seed_size = 100

[params]
beta = 0.4
gamma = 0.25 # 1/4
xi = 0.111 # 1/90
phi = 40.0
theta = 0.1
move_control = 0.9
infection_duration = 4.0
immunity_duration = 90.0
"""

exit_code = run(
    input_path="scratch/maricopa_adrio_test.toml",
    out_path= None,
    chart=None,
    profiling= False,
    engine_id=None
)

Loading requirements:
[✓] IPM (sirs)
[✓] MM (centroids)
[✓] Geo (maricopa_cbg_2019)
Fetching GEO data from ADRIOs...
Fetching name
Fetching geoid
Fetching centroid
Fetching median_income
Fetching tract_median_income
Fetching population
Fetching population_by_age
Fetching population_by_age_x6
Fetching pop_density_km2
Fetching median_age
Fetching average_household_size
Fetching gini_index
Gini Index cannot be retrieved for block group level, fetching tract level data instead.
...done

Running simulation (BasicEngine):
• 2010-01-01 to 2010-05-31 (150 days)
• 2505 geo nodes
|                    | 0% 

KeyboardInterrupt: 

In [5]:
from epymorph.adrio.census.tract_median_income import TractMedianIncome

adrio = TractMedianIncome(spec=maricopa)

data = adrio.fetch()

print(data[:10])

[106818 106818 106818 132822 132822 132822  97188  97188  97188  76069]


In [6]:
from epymorph.adrio.census.median_age import MedianAge

adrio = MedianAge(spec=maricopa)

data = adrio.fetch()

print(data[:10])

[44 58 71 58 63 66 65 62 69 66]


In [7]:
from epymorph.adrio.census.average_household_size import AverageHouseholdSize

adrio = AverageHouseholdSize(spec=maricopa)

data = adrio.fetch()

print(data[:10])

[2 2 2 2 2 1 2 1 2 2]


In [8]:
from epymorph.adrio.census.population_by_age_x6 import PopulationByAgex6

adrio = PopulationByAgex6(spec=maricopa)

data = adrio.fetch()

print(data[:10])

[[ 354  265  520  333   31   30]
 [ 529   58  454  305  432  399]
 [  78  145   73  195  354  493]
 [1475   40  237  366  209    0]
 [2218   17  271  316  515  346]
 [1492   26   62  453  511  514]
 [ 592  120  280  170  441  253]
 [ 642  105  236  292  602  158]
 [1991   16  199  132  243  452]
 [1489   13   32  143  152   88]]


In [9]:
from epymorph.adrio.census.gini_index import GiniIndex

adrio = GiniIndex(spec=maricopa)

data = adrio.fetch()

print(data[:10])

Gini Index cannot be retrieved for block group level, fetching tract level data instead.
[0.4226 0.4226 0.4226 0.5013 0.5013 0.5013 0.5136 0.5136 0.5136 0.4608]


In [1]:
from epymorph.adrio.adrio import GEOSpec, ADRIOSpec, serialize
from epymorph.adrio.census.adrio_census import Granularity

pei = GEOSpec('pei', Granularity.STATE.value,
               {'state': ['12', '13', '45', '37', '51', '24'], 'county': ['*'], 'tract': ['*'], 'block group': ['*']}, 
                                       ADRIOSpec('Name'),      # label ADRIO
                                      [ADRIOSpec('GEOID'),
                                       ADRIOSpec('Population'),
                                       ADRIOSpec('Commuters')], 2015)

serialize(pei, f'epymorph/data/geo/{pei.id}.geo')

In [3]:
from epymorph.run import run

"""
toml file contents:

ipm = 'sirs'
mm = 'centroids'
geo = 'maricopa_cbg_2019'
start_date = '2010-01-01'
duration = '150d'

[init]
initializer = "single_location"
location = 0
seed_size = 100

[params]
beta = 0.4
gamma = 0.25 # 1/4
xi = 0.111 # 1/90
phi = 40.0
theta = 0.1
move_control = 0.9
infection_duration = 4.0
immunity_duration = 90.0
"""

exit_code = run(
    input_path="scratch/pei_adrio_test.toml",
    out_path= None,
    chart=None,
    profiling= False,
    engine_id=None
)

Loading requirements:
[✓] IPM (pei)
[✓] MM (pei)
[✓] Geo (pei)
Fetching GEO data from ADRIOs...
Fetching name
Fetching geoid
Fetching population
Fetching commuters
Commuting data cannot be retrieved for 2016, fetching 2015 data instead.


KeyboardInterrupt: 

### Caching
In order to prevent long wait times for data retrieval every time the simulation runs, data fetched by ADRIOs are cached in CSV files located in a .cache folder in the adrio directory. Once data has been fetched and cached once, future retrieval of that data is instant. A CLI command also exists to fetch and cache data from ADRIOs without running the simulation, allowing users to pre-load or refresh their cache at any time. This is demonstrated below.

In [2]:
from epymorph.cache import cache_geo

# "cache" subcommand demo
exit_code = cache_geo(
    'us_sw_counties_2015',  # name of Geo
    True                    # -f arg: determines whether to overwrite existing data
)

Fetching GEO data from ADRIOs...
Fetching name_and_state
Fetching geoid
Fetching centroid
Fetching median_income
Fetching population
Fetching population_by_age
Fetching pop_density_km2
Fetching dissimilarity_index
...done
Data successfully cached
