### Active Data Resource Interface Objects (ADRIOs) Phase 1

The ADRIO system is being developed as a way to simplify the process of assembling GEOs as part of the simulation workflow. ADRIOs are derived from an abstract class and each exists to interface with a specific resource and extract specific data from that resource (doing any necessary filtering and transformation along the way). Future implementations will utilize ADRIO templates for different data sources, but only a generalized outline exists for now. The ADRIOs created in this phase are used to satisfy an example use case. That is:

I am a modeler working on a disease with human-to-human transmission and I believe certain social determinants of health are a significant factor in disease outcomes. I want to use EpiMoRPH to model a metapopulation at the county level in the US Southwest region. My existing disease data, that I hope to use for parameter fitting, comes from 2015. I need the following data for each county, filtering to Arizona, New Mexico, Nevada, Utah, and Colorado:

- Each county’s name and state.
- Geographic centroid.
- Population in 2015, total as well as aggregated into age brackets: under 20, 20-to-64, and 65 and older.
- Population density.
- Median household annual income.
- The [Dissimilarity Index](https://www.phenxtoolkit.org/protocols/view/211403) of racial segregation for selected minority groups (currently African Americans) calculated from the census tracts comprising each county.


All data for this use case comes from Census tables and/or shape files. As such, a US Census API key is required to download much of it. These ADRIOs are configured to search for a key in an environment variable named "CENSUS_API_KEY". Please make sure you have an [API key](https://api.census.gov/data/key_signup.html) and have assigned it to an environment variable with this name if running the code yourself.

The following 6 cells demonstrate the execution of each ADRIO. Each will fetch one of the required data fields and return it as a numpy array.

In [1]:
from epymorph.adrio.uscounties.name_state import NameState

adrio = NameState() # County and state name for all counties

adrio.fetch(nodes=['04','08','49','35','32'])   # Parameters are US state fips codes of the states relevant to the use case

array([('Apache County', 'Arizona'), ('Cochise County', 'Arizona'),
       ('Coconino County', 'Arizona'), ('Gila County', 'Arizona'),
       ('Graham County', 'Arizona'), ('Greenlee County', 'Arizona'),
       ('La Paz County', 'Arizona'), ('Maricopa County', 'Arizona'),
       ('Mohave County', 'Arizona'), ('Navajo County', 'Arizona'),
       ('Pima County', 'Arizona'), ('Pinal County', 'Arizona'),
       ('Santa Cruz County', 'Arizona'), ('Yavapai County', 'Arizona'),
       ('Yuma County', 'Arizona'), ('Adams County', 'Colorado'),
       ('Alamosa County', 'Colorado'), ('Arapahoe County', 'Colorado'),
       ('Archuleta County', 'Colorado'), ('Baca County', 'Colorado'),
       ('Bent County', 'Colorado'), ('Boulder County', 'Colorado'),
       ('Broomfield County', 'Colorado'), ('Chaffee County', 'Colorado'),
       ('Cheyenne County', 'Colorado'),
       ('Clear Creek County', 'Colorado'), ('Conejos County', 'Colorado'),
       ('Costilla County', 'Colorado'), ('Crowley County', '

In [2]:
from epymorph.adrio.uscounties.geographic_centroid import GeographicCentroid

adrio = GeographicCentroid() # Geographic centroid for all counties

adrio.fetch(nodes=['04','08','49','35','32'])

array([(-109.48884599933983, 35.39561636599861),
       (-109.75114065250403, 31.879565188531135),
       (-111.77051224094002, 35.83872946080435),
       (-110.81173660151235, 33.7997522507843),
       (-109.88743768095468, 32.93271630691242),
       (-109.24009746315183, 33.21535710506059),
       (-113.98133797066632, 33.729264046343694),
       (-112.49123306907562, 33.34883422171268),
       (-113.75790896737138, 35.70406048391267),
       (-110.32140292874823, 35.39953935366874),
       (-111.78994937703347, 32.097379013985574),
       (-111.34471387413656, 32.904309067925425),
       (-110.84657115896117, 31.526000975809886),
       (-112.55386387278789, 34.59988143368379),
       (-113.90556383569123, 32.76940433982402),
       (-104.33780076076727, 39.87364017150574),
       (-105.7882881969286, 37.572893140220934),
       (-104.33923329624285, 39.64977515970664),
       (-107.04832928059125, 37.19359631731574),
       (-102.56047061511788, 37.31918275625277),
       (-103.071

In [3]:
from epymorph.adrio.uscounties.pop_by_age import PopByAge

adrio = PopByAge()  # Total and aggregated population for each county

adrio.fetch(nodes=['04','08','49','35','32'])

array([[  72124,   94853,   39037,    9793],
       [ 129647,  160983,   71109,   25723],
       [ 136701,  170797,   86420,   15400],
       [  53165,   64698,   26653,   14018],
       [  37407,   48396,   21271,    4807],
       [   9023,   11708,    5183,    1125],
       [  20335,   24067,    8872,    7283],
       [4018143, 5089575, 2342120,  562347],
       [ 203362,  244365,  105003,   54455],
       [ 107656,  139693,   57385,   17347],
       [ 998537, 1235152,  571814,  176505],
       [ 389772,  491732,  213039,   71407],
       [  47073,   61376,   24843,    7493],
       [ 215996,  256932,  109803,   60913],
       [ 202987,  260257,  108575,   35287],
       [ 471206,  608531,  283642,   47230],
       [  16269,   20907,    9362,    2179],
       [ 608310,  766163,  370490,   74167],
       [  12174,   14507,    6792,    2936],
       [   3701,    4538,    1844,     883],
       [   5895,    6855,    3948,     907],
       [ 310032,  380479,  198693,   37825],
       [  

In [4]:
from epymorph.adrio.uscounties.median_income import MedianIncome

adrio = MedianIncome()  # Median income for each county

adrio.fetch(nodes=['04','08','49','35','32'])

array([ 31757,  45075,  50234,  39751,  45964,  51628,  34466,  54229,
        38488,  35921,  46162,  49477,  40140,  44748,  40743,  58946,
        32395,  63265,  46646,  38000,  36791,  70961,  81898,  51092,
        52554,  67710,  36652,  31321,  31151,  35000,  42452,  53637,
        31875, 102964,  72214,  84963,  58206,  40423,  56590,  65670,
        63628,  48071,  57083,  31715,  46014,  70164,  40304,  40603,
        45367,  60278,  59805,  45067,  39005,  42319,  49322,  48125,
        51387,  43553,  43999,  48450,  32311,  61624,  56969,  43639,
        71196,  40179,  41286,  61842,  39672,  64963,  33393,  36324,
        56047,  44191,  67983,  62372,  45541,  60572,  43105,  47415,
        51575,  58535,  71799,  39271,  60250,  65212,  78190,  44866,
        47255,  38923,  41712,  45230,  64832,  52870,  57122,  47668,
        47725,  42973,  40630,  34565,  32380,  41084,  32500,  38853,
        56618,  38311,  30772,  33393,  34444,  57533,  40708, 101934,
      

In [4]:
from epymorph.adrio.uscounties.population_density import PopulationDensity

adrio = PopulationDensity() # Population density for each county

adrio.fetch(nodes=['04','08','49','35','32'])

array([5.28692100e+04, 2.23520820e+05, 1.04520450e+05, 2.50154400e+04,
       2.41477960e+05, 7.40999200e+04, 6.19574500e+04, 6.78552980e+05,
       6.89592800e+04, 8.44467800e+04, 4.15889960e+05, 6.88951780e+05,
       2.83511400e+04, 8.96778900e+04, 6.24446730e+05, 5.40825500e+04,
       1.49218900e+05, 1.11238200e+04, 4.39653500e+04, 1.31700320e+05,
       1.04421790e+05, 3.22776600e+04, 1.87579530e+05, 3.93443200e+04,
       5.55578600e+04, 1.52356900e+04, 5.12514400e+04, 1.94867900e+04,
       7.90406000e+03, 2.20520800e+04, 1.78802700e+04, 1.49174310e+06,
       4.50612898e+06, 1.98392858e+07, 1.73629883e+06, 4.24518437e+06,
       7.46586360e+05, 1.95004130e+05, 1.35006270e+05, 1.68861500e+05,
       5.85669300e+04, 2.24926410e+05, 4.20602400e+04, 1.08895030e+05,
       7.79573900e+04, 3.82759500e+04, 4.39154070e+05, 1.83107673e+06,
       1.29004108e+06, 9.26154021e+06, 1.00641107e+06, 2.90530600e+05,
       3.81978160e+05, 7.09460150e+05, 4.45330320e+05, 1.14857162e+06,
      

In [1]:
from epymorph.adrio.uscounties.dissimilarity_index import DissimilarityIndex

adrio = DissimilarityIndex()    # Dissimilarity index for Afrian Americans for each county

adrio.fetch(nodes=['04','08','49','35','32'])

array([0.68003176, 0.44684799, 0.40285892, 0.69179817, 0.52144034,
       0.24977493, 0.88432116, 0.40776111, 0.53031332, 0.6586828 ,
       0.37770735, 0.36856729, 0.55784034, 0.43801342, 0.4827098 ,
       0.49315278, 0.28441274, 0.44235339, 0.0719905 , 0.43002257,
       0.5       , 0.41866453, 0.32166128, 0.68511961, 0.5       ,
       0.26354931, 0.02506282, 0.5       , 0.5       , 0.5       ,
       0.64242635, 0.5015027 , 0.5       , 0.37743064, 0.62734804,
       0.20978738, 0.35290738, 0.75390631, 0.51001692, 0.5       ,
       0.48713279, 0.4387428 , 0.5       , 0.59486448, 0.5       ,
       0.44625259, 0.5       , 0.31192687, 0.30592824, 0.41344022,
       0.45581216, 0.3497871 , 0.38196315, 0.46731988, 0.4276926 ,
       0.5       , 0.45848437, 0.61679691, 0.38568168, 0.34913581,
       0.35886043, 0.5       , 0.5       , 0.19352399, 0.44769519,
       0.26031307, 0.45319754, 0.16109129, 0.2579738 , 0.57603175,
       0.43745428, 0.5       , 0.65620755, 0.5       , 0.48899

### GEO Assembly

ADRIOs being able to individually fetch data is great, but we also need a more streamlined way to create a GEO object using data from a whole set of ADRIOs. That's where GEOSpecs come in. GEOSpecs are objects containing a list of geographic nodes and a list of ADRIOSpec objects, which are simply used to specify which ADRIOs to call on by their class name.

A simple function exists to create a GEOSpec by passing in an ID, a list of geographic nodes, and a list of ADRIO class names. Once created, this GEOSpec will be serialized to a file named according to the ID provided. (i.e. {ID}.jsonpickle)

In [5]:
from epymorph.adrio.adrio_driver import create_geo_spec

create_geo_spec('UseCase', ['04','08','49','35','32'], 
                ['NameState', 'GeographicCentroid', 'MedianIncome', 'PopByAge', 'PopulationDensity', 'DissimilarityIndex'])

Once a GEOSpec has been created, it can be passed into the following function and used to create a GEO. GEO objects contain 3 fields: nodes (number of nodes), node labels, and data (organized into a dictionary). For county level data, the nodes field counts the total number of counties in all states and labels them with "county name, state name". The way these fields are populated is not yet generalized and requires the "NameState" ADRIO to run. Aside from that, any combination of ADRIOs and state fips can be used. The data field's dictionary maps the output array of each ADRIO to a key named after that ADRIO's attribute.

An example GEO for the use case (and its massive amount of data) can be seen below.

In [2]:
from epymorph.adrio.adrio_driver import create_geo

create_geo('UseCase.jsonpickle')

Geo(nodes=158, labels=['Apache County, Arizona', 'Cochise County, Arizona', 'Coconino County, Arizona', 'Gila County, Arizona', 'Graham County, Arizona', 'Greenlee County, Arizona', 'La Paz County, Arizona', 'Maricopa County, Arizona', 'Mohave County, Arizona', 'Navajo County, Arizona', 'Pima County, Arizona', 'Pinal County, Arizona', 'Santa Cruz County, Arizona', 'Yavapai County, Arizona', 'Yuma County, Arizona', 'Adams County, Colorado', 'Alamosa County, Colorado', 'Arapahoe County, Colorado', 'Archuleta County, Colorado', 'Baca County, Colorado', 'Bent County, Colorado', 'Boulder County, Colorado', 'Broomfield County, Colorado', 'Chaffee County, Colorado', 'Cheyenne County, Colorado', 'Clear Creek County, Colorado', 'Conejos County, Colorado', 'Costilla County, Colorado', 'Crowley County, Colorado', 'Custer County, Colorado', 'Delta County, Colorado', 'Denver County, Colorado', 'Dolores County, Colorado', 'Douglas County, Colorado', 'Eagle County, Colorado', 'Elbert County, Colorado