# Enrich Local

If you have ArcGIS Pro with the Business Analyst installed and demographic data for at least one country installed, you can use the `get_countries` and `Country.enrich_variables` function to introspectively retrieve variables to use in the [`arcpy.ba.Enrich` function](https://pro.arcgis.com/en/pro-app/latest/tool-reference/business-analyst/enrich-layer-advanced.htm).

## Imports and Setup

In [1]:
import json
from pathlib import Path

from arcgis.features import GeoAccessor
from arcgis.geometry import Polygon
from arcgis.geoenrichment import get_countries, Country  # specific geoenrichment imports
from arcgis.gis import GIS
import arcpy
import pandas as pd

In [2]:
# paths to common data locations - NOTE: to convert any path to a raw string, simply use str(path_instance)
dir_prj = Path.cwd().parent
dir_data = dir_prj/'data'
dir_raw = dir_data/'raw'
dir_int = dir_data/'interim'

## Discover Available Countries

The first step is discovering which countries are available since most demographic data is organized by country. In the Python API 1.9.1 release the `as_df` parameter has been added to enable discovering available countries as a Pandas Dataframe for ease of discovery and filtering.

In [3]:
# create a GIS object referencing the local ArcGIS Pro instance
gis = GIS('Pro')

# use this ArcGIS Pro GIS instance as input
cntry_df = get_countries(gis, as_df=True)

cntry_df

Unnamed: 0,iso2,iso3,country_name,vintage,country_id,data_source_id
0,CA,CAN,Canada,2020,CAN_ESRI_2019,LOCAL;;CAN_ESRI_2019
1,JP,JPN,Japan,2020,JAPAN2020,LOCAL;;JAPAN2020
2,US,USA,United States,2019,USA_ESRI_2019,LOCAL;;USA_ESRI_2019
3,US,USA,United States,2020,USA_ESRI_2020,LOCAL;;USA_ESRI_2020
4,US,USA,United States,2021,USA_ESRI_2021,LOCAL;;USA_ESRI_2021


### Create a Country Object

Next, just as in the first notebook, we create a country object for the United States.

In [4]:
# create a USA country object, again using the ArcGIS Pro GIS instance
usa = Country('USA', gis=gis)

usa

<Country - United States 2021 ('local')>

### Current Year Key Variables

Also, just like in the previous notebook, we can identify a subset of variables to use, current year key variables.

In [5]:
# retrieve the locally available variables
ev = usa.enrich_variables

# filter to current year key variables
kv = ev[
    (ev.name.str.endswith('CY'))
    & (ev.data_collection.str.lower().str.contains('key'))
].reset_index(drop=True)

kv

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,TOTPOP_CY,2021 Total Population,KeyUSFacts,KeyUSFacts.TOTPOP_CY,KeyUSFacts_TOTPOP_CY
1,GQPOP_CY,2021 Group Quarters Population,KeyUSFacts,KeyUSFacts.GQPOP_CY,KeyUSFacts_GQPOP_CY
2,DIVINDX_CY,2021 Diversity Index,KeyUSFacts,KeyUSFacts.DIVINDX_CY,KeyUSFacts_DIVINDX_CY
3,TOTHH_CY,2021 Total Households,KeyUSFacts,KeyUSFacts.TOTHH_CY,KeyUSFacts_TOTHH_CY
4,AVGHHSZ_CY,2021 Average Household Size,KeyUSFacts,KeyUSFacts.AVGHHSZ_CY,KeyUSFacts_AVGHHSZ_CY
5,MEDHINC_CY,2021 Median Household Income,KeyUSFacts,KeyUSFacts.MEDHINC_CY,KeyUSFacts_MEDHINC_CY
6,AVGHINC_CY,2021 Average Household Income,KeyUSFacts,KeyUSFacts.AVGHINC_CY,KeyUSFacts_AVGHINC_CY
7,PCI_CY,2021 Per Capita Income,KeyUSFacts,KeyUSFacts.PCI_CY,KeyUSFacts_PCI_CY
8,TOTHU_CY,2021 Total Housing Units,KeyUSFacts,KeyUSFacts.TOTHU_CY,KeyUSFacts_TOTHU_CY
9,OWNER_CY,2021 Owner Occupied HUs,KeyUSFacts,KeyUSFacts.OWNER_CY,KeyUSFacts_OWNER_CY


## Load Data to Enrich

Now, before enriching, we need something to enrich. We are going to use [H3](https://h3geo.org/) level nine hexagon polygons with the associated identifier covering Olympia, WA. The data is stored in a flat CSV file with the geometry saved as Esri JSON. When read with `pd.read_csv`, the column with geometries is recognized as a string Series. To work with it we need to convert all the strings to proper polygon Geometry objects, and tell the GeoAccessor (`spatial`) to recognize the column (`set_geometry`) 

In [6]:
h3_csv = dir_raw/'h3_olympia.csv'

# read in the data from a csv file with geometries
h3_df = pd.read_csv(h3_csv, index_col=0)

# because the geometry is stored as a JSON string, we need to convert it to a proper polygon geometry object
h3_df.SHAPE = h3_df.SHAPE.apply(lambda geom: Polygon(json.loads(geom)))

# once the geometries are created, we need to tell the GeoAccessor (spatial) to recognize them
h3_df.spatial.set_geometry('SHAPE')

print(h3_df.info())
h3_df.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 625 entries, 0 to 624
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   h3_09   625 non-null    object  
 1   SHAPE   625 non-null    geometry
dtypes: geometry(1), object(1)
memory usage: 14.6+ KB
None


Unnamed: 0,h3_09,SHAPE
0,8928d590297ffff,"{""rings"": [[[-122.94740574999997, 47.024184694..."
1,8928d5902b3ffff,"{""rings"": [[[-122.95158429099996, 47.028134405..."
2,8928d591533ffff,"{""rings"": [[[-122.92062630199996, 47.028123088..."
3,8928d5916afffff,"{""rings"": [[[-122.87055727599994, 47.031318454..."
4,8928d591433ffff,"{""rings"": [[[-122.89294779499994, 47.020566091..."


### Convert to a Feature Class

Since ArcGIS Pro geoprocessing tools cannot accept a Pandas Dataframe as input, we conver the Spatially enabled Dataframe to a feature class in RAM.

In [7]:
# convert this to a feature class in RAM so it can be used with ArcGIS Pro GeoProcessing tools
h3_fc = h3_df.spatial.to_featureclass('memory/h3_tmp')

h3_fc

'memory\\h3_tmp'

## Enrich

Yes, now we can enrich the data with the demographic factors we identified using introspection and filtering above. We are using the temporary feature class as input, and also outputting to _another_ temporary feature class. This is so we can load the results back into a Dataframe where it is easy to clean up the schema a little before saving the final result.

In [8]:
enrich_fc = arcpy.ba.EnrichLayer(
    in_features=h3_fc,                     # use the temporary feature class as input
    out_feature_class=f'memory/h3_cy_kv',  # store in memory so can manipulate before saving
    variables=list(kv.enrich_name)         # create a list from the enrich variable series
)[0]                                       # specifying the first item from the result object, which is the output

enrich_fc

'memory\\h3_cy_kv'

## Convert to a Dataframe Schema Cleanup

After exporting, we can load the results from the temporary feature class into a Pandas Dataframe to clean up the schema.

In [9]:
enrich_df = GeoAccessor.from_featureclass(enrich_fc)

print(enrich_df.info())
enrich_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 625 entries, 0 to 624
Data columns (total 25 columns):
 #   Column                 Non-Null Count  Dtype   
---  ------                 --------------  -----   
 0   OBJECTID               625 non-null    int64   
 1   h3_09                  625 non-null    object  
 2   HasData                625 non-null    int64   
 3   aggregationMethod      625 non-null    object  
 4   KeyUSFacts_TOTPOP_CY   625 non-null    float64 
 5   KeyUSFacts_GQPOP_CY    625 non-null    float64 
 6   KeyUSFacts_DIVINDX_CY  625 non-null    float64 
 7   KeyUSFacts_TOTHH_CY    625 non-null    float64 
 8   KeyUSFacts_AVGHHSZ_CY  625 non-null    float64 
 9   KeyUSFacts_MEDHINC_CY  625 non-null    float64 
 10  KeyUSFacts_AVGHINC_CY  625 non-null    float64 
 11  KeyUSFacts_PCI_CY      625 non-null    float64 
 12  KeyUSFacts_TOTHU_CY    625 non-null    float64 
 13  KeyUSFacts_OWNER_CY    625 non-null    float64 
 14  KeyUSFacts_RENTER_CY   625 non-null    flo

Unnamed: 0,OBJECTID,h3_09,HasData,aggregationMethod,KeyUSFacts_TOTPOP_CY,KeyUSFacts_GQPOP_CY,KeyUSFacts_DIVINDX_CY,KeyUSFacts_TOTHH_CY,KeyUSFacts_AVGHHSZ_CY,KeyUSFacts_MEDHINC_CY,...,KeyUSFacts_VACANT_CY,KeyUSFacts_MEDVAL_CY,KeyUSFacts_AVGVAL_CY,KeyUSFacts_POPGRW10CY,KeyUSFacts_HHGRW10CY,KeyUSFacts_FAMGRW10CY,KeyUSFacts_DPOP_CY,KeyUSFacts_DPOPWRK_CY,KeyUSFacts_DPOPRES_CY,SHAPE
0,1,8928d590297ffff,0,BlockApportionment:US.BlockGroups,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{""rings"": [[[-122.94740574999997, 47.024184694..."
1,2,8928d5902b3ffff,1,BlockApportionment:US.BlockGroups,382.0,0.0,31.1,160.0,2.39,100000.0,...,5.0,398913.0,431803.0,0.76,0.7,0.56,286.0,96.0,190.0,"{""rings"": [[[-122.95158429099996, 47.028134405..."
2,3,8928d591533ffff,0,BlockApportionment:US.BlockGroups,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{""rings"": [[[-122.92062630199996, 47.028123088..."
3,4,8928d5916afffff,1,BlockApportionment:US.BlockGroups,59.0,1.0,43.6,23.0,2.52,85043.0,...,0.0,342857.0,369444.0,0.31,0.4,0.62,64.0,37.0,27.0,"{""rings"": [[[-122.87055727599994, 47.031318454..."
4,5,8928d591433ffff,1,BlockApportionment:US.BlockGroups,154.0,0.0,31.4,60.0,2.57,85889.0,...,0.0,370000.0,425962.0,0.41,0.46,0.47,129.0,48.0,81.0,"{""rings"": [[[-122.89294779499994, 47.020566091..."


### Keep Only Needed Columns

By introspectively looking at the inputs we can create a list of only what we want in the output.

In [10]:
drop_cols = [c for c in enrich_df.columns if not
    (c in kv.enrich_field_name.values           # enrich variables' output field names
     or c in h3_df.columns                      # input data column names
     or c.lower() == 'hasdata')                 # useful column to know if was enriched based on apportionment
]
enrich_df.drop(columns=drop_cols, inplace=True)

enrich_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 625 entries, 0 to 624
Data columns (total 23 columns):
 #   Column                 Non-Null Count  Dtype   
---  ------                 --------------  -----   
 0   h3_09                  625 non-null    object  
 1   HasData                625 non-null    int64   
 2   KeyUSFacts_TOTPOP_CY   625 non-null    float64 
 3   KeyUSFacts_GQPOP_CY    625 non-null    float64 
 4   KeyUSFacts_DIVINDX_CY  625 non-null    float64 
 5   KeyUSFacts_TOTHH_CY    625 non-null    float64 
 6   KeyUSFacts_AVGHHSZ_CY  625 non-null    float64 
 7   KeyUSFacts_MEDHINC_CY  625 non-null    float64 
 8   KeyUSFacts_AVGHINC_CY  625 non-null    float64 
 9   KeyUSFacts_PCI_CY      625 non-null    float64 
 10  KeyUSFacts_TOTHU_CY    625 non-null    float64 
 11  KeyUSFacts_OWNER_CY    625 non-null    float64 
 12  KeyUSFacts_RENTER_CY   625 non-null    float64 
 13  KeyUSFacts_VACANT_CY   625 non-null    float64 
 14  KeyUSFacts_MEDVAL_CY   625 non-null    flo

## Clean up Column Names

We also can use a short helper function with a list comprehension to retrieve just the variable name.

In [11]:
def lookup_column(col_nm):
    lookup_fltr = ev.enrich_field_name.str.contains(col_nm)
    if any(lookup_fltr):
        col_nm = ev[lookup_fltr]['name'].iloc[0]
    return col_nm

enrich_df.columns = [lookup_column(c) for c in enrich_df.columns]

print(enrich_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 625 entries, 0 to 624
Data columns (total 23 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   h3_09       625 non-null    object  
 1   HasData     625 non-null    int64   
 2   TOTPOP_CY   625 non-null    float64 
 3   GQPOP_CY    625 non-null    float64 
 4   DIVINDX_CY  625 non-null    float64 
 5   TOTHH_CY    625 non-null    float64 
 6   AVGHHSZ_CY  625 non-null    float64 
 7   MEDHINC_CY  625 non-null    float64 
 8   AVGHINC_CY  625 non-null    float64 
 9   PCI_CY      625 non-null    float64 
 10  TOTHU_CY    625 non-null    float64 
 11  OWNER_CY    625 non-null    float64 
 12  RENTER_CY   625 non-null    float64 
 13  VACANT_CY   625 non-null    float64 
 14  MEDVAL_CY   625 non-null    float64 
 15  AVGVAL_CY   625 non-null    float64 
 16  POPGRW10CY  625 non-null    float64 
 17  HHGRW10CY   625 non-null    float64 
 18  FAMGRW10CY  625 non-null    float64 
 19  DPOP_CY 

## Save Results

Finally, the results can be saved using the `pd.spatial.to_featureclass` method, or alternately, if desired, the results can also be saved back to a CSV file. However, the geometry objects must be deserialized back to strings before exporting. This is accomplished using the `Geometry.JSON` method.

In [12]:
# convert geometry objects to strings
enrich_df[enrich_df.spatial.name] = enrich_df[enrich_df.spatial.name].apply(lambda geom: geom.JSON)

# save to interim data diretory for further analysis
dir_int.mkdir(exist_ok=True)
enrich_df.to_csv(dir_int/'enrich_olympia_local.csv')