# Enrich Local

If you have ArcGIS Pro with the Business Analyst installed and demographic data for at least one country installed, you can use the `get_countries` and `Country.enrich_variables` function to introspectively retrieve variables to use in the [`arcpy.ba.Enrich` function](https://pro.arcgis.com/en/pro-app/latest/tool-reference/business-analyst/enrich-layer-advanced.htm).

## Imports and Setup

In [None]:
import json
from pathlib import Path

from arcgis.features import GeoAccessor
from arcgis.geometry import Polygon
from arcgis.geoenrichment import get_countries, Country  # specific geoenrichment imports
from arcgis.gis import GIS
import arcpy
import pandas as pd

In [None]:
# paths to common data locations - NOTE: to convert any path to a raw string, simply use str(path_instance)
dir_prj = Path.cwd().parent
dir_data = dir_prj/'data'
dir_raw = dir_data/'raw'
dir_int = dir_data/'interim'

## Discover Available Countries

The first step is discovering available countries since most demographic data is organized by country. In the Python API 1.9.1 release the `as_df` parameter has been added to enable discovering available countries as a Pandas Dataframe for ease of discovery and filtering.

In [None]:
# create a GIS object referencing the local ArcGIS Pro instance
gis = GIS('Pro')

# use this ArcGIS Pro GIS instance as input
cntry_df = get_countries(gis, as_df=True)

cntry_df

### Create a Country Object

Next, just as in the first notebook, we create a country object for the United States.

In [None]:
# create a USA country object, again using the ArcGIS Pro GIS instance
usa = Country('USA', gis=gis)

usa

### Current Year Key Variables

Again, we can identify a subset of variables to use, current year key variables.

In [None]:
# retrieve the locally available variables
ev = usa.enrich_variables

# filter to current year key variables
kv = ev[
    (ev.name.str.endswith('CY'))
    & (ev.data_collection.str.lower().str.contains('key'))
].reset_index(drop=True)

kv

## Load Data to Enrich

Now, before enriching, we need something to enrich. We are going to use [H3](https://h3geo.org/) level nine hexagon polygons with the associated identifier covering Olympia, WA. The data is stored in a flat CSV file with the geometry saved as Esri JSON. When read with `pd.read_csv`, the column with geometries is recognized as a string Series. To work with it we need to convert all the strings to proper polygon Geometry objects, and tell the GeoAccessor (`spatial`) to recognize the column (`set_geometry`) 

In [None]:
h3_csv = dir_raw/'h3_olympia.csv'

# read in the data from a csv file with geometries
h3_df = pd.read_csv(h3_csv, index_col=0)

# because the geometry is stored as a JSON string, need to convert to geometry
h3_df.SHAPE = h3_df.SHAPE.apply(lambda geom: Polygon(json.loads(geom)))

# once the geometries are created, we need to tell the GeoAccessor (spatial) to recognize them
h3_df.spatial.set_geometry('SHAPE')

print(h3_df.info())
h3_df.head()

### Convert to a Feature Class

Since ArcGIS Pro geoprocessing tools cannot accept a Pandas Dataframe as input, we conver the Spatially enabled Dataframe to a feature class in RAM.

In [None]:
# convert this to a feature class in RAM so it can be used with ArcGIS Pro GeoProcessing tools
h3_fc = h3_df.spatial.to_featureclass('memory/h3_tmp')

h3_fc

## Enrich

Yes, now we can enrich the data with the demographic factors we identified using introspection and filtering above. We are using the temporary feature class as input, and also outputting to _another_ temporary feature class. This is so we can load the results back into a Dataframe where it is easy to clean up the schema a little before saving the final result.

In [None]:
enrich_fc = arcpy.ba.EnrichLayer(
    in_features=h3_fc,                     # use temporary feature class as input
    out_feature_class=f'memory/h3_cy_kv',  # store in memory so can manipulate
    variables=list(kv.enrich_name)         # create list from series
)[0]                                       # first item from result object

enrich_fc

## Convert to a Dataframe Schema Cleanup

After exporting, we can load the results from the temporary feature class into a Pandas Dataframe to clean up the schema.

In [None]:
enrich_df = GeoAccessor.from_featureclass(enrich_fc)

print(enrich_df.info())
enrich_df.head()

### Keep Only Needed Columns

By introspectively looking at the inputs we can create a list of only what we want in the output.

In [None]:
drop_cols = [c for c in enrich_df.columns if not
    (c in kv.enrich_field_name.values           # enrich variables' output field names
     or c in h3_df.columns                      # input data column names
     or c.lower() == 'hasdata')                 # if row received demographc factors
]
enrich_df.drop(columns=drop_cols, inplace=True)

enrich_df.info()

## Clean up Column Names

We also can use a short helper function with a list comprehension to retrieve just the variable name.

In [None]:
def lookup_column(col_nm):
    lookup_fltr = ev.enrich_field_name.str.contains(col_nm)
    if any(lookup_fltr):
        col_nm = ev[lookup_fltr]['name'].iloc[0]
    return col_nm

enrich_df.columns = [lookup_column(c) for c in enrich_df.columns]

print(enrich_df.info())

## Save Results

Finally, the results can be saved using the `pd.spatial.to_featureclass` method, or alternately, if desired, the results can also be saved back to a CSV file. However, the geometry objects must be deserialized back to strings before exporting. This is accomplished using the `Geometry.JSON` method.

In [None]:
# convert geometry objects to strings
enrich_df[enrich_df.spatial.name] = enrich_df[enrich_df.spatial.name].apply(lambda geom: geom.JSON)

# save to interim data diretory for further analysis
dir_int.mkdir(exist_ok=True)
enrich_df.to_csv(dir_int/'enrich_olympia_local.csv')