# Example - Enrich from Previously Enriched Data

Now, with the ability to introspectively retrieve available enrichment variables as a Pandas DataFrame, we can retrieve columns from previously enriched data. In this case, we have access to some data saved as a comma separated values file. 

**NOTE:** This example uses a local instance of ArcGIS Pro with Business Analyst and USA data. Consequently, to run this example, you will need to fulfill these requirements.

## Imports

Of particular note are the `arcgis.geoenrichment` imports.

In [1]:
import importlib
import json
import os
from pathlib import Path
import sys

from arcgis.features import GeoAccessor, FeatureLayer
from arcgis.geoenrichment import get_countries, Country  # geoenrichment imports
from arcgis.geometry import Geometry
from arcgis.geometry.filters import contains
from arcgis.gis import GIS
import arcpy
import pandas as pd

## Initial Setup

This creates a few `Path` objects to where data is stored, and ensures we can overwrite previous results.

In [2]:
# paths to common data locations - NOTE: to convert any path to a raw string, simply use str(path_instance)
dir_prj = Path.cwd().parent
dir_data = dir_prj/'data'
dir_raw = dir_data/'raw'
dir_int = dir_data/'interim'

# ensure multiple runs do not cause problems
arcpy.env.overwriteOutput = True

## Load Previously Enriched Data

This data was created by saving the results of GeoEnrichment to a comma-separated values file. For this workflow, we really do not need the geometry, the `SHAPE` column. However, I included the steps required to correctly recognize this data as geometry for reference if you want to use this pattern.

In [3]:
template_df = pd.read_csv(dir_raw/'previously_enriched.csv', index_col=0)

# optional for this workflow, but necessary if you want to do something else with the goeometry
template_df['SHAPE'] = template_df['SHAPE'].apply(lambda geom: Geometry(json.loads(geom)))
template_df.spatial.set_geometry('SHAPE')
# GeoAccessor.read_featureclass('C:\path\to\feature\class')

template_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 36 columns):
 #   Column                 Non-Null Count  Dtype   
---  ------                 --------------  -----   
 0   geoid                  5 non-null      int64   
 1   HasData                5 non-null      int64   
 2   aggregationMethod      5 non-null      object  
 3   KeyUSFacts_TOTPOP_CY   5 non-null      float64 
 4   KeyUSFacts_GQPOP_CY    5 non-null      float64 
 5   KeyUSFacts_DIVINDX_CY  5 non-null      float64 
 6   KeyUSFacts_TOTHH_CY    5 non-null      float64 
 7   KeyUSFacts_AVGHHSZ_CY  5 non-null      float64 
 8   KeyUSFacts_MEDHINC_CY  5 non-null      float64 
 9   KeyUSFacts_AVGHINC_CY  5 non-null      float64 
 10  KeyUSFacts_PCI_CY      5 non-null      float64 
 11  KeyUSFacts_TOTHU_CY    5 non-null      float64 
 12  KeyUSFacts_OWNER_CY    5 non-null      float64 
 13  KeyUSFacts_RENTER_CY   5 non-null      float64 
 14  KeyUSFacts_VACANT_CY   5 non-null      float64

## Create a Country Object Instance

Next, we create a `Country` object instance to work with. We instruct the `Country` to use the local source by providing a `GIS` instance created using the `pro` keyword. This requires ArcGIS Pro to have the Business Analyst extension licensed and the USA dataset installed.

In [4]:
gis = GIS('pro')
usa = Country('USA', gis=gis)

usa

<Country - United States 2021 ('local')>

## Retrieve Geoenichment Variables

Retrieve the enrichment variables for ArcGIS Pro as a Pandas DataFrame. These columns support a large variety of data selection and matching options.

column | purpose
--- | ---
name | uniquely identifies each data variable - can be duplicates due to being included in multiple data collections, but is the same variable - used for online enrichment
alias | brief and more comprehendable variable description
data_collection | groupings of variables for a variety of purposes
enrich_name | string used for performing geoenrichment using the Enrich Layer geoprocessing tool
enrich_field_name | name of enrich output fields when using the Enrich Layer tool

In [5]:
ev = usa.enrich_variables

ev.head()

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,CHILD_CY,2021 Child Population,AgeDependency,AgeDependency.CHILD_CY,AgeDependency_CHILD_CY
1,WORKAGE_CY,2021 Working-Age Population,AgeDependency,AgeDependency.WORKAGE_CY,AgeDependency_WORKAGE_CY
2,SENIOR_CY,2021 Senior Population,AgeDependency,AgeDependency.SENIOR_CY,AgeDependency_SENIOR_CY
3,CHLDDEP_CY,2021 Child Dependency Ratio,AgeDependency,AgeDependency.CHLDDEP_CY,AgeDependency_CHLDDEP_CY
4,AGEDEP_CY,2021 Age Dependency Ratio,AgeDependency,AgeDependency.AGEDEP_CY,AgeDependency_AGEDEP_CY


## Filter Enrich Variables to Just Those From Previously Enriched Data

Using the column names from the previously enriched data, we can take advantage of the power of Pandas DataFrames to quickly prune the available geoenrichment variables to just those found in the previously enriched data. Please note the structure of the filter, we are using a pipe (`|`) to enable an *or* selelction. If the previously enriched data was created using the GeoEnrichment REST endpoint (ArcGIS Online or ArcGIS Enterprise), the columns names will match values in the `name` column. If the previously enriched data was created using the Enrich Layer geoprocessing tool in ArcGIS Pro, the column names will match values in the `enrich_field_name` column. The structure below accounts for this, finding matches using both patterns. 

In [6]:
sv = ev[
    (ev.enrich_field_name.isin(template_df.columns))  # if data was enriched using Enrich Layer in ArcGIS Pro
    | (ev.name.isin(template_df.columns))  # if data was enriched using GeoEnrichment REST endpoint (includes geoenrichment.enrich method)
].reset_index(drop=True)

sv

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,TOTPOP_CY,2021 Total Population,KeyUSFacts,KeyUSFacts.TOTPOP_CY,KeyUSFacts_TOTPOP_CY
1,GQPOP_CY,2021 Group Quarters Population,KeyUSFacts,KeyUSFacts.GQPOP_CY,KeyUSFacts_GQPOP_CY
2,DIVINDX_CY,2021 Diversity Index,KeyUSFacts,KeyUSFacts.DIVINDX_CY,KeyUSFacts_DIVINDX_CY
3,TOTHH_CY,2021 Total Households,KeyUSFacts,KeyUSFacts.TOTHH_CY,KeyUSFacts_TOTHH_CY
4,AVGHHSZ_CY,2021 Average Household Size,KeyUSFacts,KeyUSFacts.AVGHHSZ_CY,KeyUSFacts_AVGHHSZ_CY
5,MEDHINC_CY,2021 Median Household Income,KeyUSFacts,KeyUSFacts.MEDHINC_CY,KeyUSFacts_MEDHINC_CY
6,AVGHINC_CY,2021 Average Household Income,KeyUSFacts,KeyUSFacts.AVGHINC_CY,KeyUSFacts_AVGHINC_CY
7,PCI_CY,2021 Per Capita Income,KeyUSFacts,KeyUSFacts.PCI_CY,KeyUSFacts_PCI_CY
8,TOTHU_CY,2021 Total Housing Units,KeyUSFacts,KeyUSFacts.TOTHU_CY,KeyUSFacts_TOTHU_CY
9,OWNER_CY,2021 Owner Occupied HUs,KeyUSFacts,KeyUSFacts.OWNER_CY,KeyUSFacts_OWNER_CY


## New Study Area

From ArcGIS Online, we can retrieve a new study to work with - Portland, OR.

In [7]:
# create a layer to access named places through an anonymous connection to ArcGIS Online
places_id = '13ea1fb24ca14842bb265e6ec6ac1d46'
places_lyr = GIS().content.get(places_id).layers[0]

# retrieve named place (city) - Portland, OR
sql_str = "NAME LIKE '%Portland%' AND State_Name = 'Oregon'"
pdx_df = places_lyr.query(sql_str, out_fields=['GEOID', 'NAME', 'State_Name'], out_sr=4326).sdf

pdx_df

Unnamed: 0,OBJECTID,GEOID,NAME,State_Name,SHAPE
0,3298,4159000,Portland city,Oregon,"{""rings"": [[[-122.564776452646, 45.46045947276..."


## Retrieve Block Groups in the Study Area

Using the geometry from the above retreived study area, a `Polygon` geometry object, we can use it as a filter to retrive all the block groups in the study area. These are what we are going to enrich.

In [8]:
# create a filter object to use with the query
pdx_fltr = contains(pdx_df.iloc[0][pdx_df.spatial.name], sr=4326)

# create a feature layer connecting to the census server block group layer
bg_id = 'd1105f1e65a743cc84fc12c034625fc7'
bg_lyr = gis.content.get(bg_id).layers[0]

# use the filter created above to retrieve the block groups contained in the named place
bg_df = bg_lyr.query(geometry_filter=pdx_fltr, out_fields=['GEOID'], out_sr=4326).sdf

print(bg_df.info())
bg_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 447 entries, 0 to 446
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   OBJECTID  447 non-null    int64   
 1   GEOID     447 non-null    object  
 2   SHAPE     447 non-null    geometry
dtypes: geometry(1), int64(1), object(1)
memory usage: 10.6+ KB
None


Unnamed: 0,OBJECTID,GEOID,SHAPE
0,72951,410510064033,"{""rings"": [[[-122.731105475247, 45.44711945483..."
1,72953,410510065011,"{""rings"": [[[-122.734135478327, 45.46041145564..."
2,73026,410510001023,"{""rings"": [[[-122.653010466564, 45.46504146431..."
3,73027,410510001024,"{""rings"": [[[-122.666498469333, 45.46493846278..."
4,73028,410510062002,"{""rings"": [[[-122.697364473799, 45.46453145975..."


## Enrich

Now, using the values from the `enrich_name` column of the filtered enrich varaibles DataFrame (`sv`), we can enrich the block groups in the new area of interest using the Enrich Layer geoprocessing tool. Since it is frequently useful to have the results for review and subsequent analsyis in a Pandas DataFrame, we save the output straight to RAM (`memory`), and convert it to a Spatially Enabled DataFrame using the `GeoAccessor`.

In [9]:
# convert this to a feature class in RAM so it can be used with ArcGIS Pro GeoProcessing tools
bg_fc = bg_df.spatial.to_featureclass('memory/bg_tmp')

# run enrich layer geoprocessing tool
enrich_fc = arcpy.ba.EnrichLayer(
    in_features=bg_fc,                     # use temporary feature class as input
    out_feature_class=f'memory/enrch_tmp', # store in memory so can manipulate
    variables=list(sv.enrich_name)         # create list from series
)[0]                                       # first item from result object

# convert results to a dataframe
enrich_df = GeoAccessor.from_featureclass(enrich_fc)

print(enrich_df.info())
enrich_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 447 entries, 0 to 446
Data columns (total 37 columns):
 #   Column                 Non-Null Count  Dtype   
---  ------                 --------------  -----   
 0   OBJECTID               447 non-null    int64   
 1   geoid                  447 non-null    object  
 2   HasData                447 non-null    int64   
 3   aggregationMethod      447 non-null    object  
 4   KeyUSFacts_TOTPOP_CY   447 non-null    float64 
 5   KeyUSFacts_GQPOP_CY    447 non-null    float64 
 6   KeyUSFacts_DIVINDX_CY  447 non-null    float64 
 7   KeyUSFacts_TOTHH_CY    447 non-null    float64 
 8   KeyUSFacts_AVGHHSZ_CY  447 non-null    float64 
 9   KeyUSFacts_MEDHINC_CY  447 non-null    float64 
 10  KeyUSFacts_AVGHINC_CY  447 non-null    float64 
 11  KeyUSFacts_PCI_CY      447 non-null    float64 
 12  KeyUSFacts_TOTHU_CY    447 non-null    float64 
 13  KeyUSFacts_OWNER_CY    447 non-null    float64 
 14  KeyUSFacts_RENTER_CY   447 non-null    flo

Unnamed: 0,OBJECTID,geoid,HasData,aggregationMethod,KeyUSFacts_TOTPOP_CY,KeyUSFacts_GQPOP_CY,KeyUSFacts_DIVINDX_CY,KeyUSFacts_TOTHH_CY,KeyUSFacts_AVGHHSZ_CY,KeyUSFacts_MEDHINC_CY,...,sports_MP33009a_B,sports_MP33015a_B,sports_MP33020a_B,sports_MP33023a_B,sports_MP33030a_B,sports_MP33033a_B,sports_MP33034a_B,sports_MP33035a_B,sports_MP33050a_B,SHAPE
0,1,410510064033,1,BlockApportionment:US.BlockGroups,1235.0,0.0,66.2,505.0,2.45,61338.0,...,66.0,119.0,142.0,19.0,213.0,76.0,23.0,44.0,15.0,"{""rings"": [[[-122.73110547499999, 45.447119455..."
1,2,410510065011,1,BlockApportionment:US.BlockGroups,1997.0,0.0,33.7,860.0,2.32,128390.0,...,146.0,352.0,252.0,87.0,490.0,111.0,90.0,101.0,95.0,"{""rings"": [[[-122.73413547799998, 45.460411456..."
2,3,410510001023,1,BlockApportionment:US.BlockGroups,806.0,0.0,35.5,338.0,2.38,106192.0,...,83.0,111.0,112.0,29.0,176.0,55.0,47.0,49.0,30.0,"{""rings"": [[[-122.65301046699994, 45.465041464..."
3,4,410510001024,1,BlockApportionment:US.BlockGroups,1086.0,0.0,29.0,473.0,2.3,94737.0,...,116.0,155.0,157.0,40.0,246.0,77.0,65.0,69.0,42.0,"{""rings"": [[[-122.66649846899998, 45.464938463..."
4,5,410510062002,1,BlockApportionment:US.BlockGroups,901.0,0.0,28.1,395.0,2.28,119302.0,...,66.0,100.0,94.0,27.0,231.0,40.0,54.0,61.0,33.0,"{""rings"": [[[-122.69736447399998, 45.464531460..."
