# Geoenrichment Introspection Overview

Discovering what variables are available for geoenrichment is now possbile using the `arcgis.geoenrichment` module. Previously, this was only possible if using a Web GIS as the source. Now, you can use a local source, ArcGIS Pro with the Business Analyst extension and a local country data pack, in addition to a Web GIS, ArcGIS Enterprise with geoenrichment configured, or ArcGIS Online. This is the first step toward providing a single Python API for workign with the functionality offered by Business Analyst, whether working with ArcGIS Pro with Business Analyst and data for at least one country, or connected to a Web GIS with geoenrichment configured and enabled.

## Imports and Setup

In [1]:
import importlib
import os
from pathlib import Path
import sys

from arcgis.features import GeoAccessor
from arcgis.geoenrichment import get_countries, Country, enrich  # geoenrichment imports
from arcgis.gis import GIS
from dotenv import load_dotenv, find_dotenv
import pandas as pd

# import arcpy if available
if importlib.util.find_spec("arcpy") is not None:
    arcpy_avail = True
    import arcpy
else:
    arcpy_avail = False

In [2]:
# paths to common data locations - NOTE: to convert any path to a raw string, simply use str(path_instance)
dir_prj = Path.cwd().parent

dir_data = dir_prj/'data'

dir_raw = dir_data/'raw'
dir_int = dir_data/'interim'

gdb_int = dir_int/'interim.gdb'

# create the interim geodatabase if arcpy is available and it does not already exist
if arcpy_avail:
    if not arcpy.Exists(str(gdb_int)):
        arcpy.management.CreateFileGDB(str(gdb_int.parent), gdb_int.name)

# load environment variables from .env
load_dotenv(find_dotenv())

# create a GIS object instance; if you did not enter any information here, it defaults to anonymous access to ArcGIS Online
gis = GIS(
    url=os.getenv('ESRI_GIS_URL'), 
    username=os.getenv('ESRI_GIS_USERNAME'),
    password=None if len(os.getenv('ESRI_GIS_PASSWORD')) is 0 else os.getenv('ESRI_GIS_PASSWORD')
)

assert isinstance(gis, GIS)

## Discover Available Countries

The first step is discovering which countries are available since most demographic data is organized by country. In the Python API 1.9.1 release the `as_df` parameter has been added to enable discovering available countries as a Pandas Dataframe for ease of discovery and filtering.

In [3]:
cntry_df = get_countries(as_df=True)  # using as_df parameter enables retrieving a dataframe

cntry_df

Unnamed: 0,iso2,iso3,country_name,datasets,default_dataset,alt_name,continent
0,AL,ALB,Albania,[ALB_MBR_2019],ALB_MBR_2019,ALBANIA,Europe
1,DZ,DZA,Algeria,[DZA_MBR_2019],DZA_MBR_2019,ALGERIA,Africa
2,AD,AND,Andorra,[AND_MBR_2019],AND_MBR_2019,ANDORRA,Europe
3,AO,AGO,Angola,[AGO_MBR_2019],AGO_MBR_2019,ANGOLA,Africa
4,AI,AIA,Anguilla,[AIA_MBR_2020],AIA_MBR_2020,ANGUILLA,North America
...,...,...,...,...,...,...,...
139,UZ,UZB,Uzbekistan,[UZB_MBR_2020],UZB_MBR_2020,UZBEKISTAN,Asia
140,VE,VEN,Venezuela,[VEN_MBR_2020],VEN_MBR_2020,"VENEZUELA, BOLIVARIAN REPUBLIC OF",South America
141,VN,VNM,Vietnam,[VNM_MBR_2020],VNM_MBR_2020,VIET NAM,Asia
142,VI,VIR,Virgin Islands,[VIR_MBR_2020],VIR_MBR_2020,UNITED STATES VIRGIN ISLANDS,North America


Take for example discovering if the United States, USA, is available.

In [4]:
cntry_df[cntry_df.iso3.str.contains('USA')]

Unnamed: 0,iso2,iso3,country_name,datasets,default_dataset,alt_name,continent
137,US,USA,United States,"[USA_ESRI_2021, USA_ACS_2021, USA_ASR_2021, US...",USA_ESRI_2021,UNITED STATES,North America


## Discover Variables in a Country

Avaiable enrichment variables can be retrieved as a Pandas Dataframe. Since a Dataframe, it is very straightforward to filter and create a concise list of availble variables.

In [5]:
usa = Country('USA')

usa

<Country - United States (GIS @ https://bateam.maps.arcgis.com version:9.2)>

In [6]:
ev = usa.enrich_variables

ev

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name,description,vintage,units
0,AGE0_CY,2021 Population Age <1,1yearincrements,1yearincrements.AGE0_CY,F1yearincrements_AGE0_CY,2021 Total Population Age <1 (Esri),2021,count
1,AGE1_CY,2021 Population Age 1,1yearincrements,1yearincrements.AGE1_CY,F1yearincrements_AGE1_CY,2021 Total Population Age 1 (Esri),2021,count
2,AGE2_CY,2021 Population Age 2,1yearincrements,1yearincrements.AGE2_CY,F1yearincrements_AGE2_CY,2021 Total Population Age 2 (Esri),2021,count
3,AGE3_CY,2021 Population Age 3,1yearincrements,1yearincrements.AGE3_CY,F1yearincrements_AGE3_CY,2021 Total Population Age 3 (Esri),2021,count
4,AGE4_CY,2021 Population Age 4,1yearincrements,1yearincrements.AGE4_CY,F1yearincrements_AGE4_CY,2021 Total Population Age 4 (Esri),2021,count
...,...,...,...,...,...,...,...,...
37,MOEMEDYRMV,2019 Median Year Householder Moved In MOE (ACS...,yearmovedin,yearmovedin.MOEMEDYRMV,yearmovedin_MOEMEDYRMV,2019 Median Year Householder Moved into Unit M...,2015-2019,count
38,RELMEDYRMV,2019 Median Year Householder Moved In REL (ACS...,yearmovedin,yearmovedin.RELMEDYRMV,yearmovedin_RELMEDYRMV,2019 Median Year Householder Moved into Unit R...,2015-2019,count
39,ACSOWNER,2019 Owner Households (ACS 5-Yr),yearmovedin,yearmovedin.ACSOWNER,yearmovedin_ACSOWNER,2019 Owner Households (ACS 5-Yr),2015-2019,count
40,MOEOWNER,2019 Owner Households MOE (ACS 5-Yr),yearmovedin,yearmovedin.MOEOWNER,yearmovedin_MOEOWNER,2019 Owner Households MOE (ACS 5-Yr),2015-2019,count


### Filtering - Getting What You Want

Since the variables returned are a Pandas Dataframe, using some straightforward filtering techniques, we can quickly retrieve a set of enrichment variables to work with - in this case, Key US Facts for the current year.

In [7]:
kv = ev[
    (ev.name.str.endswith('CY'))  # current year variables
    & (ev.data_collection.str.lower().str.contains('key'))  # key variables
].reset_index(drop=True)

kv

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name,description,vintage,units
0,TOTPOP_CY,2021 Total Population,KeyUSFacts,KeyUSFacts.TOTPOP_CY,KeyUSFacts_TOTPOP_CY,2021 Total Population (Esri),2021,count
1,GQPOP_CY,2021 Group Quarters Population,KeyUSFacts,KeyUSFacts.GQPOP_CY,KeyUSFacts_GQPOP_CY,2021 Group Quarters Population (Esri),2021,count
2,DIVINDX_CY,2021 Diversity Index,KeyUSFacts,KeyUSFacts.DIVINDX_CY,KeyUSFacts_DIVINDX_CY,2021 Diversity Index (Esri),2021,count
3,TOTHH_CY,2021 Total Households,KeyUSFacts,KeyUSFacts.TOTHH_CY,KeyUSFacts_TOTHH_CY,2021 Total Households (Esri),2021,count
4,AVGHHSZ_CY,2021 Average Household Size,KeyUSFacts,KeyUSFacts.AVGHHSZ_CY,KeyUSFacts_AVGHHSZ_CY,2021 Average Household Size (Esri),2021,count
5,MEDHINC_CY,2021 Median Household Income,KeyUSFacts,KeyUSFacts.MEDHINC_CY,KeyUSFacts_MEDHINC_CY,2021 Median Household Income (Esri),2021,currency
6,AVGHINC_CY,2021 Average Household Income,KeyUSFacts,KeyUSFacts.AVGHINC_CY,KeyUSFacts_AVGHINC_CY,2021 Average Household Income (Esri),2021,currency
7,PCI_CY,2021 Per Capita Income,KeyUSFacts,KeyUSFacts.PCI_CY,KeyUSFacts_PCI_CY,2021 Per Capita Income (Esri),2021,currency
8,TOTHU_CY,2021 Total Housing Units,KeyUSFacts,KeyUSFacts.TOTHU_CY,KeyUSFacts_TOTHU_CY,2021 Total Housing Units (Esri),2021,count
9,OWNER_CY,2021 Owner Occupied HUs,KeyUSFacts,KeyUSFacts.OWNER_CY,KeyUSFacts_OWNER_CY,2021 Owner Occupied Housing Units (Esri),2021,count
