# Enrich Web GIS

All you need to perform enrichment using a Web GIS using Python is a Web GIS with geoenrichment configured and the Python API installed in your local environment. Either ArcGIS Online or ArcGIS Enterprise with the [geoenrichment utility service configured](https://enterprise.arcgis.com/en/portal/latest/administer/linux/configure-services.htm#ESRI_SECTION2_1E0134BF60A049FFB388265B5A6AAE7F) are valid Web GIS's. The local Python environemnt requires ArcGIS Python API 1.9.1, available in both a [Conda](https://anaconda.org/esri/arcgis) and [PIP](https://pypi.org/project/arcgis/).

With these requirements met, you can use the `get_countries` and `Country.enrich_variables` function to introspectively retrieve variables to use in the [`arcgis.geoenrichment.enrich` function](https://developers.arcgis.com/python/api-reference/arcgis.geoenrichment.html#enrich).

## Imports and Setup

In [1]:
import json
import os
from pathlib import Path

from arcgis.features import GeoAccessor
from arcgis.geometry import Polygon
from arcgis.geoenrichment import get_countries, Country, enrich  # specific geoenrichment imports
from arcgis.gis import GIS
from dotenv import find_dotenv, load_dotenv
import pandas as pd

In [2]:
# load environment variables from .env
load_dotenv(find_dotenv())

# paths to common data locations - NOTE: to convert any path to a raw string, simply use str(path_instance)
dir_prj = Path.cwd().parent
dir_data = dir_prj/'data'
dir_raw = dir_data/'raw'
dir_int = dir_data/'interim'

## Discover Available Countries

The first step is discovering which countries are available since most demographic data is organized by country. In the Python API 1.9.1 release the `as_df` parameter has been added to enable discovering available countries as a Pandas Dataframe for ease of discovery and filtering.

In [3]:
# create a GIS object connecting to a Web GIS
gis = GIS(
    url=os.getenv('ESRI_GIS_URL'), 
    username=os.getenv('ESRI_GIS_USERNAME'),
    password=os.getenv('ESRI_GIS_PASSWORD')
)

# use this Web GIS instance as input
cntry_df = get_countries(gis, as_df=True)

cntry_df

Unnamed: 0,iso2,iso3,country_name,datasets,default_dataset,alt_name,continent
0,AL,ALB,Albania,[ALB_MBR_2019],ALB_MBR_2019,ALBANIA,Europe
1,DZ,DZA,Algeria,[DZA_MBR_2019],DZA_MBR_2019,ALGERIA,Africa
2,AD,AND,Andorra,[AND_MBR_2019],AND_MBR_2019,ANDORRA,Europe
3,AO,AGO,Angola,[AGO_MBR_2019],AGO_MBR_2019,ANGOLA,Africa
4,AI,AIA,Anguilla,[AIA_MBR_2020],AIA_MBR_2020,ANGUILLA,North America
...,...,...,...,...,...,...,...
139,UZ,UZB,Uzbekistan,[UZB_MBR_2020],UZB_MBR_2020,UZBEKISTAN,Asia
140,VE,VEN,Venezuela,[VEN_MBR_2020],VEN_MBR_2020,"VENEZUELA, BOLIVARIAN REPUBLIC OF",South America
141,VN,VNM,Vietnam,[VNM_MBR_2020],VNM_MBR_2020,VIET NAM,Asia
142,VI,VIR,Virgin Islands,[VIR_MBR_2020],VIR_MBR_2020,UNITED STATES VIRGIN ISLANDS,North America


### Create a Country Object

Next, just as in the first notebook, we create a country object for the United States.

In [4]:
# create a USA country object, again using the ArcGIS Pro GIS instance
usa = Country('USA', gis=gis)

usa

<Country - United States (GIS @ https://bateam.maps.arcgis.com version:9.2)>

### Current Year Key Variables

Again, we can identify a subset of variables to use, current year key variables.

In [5]:
# retrieve the locally available variables
ev = usa.enrich_variables

# filter to current year key variables
kv = ev[
    (ev.name.str.endswith('CY'))
    & (ev.data_collection.str.lower().str.contains('key'))
].reset_index(drop=True)

kv

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name,description,vintage,units
0,TOTPOP_CY,2021 Total Population,KeyUSFacts,KeyUSFacts.TOTPOP_CY,KeyUSFacts_TOTPOP_CY,2021 Total Population (Esri),2021,count
1,GQPOP_CY,2021 Group Quarters Population,KeyUSFacts,KeyUSFacts.GQPOP_CY,KeyUSFacts_GQPOP_CY,2021 Group Quarters Population (Esri),2021,count
2,DIVINDX_CY,2021 Diversity Index,KeyUSFacts,KeyUSFacts.DIVINDX_CY,KeyUSFacts_DIVINDX_CY,2021 Diversity Index (Esri),2021,count
3,TOTHH_CY,2021 Total Households,KeyUSFacts,KeyUSFacts.TOTHH_CY,KeyUSFacts_TOTHH_CY,2021 Total Households (Esri),2021,count
4,AVGHHSZ_CY,2021 Average Household Size,KeyUSFacts,KeyUSFacts.AVGHHSZ_CY,KeyUSFacts_AVGHHSZ_CY,2021 Average Household Size (Esri),2021,count
5,MEDHINC_CY,2021 Median Household Income,KeyUSFacts,KeyUSFacts.MEDHINC_CY,KeyUSFacts_MEDHINC_CY,2021 Median Household Income (Esri),2021,currency
6,AVGHINC_CY,2021 Average Household Income,KeyUSFacts,KeyUSFacts.AVGHINC_CY,KeyUSFacts_AVGHINC_CY,2021 Average Household Income (Esri),2021,currency
7,PCI_CY,2021 Per Capita Income,KeyUSFacts,KeyUSFacts.PCI_CY,KeyUSFacts_PCI_CY,2021 Per Capita Income (Esri),2021,currency
8,TOTHU_CY,2021 Total Housing Units,KeyUSFacts,KeyUSFacts.TOTHU_CY,KeyUSFacts_TOTHU_CY,2021 Total Housing Units (Esri),2021,count
9,OWNER_CY,2021 Owner Occupied HUs,KeyUSFacts,KeyUSFacts.OWNER_CY,KeyUSFacts_OWNER_CY,2021 Owner Occupied Housing Units (Esri),2021,count


## Load Data to Enrich

Now, before enriching, we need something to enrich. We are going to use [H3](https://h3geo.org/) level nine hexagon polygons with the associated identifier covering Olympia, WA. The data is stored in a flat CSV file with the geometry saved as Esri JSON. When read with `pd.read_csv`, the column with geometries is recognized as a string Series. To work with it we need to convert all the strings to proper polygon Geometry objects, and tell the GeoAccessor (`spatial`) to recognize the column (`set_geometry`) 

In [6]:
h3_csv = dir_raw/'h3_olympia.csv'

# read in the data from a csv file with geometries
h3_df = pd.read_csv(h3_csv, index_col=0)

# because the geometry is stored as a JSON string, need to convert to geometry
h3_df.SHAPE = h3_df.SHAPE.apply(lambda geom: Polygon(json.loads(geom)))

# once the geometries are created, we need to tell the GeoAccessor (spatial) to recognize them
h3_df.spatial.set_geometry('SHAPE')

print(h3_df.info())
h3_df.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 625 entries, 0 to 624
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   h3_09   625 non-null    object  
 1   SHAPE   625 non-null    geometry
dtypes: geometry(1), object(1)
memory usage: 14.6+ KB
None


Unnamed: 0,h3_09,SHAPE
0,8928d590297ffff,"{""rings"": [[[-122.94740574999997, 47.024184694..."
1,8928d5902b3ffff,"{""rings"": [[[-122.95158429099996, 47.028134405..."
2,8928d591533ffff,"{""rings"": [[[-122.92062630199996, 47.028123088..."
3,8928d5916afffff,"{""rings"": [[[-122.87055727599994, 47.031318454..."
4,8928d591433ffff,"{""rings"": [[[-122.89294779499994, 47.020566091..."


## Enrich

Yes, now we can enrich the data with the demographic factors we identified using introspection and filtering above. We are using the temporary feature class as input, and also outputting to _another_ temporary feature class. This is so we can load the results back into a Dataframe where it is easy to clean up the schema a little before saving the final result.

In [7]:
enrich_df = enrich(
    study_areas=h3_df,                     # directly use dataframe
    analysis_variables=list(kv.name),      # create list from series 
    return_geometry=False,                 # already have geometry
    gis=gis                                # provide connection to configured GIS
)                                  

enrich_df

Unnamed: 0,ID,OBJECTID_0,sourceCountry,OBJECTID,h3_09,aggregationMethod,populationToPolygonSizeRating,apportionmentConfidence,HasData,TOTPOP_CY,...,RENTER_CY,VACANT_CY,MEDVAL_CY,AVGVAL_CY,POPGRW10CY,HHGRW10CY,FAMGRW10CY,DPOP_CY,DPOPWRK_CY,DPOPRES_CY
0,0,1,US,1,8928d590297ffff,BlockApportionment:US.BlockGroups,2.191,2.576,0,0,...,0,0,0,0,0.00,0.00,0.00,0,0,0
1,1,2,US,2,8928d5902b3ffff,BlockApportionment:US.BlockGroups,2.191,2.576,1,382,...,14,5,398913,431803,0.76,0.70,0.56,286,96,190
2,2,3,US,3,8928d591533ffff,BlockApportionment:US.BlockGroups,2.191,2.576,0,0,...,0,0,0,0,0.00,0.00,0.00,0,0,0
3,3,4,US,4,8928d5916afffff,BlockApportionment:US.BlockGroups,2.191,2.576,1,59,...,6,0,342857,369444,0.31,0.40,0.62,64,37,27
4,4,5,US,5,8928d591433ffff,BlockApportionment:US.BlockGroups,2.191,2.576,1,154,...,11,0,370000,425962,0.41,0.46,0.47,129,48,81
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
620,20,21,US,21,8928d591c6bffff,BlockApportionment:US.BlockGroups,2.191,2.576,0,0,...,0,0,0,0,0.00,0.00,0.00,0,0,0
621,21,22,US,22,8928d5913c3ffff,BlockApportionment:US.BlockGroups,2.191,2.576,1,107,...,18,0,328571,324167,0.60,0.56,0.36,80,31,49
622,22,23,US,23,8928d5918d7ffff,BlockApportionment:US.BlockGroups,2.191,2.576,0,0,...,0,0,0,0,0.00,0.00,0.00,0,0,0
623,23,24,US,24,8928d59165bffff,BlockApportionment:US.BlockGroups,2.191,2.576,0,0,...,0,0,0,0,0.00,0.00,0.00,0,0,0


### Keep Only Needed Columns

By introspectively looking at the inputs we can create a list of only what we want in the output.

In [8]:
drop_cols = [c for c in enrich_df.columns if not
    (c in kv.name.values                        # enrich variables' output field names
     or c in h3_df.columns                      # input data column names
     or c.lower() == 'hasdata')                 # if row received demographc factors
]
enrich_df.drop(columns=drop_cols, inplace=True)

enrich_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 625 entries, 0 to 624
Data columns (total 22 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   h3_09       625 non-null    object 
 1   HasData     625 non-null    int64  
 2   TOTPOP_CY   625 non-null    int64  
 3   GQPOP_CY    625 non-null    int64  
 4   DIVINDX_CY  625 non-null    float64
 5   TOTHH_CY    625 non-null    int64  
 6   AVGHHSZ_CY  625 non-null    float64
 7   MEDHINC_CY  625 non-null    int64  
 8   AVGHINC_CY  625 non-null    int64  
 9   PCI_CY      625 non-null    int64  
 10  TOTHU_CY    625 non-null    int64  
 11  OWNER_CY    625 non-null    int64  
 12  RENTER_CY   625 non-null    int64  
 13  VACANT_CY   625 non-null    int64  
 14  MEDVAL_CY   625 non-null    int64  
 15  AVGVAL_CY   625 non-null    int64  
 16  POPGRW10CY  625 non-null    float64
 17  HHGRW10CY   625 non-null    float64
 18  FAMGRW10CY  625 non-null    float64
 19  DPOP_CY     625 non-null    i

### Add Geometry Back On

We did not request the geometry to be returned as part of the enrichment call. This can dramatically reduce the size of the return package, but if we want to retain the geometry for subsequent analysis, we now need to add it back on.

In [9]:
enrich_df['SHAPE'] = h3_df['SHAPE']
enrich_df.spatial.set_geometry('SHAPE')

## Save Results

Finally, the results can be saved using the `pd.spatial.to_featureclass` method, or alternately, if desired, the results can also be saved back to a CSV file. However, the geometry objects must be deserialized back to strings before exporting. This is accomplished using the `Geometry.JSON` method.

In [10]:
# convert geometry objects to strings
enrich_df[enrich_df.spatial.name] = enrich_df[enrich_df.spatial.name].apply(lambda geom: geom.JSON)

# save to interim data diretory for further analysis
dir_int.mkdir(exist_ok=True)
enrich_df.to_csv(dir_int/'enrich_olympia_gis.csv')