<h1><b> DSC 170: Geoenrichment, and navigating different types of vector representation in  ArcGIS API for Python </b></h1>

This lecture will cover:
* Geonrichment  
  - a key approach to engineering features for analysis
  - understand what information available, and how to retrieve it
  - adding features to existing named areas, and to constructed areas
  - understand data accuracy issues
* Navigating different type of feature data in ArcGIS  
  - different types of feature representations, with different conversion and analysis APIs
  - what you can do with feature data


**GeoEnrichment**  is a key capability you can use in your data science projects. It helps you to get facts about a specific area. The area can be defined by administrative boundaries, or be a result of geometric operations such as distance buffer or drive time from a location. Available facts are stored in multiple datasets and reflect population, income, housing, consumer behavior, and the natural environment. Much of the remainder of this notebook is based on https://developers.arcgis.com/python/guide/part1-introduction-to-geoenrichment/ and subsequent documentation sections.

The main method is **enrich()**: it retrieves info for the specified area.
The arcgis.geoenrichment module can help you create geometries to which enrich() can be later applied.

**Examples:** 
    - A wildfire analyst generates a map of the dynamics and extent of forect fires: you need to quickly determine who lives there and what their mobility charatceristics are. 
    - A company is looking for a location of a new store: you need to determine who lives in the vicinity and what they typically buy.

In [1]:
import arcgis
from arcgis.gis import GIS
from arcgis import geometry
from arcgis.features import GeoAccessor, GeoSeriesAccessor
from arcgis.geoenrichment import *
import pandas as pd

# login with UCSD Single-Sign-On. 
gis=GIS("https://ucsdonline.maps.arcgis.com/home", client_id="bZshlNXFuaR2KHff") 

arcgis.__version__


Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.



Please sign in to your GIS and paste the code that is obtained below.
If a web browser does not automatically open, please navigate to the URL below yourself instead.
Opening web browser to navigate to: https://ucsdonline.maps.arcgis.com/sharing/rest/oauth2/authorize?response_type=code&client_id=bZshlNXFuaR2KHff&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&state=mqpOgSBATFhYu5o26wfyfm8ZL6CP7o&allow_verification=false


'2.4.0'

## What information is available for your country of interest?

In [2]:
country_df = get_countries(gis, as_df=True)
print("Number of countries for which GeoEnrichment data is available: " + str(len(country_df)))

#print a few countries for a sample
country_df.head(10)

Number of countries for which GeoEnrichment data is available: 177


Unnamed: 0,iso2,iso3,name,alt_name,datasets,default_dataset,continent,hierarchy
0,AL,ALB,Albania,ALBANIA,[ALB_MBR_2023],ALB_MBR_2023,Europe,[census]
1,DZ,DZA,Algeria,ALGERIA,[DZA_MBR_2023],DZA_MBR_2023,Africa,[census]
2,AD,AND,Andorra,ANDORRA,[AND_MBR_2023],AND_MBR_2023,Europe,[census]
3,AO,AGO,Angola,ANGOLA,[AGO_MBR_2023],AGO_MBR_2023,Africa,[census]
4,AI,AIA,Anguilla,ANGUILLA,[AIA_MBR_2022],AIA_MBR_2022,North America,[census]
5,AR,ARG,Argentina,ARGENTINA,[ARG_MBR_2024],ARG_MBR_2024,South America,[census]
6,AM,ARM,Armenia,ARMENIA,[ARM_MBR_2022],ARM_MBR_2022,Europe,[census]
7,AW,ABW,Aruba,ARUBA,[ABW_MBR_2022],ABW_MBR_2022,North America,[census]
8,AU,AUS,Australia,AUSTRALIA,"[AUS_ABS_2021, AUS_MBR_2024]",AUS_ABS_2021,Oceania,"[AUS_ABS, census]"
9,AT,AUT,Austria,AUSTRIA,[AUT_MBR_2023],AUT_MBR_2023,Europe,[census]


In [3]:
# What is available for the US?

usa = Country("usa",gis=gis)
usa.properties.datasets

['USA_ESRI_2024',
 'USA_ACSTRACTS_2024',
 'USA_ACS_2024',
 'USA_ASR_2024',
 'USA_BSUM_2024',
 'USA_CRM_2024',
 'USA_DATAAXLE_2024',
 'USA_DHCTRACTS_2024',
 'USA_DHC_2024',
 'USA_RETAILDEMAND_2024',
 'USA_SAFEGRAPH_2024',
 'USA_TRFCNT_2024',
 'USA_URBANICITY_2024',
 'USA_ESRI_2023',
 'USA_ACSTRACTS_2023',
 'USA_ACS_2023',
 'USA_ASR_2023',
 'USA_BSUM_2023',
 'USA_CRM_2023',
 'USA_DHCTRACTS_2023',
 'USA_DHC_2023',
 'USA_RETAILDEMAND_2023',
 'Landscape']

Typical content available for countries:

 - MBR == Michael Bauer Research (regional market data): https://www.esri.com/partners/michael-bauer-resear-a2T70000000TNZ3EAO
 - Key Global Facts: https://geoenrichdev.arcgis.com/arcgis/rest/services/World/GeoenrichmentServer/Geoenrichment/DataCollections

### Geoenrichment integrates data from many databases:

* ACS: American Community Survey (https://www.census.gov/programs-surveys/acs, https://doc.arcgis.com/en/esri-demographics/data/acs.htm, http://suave2.sdsc.edu/gallery/sdhhsa)
* ASR: Age, Sex, Race (https://www.census.gov/newsroom/press-kits/2020/population-estimates-detailed.html)
* CRM: Crimes (https://doc.arcgis.com/en/esri-demographics/data/crime-indexes.htm)
* RMP: Retail MarketPlace (https://downloads.esri.com/esri_content_doc/dbl/us/Var_List_Retail-MarketPlace_Summer2020.pdf, https://doc.arcgis.com/en/esri-demographics/data/market-potential.htm)
* Safegraph: 5 million point locations for any transactions (https://www.esri.com/arcgis-blog/products/bus-analyst/data-management/why-and-when-to-use-safegraph-data-in-your-analysis/, https://doc.arcgis.com/en/esri-demographics/data/business.htm)
* Traffic Counts (https://doc.arcgis.com/en/esri-demographics/data/traffic-counts.htm)
* Data Axle: business data from 13 mil businesses (https://storymaps.arcgis.com/stories/d13b635ab9ac44759e99eb52646877f8)

Global coverage of geoenrichment: https://doc.arcgis.com/en/arcgis-online/reference/geoenrichment-coverage.htm

Standard geography levels: https://geoenrichdev.arcgis.com/arcgis/rest/services/World/GeoenrichmentServer/Geoenrichment/StandardGeographyLevels

You can also also find variables for geoenrichment with the ESRI Demographics Data Browser at https://doc.arcgis.com/en/esri-demographics/data/data-browser.htm (may be easier!)

### Listing  available data collections

In [4]:
# A data collection is a preassembled list of attributes that will be used to enrich the input features. 
# Collection attributes can describe various types of information, 
# such as demographic characteristics and geographic context of the locations or areas submitted as input features.

df = usa.data_collections
print(df.shape) # total number of variables
print(df.index.unique().values) # names of the collections




(20932, 4)
['1yearincrements' '5yearincrements' 'Age' 'agebyracebysex'
 'agebyracebysex2010' 'agebyracebysex2020' 'AgeDependency' 'AtRisk'
 'AutomobilesAutomotiveProducts' 'BabyProductsToysGames'
 'basicFactsForMobileApps' 'businesses'
 'CivicActivitiesPoliticalAffiliation' 'classofworker' 'clothing'
 'ClothingShoesAccessories' 'commute' 'crime' 'DaytimePopulation'
 'disability' 'disposableincome' 'DniRates' 'education'
 'educationalattainment' 'ElectronicsInternet' 'employees'
 'EmploymentUnemployment' 'entertainment' 'financial' 'FinancialInsurance'
 'food' 'foodstampsSNAP' 'gender' 'Generations'
 'GroceryAlcoholicBeverages' 'groupquarters' 'Health'
 'healthinsurancecoverage' 'HealthPersonalCare' 'HealthPersonalCareCEX'
 'heatingfuel' 'hispanicorigin' 'HistoricalHouseholds' 'HistoricalHousing'
 'HistoricalPopulation' 'HomeImprovementGardenLawn' 'homevalue'
 'HouseholdGoodsFurnitureAppliances' 'householdincome' 'households'
 'householdsbyageofhouseholder' 'HouseholdsByIncome'
 'househ

In [5]:
df[7000:7050]     # returns a pandas DF with specific measured variables

Unnamed: 0_level_0,analysisVariable,alias,fieldCategory,vintage
dataCollectionID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
food,food.X1076_I,2024 Index: Frozen Fruit Juice,2024 Food at Home - Dairy/Fruit/Vegs (Consumer...,2024
food,food.X1077_X,2024 Canned Fruit,2024 Food at Home - Dairy/Fruit/Vegs (Consumer...,2024
food,food.X1077_A,2024 Avg: Canned Fruit,2024 Food at Home - Dairy/Fruit/Vegs (Consumer...,2024
food,food.X1077_I,2024 Index: Canned Fruit,2024 Food at Home - Dairy/Fruit/Vegs (Consumer...,2024
food,food.X1078_X,2024 Dried Fruit,2024 Food at Home - Dairy/Fruit/Vegs (Consumer...,2024
food,food.X1078_A,2024 Avg: Dried Fruit,2024 Food at Home - Dairy/Fruit/Vegs (Consumer...,2024
food,food.X1078_I,2024 Index: Dried Fruit,2024 Food at Home - Dairy/Fruit/Vegs (Consumer...,2024
food,food.X1079_X,2024 Fresh Fruit Juice,2024 Food at Home - Dairy/Fruit/Vegs (Consumer...,2024
food,food.X1079_A,2024 Avg: Fresh Fruit Juice,2024 Food at Home - Dairy/Fruit/Vegs (Consumer...,2024
food,food.X1079_I,2024 Index: Fresh Fruit Juice,2024 Food at Home - Dairy/Fruit/Vegs (Consumer...,2024


### How to find variables that you need:

In [6]:
df['indexes'] = df['alias'].str.find('Road')
res = df[df['indexes']>-1]
res

Unnamed: 0_level_0,analysisVariable,alias,fieldCategory,vintage,indexes
dataCollectionID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AutomobilesAutomotiveProducts,AutomobilesAutomotiveProducts.MP01058h_B,2024 HH Owns On/Off-Road Motorcycle,2024 Automobiles & Other Vehicles (Market Pote...,2024,20
AutomobilesAutomotiveProducts,AutomobilesAutomotiveProducts.MP01058h_I,2024 Index: HH Owns On/Off-Road Motorcycle,2024 Automobiles & Other Vehicles (Market Pote...,2024,27
CivicActivitiesPoliticalAffiliation,CivicActivitiesPoliticalAffiliation.MP06029a_B,2024 Middle of the Road Political Outlook,2024 Civic Activities & Political Affiliation ...,2024,19
CivicActivitiesPoliticalAffiliation,CivicActivitiesPoliticalAffiliation.MP06029a_I,2024 Index: Middle of the Road Political Outlook,2024 Civic Activities & Political Affiliation ...,2024,26
restaurants,restaurants.MP29090a_B,2024 Went to Logan`s Roadhouse/6 Mo,2024 Restaurants (Market Potential),2024,21
restaurants,restaurants.MP29090a_I,2024 Index: Went to Logan`s Roadhouse/6 Mo,2024 Restaurants (Market Potential),2024,28
restaurants,restaurants.MP29033a_B,2024 Went to Texas Roadhouse/6 Mo,2024 Restaurants (Market Potential),2024,19
restaurants,restaurants.MP29033a_I,2024 Index: Went to Texas Roadhouse/6 Mo,2024 Restaurants (Market Potential),2024,26
sports,sports.MP33006a_B,2024 Participated in Bicycling (Road)/12 Mo,2024 Sports (Market Potential),2024,32
sports,sports.MP33006a_I,2024 Index: Participated in Bicycling (Road)/1...,2024 Sports (Market Potential),2024,39


In [7]:
# Also, you can see variables available for any country (in version 2.4):

ev = usa.enrich_variables
ev_subset = ev[ev['description'].str.contains('Road', case=False, na=False)]
ev_subset

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name,description,vintage,units
3510,MP01058h_B,2024 HH Owns On/Off-Road Motorcycle,AutomobilesAutomotiveProducts,AutomobilesAutomotiveProducts.MP01058h_B,AutomobilesAutomotiveProducts_MP01058h_B,2024 HH Owns On/Off-Road Motorcycle,2024,count
3511,MP01058h_I,2024 Index: HH Owns On/Off-Road Motorcycle,AutomobilesAutomotiveProducts,AutomobilesAutomotiveProducts.MP01058h_I,AutomobilesAutomotiveProducts_MP01058h_I,2024 HH Owns On/Off-Road Motorcycle: Index,2024,count
3526,MP01066a_B,2024 Member of Auto Club Program,AutomobilesAutomotiveProducts,AutomobilesAutomotiveProducts.MP01066a_B,AutomobilesAutomotiveProducts_MP01066a_B,2024 Member of Auto Club/Road Assistance Program,2024,count
3527,MP01066a_I,2024 Index: Member of Auto Club Program,AutomobilesAutomotiveProducts,AutomobilesAutomotiveProducts.MP01066a_I,AutomobilesAutomotiveProducts_MP01066a_I,2024 Member of Auto Club/Road Assistance Progr...,2024,count
3528,MP01067a_B,2024 Member of AAA Auto Club,AutomobilesAutomotiveProducts,AutomobilesAutomotiveProducts.MP01067a_B,AutomobilesAutomotiveProducts_MP01067a_B,2024 Member of AAA Auto Club/Road Assistance P...,2024,count
3529,MP01067a_I,2024 Index: Member of AAA Auto Club,AutomobilesAutomotiveProducts,AutomobilesAutomotiveProducts.MP01067a_I,AutomobilesAutomotiveProducts_MP01067a_I,2024 Member of AAA Auto Club/Road Assistance P...,2024,count
3530,MP01087a_B,2024 Member of AARP Auto Club,AutomobilesAutomotiveProducts,AutomobilesAutomotiveProducts.MP01087a_B,AutomobilesAutomotiveProducts_MP01087a_B,2024 Member of AARP Auto Club/Road Assistance ...,2024,count
3531,MP01087a_I,2024 Index: Member of AARP Auto Club,AutomobilesAutomotiveProducts,AutomobilesAutomotiveProducts.MP01087a_I,AutomobilesAutomotiveProducts_MP01087a_I,2024 Member of AARP Auto Club/Road Assistance ...,2024,count
3532,MP01088a_B,2024 Member of Dealer/Manuf/Warranty Auto Club,AutomobilesAutomotiveProducts,AutomobilesAutomotiveProducts.MP01088a_B,AutomobilesAutomotiveProducts_MP01088a_B,2024 Member of Dealer/Manuf/Warranty Auto Club...,2024,count
3533,MP01088a_I,2024 Index: Member of Dealer/Manuf/Warranty Au...,AutomobilesAutomotiveProducts,AutomobilesAutomotiveProducts.MP01088a_I,AutomobilesAutomotiveProducts_MP01088a_I,2024 Member of Dealer/Manuf/Warranty Auto Club...,2024,count


In [8]:
ev

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name,description,vintage,units
0,AGE0_CY,2024 Population Age <1,1yearincrements,1yearincrements.AGE0_CY,F1yearincrements_AGE0_CY,2024 Total Population Age <1 (Esri),2024,count
1,AGE1_CY,2024 Population Age 1,1yearincrements,1yearincrements.AGE1_CY,F1yearincrements_AGE1_CY,2024 Total Population Age 1 (Esri),2024,count
2,AGE2_CY,2024 Population Age 2,1yearincrements,1yearincrements.AGE2_CY,F1yearincrements_AGE2_CY,2024 Total Population Age 2 (Esri),2024,count
3,AGE3_CY,2024 Population Age 3,1yearincrements,1yearincrements.AGE3_CY,F1yearincrements_AGE3_CY,2024 Total Population Age 3 (Esri),2024,count
4,AGE4_CY,2024 Population Age 4,1yearincrements,1yearincrements.AGE4_CY,F1yearincrements_AGE4_CY,2024 Total Population Age 4 (Esri),2024,count
...,...,...,...,...,...,...,...,...
20927,MOEMEDYRMV,2022 Median Year Householder Moved In MOE (ACS...,yearmovedin,yearmovedin.MOEMEDYRMV,yearmovedin_MOEMEDYRMV,2022 Median Year Householder Moved into Unit M...,2018-2022,count
20928,RELMEDYRMV,2022 Median Year Householder Moved In REL (ACS...,yearmovedin,yearmovedin.RELMEDYRMV,yearmovedin_RELMEDYRMV,2022 Median Year Householder Moved into Unit R...,2018-2022,count
20929,ACSOWNER,2022 Owner Households (ACS 5-Yr),yearmovedin,yearmovedin.ACSOWNER,yearmovedin_ACSOWNER,2022 Owner Households (ACS 5-Yr),2018-2022,count
20930,MOEOWNER,2022 Owner Households MOE (ACS 5-Yr),yearmovedin,yearmovedin.MOEOWNER,yearmovedin_MOEOWNER,2022 Owner Households MOE (ACS 5-Yr),2018-2022,count


### How to find collections and variables for any country without coding

ESRI's Data Browser (https://doc.arcgis.com/en/esri-demographics/latest/data-browser/data-browser.htm) can be used to examine the entire global listing of variables, and associated datasets for each country. You can also use it to select variables and export them for analysis as CSV or JSON.

## Creating area profiles (standard reports), for any area

In [9]:
# GeoEnrichment also enables you to create many types of high quality reports 
# for a variety of use cases describing the input area.

# print a sample of the reports available for USA
usa.reports.head(50)

Unnamed: 0,id,title,categories,formats
0,census2010_profile,2010 Census Profile,[Demographics],"[pdf, xlsx]"
1,census2020_profile,2020 Census Profile,[Demographics],"[pdf, xlsx]"
2,acs_housing,ACS Housing Summary,[Demographics],"[pdf, xlsx]"
3,acs_keyfacts,ACS Key Population & Household Facts,[Demographics],"[pdf, xlsx]"
4,acs_population,ACS Population Summary,[Demographics],"[pdf, xlsx]"
5,55plus,Age 50+ Profile,[Demographics],"[pdf, xlsx]"
6,agesexrace,Age by Sex by Race Profile,[Demographics],"[pdf, xlsx]"
7,agesex,Age by Sex Profile,[Demographics],"[pdf, xlsx]"
8,cex_auto,Automotive Aftermarket Expenditures,[Consumer Spending],"[pdf, xlsx]"
9,business_loc,Business Locator,[Business],[pdf]


### Creating Reports
The `create_report` method allows you to create many types of high quality reports for a variety of use cases describing the input area. If a point is used as a study area, the service will create a `1` mile ring buffer around the point to collect and append enrichment data. Optionally, you can create a buffer ring or drive time service area around points of interest to generate PDF or Excel reports containing relevant information for the area on demographics, consumer spending, tapestry market, etc.



In [10]:
# for zip code 92093
report = create_report(study_areas=["92093"],
                     report="tapestry_profileNEW",
                     export_format="PDF", 
                     out_folder=r"../../scratch", out_name="esri_tapestry_profile.pdf")
report

'../../scratch/esri_tapestry_profile.pdf'

In [11]:
# check if the item is already published
search_result = gis.content.search(query="esri_tapestry_profile.pdf", max_items=100)
search_result

[<Item title:"92093 Tapestry Profile PDF" type:PDF owner:dsc170wi24_7>,
 <Item title:"92093 Tapestry Profile PDF" type:PDF owner:ayeddana_UCSDOnline>]

In [12]:
# and delete, if found.
gis.content.search(query="esri_tapestry_profile.pdf", max_items=100)[23].delete()

IndexError: list index out of range

In [13]:
# adding this report to AGOL:

folder_name = "Reports"
report_path = "../../scratch/esri_tapestry_profile.pdf"
report_properties = {'title': '92093 Tapestry Profile PDF', 'type': 'PDF', 'tags': '92093, tapestry'}

# Retrieve the folder or create it if it doesn't exist
existing_folders = gis.content.folders.list()
folder = next((folder for folder in existing_folders if folder.name == folder_name), None)

if not folder:
    folder = gis.content.folders.create(folder_name)

# Add the item to the folder
report_item = folder.add(item_properties=report_properties, file=report_path)

In [14]:
# We can also create a report as Excel file. This is one of the ways to construct features for your ML tasks

# for zip code 92037
report = create_report(study_areas=["92037"],
                     report="tapestry_profileNEW",
                     export_format="XLSX", 
                     out_folder=r"../../scratch", out_name="esri_tapestry_profile_92037.xlsx")
report

# publish it on AGOL yourself

'../../scratch/esri_tapestry_profile_92037.xlsx'

In [15]:
# note that format names may be inconsistent across operations. 
# See https://developers.arcgis.com/rest/users-groups-and-items/items-and-item-types.htm
# for types of items in AGOL

### Working with study areas

GeoEnrichment uses the concept of a study area to define the location of the point or area that you want to enrich with additional information, or create reports about.

#### Accepted forms of study areas

- **Street address locations** - Locations can be passed as strings of input street addresses, points of interest or place names.
    + **Example:** `"380 New York St, Redlands, CA"`

- **Multiple field input addresses** - Locations described as multiple field input addresses, using dictionaries.
    + **Example:** 
        {"Address" : "380 New York Street",
        "City" : "Redlands",
        "Region" : "CA",
        "Postal" : 92373}    
 
- **Point and line geometries** - Point and line locations, using `arcgis.geometry` instances.
    + **Example Point Location: ** 
    
    `arcgis.geometry.Geometry({"x":-122.435,"y":37.785})`

- **Buffered study areas** - `BufferStudyArea` instances to change the ring buffer size or create drive-time service areas around points specified using one of the above methods. BufferStudyArea allows you to buffer point and street address study areas. They can be created using the following parameters:
        * area: the point geometry or street address (string) study area to be buffered
        * radii: list of distances by which to buffer the study area, eg. [1, 2, 3]
        * units: distance unit, eg. Miles, Kilometers, Minutes (when using drive times/travel_mode)
        * overlap: boolean, uses overlapping rings/network service areas when True, or non-overlapping disks when False
        * travel_mode: None or string, one of the supported travel modes when using network service areas
    + **Example Buffered Location: ** 
    
    `pt = arcgis.geometry.Geometry({"x":-122.435,"y":37.785})
    buffered_area = BufferStudyArea(area=pt, radii=[1,2,3], units="Miles", overlap=False)` 

- **Network service areas** - `BufferStudyArea` also allows you to define drive time service areas around points as well as other advanced service areas such as walking and trucking.
    + **Example: **
    
    `pt = arcgis.geometry.Geometry({"x":-122.435,"y":37.785})
    buffered_area = BufferStudyArea(area=pt, radii=[1,2,3], units="Minutes", travel_mode="Driving")` 

- **Named statistical areas** - In all previous examples of different study area types, locations were defined as either points or polygons. Study area locations can also be passed as one or many named statistical areas. This form of study area lets you define an area as a standard geographic statistical feature, such as a census or postal area, for example, to obtain enrichment information for a U.S. state, county, or ZIP Code or a Canadian province or postal code. When the NamedArea instances should be combined together (union), a list of such NamedArea instances should constitute a study area in the list of requested study areas.
    + **Example:** 
    
    `usa.subgeographies.states['California'].zip5['92373']`
   
- **Polygon geometries** - Locations can be given as polygon geometries.
    + **Example Polygon geometry: ** 
    
    `arcgis.geometry.Geometry({"rings":[[[-117.185412,34.063170],[-122.81,37.81],[-117.200570,34.057196],[-117.185412,34.063170]]],"spatialReference":{"wkid":4326}})`


## Standard Named Geographies and their levels

In [None]:
# Discover named geographies and level of detail

sandiego_in_usa = usa.search('San Diego')
print("number of San Diego's in the US: " + str(len(sandiego_in_usa)))

# list a few of them
sandiego_in_usa[:160]

In [None]:
# let's put them on a map
usamap = gis.map('United States')
usamap

In [None]:
for sd in sandiego_in_usa:
    usamap.content.draw(sd.geometry)


In [None]:
# here are the geographic levels:
usa.levels

## Geoenrichment Examples, for different study areas

### Example 1: area around a stree address

In [None]:
# Enrich a 1-mile buffer around a street address:

enrich(study_areas=["9500 Gilman Drive La Jolla CA 92093"], data_collections=['Age'])


**If you are writing code for production: always a good idea to place enrich into try... except, and parameterize.**

In [None]:
import numpy as np

# Sample Study Areas and Enrichment Task
study_areas = ["9500 Gilman Drive La Jolla CA 92093"]
data_collections = ["Age"]

# Enrich data
try:
    enriched_data = enrich(
        study_areas=study_areas,
        data_collections=data_collections
    )
except Exception as e:
    print("Error during enrichment:", e)
    enriched_data = None



In [None]:
# Handle resulting dataframe
if enriched_data is not None:
    # Ensure no chained assignment occurs
    enriched_df = enriched_data.copy()
    enriched_df["SHAPE"] = enriched_df["SHAPE"].replace({np.nan: None})  # Explicitly replace NaN values
    display(enriched_df.head())
else:
    print("No enriched data returned.")



### Example 2: area around a city (???)

In [None]:
# Let's try it for a study area named "San Diego, CA:"
# Note that, in this case, too, it creates a 1-mile ring buffer around the center of the study area 


enrich(study_areas=["San Diego, CA"], data_collections=['transportation'])

In [None]:
# let's figure out what variables we obtained in this way:
df = usa.data_collections
df[df.index.values == "transportation"]

### Example 3: compare data by several proposed locations

In [None]:
itm_id = "379bdcc3f34b4407bef1135956edcf4b"
candidate_df = (
    gis.content.get(itm_id).layers[0].query(out_fields="loc_id", as_df=True)
)

candidate_df

In [None]:
analysis_variables = [
    "TOTPOP_CY",  # Population: Total Population (Esri)
    "DIVINDX_CY",  # Diversity Index (Esri)
    "AVGHHSZ_CY",  # Average Household Size (Esri)
    "MEDAGE_CY",  # Age: Median Age (Esri)
    "MEDHINC_CY",  # Income: Median Household Income (Esri)
    "BACHDEG_CY",  # Education: Bachelor"s Degree (Esri)
]

analysis_variables

In [None]:
enrich_df = usa.enrich(candidate_df, enrich_variables=analysis_variables)
enrich_df

### Example 4: explore demographics along a route

In [None]:
from arcgis.geometry import Polyline
line = Polyline(
    {
        "paths": [[[-13048580, 4036370], [-13046151, 4036366]]],
        "spatialReference": {"wkid": 102100},
    }
)
enriched_line_df = enrich(study_areas=[line], data_collections=["Age"])

In [None]:
# Plot on a map
line_map = gis.map("Redlands, CA")
line_map

In [None]:
# Draw line
line_map.content.draw(line)

# Plot enriched area around line
enriched_line_df.spatial.plot(line_map)

### Example 5: Constructing study areas (BufferedStudyArea) and adding features to them

#### As non-overlapping disks of a given radii

Suppose you are trying to explore how close to UCSD students tend to live. You may look at it by exploring proportion of student-age population within different distance buffers from UCSD.

The example below creates non-overlapping disks of radii 1, 3 and 5 Miles respectively from a street address and enriches these using the 'Age' data collection.

In [None]:
buffered = BufferStudyArea(area='UCSD, La Jolla, CA 92093',
                           radii=[1,3,5], units='Miles', overlap=False)
drive_dist_df = enrich(study_areas=[buffered], data_collections=["Age"])
drive_dist_df

In [None]:
# Plot on a map
buffer_map1 = gis.map("La Jolla, CA 92037")
buffer_map1.basemap.basemap = "dark-gray-vector"
buffer_map1

In [None]:
drive_dist_df.spatial.plot(map_widget=buffer_map1)


In [None]:
buffer_map1.zoom_to_layer(drive_dist_df)

#### As drive-time buffers for given drive-times (or walk times, or truck time)


In [None]:
# Create 5, 10, 15, 20 and 25 minute drive times from a location and enrich these using the 'Age' data collection.
buffered2 = BufferStudyArea(area='700 Prospect Street, La Jolla, CA', 
                           radii=[5, 10, 15, 20, 25], units='Minutes', 
                           travel_mode='Driving')
buffered2

In [None]:
# We can explore what other travel modes are supported:

usa.travel_modes

In [None]:
# let's enrich these buffers, and explore the content 
drive_time_df = enrich(study_areas=[buffered2], data_collections=['Age'])
drive_time_df

In [None]:
# now, we'll show it on a map and then create a feature layer in AGOL:

# Step 1: create a Spatially-Enabled DataFrame (SEDF) with the enriched data
drive_time_sedf = drive_time_df.spatial


In [None]:
# spatial objects here are polygons, let's make sure:

drive_time_sedf.geometry_type

In [None]:
# Step 2: show these drive-time buffers on a map
map4 = gis.map('700 Prospect Street, La Jolla, CA')
map4.zoom = 11
map4

In [None]:
map4.basemap.basemap = "gray-vector"
drive_time_df.spatial.plot(map_widget=map4)

In [None]:
# Step 4: now we can save it as a feature layer, and publish on AGOL
drivetime_lj_fl= drive_time_sedf.to_featurelayer(title='drive_from_art_museum', gis=gis, tags='sample', sanitize_columns=True)

Think of other situations where you may use this approach:
 - when you are locating a retail store: what factors you may be interested in?
 - when you are deciding whether to add staff to a nursing home: what factors are you looking at?
 - when you decide which polling places to relocate or consolidate?

In [None]:
pd.set_option('display.max_columns', None)
drive_time_df

## Working with named statistical areas: counties, zip codes, etc ("standard geographies")

More info at https://developers.arcgis.com/python/guide/part3-where-to-enrich-named-stat-areas/

In [None]:
# Get the Geographic Level
usa.levels

### First, enrich a single zip5 area

In [None]:
# get a single zip code. 
zip92122 = usa.subgeographies.states['California'].zip5['92122']

In [None]:
zip92122_enriched = enrich(study_areas=[zip92122], data_collections=['Age'] )
zip92122_enriched

### Second, enrich all zip codes or tracts in San Diego county

In [None]:
# Define variables to use in enrichment

# from ESRI page example, https://developers.arcgis.com/python/guide/part7-discover-and-enrich-standard-geographies/

enrich_vars = (
    usa.enrich_variables[
        (usa.enrich_variables.name.str.lower().str.contains("cy"))
        & (
            (usa.enrich_variables.data_collection == "occupation")
            | (usa.enrich_variables.data_collection == "Wealth")
            | (usa.enrich_variables.data_collection == "financial")
            | (usa.enrich_variables.data_collection == "educationalattainment")
            | (usa.enrich_variables.data_collection == "language")
            | (usa.enrich_variables.data_collection == "healthinsurancecoverage")
            | (usa.enrich_variables.data_collection == "veterans")
            | (usa.enrich_variables.data_collection == "yearmovedin")
            | (usa.enrich_variables.data_collection == "yearbuilt")
            | (usa.enrich_variables.data_collection == "population")
            | (usa.enrich_variables.data_collection == "occupation")
            | (usa.enrich_variables.data_collection == "housingcosts")
        )
    ]
    .drop_duplicates("name")
    .reset_index(drop=True)
)

In [None]:
# # my example, using all variables from a data_collection

# enrich_vars = (
#     usa.enrich_variables[
#         (usa.enrich_variables.data_collection == "Age")
#     ]
#     .reset_index(drop=True)
# )

In [None]:
enrich_vars

In [None]:
# get all zip codes in San Diego county
sd_zips = usa.subgeographies.states['California'].counties['San_Diego_County'].zip5
sd_zips

In [None]:
# or, get all tracts:
sd_tracts = usa.subgeographies.states['California'].counties['San_Diego_County'].tracts
sd_tracts

In [None]:
# The best practice is to submit a list of study areas to the enrich operation, so we convert to a list
# Though the enrich operation can also take a dict (output of subgeographies)

sd_list=list(sd_tracts.values())


In [None]:
sd_list

In [None]:
# Let's use a subset of variables:

enrich_vars_subset = (
    usa.enrich_variables[
        (usa.enrich_variables.name.str.lower().str.contains("cy"))
        & (
            (usa.enrich_variables.data_collection == "homevalue")
        )
    ]
    .drop_duplicates("name")
    .reset_index(drop=True)
)

In [None]:
enrich_vars_subset

In [None]:

# Option 1: using subgeographics
# sd_tracts_enriched = enrich(study_areas=sd_tracts, data_collections=['Age'])
sd_tracts_enriched = enrich(study_areas=sd_tracts, analysis_variables=enrich_vars_subset)

# Option 2: using a list of area names
# sd_tracts_enriched = enrich(study_areas=sd_list, data_collections=['Age'])

sd_tracts_enriched

### Enriching a spatially-enabled dataframe 

We've done this earlier: examine how SEDF named "drivetime_sedf" (based on driving time from La Jolla Art Museum) was enriched with demographic data.

Many more examples of this can be constructed or found.

In [None]:
# Here is a failed examples of geenriching a subset of water bodies...

In [None]:
import os
data_location = os.environ["HOME"]+"/public/datasets/"  # in the shared datahub 

In [None]:
shpFileIn = data_location + 'california/water/california_water.shp'
water_sdf = GeoAccessor.from_featureclass(shpFileIn)
water_sdf

In [None]:
# Let's create a small subset with 10 named water bodies, for geoenrichment 

# Filter rows with non-empty NAME
filtered_sedf = water_sdf[water_sdf['NAME'].notnull() & (water_sdf['NAME'] != '')]

# Extract the first 10 rows
water_subset = filtered_sedf.head(10)

# Ensure the SHAPE column is set as the geometry column
water_subset.spatial.set_geometry("SHAPE")

# Display the resulting SEDF
print(water_subset)


In [None]:
# Check that these are polygons

water_subset.spatial.geometry_type

In [None]:
# Check geometry types
geometry_types = water_subset['SHAPE'].apply(lambda geom: type(geom) if geom else None)
print(geometry_types.value_counts())


In [None]:
# now, trying to enrich these polygons...
enriched_water = enrich(study_areas=water_subset.spatial, 
       analysis_variables=["Age.FEM45","Age.FEM55","Age.FEM65"])

In [None]:
enriched_water

### How accurate are the numbers returned for such constructed polygons?

**ApportionmentConfidence** depends on three factors: 

1. **Reliability of the original census data**, on a 1-5 scale, based on 
    1. Census type: from 1 (Census - de jure - Complete Tabulation), to 5 - Sample Survey -de facto
    1. Census completeness: from 1 (final figure) to 4 (provisional figure with questionable reliability)
        1. consider 2021 ACS estimates as an example
    1. Age of Censis: from 1 (1-2 years) to 5 (9-10 years)
1. **Ratio of the population polygon to the number of people** (1.0 - 5.0 scale) (populationToPolygonSizeRating)
    1. The larger the area of a census tabulation area, the less likely the specific locations where people live can reliably be found. For large areas with relatively low populations, this means the likelihood of correctly locating where those people live is even lower. 
1. **Complexity of settlement footprint relative to NoData and zero population cells** (1.0 to 5.0)
    1. Based on Landsat8 panchromatic imagery (15m resolution). When levels of texture are sufficiently high, the likelihood that it represents human settlement is high. However, because this model is largely completed using raster data, underestimation of the footprint edges occur due to resampling. The amount of area is proportional to the complexity of the (raster) human settlement footprint. Complexity is measured as the sum of distances from a given cell to all NoData cells within 8 kilometers (this figure is then scaled to 1.0 to 5.0).
    
Problem is when study area spans more than one country. Then both populationToPolygonSizeRating and ApportionmentConfidence are NULL.

More about Data Apportionment see https://developers.arcgis.com/rest/geoenrichment/api-reference/data-apportionment.htm


**When an area for geoenrichment is near an international border - be very careful!**

In [None]:
from arcgis.geometry import Point

# Point in Mexico: find how many people live within 9-miles from this point

pt = Point({"x" : -116.6269, "y" : 32.5766, "spatialReference" : {"wkid" : 4326}}) 
enrich(study_areas=[pt], data_collections=['KeyGlobalFacts'], proximity_value='9', return_geometry=False)

In [None]:
type(pt)

In [None]:
# Moving that point very slightly north (by 11 meters) results in a point in the US:

pt = Point({"x" : -116.6269, "y" : 32.5767, "spatialReference" : {"wkid" : 4326}}) 

enrich(study_areas=[pt], data_collections=['KeyGlobalFacts'], proximity_value='9', return_geometry=False)

Essentially identical areas result in geoenrichment that shows 83 thousand total population when the center is in Mexico, and about 1500 total pop when the center is in the US.

Most data collections won't work across borders anyway. The KeyGlobalFacts collection exists for all countries - but gives incorrect results when geoenrichment area crosses the border.



# Organizing spatial data for analysis
1. Feature layers, feature sets, feature collections, and more...
2. What you can do with them


## Feature Layers, Feature Collections, Feature Sets, Feature Services... 

Terminology may be daunting...

The __feature layer__ is the primary concept for working with features in a GIS. Users create, import, export, analyze, edit, and visualize features, i.e. “entities in space” as feature layers.

Feature layers can be added to and visualized using maps. They act as inputs to and outputs from feature analysis tools.

Feature layers are created by publishing feature data to a GIS, and are exposed as a broader resource (Item) in the GIS. 
__Feature layer instances__ can be obtained through the __layers__ attribute on __feature layer collection__ items in the GIS. A __feature layer collection__ is a collection of feature layers and tables, with the associated relationships among the entities. A feature layer collection is backed by a [feature service](http://server.arcgis.com/en/server/latest/publish-services/windows/what-is-a-feature-service-.htm) in a web GIS.

You will work with several feature services in MP3, to retrieve current data from several sources



### Find a feature layer collection online

In [None]:
# let's find some feature layer and explore it. 
# Note that "feature layer collection" can be "a group feature layer":
# these may include layers at different levels of resolutions, shown with different symbols, etc.

# Search for freeways:

search_results = gis.content.search('title: USA Freeway System',
                                    'Feature Layer', outside_org=True)

# Access the first item that's returned: this is a 'feature layer collection'
freeways = search_results[0]

freeways


### Find AGOL ID an URL for a feature layer collection

In [None]:
# when you open the feature layer collection in AGOL, notice the ID of that layer collection (in address bar)

# this unique ID can be also retrieved through Python:
print(freeways.id)

# also, feature layers can be accessed via service URLs

print(freeways.url)


### Retrieve a layer collection from AGOL by ID

In [None]:
# here is how you can retrieve the layer collection by ID:

my_new_freeways = gis.content.get('91c6a5f6410b4991ab0db1d7c26daacb')
my_new_freeways

# When you submit MP3, MP4, and final projects, 
# referencing feature layer collections/services with gis.content.get will be the safest option
# They must be shared so that instructors can access them.

### Use the "layers" property to access individual layers in a collection

In [None]:
# this is a "feature layer collection" - so we can discover individual layers via the layers property:
freeways.layers 


In [None]:
# There are two layers here! Why?
#
#
#
#   YOUR THOUGHTS?
#

for lyr in freeways.layers:
    print(lyr.properties.name)

In [None]:
# each layer has properties, including name, e.g:

for lyr in freeways.layers:
    print(lyr.properties.name + " :::  " + lyr.properties.description)

In [None]:
# and here are the fields in the first layer:
for f in freeways.layers[0].properties.fields:
    print(f['name'])


In [None]:
# you can also see how the layer will be rendered (layer properties include rendering information):
print(freeways.layers[0].properties.drawingInfo.renderer) 

### Feature services

In [None]:
# Now, let's look at an example of a __feature service__

# Feature Service: serves a collection of feature layers and tables, 
# with the associated relationships among the entities. 
# following the example with freeways: 

from arcgis.features import FeatureLayerCollection

serviceURL = freeways.url
# or:
serviceURL = 'https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Freeway_System/FeatureServer'

flc_from_url= FeatureLayerCollection(serviceURL, gis=gis)
flc_from_url.layers


In [None]:
# Let's try another service:

from arcgis.features import FeatureLayerCollection
fs_url= 'https://services1.arcgis.com/eGSDp8lpKe5izqVc/arcgis/rest/services/Brewery_Locations_in_San_Diego_WFL1/FeatureServer'
breweries = FeatureLayerCollection(fs_url)

In [None]:
breweries.layers # shows layers in the service

In [None]:
breweries.tables # shows tables in the service

In [None]:
# we can also look at properties of each layer or table
breweries.layers[0].properties

In [None]:
# it can also show what operations are possible over this layer:

print(breweries.layers[0].properties.capabilities) 

# or how it is to be rendered:

print(breweries.layers[0].properties.drawingInfo.renderer.type) 


In [None]:
# we can explore the same from AGOL UI:
comm_points = gis.content.get('e435c0dd31c3447db9503272edf7abf0')
comm_points

### Properties of layers. From layers to featuresets, to determine CRS

In [None]:
# How can we determine CRS of a layer (called "spatial_reference"). 

# By converting it to a featureset - using query() without parameters -  and then retrieving its spatial reference
# note that "spatial reference" is a property of a featureset, but not of a layer

query_result1 = breweries.layers[0].query()
type(query_result1)

In [None]:
query_result1.spatial_reference

# see more about spatial reference at https://developers.arcgis.com/web-map-specification/objects/spatialReference/
# latestWkid:: the current Well-Known ID
# wkid:: wkid originally assigned to geometry objects


In [None]:
# change spatial reference:
breweries_fs_4326 = breweries.layers[1].query(out_sr=4326)
breweries_fs_4326.spatial_reference

In [None]:
# now you get breweries as SEDF in 4326:

breweries_sedf_4326 = breweries_fs_4326.sdf
breweries_sedf_4326

### Selecting records in a layer

In [None]:
# featurelayer.query() can be used to filter records, e g.

sedf1 = breweries.layers[1].query(where="TYPE='Brewery'").sdf
sedf1

# query() can be used to execute arbitrary SQL statements (including distance-based queries)

# notice that featureset.sdf will generate a SEDF

### Another way to create a SEDF from a feature layer

In [None]:
# alternatively, it can be converted to a spatially-enabled dataframe directly from feature layer

sedf1 = pd.DataFrame.spatial.from_layer(breweries.layers[1])
sedf1


In [None]:
# Convert back from a SEDF into a feature layer, and publishing on AGOL

my_new_featurelayer = sedf1.spatial.to_featurelayer(title="my sample fl", gis=gis, tags='sample')

## Mapping feature layers and SEDFs


In [None]:
# Mapping a feature layer
m = gis.map("USA")
m.content.add(freeways)
m



In [None]:
# mapping a SEDF
# fw_sdf = pd.DataFrame.spatial.from_layer(freeways.layers[0])
m1= gis.map("San Diego, CA")
m1


In [None]:
# here is the simplest way to add a SEDF to a map (we'll use the Breweries)
sedf1.spatial.plot(map_widget=m1)

# In an earlier version of ArcGIS Python API, only a limited number of vector features
# could be drawn in a map widget! Now it appears fixed, but better use feature layers 
# and not SEDF when you have many features.


In [None]:
# what would happen if you don't add .spatial to the dataframe?

%matplotlib inline
sedf1.plot()   # not too useful

In [None]:

# Figure out how to draw different symbols by reading https://developers.arcgis.com/python/latest/guide