# GeoEnrichment

GeoEnrichment provides the ability to 
* get facts about a location or area. 
* information about the people, places, and businesses 
 * in a specific area or 
 * within a certain distance or drive time from a location.
* large collection of data sets including population, income, housing, consumer behavior, and the natural environment.
* Site analysis is a popular application

In [1]:
from arcgis.gis import GIS
from arcgis.geoenrichment import *

gis = GIS('https://www.arcgis.com', 'arcgis_python', 'P@ssword123')

## GeoEnrichment coverage

In [2]:
countries = get_countries()
print("Number of countries for which GeoEnrichment data is available: " + str(len(countries)))

#print a few countries for a sample
countries[0:10]

Number of countries for which GeoEnrichment data is available: 137


[<Country name:Albania>,
 <Country name:Algeria>,
 <Country name:Andorra>,
 <Country name:Angola>,
 <Country name:Argentina>,
 <Country name:Armenia>,
 <Country name:Aruba>,
 <Country name:Australia>,
 <Country name:Austria>,
 <Country name:Azerbaijan>]

### Filtering countries by properties

In [3]:
[c.properties.name for c in countries if c.properties.continent == 'Australia']

['Australia',
 'French Polynesia',
 'Indonesia',
 'Malaysia',
 'New Caledonia',
 'New Zealand',
 'Philippines']

## Discovering information for a country
* Data collections, 
* Sub-geographies and 
* Available reports for a country

In [5]:
usa = Country.get('US')

Commonly used properties for the country are accessible using `Country.properties`.

In [6]:
usa.properties.name

'United States'

### Data collections and analysis variables

In [7]:
df = usa.data_collections

# print a few rows of the DataFrame
df.head()

Unnamed: 0_level_0,analysisVariable,alias,fieldCategory,vintage
dataCollectionID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1yearincrements,1yearincrements.AGE0_CY,2017 Population Age <1,2017 Age: 1 Year Increments (Esri),2017
1yearincrements,1yearincrements.AGE1_CY,2017 Population Age 1,2017 Age: 1 Year Increments (Esri),2017
1yearincrements,1yearincrements.AGE2_CY,2017 Population Age 2,2017 Age: 1 Year Increments (Esri),2017
1yearincrements,1yearincrements.AGE3_CY,2017 Population Age 3,2017 Age: 1 Year Increments (Esri),2017
1yearincrements,1yearincrements.AGE4_CY,2017 Population Age 4,2017 Age: 1 Year Increments (Esri),2017


In [8]:
# call the shape property to get the total number of rows and columns
df.shape

(14609, 4)

In [49]:
# get all the unique data collections available
len(df.index.unique())

141

Query the `Age` data collection and get all the unique `analysisVariable`s under that collection

In [10]:
df.loc['Age']['analysisVariable'].unique()

array(['Age.MALE0', 'Age.MALE5', 'Age.MALE10', 'Age.MALE15', 'Age.MALE20',
       'Age.MALE25', 'Age.MALE30', 'Age.MALE35', 'Age.MALE40',
       'Age.MALE45', 'Age.MALE50', 'Age.MALE55', 'Age.MALE60',
       'Age.MALE65', 'Age.MALE70', 'Age.MALE75', 'Age.MALE80',
       'Age.MALE85', 'Age.FEM0', 'Age.FEM5', 'Age.FEM10', 'Age.FEM15',
       'Age.FEM20', 'Age.FEM25', 'Age.FEM30', 'Age.FEM35', 'Age.FEM40',
       'Age.FEM45', 'Age.FEM50', 'Age.FEM55', 'Age.FEM60', 'Age.FEM65',
       'Age.FEM70', 'Age.FEM75', 'Age.FEM80', 'Age.FEM85'], dtype=object)

In [11]:
# view a sample of the `Age` data collection
df.loc['Age'].head()

Unnamed: 0_level_0,analysisVariable,alias,fieldCategory,vintage
dataCollectionID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Age,Age.MALE0,2017 Males Age 0-4,2017 Age: 5 Year Increments (Esri),2017
Age,Age.MALE5,2017 Males Age 5-9,2017 Age: 5 Year Increments (Esri),2017
Age,Age.MALE10,2017 Males Age 10-14,2017 Age: 5 Year Increments (Esri),2017
Age,Age.MALE15,2017 Males Age 15-19,2017 Age: 5 Year Increments (Esri),2017
Age,Age.MALE20,2017 Males Age 20-24,2017 Age: 5 Year Increments (Esri),2017


### Enriching an address

In [23]:
enrich(study_areas=["380 New York St Redlands CA 92373"],  
       data_collections=['Age'])

Unnamed: 0,FEM0,FEM10,FEM15,FEM20,FEM25,FEM30,FEM35,FEM40,FEM45,FEM5,...,OBJECTID,X,Y,aggregationMethod,areaType,bufferRadii,bufferUnits,bufferUnitsAlias,sourceCountry,SHAPE
0,435,377,417,602,615,577,508,403,386,391,...,1,-117.19567,34.056488,BlockApportionment:US.BlockGroups,RingBuffer,1,esriMiles,Miles,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."


# Reports

In [12]:
# print a sample of the reports available for USA
usa.reports.head(10)

Unnamed: 0,id,title,categories,formats
0,census2010_profile,2010 Census Profile,[Demographics],"[pdf, xlsx]"
1,acs_housing,ACS Housing Summary,[Demographics],"[pdf, xlsx]"
2,acs_population,ACS Population Summary,[Demographics],"[pdf, xlsx]"
3,55plus,Age 50+ Profile,[Demographics],"[pdf, xlsx]"
4,agesexrace,Age by Sex by Race Profile,[Demographics],"[pdf, xlsx]"
5,agesex,Age by Sex Profile,[Demographics],"[pdf, xlsx]"
6,cex_auto,Automotive Aftermarket Expenditures,[Consumer Spending],"[pdf, xlsx]"
7,business_loc,Business Locator,[Business],"[pdf, xlsx]"
8,business_summary,Business Summary,[Business],"[pdf, xlsx]"
9,community_profile,Community Profile,[Demographics],"[pdf, xlsx]"


In [13]:
# total number of reports available
usa.reports.shape

(49, 4)

### Creating Reports

In [14]:
report = create_report(study_areas=["380 New York Street, Redlands, CA"],
                     report="tapestry_profileNEW",
                     export_format="PDF", 
                     out_folder=r"c:\xc", out_name="esri_tapestry_profile.pdf")
report

'c:\\xc\\esri_tapestry_profile.pdf'

## Finding named statistical areas

Each country has several named statistical areas in a hierarchy of geography levels (such as states, counties, zip codes, etc).

In [24]:
%config IPCompleter.greedy=True

In [50]:
usa.subgeographies.states['California'].counties['San_Bernardino_County']

<NamedArea name:"San Bernardino County" area_id="06071", level="US.Counties", country="United States">

In [27]:
usa.subgeographies.states['California'].counties['San_Bernardino_County'].tracts['060710001.03']

<NamedArea name:"060710001.03" area_id="06071000103", level="US.Tracts", country="United States">

In [28]:
usa.subgeographies.states['California'].zip5['92373']

<NamedArea name:"Redlands" area_id="92373", level="US.ZIP5", country="United States">

The named areas can also be drawn on a map, as they include a `geometry` property.

In [29]:
m = gis.map('Redlands, CA', zoomlevel=11)
m

In [30]:
m.draw(usa.subgeographies.states['California'].zip5['92373'].geometry)

# Different geography levels for different country

In [31]:
india = Country.get('India')

In [32]:
india.subgeographies.states['Uttar_Pradesh'].districts['Baghpat'].subdistricts['Baraut']

<NamedArea name:"Baraut" area_id="09080001", level="IN.Subdistricts", country="India">

### Searching for named areas within a country

In [33]:
riversides_in_usa = usa.search('Riverside')
print("number of riversides in the US: " + str(len(riversides_in_usa)))

# list a few of them
riversides_in_usa[:10]

number of riversides in the US: 83


[<NamedArea name:"Riverside" area_id="147435", level="Cities", country="United States">,
 <NamedArea name:"Riverside" area_id="147436", level="Cities", country="United States">,
 <NamedArea name:"Riverside" area_id="147437", level="Cities", country="United States">,
 <NamedArea name:"Riverside" area_id="147438", level="Cities", country="United States">,
 <NamedArea name:"Riverside" area_id="147439", level="Cities", country="United States">,
 <NamedArea name:"Riverside" area_id="147440", level="Cities", country="United States">,
 <NamedArea name:"Riverside" area_id="147441", level="Cities", country="United States">,
 <NamedArea name:"Riverside" area_id="147442", level="Cities", country="United States">,
 <NamedArea name:"Riverside" area_id="147443", level="Cities", country="United States">,
 <NamedArea name:"Riverside" area_id="147444", level="Cities", country="United States">]

For instance, you can make a map of all the riversides in the US

In [34]:
usamap = gis.map('United States', zoomlevel=4)
usamap

In [35]:
for riverside in riversides_in_usa:
    usamap.draw(riverside.geometry)

#### Filtering named areas by geography level

In [30]:
[level['id'] for level in usa.levels]

['US.WholeUSA',
 'US.States',
 'US.DMA',
 'US.CD',
 'US.CBSA',
 'US.Counties',
 'US.CSD',
 'US.ZIP5',
 'US.Places',
 'US.Tracts',
 'US.BlockGroups']

In [29]:
usa.search(query='Riverside', layers=['US.Counties'])

[<NamedArea name:"Riverside County" area_id="06065", level="US.Counties", country="United States">]

## Finding businesses

In [36]:
businesses = find_businesses(search_string="Starbucks",
                          return_geometry=True,
                          spatial_filter={"Locations":["NY,TONAWANDA,14150",
                                                       "NJ, CAMDEN, 08102",
                                                       "KY,LOUISVILLE,40204",
                                                       "WA,SEATTLE,98108"]})
# print a sample
businesses

Unnamed: 0,ADDR,CITY,CONAME,EMPNUM,FRNCOD,HDBRCH,ISCODE,LOCNUM,LOC_NAME,NAICS,...,SIC,SOURCE,SQFTCODE,STATE,STATE_NAME,STATUS,STREET,ZIP,ZIP4,SHAPE
0,5963 CORSON AVE S # A-184,SEATTLE,STARBUCKS,20,4,2,,243104437,StreetAddress,72251505,...,581228,INFOGROUP,B,WA,Washington,M,CORSON AVE S,98108,2619,"{'y': 47.5483999998966, 'spatialReference': {'..."
1,406 PENN ST,CAMDEN,STARBUCKS,15,4,2,,243189941,StreetAddress,72251505,...,581228,INFOGROUP,A,NJ,New Jersey,M,PENN ST,8102,1400,"{'y': 39.9480999999659, 'spatialReference': {'..."
2,972 BAXTER AVE,LOUISVILLE,STARBUCKS,16,4,2,,210648473,PointAddress,72251505,...,581228,INFOGROUP,A,KY,Kentucky,M,BAXTER AVE,40204,2064,"{'y': 38.2401999999322, 'spatialReference': {'..."
3,326 PENN ST,CAMDEN,STARBUCKS,12,4,2,,683534911,StreetAddress,72251505,...,581228,INFOGROUP,A,NJ,New Jersey,M,PENN ST,8102,1410,"{'y': 39.9482999997207, 'spatialReference': {'..."
4,4502 12TH AVE S,SEATTLE,STARBUCKS,15,4,2,,714927777,PointAddress,72251505,...,581228,INFOGROUP,A,WA,Washington,M,12TH AVE S,98108,1805,"{'y': 47.5633000002997, 'spatialReference': {'..."


## Study Areas

### Accepted forms of study areas

- **Street address locations** - Locations can be passed as strings of input street addresses, points of interest or place names.
    + **Example:** `"380 New York St, Redlands, CA"`

- **Multiple field input addresses** - Locations described as multiple field input addresses, using dictionaries.
    + **Example:** 
        {"Address" : "380 New York Street",
        "City" : "Redlands",
        "Region" : "CA",
        "Postal" : 92373}    
 
- **Point and line geometries** - Point and line locations, using `arcgis.geometry` instances.
    + **Example Point Location: ** 
    
    `arcgis.geometry.Geometry({"x":-122.435,"y":37.785})`
    
    + ** Example Point location obtained using find_businesses() above: **
     
     `arcgis.geometry.Geometry(businesses.iloc[0]['SHAPE'])`

- **Buffered study areas** - `BufferStudyArea` instances to change the ring buffer size or create drive-time service areas around points specified using one of the above methods. BufferStudyArea allows you to buffer point and street address study areas. They can be created using the following parameters:
        * area: the point geometry or street address (string) study area to be buffered
        * radii: list of distances by which to buffer the study area, eg. [1, 2, 3]
        * units: distance unit, eg. Miles, Kilometers, Minutes (when using drive times/travel_mode)
        * overlap: boolean, uses overlapping rings/network service areas when True, or non-overlapping disks when False
        * travel_mode: None or string, one of the supported travel modes when using network service areas
    + **Example Buffered Location: ** 
    
    `pt = arcgis.geometry.Geometry({"x":-122.435,"y":37.785})
    buffered_area = BufferStudyArea(area=pt, radii=[1,2,3], units="Miles", overlap=False)` 

- **Network service areas** - `BufferStudyArea` also allows you to define drive time service areas around points as well as other advanced service areas such as walking and trucking.
    + **Example: **
    
    `pt = arcgis.geometry.Geometry({"x":-122.435,"y":37.785})
    buffered_area = BufferStudyArea(area=pt, radii=[1,2,3], units="Minutes", travel_mode="Driving")` 

- **Named statistical areas** - 
    + **Example:** 
    
    `usa.subgeographies.states['California'].zip5['92373']`
   
- **Polygon geometries** - Locations can given as polygon geometries.
    + **Example Polygon geometry: ** 
    
    `arcgis.geometry.Geometry({"rings":[[[-117.185412,34.063170],[-122.81,37.81],[-117.200570,34.057196],[-117.185412,34.063170]]],"spatialReference":{"wkid":4326}})`


### Example: Enriching a named statistical area
Enriching zip code 92373 in California using the 'Age' data collection:

In [37]:
redlands = usa.subgeographies.states['California'].zip5['92373']

In [38]:
enrich(study_areas=[redlands], data_collections=['Age'] )

Unnamed: 0,FEM0,FEM10,FEM15,FEM20,FEM25,FEM30,FEM35,FEM40,FEM45,FEM5,...,MALE75,MALE80,MALE85,OBJECTID,StdGeographyID,StdGeographyLevel,StdGeographyName,aggregationMethod,sourceCountry,SHAPE
0,871,931,962,1123,1147,1206,1126,1041,1098,891,...,477,317,362,1,92373,US.ZIP5,Redlands,Query:US.ZIP5,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."


### Example: Enrich all counties in a state

In [39]:
ca_counties = usa.subgeographies.states['California'].counties

In [40]:
counties_df = enrich(study_areas=ca_counties, data_collections=['Age'])
counties_df.head(10)

Unnamed: 0,FEM0,FEM10,FEM15,FEM20,FEM25,FEM30,FEM35,FEM40,FEM45,FEM5,...,MALE75,MALE80,MALE85,OBJECTID,StdGeographyID,StdGeographyLevel,StdGeographyName,aggregationMethod,sourceCountry,SHAPE
0,1039,1027,944,917,965,898,840,795,835,1004,...,349,217,208,1,6021,US.Counties,Glenn County,Query:US.Counties,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."
1,6438,7053,9401,12114,8568,7964,7310,6905,7450,6820,...,4059,2744,2883,2,6079,US.Counties,San Luis Obispo County,Query:US.Counties,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."
2,252,264,242,215,215,230,217,241,285,279,...,175,104,103,3,6049,US.Counties,Modoc County,Query:US.Counties,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."
3,2465,2515,2416,2401,2592,2666,2560,2409,2488,2465,...,1277,746,751,4,6045,US.Counties,Mendocino County,Query:US.Counties,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."
4,20668,18942,17445,17438,18597,16683,14568,13010,12625,19489,...,3872,2410,2292,5,6107,US.Counties,Tulare County,Query:US.Counties,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."
5,3458,3423,3102,3179,3518,3412,3086,2858,2881,3453,...,1142,775,638,6,6101,US.Counties,Sutter County,Query:US.Counties,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."
6,2065,2079,1963,1911,2025,1851,1733,1691,1922,2065,...,1009,652,531,7,6103,US.Counties,Tehama County,Query:US.Counties,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."
7,4733,5125,5247,5425,5590,5269,4941,4839,5355,4847,...,2882,1770,1710,8,6089,US.Counties,Shasta County,Query:US.Counties,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."
8,371,458,486,426,438,447,390,427,531,440,...,429,244,183,9,6063,US.Counties,Plumas County,Query:US.Counties,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."
9,21466,23726,20863,20830,23275,25123,26114,26147,27054,23197,...,9147,6126,6547,10,6081,US.Counties,San Mateo County,Query:US.Counties,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."


In [41]:
m = gis.map('California')
m

In [42]:
lyr = gis.content.import_data(df=counties_df, title="CA county population")

In [43]:
m.add_layer(lyr.layers[0], {'renderer': 'ClassedColorRenderer', 
                            'field_name':'FEM0'})

### Example: Using comparison levels

In [54]:
enrich(study_areas=[redlands], data_collections=['Age'], 
       comparison_levels=['US.Counties', 'US.States'])

  frame.set_value(index=idx, col=self._geometry_column_name, value=g)


Unnamed: 0,FEM0,FEM10,FEM15,FEM20,FEM25,FEM30,FEM35,FEM40,FEM45,FEM5,...,MALE75,MALE80,MALE85,OBJECTID,StdGeographyID,StdGeographyLevel,StdGeographyName,aggregationMethod,sourceCountry,SHAPE
0,871,931,962,1123,1147,1206,1126,1041,1098,891,...,477,317,362,1,92373,US.ZIP5,Redlands,Query:US.ZIP5,US,"{'rings': [[[-117.21461999975689, 34.065140000..."
1,84409,83941,82177,84307,91156,83313,76213,71910,71553,84041,...,26944,18151,16402,2,6065,US.Counties,Riverside County,Query:US.Counties,US,"{'spatialReference': {'wkid': 4326, 'latestWki..."
2,79477,76497,74857,82633,90066,79719,70374,65761,65744,77343,...,17677,10838,9276,3,6071,US.Counties,San Bernardino County,Query:US.Counties,US,"{'spatialReference': {'wkid': 4326, 'latestWki..."
3,1245148,1260883,1269825,1405639,1510193,1429495,1313060,1237282,1250328,1251583,...,408593,268305,266546,4,6,US.States,California,Query:US.States,US,"{'spatialReference': {'wkid': 4326, 'latestWki..."


### Example: Buffering locations using non overlapping disks 

The example below creates non-overlapping disks of radii 1, 3 and 5 Miles respectively from a street address and enriches these using the 'Age' data collection.

In [44]:
buffered = BufferStudyArea(area='380 New York St Redlands CA 92373',
                           radii=[1,3,5], units='Miles', overlap=False)
enrich(study_areas=[buffered], data_collections=['Age'])

Unnamed: 0,FEM0,FEM10,FEM15,FEM20,FEM25,FEM30,FEM35,FEM40,FEM45,FEM5,...,OBJECTID,X,Y,aggregationMethod,areaType,bufferRadii,bufferUnits,bufferUnitsAlias,sourceCountry,SHAPE
0,435,377,417,602,615,577,508,403,386,391,...,1,-117.19567,34.056488,BlockApportionment:US.BlockGroups,RingBufferBands,1,Miles,Miles,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."
1,1759,1809,2158,2280,2404,2369,2060,1908,1941,1786,...,2,-117.19567,34.056488,BlockApportionment:US.BlockGroups,RingBufferBands,3,Miles,Miles,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."
2,2790,2776,2628,3257,3596,2975,2538,2364,2453,2807,...,3,-117.19567,34.056488,BlockApportionment:US.BlockGroups,RingBufferBands,5,Miles,Miles,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."


### Example: Using drive times as study areas
    
The example below creates 5 and 10 minute drive times from a street address and enriches these using the 'Age' data collection.

In [45]:
buffered = BufferStudyArea(area='380 New York St Redlands CA 92373', 
                           radii=[5, 10], units='Minutes', 
                           travel_mode='Driving')
drive_time_df = enrich(study_areas=[buffered], data_collections=['Age'])

In [46]:
drive_time_df

Unnamed: 0,FEM0,FEM10,FEM15,FEM20,FEM25,FEM30,FEM35,FEM40,FEM45,FEM5,...,OBJECTID,X,Y,aggregationMethod,areaType,bufferRadii,bufferUnits,bufferUnitsAlias,sourceCountry,SHAPE
0,368,318,346,492,513,473,406,325,308,329,...,1,-117.19567,34.056488,BlockApportionment:US.BlockGroups,NetworkServiceArea,5,Minutes,Drive Time Minutes,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."
1,2326,2285,2679,3042,3135,2991,2607,2351,2363,2293,...,2,-117.19567,34.056488,BlockApportionment:US.BlockGroups,NetworkServiceArea,10,Minutes,Drive Time Minutes,US,"{'spatialReference': {'latestWkid': 4326, 'wki..."


### Visualize results on a map

The returned spatial dataframe can be visualized on a map as shown below:

In [47]:
redlands_map = gis.map('Redlands, CA')
redlands_map.basemap = 'dark-gray-vector'
redlands_map

In [48]:
redlands_map.draw(drive_time_df.to_featureset())

## Saving GeoEnrichment Results

In [48]:
gis.content.import_data(df=drive_time_df, title="Age statistics within 5,10 minutes of drive time from Esri")