# Advanced Topics

The information in this section provides a brief introduction to advanced topics with the `Spatial DataFrame` structure.  

One of the most important tasks for any software is to quickly retrieve and process information. Enterprise systems, whether storing GIS information or not, all utilize the concept of indexing to allow for quick searching through large data stores to locate and select specific information. 

This document will outline how to access data from a Spatial Dataframe using row and column indexing and also demonstrate building a spatial index on the geometries in a dataframe to aid in selecting features based on their location.

 * [Dataframe Index](#Dataframe-Index)
 * [Spatial Index](#Spatial-Index)
 * [Spatial Joins](#Spatial-Joins)
   * [Example: Merging State Statistics with Cities](#Example:-Merging-State-Statistics-Information-with-Cities)

## Dataframe Index



In [99]:
from arcgis.gis import GIS

In [178]:
from arcgis import GIS
item = GIS().content.get("85d0ca4ea1ca4b9abf0c51b9bd34de2e")
flayer = item.layers[0]
df = flayer.query(where="AGE_45_54 < 1500").df
df.head()

Unnamed: 0,AGE_10_14,AGE_15_19,AGE_20_24,AGE_25_34,AGE_35_44,AGE_45_54,AGE_55_64,AGE_5_9,AGE_65_74,AGE_75_84,...,PLACEFIPS,POP2010,POPULATION,POP_CLASS,RENTER_OCC,ST,STFIPS,VACANT,WHITE,SHAPE
0,1413,1381,1106,2138,1815,1411,979,1557,525,307,...,468080,14287,14980,6,1074,AZ,4,261,9196,"{'x': -12768343.256613126, 'y': 3842463.708135..."
1,727,738,677,1380,1185,1333,1087,740,661,444,...,602042,9932,10239,6,2056,CA,6,267,8273,"{'x': -13613950.337588644, 'y': 4931686.754090..."
2,593,511,2323,2767,746,127,34,1229,4,2,...,610561,10616,11869,6,2558,CA,6,296,7530,"{'x': -13066582.116550362, 'y': 3925650.676616..."
3,888,988,900,1729,1479,1443,959,766,514,280,...,613560,10866,11195,6,761,CA,6,86,5898,"{'x': -13123874.446103057, 'y': 4044249.710416..."
4,1086,1228,1013,1822,1759,1478,1112,925,687,477,...,614974,12823,13009,6,1763,CA,6,88,6930,"{'x': -13151212.145276317, 'y': 4027601.332347..."


In [188]:
df.sort_values(by='POP2010', ascending=False)[['OBJECTID', 'NAME', 'ST', 'POP2010', 'AVE_FAM_SZ']]

Unnamed: 0,OBJECTID,NAME,ST,POP2010,AVE_FAM_SZ
65,1709,The Villages,FL,51442,2.05
292,3274,State College,PA,42034,2.71
110,792,Sun City,AZ,37499,2.11
29,1252,Fort Hood,TX,29589,3.75
303,3431,Rexburg,ID,25484,3.17
111,793,Sun City West,AZ,24535,2.05
220,2390,Athens,OH,23832,2.74
11,198,Isla Vista,CA,23096,2.94
129,930,Eagle Mountain,UT,21415,4.34
107,767,Green Valley,AZ,21391,2.08


In [121]:
df.columns

Index(['AGE_10_14', 'AGE_15_19', 'AGE_20_24', 'AGE_25_34', 'AGE_35_44',
       'AGE_45_54', 'AGE_55_64', 'AGE_5_9', 'AGE_65_74', 'AGE_75_84',
       'AGE_85_UP', 'AGE_UNDER5', 'AMERI_ES', 'ASIAN', 'AVE_FAM_SZ',
       'AVE_HH_SZ', 'BLACK', 'CAPITAL', 'CLASS', 'FAMILIES', 'FEMALES',
       'FHH_CHILD', 'FID', 'HAWN_PI', 'HISPANIC', 'HOUSEHOLDS', 'HSEHLD_1_F',
       'HSEHLD_1_M', 'HSE_UNITS', 'MALES', 'MARHH_CHD', 'MARHH_NO_C',
       'MED_AGE', 'MED_AGE_F', 'MED_AGE_M', 'MHH_CHILD', 'MULT_RACE', 'NAME',
       'OBJECTID', 'OTHER', 'OWNER_OCC', 'PLACEFIPS', 'POP2010', 'POPULATION',
       'POP_CLASS', 'RENTER_OCC', 'ST', 'STFIPS', 'VACANT', 'WHITE', 'SHAPE'],
      dtype='object')

In [42]:
df.loc[4][['NAME', 'ST', 'POP2010', 'MED_AGE']]

NAME       Commerce
ST               CA
POP2010       12823
MED_AGE        31.1
Name: 4, dtype: object

In [152]:
df.loc[298]['SHAPE']

{'x': -8538585.881145325,
 'y': 4476193.137276442,
 'spatialReference': {'wkid': 102100, 'latestWkid': 3857}}

## Spatial Index
Spatial indexes are based on the concept of a minimum bounding rectangle - the smallest rectangle that contains the entire geometric shape. Querying rectangles is an inexpensive operation when analyzing geometries relative to querying an entire feature array composed of numerous coordinate pairs. Complex lines and irregularly-shaped polygons can easily be queried to analyze relationships between features through different kinds of spatial indexes.   

The Spatial DataFrame uses an implementation of spatial indexing known as [QuadTree indexing](https://en.wikipedia.org/wiki/Quadtree), which breaks down a dataset into nodes that have zero or four children and searches these nodes when determining locations, relationships and attributes of specific features. In the [**Examining Feature Layer content**](#Example:-Examining-Feature-Layer-content) section of this notebook, the USA Major Cities feature layer was queried and the `df` method was called on the results to create a data frame. The [`sindex`](https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.features.html?highlight=style#arcgis.features.SpatialDataFrame.sindex) method on the `df` creates a quad tree index.

The `intersect` method of the resulting index takes a bounding box as input (4 coordinates representing the minimum and maximum x,y coordinate pairs) and returns a list of features that intersect that bounding box.

In [None]:
si = df.sindex
index = si.intersect([-13043219.122301877, 3911134.034258818, 
                      -13243219.102301877, 4111134.0542588173])

In [None]:
df.iloc[index]

## Spatial Joins

A Spatial join is a GIS operation that affixes data from one feature layer’s attribute table to another based on a spatial relationship.

The spatial join involves matching rows from the Join Features (data frame1) to the Target Features (data frame2) based on their relative spatial locations.  

#### Example: Merging State Statistics Information with Cities

The goal is to get Wyoming's city locations and census data joined with Wymoing's state census data.
> If you do not access to the `ArcPy` site-package from the Python interpreter used to execute the following cells, you must authenticate to an ArcGIS Online Organization or ArcGIS Enterprise portal.

> g3 = GIS("https://www.arcgis.com", "username", "password")

In [None]:
g2 = GIS("https://pythonapi.playground.esri.com/portal", "arcgis_python", "amazing_arcgis_123")

In [None]:
from arcgis.features import SpatialDataFrame

In [None]:
import os
#data_pth = r'/path/to/your/data/census_2010/example'
data_pth = r"/Volumes/Data/My_Projects/Python_API/notebooks/data/census_2010/"
cities = r"cities.shp"
states = r"states.shp"

In [None]:
sdf_target = SpatialDataFrame.from_featureclass(os.path.join(data_pth, cities))
sdf_join = SpatialDataFrame.from_featureclass(os.path.join(data_pth, states))

We will use python's list comprehensions to create lists of the attribute columns in the dataframe, then print out the lists to see the names of all the attribute columns.

In [None]:
sdf_target_cols = [column for column in sdf_target.columns]
sdf_join_cols = [column for column in sdf_join.columns]

Print out a list of columns in the `sdf_target` dataframe created from the cities shapefile:

In [None]:
for a,b,c,d in zip(sdf_target_cols[::4],sdf_target_cols[1::4],sdf_target_cols[2::4], sdf_target_cols[3::4]):
    print("{:<30}{:<30}{:<30}{:<}".format(a,b,c,d))

Print out a list of columns in the `sdf_join` dataframe created from the states shapefile:

In [None]:
for a,b,c,d,e in zip(sdf_join_cols[::5],sdf_join_cols[1::5],sdf_join_cols[2::5],sdf_join_cols[3::5],sdf_join_cols[4::5]):
    print("{:<20}{:<20}{:<20}{:<20}{:<}".format(a,b,c,d,e))

Create a dataframe for the cities in Wyoming:

In [None]:
q = sdf_target['ST'] == 'WY'
left = sdf_target[q].copy()
left.head()

Create a dataframe for the state of Wyoming:

In [None]:
q = sdf_join.STATE_ABBR == 'WY'
right = sdf_join[q].copy()
right.head()

Perform the spatial join:

In [None]:
from arcgis.features._data.geodataset.tools import spatial_join

In [None]:
sdf2 = spatial_join(df1=left, df2=right)
sdf2.head()