# Tutorial 06-02: Working with Spatially Enabled DataFrames
Now let’s suppose that our colleagues at GeoNinjas PythonAnalytics are happy with the methodology in our summary, but have found that the neighborhoods aren’t granular enough for their desired analysis.  They’ve asked us to repeat our analysis, but summarize by census block group instead.  This is interesting because census block group is not attributed in our raw data.  We will have to perform a spatial join to attribute each 311 case with its corresponding census block group.

## Repeat Your Previous Analysis

This is one of the most rewarding things about coding in general.  You get to re-use your code and don’t have to rewrite all our summary logic.  You can copy the code from our previous exercise (minus some of the data exploration) and produce a clean dataset that is ready for summary.

In [1]:
import pandas
import arcgis

# read the 311 CSV
df_311 = pandas.read_csv("../Chapter 05 - Jupyter Notebooks/311_cases.csv")

# drop the DELETE columns
drop_cols = [c for c in df_311.columns if "DELETE" in c]
df_311 = df_311.drop(columns = drop_cols)

# exclude any records with invalid Latitude/Longitude
df_311 = df_311[df_311['Latitude'] > 0]

# convert Opened/Closed to datetime
df_311['Opened'] = pandas.to_datetime(df_311['Opened'])
df_311['Closed'] = pandas.to_datetime(df_311['Closed'])

# subtract the Opened time from the Closed time to get the OpenTime duration
df_311['OpenTime'] = df_311['Closed'] - df_311['Opened']

  df_311['Opened'] = pandas.to_datetime(df_311['Opened'])


## Work with Spatially Enabled DataFrames

#### 1. Create a Spatially Enabled DataFrame

To perform any spatial analysis, you’ll have to create some geometry to work with.  In this case, you’ll create a Spatially Enabled DataFrame (discussed in the previous chapter).  You’ll need to import the **arcgis** package first to access the spatial namespace/accessor.  Then you can use the `from_xy()` method that allows you to create geometry based on x/y or Latitude/Longitude columns, which you already have in the DataFrame.

In [2]:
import arcgis

df_311 = pandas.DataFrame.spatial.from_xy(
    df = df_311,
    x_column = 'Longitude', 
    y_column = 'Latitude',
    sr = 4326
)

If you look at the code above, you’re creating a new pandas DataFrame and overwriting the DataFrame previously stored by the `df_311` variable.  What you've added is a spatial component that we created based on the *Latitude* and *Longitude* columns.  This is functionality that wouldn’t be available to us if we hadn’t imported the ArcGIS API for Python in the previous line.  

Spatially enabled DataFrames will behave exactly like all the other DataFrames we’ve worked with at this point.  When you look at the top five rows by calling the head method, you’ll see one major difference though.  There is a column in this DataFrame called “Shape” that looks like it has dictionaries containing coordinates.  This is the Python representation of the geometries of our individual 311 cases.

In [3]:
df_311[['Latitude','Longitude','SHAPE']].head()

Unnamed: 0,Latitude,Longitude,SHAPE
0,37.776009,-122.4102,"{""spatialReference"": {""wkid"": 4326}, ""x"": -122..."
1,37.798813,-122.424171,"{""spatialReference"": {""wkid"": 4326}, ""x"": -122..."
3,37.793764,-122.407756,"{""spatialReference"": {""wkid"": 4326}, ""x"": -122..."
5,37.765979,-122.455519,"{""spatialReference"": {""wkid"": 4326}, ""x"": -122..."
6,37.752404,-122.415491,"{""spatialReference"": {""wkid"": 4326}, ""x"": -122..."


#### 2.  Read the Census Block Group Data

Now you can start working with spatial data.  You’ll use the previously mentioned *spatial* namespace that we get access to through the ArcGIS API for Python.  

You can start by using the `.from_featureclass()` method included in the spatial namespace to read a feature class in the included tutorial data.

In [4]:
df_cbg = pandas.DataFrame.spatial.from_featureclass(
    "./Tutorial_06_02.gdb/Census_Block_Groups"
)

Note that when you look at this resulting data, we see a *SHAPE* column.  The data in this shape column looks slightly different than inour 311 dataset.  This is because the 311 data is point data and this data is polygon data.  The geometry in this dataset is represented by a JSON object starting with the word “rings” and consisting of sets of coordinates of vertices.

In [5]:
df_cbg.head()

Unnamed: 0,OBJECTID,geoid,SHAPE
0,1,60750127001,"{""rings"": [[[-122.44958399999996, 37.803153001..."
1,2,60750127003,"{""rings"": [[[-122.44777499999998, 37.801007000..."
2,3,60750263021,"{""rings"": [[[-122.43791799999997, 37.711770000..."
3,4,60750302021,"{""rings"": [[[-122.47072999999995, 37.765779001..."
4,5,60750205003,"{""rings"": [[[-122.43916699899995, 37.759028001..."


#### 3. Spatially Join the two DataFrames

Now that you’ve got two Spatially Enabled DataFrames, you can perform a spatial join.  If you’ve used ArcGIS desktop software, then the spatial join is probably a pretty familiar concept.  If not, it’s similar to a table join, but joins records from two tables based on a spatial relationship.  In this case, we’re going to start with the 311 records as our left table and join our census block group records on the right.  You’ll be using a left join so to help ensure that all records are returned.

In [6]:
df_join = df_311.spatial.join(
    right_df = df_cbg,
    how = 'left',
    op = 'intersects',
)

#### 4.  Analyze the results of the join

After performing your join, it’s worth doing some QA/QC to ensure that the records joined correctly.  We would expect each of the 311 records to have a corresponding census block group.  The census block group attribute that we’re going to be using is *geoid*, so let’s see if there are any blank *geoid* in the resulting joined DataFrame.

You can use the built-in `pandas.isnull()` method to identify gaps in your data.  In this case, blanks in the geoid column would indicate that records didn't join with (or intersect with) the census block group dataset.

In [7]:
df_join[pandas.isnull(df_join.geoid)]

Unnamed: 0,CaseID,Opened,Closed,Updated,Status,Status Notes,Responsible Agency,Category,Request Type,Request Details,...,Central Market/Tenderloin Boundary Polygon - Updated,HSOC Zones as of 2018-06-05,OWED Public Spaces,Parks Alliance CPSI (27+TL sites),Neighborhoods,OpenTime,SHAPE,index_right,OBJECTID,geoid
9963,17580934,2023-11-22 08:26:00,NaT,11/22/2023 08:31:32 AM,Open,open,DPW BSSR Queue,Street Defects,Pavement_Defect,Pavement_Defect,...,,,,,,NaT,"{""spatialReference"": {""wkid"": 4326}, ""x"": -122...",,,


If everything went well, this would return no records.  In this case, it has returned a single record though.  If you were to just look at the attributes, we probably wouldn’t be able to figure out why that is.  You can plot the two source layers on a map, though.  That might give you a better idea of why one record wouldn’t join.

In [8]:
# create a GIS object
gis = arcgis.GIS()

# create a map and set our Area of Interest
qc_map = gis.map("San Francisco, CA")

# plot the census block groups
df_cbg.spatial.plot(
    colors="#fafafa",
    map_widget = qc_map,
)

# narrow down the 311 records to the one that didn't join
null_record = df_join[pandas.isnull(df_join.geoid)][['SHAPE','geoid']]

# plot the null record
null_record.spatial.plot(map_widget=qc_map,
                renderer_type='s',
                symbol_type='simple',
                symbol_style='d', # d - for diamonds
                colors='Blues',
                marker_size=16
)

qc_map

MapView(layout=Layout(height='400px', width='100%'))

In this case, you can see that the record that didn't join is outside the city/county boundary.  It's probably safe to ignore that for the purposes of this analysis

## Summarize the Data

#### 1. Use pandas methods to summarize by census block group

Similarly to the first exercise, you can use built-in pandas methods to summarize by the census block group.  In the first tutorial, you summarized by Neighborhood.  In this case, you'll use the same syntax and change the column you group by.

In [9]:
df_cbg_summary = df_join.groupby("geoid").agg(
    {
        "OpenTime": "mean",
        "CaseID": "count"
    }
)

#### 2.  Convert OpenTime to a number of days.

This summary works, but if you want to map this you’ll have to convert the *OpenTime* column into something numeric.  It’s currently a *timedelta[ns]* column, which you can see by looking at the DataFrame’s dtypes.

In [10]:
df_cbg_summary.dtypes

OpenTime    timedelta64[ns]
CaseID                int64
dtype: object

Now, you'll use some built-in pandas date methods properties to convert the timedelta object to a number of days

In [11]:
df_cbg_summary['OpenTime'] = df_cbg_summary['OpenTime'].dt.days

Now if you look at the DataFrame’s dtypes, you can see that the OpenTime column is an integer that you can use for mapping purposes.

In [12]:
df_cbg_summary.head()

Unnamed: 0_level_0,OpenTime,CaseID
geoid,Unnamed: 1_level_1,Unnamed: 2_level_1
60750101011,0,227
60750101012,5,31
60750101021,1,65
60750102011,1,70
60750102012,1,15


#### 3. Join the Summary with the Polygon DataFrame

At this point, you’ve created a summary DataFrame that doesn’t contain any geometry.  You can join that DataFrame with our census block group DataFrame so that we can render it on a map and create a geospatial product.  

You'll do that by using the DataFrame's `.merge()` method.

In [13]:
df_cbg_summary = df_cbg.merge(
    df_cbg_summary, 
    how = 'left', 
    left_on = 'geoid', 
    right_on = 'geoid'
)

After you’ve completed this merge operation, the resulting DataFrame contains geometries and the summary of our *OpenTime* column.  Now you can map this data and see areas where response times are slow.

In [14]:
df_cbg_summary.head()

Unnamed: 0,OBJECTID,geoid,SHAPE,OpenTime,CaseID
0,1,60750127001,"{""rings"": [[[-122.44958399999996, 37.803153001...",3.0,30.0
1,2,60750127003,"{""rings"": [[[-122.44777499999998, 37.801007000...",1.0,18.0
2,3,60750263021,"{""rings"": [[[-122.43791799999997, 37.711770000...",3.0,12.0
3,4,60750302021,"{""rings"": [[[-122.47072999999995, 37.765779001...",1.0,86.0
4,5,60750205003,"{""rings"": [[[-122.43916699899995, 37.759028001...",1.0,10.0


## Map the data and Export

#### 1. Map the Data and Export

With your analysis complete, you can now produce a feature class to hand off to our colleagues.  First, though, it's probably worth putting this data on a map so you can have a look at it.  You’ll exclude any records with null *OpenTime* values as they appear to be polygons in the middle of the water.

In [15]:
# create a new WebMap object
gis = arcgis.GIS()
cbg_map = gis.map("San Francisco, CA")

# exclude any null OpenTime values (in the middle of the bay)
df_to_map = df_cbg_summary[pandas.notna(df_cbg_summary.OpenTime)]

# plot the summary on the map
df_to_map.spatial.plot(
                colors='coolwarm',
                class_count=5,
                map_widget=cbg_map,
                renderer_type='c',
                col='OpenTime',
                line_width=0.1,
)

cbg_map

MapView(layout=Layout(height='400px', width='100%'))

In the resulting web map, we can review the data and ensure its quality.  Everything looks good, so let’s export it to a feature class.  

#### 2.  Export to a feature class

Using the ArcGIS API for Python to write a Spatially Enabled DataFrame to a feature class is pretty simple.  You can just use the spatial accessor's `to_featureclass()` method.

In [16]:
df_cbg_summary.spatial.to_featureclass(
    "./Tutorial_06_02.gdb/OpenTime_311_Cases_by_CBG"
)

'C:\\Users\\dav11274\\Desktop\\github\\Top-20-Python\\Exercises\\Chapter 06 - Data Manipulation\\Tutorial_06_02.gdb\\OpenTime_311_Cases_by_CBG'