# Putting it All Together Solutions?

For this last lesson, we'll practice going through a full workflow!

We'll answer the question:

## **What is the total grocery-store sales volume of each Census tract?**

In [None]:
import pandas as pd
import geopandas as gpd

import matplotlib
import matplotlib.pyplot as plt

%matplotlib inline  

## Importing Datasets

We first need to prepare our data by loading the joined tracts/ACS data and grocery data, and conduct our usual steps to make there they have the same CRS.

- Read in the joined tracts/ACS data
- Read the grocery data CSV into a pandas DataFrame (it lives at `'../data/other/ca_grocery_stores_2019_wgs84.csv`).
- Convert it to a GeoDataFrame.
- Define its CRS (EPSG:4326).
- Transform it to match the CRS of `tracts_acs_gdf_ac`.
- Examine the transformed GeoDataFrame

In [None]:
# Read in the joined tracts/ACS data
tracts_acs_gdf_ac = gpd.read_file('../data/outdata/tracts_acs_gdf_ac.json')

In [None]:
# Read the grocery data CSV into a pandas DataFrame
grocery_pts_df = pd.read_csv('../data/other/ca_grocery_stores_2019_wgs84.csv')

In [None]:
# Convert it to a GeoDataFrame
grocery_pts_gdf = gpd.GeoDataFrame(grocery_pts_df, 
                                   geometry=gpd.points_from_xy(grocery_pts_df.X, grocery_pts_df.Y))

In [None]:
# Define its CRS
grocery_pts_gdf.crs = "epsg:4326"

In [None]:
# Transform it to match the CRS of tracts_acs_gdf_ac
grocery_pts_gdf.to_crs(tracts_acs_gdf_ac.crs, inplace=True)

In [None]:
# Examine transformed GeoDataFrame
grocery_pts_gdf.head()

## Spatial Join and Dissolve

Now that we have our data and they're in the same projection, we're going to conduct an *attribute join* to bring together the two datasets. From there we'll be able to actually *aggregate* our data to count the total sales volume.

Complete the following steps:

- Join the two datasets in such a way that you can then...
- Group by tract and calculate the total grocery-store sales volume.
- Don't forget to check the dimensions, contents, and any other relevant aspects of your results.

In [None]:
# Join the two datasets
tracts_joingrocery = gpd.sjoin(tracts_acs_gdf_ac, grocery_pts_gdf, how='left')

In [None]:
# Group by tract and calculate the total grocery-store sales volume
tracts_totsalesvol = tracts_joingrocery[['GEOID','geometry','SALESVOL']].dissolve(by='GEOID',
                                                                                  aggfunc="sum",
                                                                                  as_index=False)

In [None]:
# Don't forget to check the dimensions, contents, and any other relevant aspects of your results
print(f'Dimensions of result: {tracts_totsalesvol.shape}')
print(f'Dimesions of Census tracts: {tracts_acs_gdf_ac.shape}')

In [None]:
# Check the result
tracts_totsalesvol.head()

## Plot and Review

With any type of geospatial analysis you do, it's always nice to plot and visualize your results to check your work and start to understand the full story of your analysis.

Complete the following:

- Plot the tracts, coloring them by total grocery-store sales volume.
- Plot the grocery stores on top.
- Bonus points for devising a nice visualization scheme that helps you heuristically check your results!

We've broken these steps into three individual cells,

In [None]:
# Subset the stores for only those within our tracts, to keep map within region of interest
grocery_pts_gdf_ac = grocery_pts_gdf[grocery_pts_gdf.within(tracts_acs_gdf_ac.unary_union)]

In [None]:
# Create the figure and axes
fig, ax = plt.subplots(figsize = (20,20)) 
# Plot the tracts, coloring by total SALESVOL
tracts_totsalesvol.plot(ax=ax,
                        column='SALESVOL',
                        scheme="quantiles",
                        cmap="viridis_r",
                        edgecolor="grey",
                        legend=True,
                        legend_kwds={'title': 'Total Grocery Store Sales Volume (Dollars)'})

# Add the grocery stores, coloring by SALESVOL, for a visual check
grocery_pts_gdf_ac.plot(ax=ax,
                        column='SALESVOL',
                        cmap='Greys_r',
                        linewidth=0.3,
                        markersize=25,
                        legend=True,
                        legend_kwds={'label': 'Sales Volume, Individual Stores (Dollars)',
                                     'orientation': "horizontal",
                                     'pad': 0.05})