## Exercise 4: Data Warehouse Querying and Basic Geospatial Operations

Skills: 
* Query data warehouse table
* Use dictionary to map values

References: 
* https://docs.calitp.org/data-infra/analytics_new_analysts/05-spatial-analysis-basics.html
* https://docs.calitp.org/data-infra/analytics_new_analysts/06-spatial-analysis-intro.html
* https://docs.calitp.org/data-infra/analytics_new_analysts/07-spatial-analysis-intermediate.html
* https://github.com/jorisvandenbossche/geopandas-tutorial

To use `shared_utils`: 

> In the terminal: `cd ..` to be in the `_shared_utils` sub-folder, run `make setup_env`, then `cd` back into your task > sub-folder (e.g., `bus_service_increase` or `example_report`). 

In [1]:
import geopandas as gpd
import pandas as pd
import os

#os.environ["CALITP_BQ_MAX_BYTES"] = str(100_000_000_000)
pd.set_option("display.max_rows", 20)

from calitp_data_analysis.tables import tbls
from calitp_data_analysis.sql import query_sql
from siuba import *



## Query a table, turn it into a gdf

You will query the warehouse table for 2 operators, Caltrain and Merced. A `feed_key` is a hash identifier, there's no real meaning to it, but it uniquely identifies a feed for that day.

The `feed_key` values for those 2 operators for 6/1/2022 are provided. 

* Query `mart_gtfs.dim_stops`
* Filter to the feed keys of interest
* Select these columns: `feed_key`, `stop_id`, `stop_lat`, `stop_lon`, `stop_name`
* Return as a dataframe using `collect()`
* Turn the point data into geometry with `geopandas`: [docs](https://geopandas.org/en/stable/docs/reference/api/geopandas.points_from_xy.html)

In [2]:
FEEDS = [
    "25c6505166c01099b2f6f2de173e20b9", # Caltrain
    "52639f09eb535f75b33d2c6a654cb89e", # Merced
]

stops = (
    tbls.mart_gtfs.dim_stops()
    >> filter(_.feed_key.isin(FEEDS))
    >> select(_.feed_key, _.stop_id, 
             _.stop_lat, _.stop_lon, _.stop_name)
    >> arrange(_.feed_key, _.stop_id, 
               _.stop_lat, _.stop_lon)
    >> collect() 
)

  sqlalchemy.util.warn(


In [3]:
stops

Unnamed: 0,feed_key,stop_id,stop_lat,stop_lon,stop_name
0,25c6505166c01099b2f6f2de173e20b9,22nd_street,37.756972,-122.392492,22nd Street
1,25c6505166c01099b2f6f2de173e20b9,2537740,37.438491,-122.156405,Stanford Caltrain Station
2,25c6505166c01099b2f6f2de173e20b9,2537744,37.438425,-122.156482,Stanford Caltrain Station
3,25c6505166c01099b2f6f2de173e20b9,70011,37.776390,-122.394992,San Francisco Caltrain Station
4,25c6505166c01099b2f6f2de173e20b9,70012,37.776348,-122.394935,San Francisco Caltrain Station
...,...,...,...,...,...
575,52639f09eb535f75b33d2c6a654cb89e,782489,36.992336,-120.626187,Obanion Park
576,52639f09eb535f75b33d2c6a654cb89e,835001,37.361993,-120.572221,Castle H.S.A.
577,52639f09eb535f75b33d2c6a654cb89e,835719,37.291789,-120.503653,T St. @ 3rd St.
578,52639f09eb535f75b33d2c6a654cb89e,844203,37.391296,-120.722486,Foster Farms (To Livingston)


## Use a dictionary to map values

* Create a new column called `operator` where `feed_key` is associated with its operator name.
* First, write a function to do it.
* Then, use a dictionary to do it (create new column called `agency`).
* Double check that `operator` and `agency` show the same values. Use `assert` to check.
    * `df.operator == df.agency` returns a series containing True/False for each row
    * `assert (df.operator == df.agency).all()` returns one result, False if it's not true, and nothing if it is True.
* Hint: https://docs.calitp.org/data-infra/analytics_new_analysts/02-data-analysis-intermediate.html

In [4]:
def operator(row):
    if row.feed_key == '25c6505166c01099b2f6f2de173e20b9':
        return 'Caltrain'
    else:
        return 'Merced'
    
stops['operator'] = stops.apply(operator, axis = 1)
stops
    

Unnamed: 0,feed_key,stop_id,stop_lat,stop_lon,stop_name,operator
0,25c6505166c01099b2f6f2de173e20b9,22nd_street,37.756972,-122.392492,22nd Street,Caltrain
1,25c6505166c01099b2f6f2de173e20b9,2537740,37.438491,-122.156405,Stanford Caltrain Station,Caltrain
2,25c6505166c01099b2f6f2de173e20b9,2537744,37.438425,-122.156482,Stanford Caltrain Station,Caltrain
3,25c6505166c01099b2f6f2de173e20b9,70011,37.776390,-122.394992,San Francisco Caltrain Station,Caltrain
4,25c6505166c01099b2f6f2de173e20b9,70012,37.776348,-122.394935,San Francisco Caltrain Station,Caltrain
...,...,...,...,...,...,...
575,52639f09eb535f75b33d2c6a654cb89e,782489,36.992336,-120.626187,Obanion Park,Merced
576,52639f09eb535f75b33d2c6a654cb89e,835001,37.361993,-120.572221,Castle H.S.A.,Merced
577,52639f09eb535f75b33d2c6a654cb89e,835719,37.291789,-120.503653,T St. @ 3rd St.,Merced
578,52639f09eb535f75b33d2c6a654cb89e,844203,37.391296,-120.722486,Foster Farms (To Livingston),Merced


In [5]:
agency = {'25c6505166c01099b2f6f2de173e20b9':'Caltrain',
          '52639f09eb535f75b33d2c6a654cb89e':'Merced'}

stops['agency'] = stops.feed_key.map(agency)

stops

Unnamed: 0,feed_key,stop_id,stop_lat,stop_lon,stop_name,operator,agency
0,25c6505166c01099b2f6f2de173e20b9,22nd_street,37.756972,-122.392492,22nd Street,Caltrain,Caltrain
1,25c6505166c01099b2f6f2de173e20b9,2537740,37.438491,-122.156405,Stanford Caltrain Station,Caltrain,Caltrain
2,25c6505166c01099b2f6f2de173e20b9,2537744,37.438425,-122.156482,Stanford Caltrain Station,Caltrain,Caltrain
3,25c6505166c01099b2f6f2de173e20b9,70011,37.776390,-122.394992,San Francisco Caltrain Station,Caltrain,Caltrain
4,25c6505166c01099b2f6f2de173e20b9,70012,37.776348,-122.394935,San Francisco Caltrain Station,Caltrain,Caltrain
...,...,...,...,...,...,...,...
575,52639f09eb535f75b33d2c6a654cb89e,782489,36.992336,-120.626187,Obanion Park,Merced,Merced
576,52639f09eb535f75b33d2c6a654cb89e,835001,37.361993,-120.572221,Castle H.S.A.,Merced,Merced
577,52639f09eb535f75b33d2c6a654cb89e,835719,37.291789,-120.503653,T St. @ 3rd St.,Merced,Merced
578,52639f09eb535f75b33d2c6a654cb89e,844203,37.391296,-120.722486,Foster Farms (To Livingston),Merced,Merced


In [6]:
stops.operator == stops.agency

0      True
1      True
2      True
3      True
4      True
       ... 
575    True
576    True
577    True
578    True
579    True
Length: 580, dtype: bool

In [7]:
assert (stops.operator == stops.agency).all()
stops.dtypes

feed_key      object
stop_id       object
stop_lat     float64
stop_lon     float64
stop_name     object
operator      object
agency        object
dtype: object

## Turn lat/lon into point geometry
* There is a [function in calitp_data_analysis](https://github.com/cal-itp/data-infra/blob/main/packages/calitp-data-analysis/calitp_data_analysis/geography_utils.py#L57-L84) that does it. Show the steps within the function (the long way), and also create the `geometry` column using `shared_utils`.
* Use `geography_utils.create_point_geometry??` to see what goes into that function, and what that function looks like under the hood.

In [8]:
stops = stops.assign(
    geometry = gpd.points_from_xy(stops['stop_lon'], stops['stop_lat'], crs="EPSG:4326")
)
    
gdf1 = gpd.GeoDataFrame(stops).to_crs("EPSG:4326")

gdf1   

Unnamed: 0,feed_key,stop_id,stop_lat,stop_lon,stop_name,operator,agency,geometry
0,25c6505166c01099b2f6f2de173e20b9,22nd_street,37.756972,-122.392492,22nd Street,Caltrain,Caltrain,POINT (-122.39249 37.75697)
1,25c6505166c01099b2f6f2de173e20b9,2537740,37.438491,-122.156405,Stanford Caltrain Station,Caltrain,Caltrain,POINT (-122.15641 37.43849)
2,25c6505166c01099b2f6f2de173e20b9,2537744,37.438425,-122.156482,Stanford Caltrain Station,Caltrain,Caltrain,POINT (-122.15648 37.43842)
3,25c6505166c01099b2f6f2de173e20b9,70011,37.776390,-122.394992,San Francisco Caltrain Station,Caltrain,Caltrain,POINT (-122.39499 37.77639)
4,25c6505166c01099b2f6f2de173e20b9,70012,37.776348,-122.394935,San Francisco Caltrain Station,Caltrain,Caltrain,POINT (-122.39494 37.77635)
...,...,...,...,...,...,...,...,...
575,52639f09eb535f75b33d2c6a654cb89e,782489,36.992336,-120.626187,Obanion Park,Merced,Merced,POINT (-120.62619 36.99234)
576,52639f09eb535f75b33d2c6a654cb89e,835001,37.361993,-120.572221,Castle H.S.A.,Merced,Merced,POINT (-120.57222 37.36199)
577,52639f09eb535f75b33d2c6a654cb89e,835719,37.291789,-120.503653,T St. @ 3rd St.,Merced,Merced,POINT (-120.50365 37.29179)
578,52639f09eb535f75b33d2c6a654cb89e,844203,37.391296,-120.722486,Foster Farms (To Livingston),Merced,Merced,POINT (-120.72249 37.39130)


Basic stuff about a geodataframe.

A gdf would have a coordinate reference system that converts the points or lines into a place on the spherical Earth. The most common CRS is called `WGS 84`, and its code is `EPSG:4326`. This is what you'd see when you use Google Maps to find lat/lon of a place.

[Read](https://desktop.arcgis.com/en/arcmap/latest/map/projections/about-geographic-coordinate-systems.htm) about the `WGS 84` geographic coordinate system.

[Read](https://desktop.arcgis.com/en/arcmap/latest/map/projections/about-projected-coordinate-systems.htm) about projected coordinate reference systems, which is essentially about flattening our spherical Earth into a 2D plane so we can measure distances and whatnot.

* Is it a pandas dataframe or a geopandas geodataframe?: `type(gdf)`
* Coordinate reference system: `gdf.crs`
* gdfs must have a geometry column. Find the name of the column that is geometry: `gdf.geometry.name`
* Project the coordinate reference system to something else: `gdf = gdf.to_crs("EPSG:2229")` and check.

* This GitHub repo has several `geopandas` tutorials that covers basic spatial concepts: https://github.com/jorisvandenbossche/geopandas-tutorial. 
* Skim through the notebooks to see some of the concepts demonstrated, although to actually run the notebooks, you can click on `launch binder` in the repo's README to do so.

In [40]:
type(gdf1)

geopandas.geodataframe.GeoDataFrame

In [41]:
gdf1.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [42]:
gdf1.geometry.name

'geometry'

## Spatial Join (which points fall into which polygon)

This URL gives you CA county boundaries: https://gis.data.ca.gov/datasets/CALFIRE-Forestry::california-county-boundaries/explore?location=37.246136%2C-119.002032%2C6.12

* Go to "I want to use this" > View API Resources > copy link for geojson
* Read in the geojson with `geopandas` and make it a geodataframe: `gpd.read_file(LONG_URL_PATH)`
* Double check that the coordinate reference system is the same for both gdfs using `gdf.crs`. If not, change it so they are the same.
* Spatial join stops to counties: [docs](https://geopandas.org/en/stable/docs/reference/api/geopandas.sjoin.html)
    * Play with inner join or left join, what's the difference? Which one do you want?
    * Play with switching around the left_df and right_df, what's the right order?
* By county: count number of stops and stops per sq_mi.
    * Hint 1: Start with a CRS with units in feet or meters, then do a conversion to sq mi. [CRS in geography_utils](https://github.com/cal-itp/data-infra/blob/main/packages/calitp-data-analysis/calitp_data_analysis/geography_utils.py)
    * Hint 2: to find area, you can create a new column and calculate `gdf.geometry.area`. [geometry manipulations docs](https://geopandas.org/en/stable/docs/user_guide/geometric_manipulations.html)

In [9]:
LONG_URL_PATH = "https://services1.arcgis.com/jUJYIo9tSA7EHvfZ/arcgis/rest/services/California_County_Boundaries/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson"
CA_county = gpd.read_file(LONG_URL_PATH)

In [10]:
CA_county

Unnamed: 0,OBJECTID,COUNTY_NAME,COUNTY_ABBREV,COUNTY_NUM,COUNTY_CODE,COUNTY_FIPS,ISLAND,Shape__Area,Shape__Length,GlobalID,geometry
0,1,Alameda,ALA,1,01,001,,3.402787e+09,308998.650766,e6f92268-d2dd-4cfb-8b79-5b4b2f07c559,"POLYGON ((-122.27125 37.90503, -122.27024 37.9..."
1,2,Alpine,ALP,2,02,003,,3.146939e+09,274888.492411,870479b2-480a-494b-8352-ad60578839c1,"POLYGON ((-119.58667 38.71420, -119.58653 38.7..."
2,3,Amador,AMA,3,03,005,,2.562635e+09,361708.438013,4f45b3a6-be10-461c-8945-6b2aaa7119f6,"POLYGON ((-120.07246 38.70276, -120.07249 38.6..."
3,4,Butte,BUT,4,04,007,,7.339348e+09,526547.115238,44fba680-aecc-4e04-a499-29d69affbd4a,"POLYGON ((-121.07661 39.59729, -121.07945 39.5..."
4,5,Calaveras,CAL,5,05,009,,4.351069e+09,370637.578323,d11ef739-4a1e-414e-bfd1-e7dcd56cd61e,"POLYGON ((-120.01792 38.43586, -120.01788 38.4..."
...,...,...,...,...,...,...,...,...,...,...,...
64,65,Ventura,VEN,56,56,111,Channel Islands,8.750094e+05,11880.900594,86c2171f-d249-45a0-ac0c-40a4e6cb82e2,"POLYGON ((-119.38135 34.01116, -119.38135 34.0..."
65,66,Ventura,VEN,56,56,111,Channel Islands,2.595855e+06,14258.527110,71e2d2ad-a83c-4f5d-bc3b-b7ad9b12f57b,"POLYGON ((-119.43281 34.01600, -119.43276 34.0..."
66,67,Ventura,VEN,56,56,111,Channel Islands,6.082082e+05,7967.029762,d90412b1-c6af-4437-94d9-48dc3a13a64d,"POLYGON ((-119.36427 34.01681, -119.36427 34.0..."
67,68,Los Angeles,LOS,19,19,037,Channel Islands,2.076580e+08,135274.201204,3cb33dc3-e564-4bbf-8528-866600a1f9e4,"POLYGON ((-118.53891 32.98008, -118.53884 32.9..."


In [39]:
CA_county.crs
gdf1.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [45]:
county_stops = gpd.sjoin(
    CA_county, 
    gdf1, 
    how = 'inner',
    predicate = 'intersects'
)

county_stops.geometry.unique()
county_stops

Unnamed: 0,OBJECTID,COUNTY_NAME,COUNTY_ABBREV,COUNTY_NUM,COUNTY_CODE,COUNTY_FIPS,ISLAND,Shape__Area,Shape__Length,GlobalID,geometry,index_right,feed_key,stop_id,stop_lat,stop_lon,stop_name,operator,agency
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-120.37071 37.60543, -120.35414 37.5...",566,52639f09eb535f75b33d2c6a654cb89e,782479,36.964503,-120.653529,Kwik Serv (South Dos Palos),Merced,Merced
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-120.37071 37.60543, -120.35414 37.5...",505,52639f09eb535f75b33d2c6a654cb89e,768637,36.964522,-120.653626,Dos Palos/Chics Market,Merced,Merced
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-120.37071 37.60543, -120.35414 37.5...",252,52639f09eb535f75b33d2c6a654cb89e,768356,36.964552,-120.653486,Dos Palos/Chics Market (to Los Banos),Merced,Merced
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-120.37071 37.60543, -120.35414 37.5...",567,52639f09eb535f75b33d2c6a654cb89e,782480,36.968476,-120.653701,Lexington Ave Stop,Merced,Merced
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-120.37071 37.60543, -120.35414 37.5...",568,52639f09eb535f75b33d2c6a654cb89e,782481,36.968538,-120.644763,South Dos Palos County Park,Merced,Merced
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49,50,Stanislaus,STA,50,50,099,,6.258717e+09,474264.062970,be9f177f-a5ff-460c-9b79-3197a6e5e093,"POLYGON ((-120.85530 38.01420, -120.83950 37.9...",514,52639f09eb535f75b33d2c6a654cb89e,770324,37.309401,-121.020240,Nob Hill (Newman),Merced,Merced
49,50,Stanislaus,STA,50,50,099,,6.258717e+09,474264.062970,be9f177f-a5ff-460c-9b79-3197a6e5e093,"POLYGON ((-120.85530 38.01420, -120.83950 37.9...",237,52639f09eb535f75b33d2c6a654cb89e,768335,37.507156,-120.858762,Roger K. Fall Transit Center,Merced,Merced
49,50,Stanislaus,STA,50,50,099,,6.258717e+09,474264.062970,be9f177f-a5ff-460c-9b79-3197a6e5e093,"POLYGON ((-120.85530 38.01420, -120.83950 37.9...",259,52639f09eb535f75b33d2c6a654cb89e,768363,37.507400,-120.873000,Fulkerth Shopping Center,Merced,Merced
49,50,Stanislaus,STA,50,50,099,,6.258717e+09,474264.062970,be9f177f-a5ff-460c-9b79-3197a6e5e093,"POLYGON ((-120.85530 38.01420, -120.83950 37.9...",185,52639f09eb535f75b33d2c6a654cb89e,768276,37.521689,-120.882525,Monte Vista Ave @ Country Side - Turlock (east...,Merced,Merced


In [25]:
county_stops1 = gpd.sjoin(
    gdf1, 
    CA_county, 
    how = 'left'
)

county_stops1.geometry.unique()

<GeometryArray>
[<POINT (-122.392 37.757)>, <POINT (-122.156 37.438)>,
 <POINT (-122.156 37.438)>, <POINT (-122.395 37.776)>,
 <POINT (-122.395 37.776)>, <POINT (-122.392 37.758)>,
 <POINT (-122.392 37.758)>,  <POINT (-122.402 37.71)>,
  <POINT (-122.402 37.71)>, <POINT (-122.405 37.656)>,
 ...
 <POINT (-120.645 36.976)>, <POINT (-120.637 36.986)>,
 <POINT (-120.628 36.982)>, <POINT (-120.625 36.983)>,
 <POINT (-120.629 36.987)>, <POINT (-120.626 36.992)>,
 <POINT (-120.572 37.362)>, <POINT (-120.504 37.292)>,
 <POINT (-120.722 37.391)>, <POINT (-120.823 37.058)>]
Length: 580, dtype: geometry

> Inner join and left join both retains the value from the left_df geometry column which is what we want for this analysis.

In [14]:
#By county: count number of stops and stops per sq_mi.
#Hint 1: Start with a CRS with units in feet or meters, then do a conversion to sq mi. CRS in geography_utils
#Hint 2: to find area, you can create a new column and calculate gdf.geometry.area. geometry manipulations docs

In [46]:
no_stops = (county_stops.groupby(['COUNTY_NAME'])
            .agg({'stop_id' : 'count'}
                ).reset_index()
           )

In [47]:
no_stops

Unnamed: 0,COUNTY_NAME,stop_id
0,Merced,480
1,San Francisco,8
2,San Mateo,37
3,Santa Clara,50
4,Stanislaus,5


In [48]:
county_stops = county_stops.to_crs("EPSG:3310") #to convert units to meters
county_stops

Unnamed: 0,OBJECTID,COUNTY_NAME,COUNTY_ABBREV,COUNTY_NUM,COUNTY_CODE,COUNTY_FIPS,ISLAND,Shape__Area,Shape__Length,GlobalID,geometry,index_right,feed_key,stop_id,stop_lat,stop_lon,stop_name,operator,agency
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-32681.799 -45617.994, -31232.750 -4...",566,52639f09eb535f75b33d2c6a654cb89e,782479,36.964503,-120.653529,Kwik Serv (South Dos Palos),Merced,Merced
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-32681.799 -45617.994, -31232.750 -4...",505,52639f09eb535f75b33d2c6a654cb89e,768637,36.964522,-120.653626,Dos Palos/Chics Market,Merced,Merced
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-32681.799 -45617.994, -31232.750 -4...",252,52639f09eb535f75b33d2c6a654cb89e,768356,36.964552,-120.653486,Dos Palos/Chics Market (to Los Banos),Merced,Merced
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-32681.799 -45617.994, -31232.750 -4...",567,52639f09eb535f75b33d2c6a654cb89e,782480,36.968476,-120.653701,Lexington Ave Stop,Merced,Merced
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-32681.799 -45617.994, -31232.750 -4...",568,52639f09eb535f75b33d2c6a654cb89e,782481,36.968538,-120.644763,South Dos Palos County Park,Merced,Merced
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49,50,Stanislaus,STA,50,50,099,,6.258717e+09,474264.062970,be9f177f-a5ff-460c-9b79-3197a6e5e093,"POLYGON ((-74992.533 97.437, -73621.291 -1501....",514,52639f09eb535f75b33d2c6a654cb89e,770324,37.309401,-121.020240,Nob Hill (Newman),Merced,Merced
49,50,Stanislaus,STA,50,50,099,,6.258717e+09,474264.062970,be9f177f-a5ff-460c-9b79-3197a6e5e093,"POLYGON ((-74992.533 97.437, -73621.291 -1501....",237,52639f09eb535f75b33d2c6a654cb89e,768335,37.507156,-120.858762,Roger K. Fall Transit Center,Merced,Merced
49,50,Stanislaus,STA,50,50,099,,6.258717e+09,474264.062970,be9f177f-a5ff-460c-9b79-3197a6e5e093,"POLYGON ((-74992.533 97.437, -73621.291 -1501....",259,52639f09eb535f75b33d2c6a654cb89e,768363,37.507400,-120.873000,Fulkerth Shopping Center,Merced,Merced
49,50,Stanislaus,STA,50,50,099,,6.258717e+09,474264.062970,be9f177f-a5ff-460c-9b79-3197a6e5e093,"POLYGON ((-74992.533 97.437, -73621.291 -1501....",185,52639f09eb535f75b33d2c6a654cb89e,768276,37.521689,-120.882525,Monte Vista Ave @ Country Side - Turlock (east...,Merced,Merced


In [49]:

county_stops = county_stops.assign(
    area_sq_mi = county_stops['geometry'].area / (3.86 * 10**-7)
)
county_stops


Unnamed: 0,OBJECTID,COUNTY_NAME,COUNTY_ABBREV,COUNTY_NUM,COUNTY_CODE,COUNTY_FIPS,ISLAND,Shape__Area,Shape__Length,GlobalID,geometry,index_right,feed_key,stop_id,stop_lat,stop_lon,stop_name,operator,agency,area_sq_mi
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-32681.799 -45617.994, -31232.750 -4...",566,52639f09eb535f75b33d2c6a654cb89e,782479,36.964503,-120.653529,Kwik Serv (South Dos Palos),Merced,Merced,1.326899e+16
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-32681.799 -45617.994, -31232.750 -4...",505,52639f09eb535f75b33d2c6a654cb89e,768637,36.964522,-120.653626,Dos Palos/Chics Market,Merced,Merced,1.326899e+16
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-32681.799 -45617.994, -31232.750 -4...",252,52639f09eb535f75b33d2c6a654cb89e,768356,36.964552,-120.653486,Dos Palos/Chics Market (to Los Banos),Merced,Merced,1.326899e+16
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-32681.799 -45617.994, -31232.750 -4...",567,52639f09eb535f75b33d2c6a654cb89e,782480,36.968476,-120.653701,Lexington Ave Stop,Merced,Merced,1.326899e+16
23,24,Merced,MER,24,24,047,,8.085829e+09,432954.410428,f3d6231c-c7fa-4340-b03f-23c1d0572979,"POLYGON ((-32681.799 -45617.994, -31232.750 -4...",568,52639f09eb535f75b33d2c6a654cb89e,782481,36.968538,-120.644763,South Dos Palos County Park,Merced,Merced,1.326899e+16
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49,50,Stanislaus,STA,50,50,099,,6.258717e+09,474264.062970,be9f177f-a5ff-460c-9b79-3197a6e5e093,"POLYGON ((-74992.533 97.437, -73621.291 -1501....",514,52639f09eb535f75b33d2c6a654cb89e,770324,37.309401,-121.020240,Nob Hill (Newman),Merced,Merced,1.017137e+16
49,50,Stanislaus,STA,50,50,099,,6.258717e+09,474264.062970,be9f177f-a5ff-460c-9b79-3197a6e5e093,"POLYGON ((-74992.533 97.437, -73621.291 -1501....",237,52639f09eb535f75b33d2c6a654cb89e,768335,37.507156,-120.858762,Roger K. Fall Transit Center,Merced,Merced,1.017137e+16
49,50,Stanislaus,STA,50,50,099,,6.258717e+09,474264.062970,be9f177f-a5ff-460c-9b79-3197a6e5e093,"POLYGON ((-74992.533 97.437, -73621.291 -1501....",259,52639f09eb535f75b33d2c6a654cb89e,768363,37.507400,-120.873000,Fulkerth Shopping Center,Merced,Merced,1.017137e+16
49,50,Stanislaus,STA,50,50,099,,6.258717e+09,474264.062970,be9f177f-a5ff-460c-9b79-3197a6e5e093,"POLYGON ((-74992.533 97.437, -73621.291 -1501....",185,52639f09eb535f75b33d2c6a654cb89e,768276,37.521689,-120.882525,Monte Vista Ave @ Country Side - Turlock (east...,Merced,Merced,1.017137e+16


In [55]:
#Grouping by county
county_area = (county_stops.groupby(['COUNTY_NAME'])
            .agg({'area_sq_mi' : 'mean',
                 'stop_id' : 'count'}
                ).reset_index()
           )

county_area

Unnamed: 0,COUNTY_NAME,area_sq_mi,stop_id
0,Merced,1.326899e+16,480
1,San Francisco,708635200000000.0,8
2,San Mateo,3702600000000000.0,37
3,Santa Clara,8756433000000000.0,50
4,Stanislaus,1.017137e+16,5


In [57]:

county_area = county_area.assign(
    stop_sqmi =  county_area.stop_id / county_area.area_sq_mi
)

county_area

Unnamed: 0,COUNTY_NAME,area_sq_mi,stop_id,stop_sqmi
0,Merced,1.326899e+16,480,3.617458e-14
1,San Francisco,708635200000000.0,8,1.128931e-14
2,San Mateo,3702600000000000.0,37,9.992977e-15
3,Santa Clara,8756433000000000.0,50,5.710088e-15
4,Stanislaus,1.017137e+16,5,4.915761e-16
