## Exercise 5: Geospatial wrangling and making maps

Skills: 
* More geospatial practice building on earlier skills
* Make a map with `geopandas`

References: 
* https://docs.calitp.org/data-infra/analytics_new_analysts/02-data-analysis-intermediate.html
* https://docs.calitp.org/data-infra/analytics_tools/python_libraries.html

In [1]:
import geopandas as gpd
import intake
#import os
import pandas as pd
import shapely
import altair as alt

#os.environ["CALITP_BQ_MAX_BYTES"] = str(100_000_000_000)

#from calitp_data_analysis.tables import tbls
from siuba import *

# Hint: if this doesn't import: refer to docs for correctly import
# cd into _shared_utils folder, run the make setup_env command
#from shared_utils import geography_utils
FOLDER = "./data/"
FILE_NAME = "exercise_5_stops_sample.parquet"
stops=gpd.read_parquet(f"{FOLDER}{FILE_NAME}")


import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In the next release, GeoPandas will switch to using Shapely by default, even if PyGEOS is installed. If you only have PyGEOS installed to get speed-ups, this switch should be smooth. However, if you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas as gpd


In [2]:
stops.head(2)

Unnamed: 0,feed_key,stop_id,stop_key,stop_name,route_type_0,route_type_1,route_type_2,route_type_3,route_type_4,route_type_5,route_type_6,route_type_7,route_type_11,route_type_12,missing_route_type,geometry
0,8a47f5aa51f481e9ddc7c497bd72d264,1117,996713f17805d89de17057bcd41a482d,ANTON-SAKIOKA,,,,55.0,,,,,,,,POINT (-117.87903 33.69030)
1,8a47f5aa51f481e9ddc7c497bd72d264,2820,b7cb986ea2688781ce06c0654a08075c,GOLDEN WEST-EDINGER,,,,21.0,,,,,,,,POINT (-118.00668 33.73067)


def create_point_geometry(
    df: pd.DataFrame,
    longitude_col: str = "stop_lon",
    latitude_col: str = "stop_lat",
    crs: str = "EPSG:4326",
) -> gpd.GeoDataFrame:
    """
    Parameters:
    df: pandas.DataFrame to turn into geopandas.GeoDataFrame,
        default dataframe in mind is gtfs_schedule.stops

    longitude_col: str, column name corresponding to longitude
                    in gtfs_schedule.stops, this column is "stop_lon"

    latitude_col: str, column name corresponding to latitude
                    in gtfs_schedule.stops, this column is "stop_lat"

    crs: str, coordinate reference system for point geometry
    """
    # Default CRS for stop_lon, stop_lat is WGS84
    df = df.assign(geometry=gpd.points_from_xy(df[longitude_col], df[latitude_col], crs=WGS84))

    # ALlow projection to different CRS
    gdf = gpd.GeoDataFrame(df).to_crs(crs)

    return gdf

In [None]:
#point_gdf = geography_utils.create_point_geometry()
#point_gdf = create_point_geometry(fill in here)

## Research Question

What's the average number of trips per stop by operators in southern California? Show visualizations at the operator and county-level.
<br>**Geographic scope:** southern California counties
<br>**Deliverables:** chart(s) and map(s) showing metrics comparing across counties and also across operators. Make these visualizations using function(s).

### Prep data

* Use the same query, but grab a different set of operators. These are in southern California, so the map should zoom in counties ranging from LA to SD.
* *Hint*: for some counties, there are multiple operators. Make sure the average trips per stop by counties is the weighted average.
* Use the same [shapefile for CA counties](https://gis.data.ca.gov/datasets/CALFIRE-Forestry::california-county-boundaries/explore?location=37.246136%2C-119.002032%2C6.12) as in Exercise 4.
* Join the data and only keep counties that have bus stops.

feeds_to_names = shared_utils.gtfs_utils_v2.schedule_daily_feed_to_organization(
    selected_date = "2022-06-01",
    get_df = True
)[["feed_key", "name"]].drop_duplicates()

OPERATORS = [
    "Alhambra Schedule", 
    "San Diego Schedule",
    "Big Blue Bus Schedule",
    "Culver City Schedule",
    "OmniTrans Schedule",
]

SUBSET_FEEDS = feeds_to_names[
    feeds_to_names.name.isin(OPERATORS)
].feed_key.tolist()

# example from ex 3. did the same thing
MODE_NAMES = {
    'MB': 'Bus', 
    'LR': 'Light Rail',
    'CB': 'Commuter Bus',
}

# What happens to the ones that aren't specified in MODE_NAMES?
df = df.assign(
    mode_full_name = df.Mode.map(MODE_NAMES)
)

keep_me = ['Alhambra Schedule', 'San Diego Schedule', 'Big Blue Bus Schedule', 'OmniTrans Schedule', 'Culver City Schedule']
stops_keep = stops[stops.feed_key.isin(keep_me)]

In [3]:
# not only rename, but also filtered out the row that are being renamed? no
# we dont need to add this new column back to data? it added autoly once created
feed_name = {
   "71d91d70ad6c07b1f9b0a618ffceef93": "Alhambra Schedule",
    "a7ba6f075198e9bf9152fab6c7faf0f6": "San Diego Schedule",
    "4f77ef02b983eccc0869c7540f98a7d0": "Big Blue Bus Schedule",
    "ae93a53469371fb3f9059d2097f66842": "OmniTrans Schedule",
    "180d48eb03829594478082dca5782ccd": "Culver City Schedule"
}
stops = stops.assign(
    name = stops.feed_key.map(feed_name)
)

In [4]:
stops.head(2)

Unnamed: 0,feed_key,stop_id,stop_key,stop_name,route_type_0,route_type_1,route_type_2,route_type_3,route_type_4,route_type_5,route_type_6,route_type_7,route_type_11,route_type_12,missing_route_type,geometry,name
0,8a47f5aa51f481e9ddc7c497bd72d264,1117,996713f17805d89de17057bcd41a482d,ANTON-SAKIOKA,,,,55.0,,,,,,,,POINT (-117.87903 33.69030),
1,8a47f5aa51f481e9ddc7c497bd72d264,2820,b7cb986ea2688781ce06c0654a08075c,GOLDEN WEST-EDINGER,,,,21.0,,,,,,,,POINT (-118.00668 33.73067),


In [5]:
stops.name.value_counts()

San Diego Schedule       1675
OmniTrans Schedule        908
Big Blue Bus Schedule     378
Culver City Schedule      177
Alhambra Schedule          39
Name: name, dtype: int64

feed_keys_to_names_dict = {
    "71d91d70ad6c07b1f9b0a618ffceef93": "Alhambra Schedule",
    "a7ba6f075198e9bf9152fab6c7faf0f6": "San Diego Schedule",
    "4f77ef02b983eccc0869c7540f98a7d0": "Big Blue Bus Schedule"
    "ae93a53469371fb3f9059d2097f66842": "OmniTrans Schedule",
    "180d48eb03829594478082dca5782ccd": "Culver City Schedule"
}

OPERATORS = [
    "Alhambra Schedule", 
    "San Diego Schedule",
    "Big Blue Bus Schedule",
    "Culver City Schedule",
    "OmniTrans Schedule",
    "OCTA Schedule"
]

SUBSET_FEEDS = feeds_to_names[
    feeds_to_names.name.isin(OPERATORS)
].feed_key.tolist()

In [6]:
keep_me = ['Alhambra Schedule', 'San Diego Schedule', 'Big Blue Bus Schedule', 'OmniTrans Schedule', 'Culver City Schedule']
stops2 = stops[stops.name.isin(keep_me)]

In [7]:
stops2.head()

Unnamed: 0,feed_key,stop_id,stop_key,stop_name,route_type_0,route_type_1,route_type_2,route_type_3,route_type_4,route_type_5,route_type_6,route_type_7,route_type_11,route_type_12,missing_route_type,geometry,name
2,180d48eb03829594478082dca5782ccd,616,e598476f943975d3657c2164506ce82c,WESTWOOD MEDICAL PLAZA,,,,85.0,,,,,,,,POINT (-118.44548 34.06552),Culver City Schedule
5,ae93a53469371fb3f9059d2097f66842,5407,78f87661965325e2bf5fdcf8b8ae923d,Northpark @ Serrano Wb Fs,,,,16.0,,,,,,,,POINT (-117.32369 34.17753),OmniTrans Schedule
7,a7ba6f075198e9bf9152fab6c7faf0f6,41235,3e37f2ff36e642f3c7593ed1a60b4637,Poway Rd & Pomerado Rd,,,,28.0,,,,,,,,POINT (-117.06371 32.95171),San Diego Schedule
9,4f77ef02b983eccc0869c7540f98a7d0,1557,4f03829cd225a77b989d56f1ecbd9a3a,26TH SB & WASHINGTON (SM) FS,,,,17.0,,,,,,,,POINT (-118.48037 34.03844),Big Blue Bus Schedule
10,a7ba6f075198e9bf9152fab6c7faf0f6,11367,63bc51ab5314fe5181dd7b823d750079,Imperial Av & Greenwood Cemetery,,,,34.0,,,,,,,,POINT (-117.10184 32.70429),San Diego Schedule


In [8]:
stops2.name.value_counts()

San Diego Schedule       1675
OmniTrans Schedule        908
Big Blue Bus Schedule     378
Culver City Schedule      177
Alhambra Schedule          39
Name: name, dtype: int64

In [9]:
stops.shape, stops2.shape

((6425, 17), (3177, 17))

In [10]:
#same as below, keep_col. this is the siuba way
stops2 = (stops2
    #tbls.mart_gtfs.fct_daily_scheduled_stops()
    #>> filter(_.feed_key.isin(SUBSET_FEEDS))
    #>> filter(_.service_date == "2022-06-01")
    >> select(_.feed_key, _.stop_key, _.stop_id,
              _.stop_name, _.geometry, _.name, _.route_type_3)
    #>> collect()
)
# is this the same as the cell below, which is to do a keep_col???

In [None]:
# is this the same as above? yes,this is pandas way
keep_col=['feed_key','stop_key','stop_id','stop_name','geometry','name','route_type_3']
stops_clean=stops[keep_col]

Check the type of `stops`. Is it a pandas df or geopandas gdf?

In [11]:
type(stops2)

geopandas.geodataframe.GeoDataFrame

In [12]:
stops2.head()

Unnamed: 0,feed_key,stop_key,stop_id,stop_name,geometry,name,route_type_3
2,180d48eb03829594478082dca5782ccd,e598476f943975d3657c2164506ce82c,616,WESTWOOD MEDICAL PLAZA,POINT (-118.44548 34.06552),Culver City Schedule,85.0
5,ae93a53469371fb3f9059d2097f66842,78f87661965325e2bf5fdcf8b8ae923d,5407,Northpark @ Serrano Wb Fs,POINT (-117.32369 34.17753),OmniTrans Schedule,16.0
7,a7ba6f075198e9bf9152fab6c7faf0f6,3e37f2ff36e642f3c7593ed1a60b4637,41235,Poway Rd & Pomerado Rd,POINT (-117.06371 32.95171),San Diego Schedule,28.0
9,4f77ef02b983eccc0869c7540f98a7d0,4f03829cd225a77b989d56f1ecbd9a3a,1557,26TH SB & WASHINGTON (SM) FS,POINT (-118.48037 34.03844),Big Blue Bus Schedule,17.0
10,a7ba6f075198e9bf9152fab6c7faf0f6,63bc51ab5314fe5181dd7b823d750079,11367,Imperial Av & Greenwood Cemetery,POINT (-117.10184 32.70429),San Diego Schedule,34.0


In [None]:
#skip
# Turn stops into a gdf
geom = [shapely.wkt.loads(x) for x in stops.pt_geom]

stops = gpd.GeoDataFrame(
    stops, 
    geometry=geom, 
    crs="EPSG:4326"
).drop(columns="pt_geom")

Check the type of `stops`. Is it a pandas df or geopandas gdf?

What is the CRS and geometry column name?

In [None]:
#type(stops)

In [13]:
stops2.geometry.name

'geometry'

In [14]:
stops.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [15]:
counties = gpd.read_file('https://services1.arcgis.com/jUJYIo9tSA7EHvfZ/arcgis/rest/services/California_County_Boundaries/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson')

In [16]:
counties.head(2)

Unnamed: 0,OBJECTID,COUNTY_NAME,COUNTY_ABBREV,COUNTY_NUM,COUNTY_CODE,COUNTY_FIPS,ISLAND,Shape__Area,Shape__Length,GlobalID,geometry
0,1,Alameda,ALA,1,1,1,,3402787000.0,308998.650766,e6f92268-d2dd-4cfb-8b79-5b4b2f07c559,"POLYGON ((-122.27125 37.90503, -122.27024 37.9..."
1,2,Alpine,ALP,2,2,3,,3146939000.0,274888.492411,870479b2-480a-494b-8352-ad60578839c1,"POLYGON ((-119.58667 38.71420, -119.58653 38.7..."


In [37]:
type(counties)

geopandas.geodataframe.GeoDataFrame

In [17]:
counties = counties.to_crs('EPSG:4326')

In [18]:
stops2 = stops2.to_crs('EPSG:4326')

In [38]:
# only keep counties have bus stops only. yes. when we do an inner join, 
# it drops rows that dont match, ie. rows dont have bus stops
join = gpd.sjoin(stops2, counties, how = 'inner', predicate = 'intersects')

In [39]:
list(join.columns)

['feed_key',
 'stop_key',
 'stop_id',
 'stop_name',
 'geometry',
 'name',
 'route_type_3',
 'index_right',
 'OBJECTID',
 'COUNTY_NAME',
 'COUNTY_ABBREV',
 'COUNTY_NUM',
 'COUNTY_CODE',
 'COUNTY_FIPS',
 'ISLAND',
 'Shape__Area',
 'Shape__Length',
 'GlobalID']

In [40]:
join.head()

Unnamed: 0,feed_key,stop_key,stop_id,stop_name,geometry,name,route_type_3,index_right,OBJECTID,COUNTY_NAME,COUNTY_ABBREV,COUNTY_NUM,COUNTY_CODE,COUNTY_FIPS,ISLAND,Shape__Area,Shape__Length,GlobalID
2,180d48eb03829594478082dca5782ccd,e598476f943975d3657c2164506ce82c,616,WESTWOOD MEDICAL PLAZA,POINT (-118.44548 34.06552),Culver City Schedule,85.0,18,19,Los Angeles,LOS,19,19,37,,15054690000.0,629726.475248,3b1e1d69-2b1a-464d-ba43-611c4201b219
9,4f77ef02b983eccc0869c7540f98a7d0,4f03829cd225a77b989d56f1ecbd9a3a,1557,26TH SB & WASHINGTON (SM) FS,POINT (-118.48037 34.03844),Big Blue Bus Schedule,17.0,18,19,Los Angeles,LOS,19,19,37,,15054690000.0,629726.475248,3b1e1d69-2b1a-464d-ba43-611c4201b219
11,180d48eb03829594478082dca5782ccd,c40af297c6bff3e22d8d54c7a4e7b589,658,ADMIRALTY WY/FIJI WY,POINT (-118.43873 33.97806),Culver City Schedule,16.0,18,19,Los Angeles,LOS,19,19,37,,15054690000.0,629726.475248,3b1e1d69-2b1a-464d-ba43-611c4201b219
65,4f77ef02b983eccc0869c7540f98a7d0,133b8ff1003ad8f14675f9da8e8bfc8b,379,SANTA MONICA EB & WELLESLEY NS,POINT (-118.46554 34.03948),Big Blue Bus Schedule,90.0,18,19,Los Angeles,LOS,19,19,37,,15054690000.0,629726.475248,3b1e1d69-2b1a-464d-ba43-611c4201b219
72,4f77ef02b983eccc0869c7540f98a7d0,2f1351b41c545ad21e28677f24189364,1393,BARRINGTON SB & OLYMPIC NS,POINT (-118.44800 34.03435),Big Blue Bus Schedule,13.0,18,19,Los Angeles,LOS,19,19,37,,15054690000.0,629726.475248,3b1e1d69-2b1a-464d-ba43-611c4201b219


In [41]:
#average trips per stop by counties is the weighted average.
# these are two new data sets
trip=join.groupby(['COUNTY_NAME']).agg({'route_type_3':'sum'}).reset_index()
stop=join.groupby(['COUNTY_NAME']).agg({'stop_id':'count'}).reset_index()

In [None]:
#join.geometry.name

In [42]:
join.COUNTY_NAME.value_counts()

San Diego         1675
San Bernardino     896
Los Angeles        605
Riverside            1
Name: COUNTY_NAME, dtype: int64

In [None]:
#trip

In [None]:
#stop

In [43]:
# merge the two new datas we just created above
merge1 = pd.merge(stop, trip, on = 'COUNTY_NAME',
    how = 'inner', validate = 'm:1')

In [44]:
merge1

Unnamed: 0,COUNTY_NAME,stop_id,route_type_3
0,Los Angeles,605,30183.0
1,Riverside,1,33.0
2,San Bernardino,896,24538.0
3,San Diego,1675,82674.0


In [26]:
#merge2 = pd.merge(merge1, stop, on = 'feed_key',
    #how = 'inner', validate = 'm:1')

In [45]:
# calculate and add a new column to the data, still the same data with one new column, called 'trip_per_stop'
merge1['trip_per_stop'] = merge1.route_type_3/merge1.stop_id

In [46]:
merge1

Unnamed: 0,COUNTY_NAME,stop_id,route_type_3,trip_per_stop
0,Los Angeles,605,30183.0,49.889256
1,Riverside,1,33.0,33.0
2,San Bernardino,896,24538.0,27.386161
3,San Diego,1675,82674.0,49.357612


### Bring in a new table from BigQuery

* In `mart_gtfs`, bring in the table called `fct_daily_scheduled_stops` for the subset of feeds defined above.
* Modify the snippet below to:
   * filter for the subset of operators
   * only keep columns: `feed_key`, `stop_id`, `stop_event_count`

In [None]:
stop_counts = (
    #tbls.mart_gtfs.fct_daily_scheduled_stops()
    >> filter(_.activity_date == "2022-06-01")
)

In [None]:
stops = pd.read_parquet('./data/exercise_5_stops_sample.parquet')
stops = (stops
    #tbls.mart_gtfs.fct_daily_scheduled_stops()
    >> filter(_.activity_date == "2022-06-01")
    >> select(_.feed_key, _.stop_id, 
             _.stop_event_count)
    >> arrange(_.feed_key, _.stop_id)
    #>> collect() 
)

### Aggregate
* Write a function to aggregate to the operator level or county level, add new columns for desired metrics.
* Merge in CA shapefile to get a gdf.
* Add another `geometry` column, called `centroid`, and grab the county's centroid.
* Refer to [docs](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.set_geometry.html) to see how to pick which column to use as the `geometry` for the gdf, since technically, a gdf can handle multiple geometry columns.

In [47]:
#number of stops an operator has
agg_sum1=stops2.groupby(['stop_name']).agg({'route_type_3':'sum'}).reset_index()

In [48]:
agg_sum1

Unnamed: 0,stop_name,route_type_3
0,10350 Science Center Dr,8.0
1,10th Av & C St,70.0
2,10th Av & Island Av,117.0
3,10th Av & Market St,117.0
4,10th Av & Park Bl (Petco Park),117.0
...,...,...
2897,Yucaipa Trans Ctr,103.0
2898,Zion Av & 51st St,13.0
2899,Zion Av & Cartwright St,13.0
2900,Zion Av & Crawford St (Kaiser),77.0


In [49]:
agg_sum2=stops2.groupby(['name']).agg({'route_type_3':'sum'}).reset_index()

In [50]:
agg_sum2

Unnamed: 0,name,route_type_3
0,Alhambra Schedule,1191.0
1,Big Blue Bus Schedule,20977.0
2,Culver City Schedule,7483.0
3,OmniTrans Schedule,25103.0
4,San Diego Schedule,82674.0


In [51]:
list(join.columns)

['feed_key',
 'stop_key',
 'stop_id',
 'stop_name',
 'geometry',
 'name',
 'route_type_3',
 'index_right',
 'OBJECTID',
 'COUNTY_NAME',
 'COUNTY_ABBREV',
 'COUNTY_NUM',
 'COUNTY_CODE',
 'COUNTY_FIPS',
 'ISLAND',
 'Shape__Area',
 'Shape__Length',
 'GlobalID']

In [52]:
join.head(2)

Unnamed: 0,feed_key,stop_key,stop_id,stop_name,geometry,name,route_type_3,index_right,OBJECTID,COUNTY_NAME,COUNTY_ABBREV,COUNTY_NUM,COUNTY_CODE,COUNTY_FIPS,ISLAND,Shape__Area,Shape__Length,GlobalID
2,180d48eb03829594478082dca5782ccd,e598476f943975d3657c2164506ce82c,616,WESTWOOD MEDICAL PLAZA,POINT (-118.44548 34.06552),Culver City Schedule,85.0,18,19,Los Angeles,LOS,19,19,37,,15054690000.0,629726.475248,3b1e1d69-2b1a-464d-ba43-611c4201b219
9,4f77ef02b983eccc0869c7540f98a7d0,4f03829cd225a77b989d56f1ecbd9a3a,1557,26TH SB & WASHINGTON (SM) FS,POINT (-118.48037 34.03844),Big Blue Bus Schedule,17.0,18,19,Los Angeles,LOS,19,19,37,,15054690000.0,629726.475248,3b1e1d69-2b1a-464d-ba43-611c4201b219


In [56]:
#For each county, calculate the number of operators, stops, and stop events.
agg_gct1=join.groupby(['COUNTY_NAME']).agg({'feed_key':'count'}).reset_index()
agg_gsum1=join.groupby(['COUNTY_NAME']).agg({'route_type_3':'sum'}).reset_index()

In [57]:
agg_gct1

Unnamed: 0,COUNTY_NAME,feed_key
0,Los Angeles,605
1,Riverside,1
2,San Bernardino,896
3,San Diego,1675


In [58]:
agg_gsum1

Unnamed: 0,COUNTY_NAME,route_type_3
0,Los Angeles,30183.0
1,Riverside,33.0
2,San Bernardino,24538.0
3,San Diego,82674.0


In [70]:
merge2 = pd.merge(counties, merge1, on = 'COUNTY_NAME',
    how = 'inner', validate = 'm:1')

#Add another geometry column, called centroid, and grab the county's centroid.
gdf = gpd.GeoDataFrame(
    stops, 
    geometry=gpd.points_from_xy(stops['stop_lon'], stops['stop_lat']),
    crs='EPSG:4326'
)

In [71]:
merge2["centr"] = merge2.geometry.centroid
#merge3_c = merge2.set_geometry("geometry")
#GeoDataFrame.set_geometry(col, crs='EPSG:4326')


  merge2["centr"] = merge2.geometry.centroid


In [72]:
merge2.geometry.name

'geometry'

In [73]:
merge2.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [74]:
merge2.geometry.area


  merge2.geometry.area


0    1.002853
1    0.014070
2    0.018707
3    1.839965
4    5.132310
5    1.064552
dtype: float64

### Visualizations
* Make one chart for comparing trips per stop by operators, and another chart for comparing it by counties. Use a function to do this.
* Make 1 map for comparing trips per stop by counties. Use `gdf.explore()` to do this.
* Visualizations should follow the Cal-ITP style guide: [styleguide example notebook](https://github.com/cal-itp/data-analyses/blob/main/example_report/style-guide-examples.ipynb)
* More on `folium` and `ipyleaflet`: https://github.com/jorisvandenbossche/geopandas-tutorial/blob/master/05-more-on-visualization.ipynb

In [None]:
# To add styleguide
from shared_utils import styleguide
from shared_utils import calitp_color_palette as cp

In [None]:
#merge2.drop_duplicates()

def df_bar(df, xcol, ycol, xlabel=None, ylabel=None, title=None):
    if xlabel is None:
        xlabel = xcol
    if ylabel is None:
        ylabel = xcol    
    ax = textranges_freq.plot(x=xcol,y=ycol,kind='bar',title=title, width=0.5, legend=False)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    plt.show()

def make_bar_chart(df, x_col, y_col):
    x_title = f"{x_col.title()}"

    chart = (alt.Chart(df)
             .mark_bar()
             .encode(
                 x=alt.X(x_col, title=x_title),
                 y=alt.Y(y_col, title=""),
             )
            )
    return chart

In [75]:
def make_bar_chart(merge1, x_col, y_col):
    chart = (alt.Chart(merge1)
             .mark_bar()
             .encode(
                 x=alt.X(x_col),
                 y=alt.Y(y_col)
             )
            )
    return chart


In [76]:
make_bar_chart(merge1, 'COUNTY_NAME','trip_per_stop')

In [None]:
#merge1.plot(x='COUNTY_NAME', y='trip_per_stop', kind='bar')

In [77]:
def make_map(merge2, plot_col):
    m = merge2.explore(plot_col, legend=False)
    return m

In [78]:
make_map(merge2, 'trip_per_stop')