## Exercise 2.5 – Advanced Geospatial Plotting

In this exercise, I explored geospatial visualization using **Kepler.gl** with the New York CitiBike 2022 dataset.  
The goal was to map bike trips across the city, understand route density, and identify areas of high activity.  
I used the dataset created in the previous exercise, which combines CitiBike trip data with weather data from NOAA.


In [2]:
import pandas as pd
from keplergl import KeplerGl


In [3]:
import pandas as pd

# Load the merged CitiBike + weather dataset
df = pd.read_csv("../temp_storage/data_raw/citibike_weather_2022.csv")

print(df.shape)
df.head(3)
# Add a column representing each trip
df["trip_count"] = 1
df.head(3)


(895485, 17)


Unnamed: 0.1,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual,Unnamed: 0,date,avgTemp,_merge,trip_count
0,919C40A703A965D7,electric_bike,2022-04-15 15:02:20,2022-04-15 15:57:16,Pershing Field,JC024,Pershing Field,JC024,40.742677,-74.051789,40.742677,-74.051789,casual,,2022-04-15,12.6,both,1
1,3B40831921DAD6C1,classic_bike,2022-04-08 10:20:55,2022-04-08 10:25:29,JC Medical Center,JC011,Grand St,JC102,40.71654,-74.049638,40.715178,-74.037683,member,,2022-04-08,11.4,both,1
2,69C5C0766309F73D,electric_bike,2022-04-09 14:14:04,2022-04-09 14:18:51,Mama Johnson Field - 4 St & Jackson St,HB404,Southwest Park - Jackson St & Observer Hwy,HB401,40.74314,-74.040041,40.737551,-74.041664,member,,2022-04-09,10.5,both,1


In [4]:
df.columns.tolist()


['ride_id',
 'rideable_type',
 'started_at',
 'ended_at',
 'start_station_name',
 'start_station_id',
 'end_station_name',
 'end_station_id',
 'start_lat',
 'start_lng',
 'end_lat',
 'end_lng',
 'member_casual',
 'Unnamed: 0',
 'date',
 'avgTemp',
 '_merge',
 'trip_count']

In [5]:
# Each row represents one trip, so assign a count of 1
df["trip_count"] = 1

# Confirm the column was added
df.head(3)


Unnamed: 0.1,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual,Unnamed: 0,date,avgTemp,_merge,trip_count
0,919C40A703A965D7,electric_bike,2022-04-15 15:02:20,2022-04-15 15:57:16,Pershing Field,JC024,Pershing Field,JC024,40.742677,-74.051789,40.742677,-74.051789,casual,,2022-04-15,12.6,both,1
1,3B40831921DAD6C1,classic_bike,2022-04-08 10:20:55,2022-04-08 10:25:29,JC Medical Center,JC011,Grand St,JC102,40.71654,-74.049638,40.715178,-74.037683,member,,2022-04-08,11.4,both,1
2,69C5C0766309F73D,electric_bike,2022-04-09 14:14:04,2022-04-09 14:18:51,Mama Johnson Field - 4 St & Jackson St,HB404,Southwest Park - Jackson St & Observer Hwy,HB401,40.74314,-74.040041,40.737551,-74.041664,member,,2022-04-09,10.5,both,1


In [6]:
# Aggregate total trips between start and end stations
df_agg = (
    df.groupby(["start_station_name", "end_station_name"])
    .agg({"trip_count": "sum"})
    .reset_index()
    .sort_values("trip_count", ascending=False)
)

print(df_agg.shape)
df_agg.head(10)


(6953, 3)


Unnamed: 0,start_station_name,end_station_name,trip_count
3657,Hoboken Terminal - Hudson St & Hudson Pl,Hoboken Ave at Monmouth St,5565
6290,South Waterfront Walkway - Sinatra Dr & 1 St,South Waterfront Walkway - Sinatra Dr & 1 St,5439
4976,Marin Light Rail,Grove St PATH,4113
3574,Hoboken Ave at Monmouth St,Hoboken Terminal - Hudson St & Hudson Pl,4083
3135,Grove St PATH,Marin Light Rail,3973
181,12 St & Sinatra Dr N,South Waterfront Walkway - Sinatra Dr & 1 St,3964
4503,Liberty Light Rail,Liberty Light Rail,3696
6211,South Waterfront Walkway - Sinatra Dr & 1 St,12 St & Sinatra Dr N,3495
3201,Hamilton Park,Grove St PATH,3203
5722,Newport Pkwy,Newport Pkwy,3131


In [7]:
df = pd.read_csv("../temp_storage/data_raw/citibike_weather_2022.csv")
print(df.shape)
df.head(3)


(895485, 17)


Unnamed: 0.1,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual,Unnamed: 0,date,avgTemp,_merge
0,919C40A703A965D7,electric_bike,2022-04-15 15:02:20,2022-04-15 15:57:16,Pershing Field,JC024,Pershing Field,JC024,40.742677,-74.051789,40.742677,-74.051789,casual,,2022-04-15,12.6,both
1,3B40831921DAD6C1,classic_bike,2022-04-08 10:20:55,2022-04-08 10:25:29,JC Medical Center,JC011,Grand St,JC102,40.71654,-74.049638,40.715178,-74.037683,member,,2022-04-08,11.4,both
2,69C5C0766309F73D,electric_bike,2022-04-09 14:14:04,2022-04-09 14:18:51,Mama Johnson Field - 4 St & Jackson St,HB404,Southwest Park - Jackson St & Observer Hwy,HB401,40.74314,-74.040041,40.737551,-74.041664,member,,2022-04-09,10.5,both


In [8]:

import pandas as pd
from keplergl import KeplerGl

# Initialize Kepler map
map_1 = KeplerGl(height=600)

# Add the CitiBike dataframe to the map
map_1.add_data(data=df, name="CitiBike 2022")

# Display it
map_1


User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


Out of range float values are not JSON compliant
Supporting this message is deprecated in jupyter-client 7, please make sure your message is JSON-compliant
  content = self.pack(content)


KeplerGl(data={'CitiBike 2022': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1…

In [12]:
import json

# Save Kepler.gl configuration inside the notebooks folder
with open("config.json", "w") as f:
    json.dump(map_1.config, f)

print("done")


done


In [13]:
import os
os.listdir()


['st_dashboard.py',
 '2.4_fundamentals_of_visualization_libraries_part2.ipynb',
 'config.json',
 '2.6_creating_dashboards_with_python.ipynb',
 '2.2_citibike_weather_merge.ipynb',
 '2.3_fundamentals_of_visualization_libraries.ipynb',
 '2.5_advanced_geospatial_plotting.ipynb']

In [17]:
df.columns.tolist()


['ride_id',
 'rideable_type',
 'started_at',
 'ended_at',
 'start_station_name',
 'start_station_id',
 'end_station_name',
 'end_station_id',
 'start_lat',
 'start_lng',
 'end_lat',
 'end_lng',
 'member_casual',
 'Unnamed: 0',
 'date',
 'avgTemp',
 '_merge']

In [5]:
# Keep only relevant columns for mapping
df_map = df[["start_lat", "start_lng", "end_lat", "end_lng", "member_casual"]].dropna()

# Optional: sample smaller subset (for faster performance)
df_map = df_map.sample(n=5000, random_state=42)

df_map.head()


Unnamed: 0,start_lat,start_lng,end_lat,end_lng,member_casual
603141,40.717752,-74.043856,40.722104,-74.071455,member
336703,40.746503,-74.038114,40.750604,-74.02402,casual
793991,40.749943,-74.035865,40.75409,-74.0316,member
767329,40.716843,-74.032879,40.721124,-74.038051,member
748057,40.718089,-74.083355,40.718211,-74.083639,casual


In [6]:
# Initialize Kepler map
map_2 = KeplerGl(height=600)

# Add reduced CitiBike dataset
map_2.add_data(data=df_map, name="CitiBike Routes 2022")

# Show map
map_2


User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(data={'CitiBike Routes 2022': {'index': [603141, 336703, 793991, 767329, 748057, 628046, 387568, 4781…

In [7]:
# Export the map to an interactive HTML file
map_2.save_to_html(file_name="../temp_storage/citibike_routes_map.html")


Map saved to ../temp_storage/citibike_routes_map.html!


In [8]:
df_members = df_map[df_map["member_casual"] == "member"]

map_3 = KeplerGl(height=600)
map_3.add_data(data=df_members, name="Members Only")
map_3


User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(data={'Members Only': {'index': [603141, 793991, 767329, 387568, 489555, 404754, 245952, 392009, 5699…

### Summary

By customizing the map’s layers, colors, and filters, I was able to highlight the most frequent bike routes in New York City.  
The visualization showed clear clusters of activity in **Manhattan**, especially around Midtown and Downtown, where station density and commuter traffic are highest.  
After reviewing the patterns, I exported the map as an interactive HTML file and pushed the notebook to the project’s GitHub repository.

**Repository link:** [https://github.com/boukaskasbrahim/New-York-s-CitiBike-trips-in-2022](https://github.com/boukaskasbrahim/New-York-s-CitiBike-trips-in-2022)
