## Task 2.5 Advanced Geospatial Plotting
#### 1. Import libraries
#### 2. Read in data from previous task
#### 3. Data Wrangling
#### 4. Create new column with the value of 1. Then create a new aggregated dataframe that contains 3 columns: starting station, ending station, and the count of trips between those stations.
#### 5. Initialize an instance of a kepler.gl map.
#### 6. Customize the output of your map—adjust the color of the points and add an arc that         connects them. Pick a suitable color palette for each. In a Markdown section, explain what settings you changed and why.
#### 7. Add a filter to your map and use it to see what the most common trips are in New York City. What else makes an impression? For example, are there any zones that seem particularly busy? Using some additional research, write a few sentences to make sense of that output.
#### 8. Create a config object and save your map with it.

### 1. Import Libraries

In [1]:
# Importing libraries
import pandas as pd
import os
from keplergl import KeplerGl
from pyproj import CRS 
import numpy as np
from matplotlib import pyplot as plt

### 2. Read in Data from previous task

In [2]:
# Read in data
df = pd.read_csv('LaGuardia_data.csv', index_col = 0)

  df = pd.read_csv('LaGuardia_data.csv', index_col = 0)


### 3. Data Wrangling

In [3]:
df.dtypes

Unnamed: 0              int64
ride_id                object
rideable_type          object
started_at             object
ended_at               object
start_station_name     object
start_station_id       object
end_station_name       object
end_station_id         object
start_lat             float64
start_lng             float64
end_lat               float64
end_lng               float64
member_casual          object
avgTemp               float64
bike_rides_daily        int64
_merge                 object
dtype: object

In [4]:
print(df.columns)

Index(['Unnamed: 0', 'ride_id', 'rideable_type', 'started_at', 'ended_at',
       'start_station_name', 'start_station_id', 'end_station_name',
       'end_station_id', 'start_lat', 'start_lng', 'end_lat', 'end_lng',
       'member_casual', 'avgTemp', 'bike_rides_daily', '_merge'],
      dtype='object')


In [5]:
df.shape

(29838166, 17)

In [6]:
#Grouping and aggregate start_station_name with start and latitude and longitude
start_station_coordinates = df.groupby('start_station_name')[['start_lat', 'start_lng']].mean().reset_index() 

In [7]:
start_station_coordinates

Unnamed: 0,start_station_name,start_lat,start_lng
0,1 Ave & E 110 St,40.792333,-73.938283
1,1 Ave & E 16 St,40.732222,-73.981654
2,1 Ave & E 18 St,40.733838,-73.980533
3,1 Ave & E 30 St,40.741452,-73.975356
4,1 Ave & E 39 St,40.747159,-73.971122
...,...,...,...
1756,Wyckoff Ave & Gates Ave,40.699863,-73.911712
1757,Wyckoff St & 3 Ave,40.682775,-73.982615
1758,Wyckoff St & Nevins St,40.683421,-73.984265
1759,Wythe Ave & Metropolitan Ave,40.716898,-73.963191


In [8]:
#Grouping and aggregate end_station_name with end and latitude and longitude
end_station_coordinates = df.groupby('end_station_name')[['end_lat','end_lng']].mean().reset_index()

In [9]:
end_station_coordinates

Unnamed: 0,end_station_name,end_lat,end_lng
0,1 Ave & E 110 St,40.792327,-73.938300
1,1 Ave & E 16 St,40.732219,-73.981656
2,1 Ave & E 18 St,40.733812,-73.980544
3,1 Ave & E 30 St,40.741444,-73.975361
4,1 Ave & E 39 St,40.747140,-73.971130
...,...,...,...
1836,Wyckoff St & 3 Ave,40.682755,-73.982586
1837,Wyckoff St & Nevins St,40.683426,-73.984275
1838,Wythe Ave & Metropolitan Ave,40.716887,-73.963198
1839,Yankee Ferry Terminal,40.687066,-74.016756


### 4. Create new column with the value of 1. Then create a new aggregated dataframe that contains 3 columns: starting station, ending station, and the count of trips between those stations.

In [10]:
# Create a value column and group by start and end station and rename columns
df['trip_count'] = 1
df_group = df.groupby(['start_station_name', 'end_station_name'])['trip_count'].sum().reset_index()
#df_group.rename(columns={'start_station_name':'start_station','end_station_name' : 'end_station','trip_count': 'count_of_trips'}, inplace=True)

In [11]:
df_group

Unnamed: 0,start_station_name,end_station_name,trip_count
0,1 Ave & E 110 St,1 Ave & E 110 St,791
1,1 Ave & E 110 St,1 Ave & E 18 St,2
2,1 Ave & E 110 St,1 Ave & E 30 St,4
3,1 Ave & E 110 St,1 Ave & E 39 St,1
4,1 Ave & E 110 St,1 Ave & E 44 St,12
...,...,...,...
1013392,Yankee Ferry Terminal,Water St & Main St,4
1013393,Yankee Ferry Terminal,West St & Chambers St,6
1013394,Yankee Ferry Terminal,West St & Liberty St,4
1013395,Yankee Ferry Terminal,West Thames St,1


In [12]:
# Merge the average starting station lat and lng with df_group -- merge on starting station name
df_group2 = df_group.merge(start_station_coordinates, on="start_station_name", how="inner")
df_group2

Unnamed: 0,start_station_name,end_station_name,trip_count,start_lat,start_lng
0,1 Ave & E 110 St,1 Ave & E 110 St,791,40.792333,-73.938283
1,1 Ave & E 110 St,1 Ave & E 18 St,2,40.792333,-73.938283
2,1 Ave & E 110 St,1 Ave & E 30 St,4,40.792333,-73.938283
3,1 Ave & E 110 St,1 Ave & E 39 St,1,40.792333,-73.938283
4,1 Ave & E 110 St,1 Ave & E 44 St,12,40.792333,-73.938283
...,...,...,...,...,...
1013392,Yankee Ferry Terminal,Water St & Main St,4,40.687067,-74.016754
1013393,Yankee Ferry Terminal,West St & Chambers St,6,40.687067,-74.016754
1013394,Yankee Ferry Terminal,West St & Liberty St,4,40.687067,-74.016754
1013395,Yankee Ferry Terminal,West Thames St,1,40.687067,-74.016754


In [13]:
# Merge the average ending station lat and lng with df_group -- merge on ending station name
df_group3 = df_group2.merge(end_station_coordinates, on="end_station_name", how="inner")
df_group3

Unnamed: 0,start_station_name,end_station_name,trip_count,start_lat,start_lng,end_lat,end_lng
0,1 Ave & E 110 St,1 Ave & E 110 St,791,40.792333,-73.938283,40.792327,-73.938300
1,1 Ave & E 110 St,1 Ave & E 18 St,2,40.792333,-73.938283,40.733812,-73.980544
2,1 Ave & E 110 St,1 Ave & E 30 St,4,40.792333,-73.938283,40.741444,-73.975361
3,1 Ave & E 110 St,1 Ave & E 39 St,1,40.792333,-73.938283,40.747140,-73.971130
4,1 Ave & E 110 St,1 Ave & E 44 St,12,40.792333,-73.938283,40.750020,-73.969053
...,...,...,...,...,...,...,...
1013392,Yankee Ferry Terminal,Water St & Main St,4,40.687067,-74.016754,40.703212,-73.990409
1013393,Yankee Ferry Terminal,West St & Chambers St,6,40.687067,-74.016754,40.717548,-74.013221
1013394,Yankee Ferry Terminal,West St & Liberty St,4,40.687067,-74.016754,40.711444,-74.014847
1013395,Yankee Ferry Terminal,West Thames St,1,40.687067,-74.016754,40.708347,-74.017134


In [14]:
#Renaming columns
df_group3.rename(columns={'start_station_name':'start_station',
                          'end_station_name' : 'end_station',
                          'trip_count': 'count_of_trips',
                          'start_lat':'start_latitude',
                          'start_lng':'start_longitude',
                          'end_lat':'end_latitude',
                          'end_lng':'end_longitude'},
                 inplace=True)

In [15]:
df_group3

Unnamed: 0,start_station,end_station,count_of_trips,start_latitude,start_longitude,end_latitude,end_longitude
0,1 Ave & E 110 St,1 Ave & E 110 St,791,40.792333,-73.938283,40.792327,-73.938300
1,1 Ave & E 110 St,1 Ave & E 18 St,2,40.792333,-73.938283,40.733812,-73.980544
2,1 Ave & E 110 St,1 Ave & E 30 St,4,40.792333,-73.938283,40.741444,-73.975361
3,1 Ave & E 110 St,1 Ave & E 39 St,1,40.792333,-73.938283,40.747140,-73.971130
4,1 Ave & E 110 St,1 Ave & E 44 St,12,40.792333,-73.938283,40.750020,-73.969053
...,...,...,...,...,...,...,...
1013392,Yankee Ferry Terminal,Water St & Main St,4,40.687067,-74.016754,40.703212,-73.990409
1013393,Yankee Ferry Terminal,West St & Chambers St,6,40.687067,-74.016754,40.717548,-74.013221
1013394,Yankee Ferry Terminal,West St & Liberty St,4,40.687067,-74.016754,40.711444,-74.014847
1013395,Yankee Ferry Terminal,West Thames St,1,40.687067,-74.016754,40.708347,-74.017134


### 5. Initialize an instance of a kepler.gl map.

In [16]:
df_group3.to_csv('df_group3_locations_for_map.csv')

In [17]:
# Create KeplerGl instance
m = KeplerGl(height = 700, data={"data_1": df_group3})
m

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(data={'data_1':                  start_station            end_station  count_of_trips  \
0           …

#### Q5 comment: I chose a color palette that uses blue for start stations and orange for end stations, ensuring a clear visual distinction between the two. For the arc lines, I opted for a purple-to-light-yellow gradient, which complements the station colors without directly matching them. Since the map features a dark background, I carefully selected colors that provide strong contrast to enhance readability and visual clarity.

#### Q6 comment: The bike route from North Moore St & Greenwich St (start station) to Vesey St & Church St (end station) has recorded 4,523 trips. The starting location at North Moore St & Greenwich St is just a 4-minute walk from the subway and is situated in the TriBeCa neighborhood of Manhattan, known for its vibrant atmosphere. The end station, Vesey St & Church St, is located near the iconic One World Observatory in downtown Manhattan. Church Street, though relatively short, is a heavily traveled north-south corridor in lower Manhattan. Given the area's dense population and its proximity to popular tourist destinations, it's no surprise that this route experiences a high volume of trips. 

In [18]:
config = m.config

In [19]:
config

{}

In [20]:
import json
with open("config.json", "w") as outfile:
    json.dump(config, outfile)

In [21]:
m.save_to_html(file_name = 'Citi_bike Trips Aggregated.html', read_only = False, config = config)

Map saved to Citi_bike Trips Aggregated.html!
