 ## Importing the libraries used for this Exercise.

In [1]:
import pandas as pd
import os
from keplergl import KeplerGl
from pyproj import CRS 
import numpy as np
from matplotlib import pyplot as plt

## Read in your data from the previous task.

In [2]:
# Read the CSV file full merged
df = pd.read_csv('Full grouped4.csv')

# Display the DataFrame
df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual,date,AVGTemperature (°C),_merge,bike_rides_daily,merge_2,value
0,115C78C3039FFA89,electric_bike,2022-01-01 09:21:14,2022-01-01 09:35:46,Essex Light Rail,JC038,Essex Light Rail,JC038,40.712774,-74.036486,40.712774,-74.036486,member,2022-01-01,2.0,both,592,both,1
1,7FFD810CAA7A919E,classic_bike,2022-01-01 02:43:56,2022-01-01 02:43:57,12 St & Sinatra Dr N,HB201,12 St & Sinatra Dr N,HB201,40.750604,-74.02402,40.750604,-74.02402,member,2022-01-01,2.0,both,592,both,1
2,E715E8432031B72C,classic_bike,2022-01-01 02:13:33,2022-01-01 02:18:42,Essex Light Rail,JC038,Washington St,JC098,40.712774,-74.036486,40.724294,-74.035483,member,2022-01-01,2.0,both,592,both,1
3,BF1B7B1E1961A87B,electric_bike,2022-01-01 17:18:46,2022-01-01 18:55:25,Grand St,JC102,W 27 St & 7 Ave,6247.06,40.715178,-74.037683,40.746647,-73.993915,casual,2022-01-01,2.0,both,592,both,1
4,4A01F0E53C6F4386,electric_bike,2022-01-01 11:23:32,2022-01-01 11:29:27,Christ Hospital,JC034,Hoboken Terminal - Hudson St & Hudson Pl,HB101,40.734786,-74.050444,40.735938,-74.030305,member,2022-01-01,2.0,both,592,both,1


## create a new column with the value of 1. 
## Then create a new aggregated dataframe that contains 3 columns: starting station, ending station, and the count of trips between those stations.

In [3]:
df['value1'] = 1
df_group = df.groupby(['start_station_name', 'end_station_name'])['value1'].count().reset_index()

In [4]:
# Create a value and group by start and end station

df['value1'] = 1
df_group = df.groupby(['start_station_name', 'end_station_name'])['value1'].count().reset_index() 

df_group.head()

Unnamed: 0,start_station_name,end_station_name,value1
0,11 St & Washington St,11 St & Washington St,1132
1,11 St & Washington St,12 Ave & W 40 St,1
2,11 St & Washington St,12 St & Sinatra Dr N,253
3,11 St & Washington St,14 St Ferry - 14 St & Shipyard Ln,395
4,11 St & Washington St,4 St & Grand St,350


In [5]:
## Checking if the data is intact

print(df_group['value1'].sum())
print(df.shape)

892281
(895485, 20)


In [6]:
## Renaming the value1 column

df_group.rename(columns = {'value1': 'Trips'}, inplace = True)

## Initialize an instance of a kepler.gl map.

In [8]:
# Create KeplerGl instance

m = KeplerGl(height = 700, data={"data_1": df})

# Display the map
m

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


Out of range float values are not JSON compliant
Supporting this message is deprecated in jupyter-client 7, please make sure your message is JSON-compliant
  content = self.pack(content)


KeplerGl(data={'data_1':                  ride_id  rideable_type           started_at  \
0       115C78C3039FF…

## What settings I changed

Changing Colors for Start and End Stations:
I selected red for start stations and yellow for end stations. This makes the start and end points easily distinguishable, especially if there’s a large set of points. Red and yellow are high-contrast colors, so they’ll stand out better against the map background and any other layers.

Editing the Colors for Arcs (Source and Target):
By using lighter colors for the arcs, I am making them easier to see and follow. Lighter colors are also less visually aggressive and make the map look cleaner, especially when dealing with dense data. This allows the arcs to remain visible without overwhelming the viewer.

Reducing Stroke Range:
Before: The arcs might have been too thick, making them appear as large lines that could obscure parts of the map or other data.
I have reduced the stroke range to 2, the arcs become thinner but still visible. This allows for a more refined view, helping users focus on the paths and connections without the arcs dominating the map’s display. It also ensures that the map looks less cluttered, which is especially important when working with a large set of data points and arcs.

From my analysis, it is evident that Jersey City serves as a central hub for numerous start and stop stations, positioning it as a key location in the region's transportation network. This high concentration of stations likely contributes to a significant number of trips originating from or passing through the city. Several factors may explain why Jersey City is such a prominent location for these transit movements:

One of the standout features of Jersey City is its miles of recreational spaces, particularly along the waterfront. The panoramic views of the Manhattan skyline from locations like Liberty State Park and the Jersey City Waterfront Walkway offer stunning, unobstructed vistas that make for great photo opportunities and relaxing experiences. These areas attract both locals and tourists seeking to enjoy some fresh air, take a leisurely walk, or simply admire the view of iconic NYC landmarks such as the Statue of Liberty, One World Trade Center, and the Brooklyn Bridge.

## Create a config object and save your map with it.

In [9]:
config = m.config

In [11]:
## Saving as HTML

m.save_to_html(file_name='New York CitiBikes Bikes Trips Aggregated.html', read_only = False, config = config)

Map saved to New York CitiBikes Bikes Trips Aggregated.html!


In [12]:
import json
with open("config.json", "w") as outfile:
    json.dump(config, outfile)

In [13]:
## Exporting files

## Saving the df as a csv
df_group.to_csv('Full grouped5.csv', index=False)