# 6.3: Geographical Visualizations with Python

# Table of Contents

[01. Importing Libraries](#01.-Importing-Libraries)

[02. Importing Data](#02.-Importing-Data)

[03. Creating a choropleth map](#03.-Creating-a-choropleth-map)

[04. Key Questions & Answers](#04.-Key-Questions-&-Answers)

## 01. Importing Libraries

In [1]:
import pandas as pd
import folium
from folium.plugins import HeatMap

In [2]:
# This command propts matplotlib visuals to appear in the notebook 
%matplotlib inline

# 02. Importing Data

In [None]:
# Import Divvy_trip dataset

df = pd.read_pickle( r'E:\Careerfoundry course\My Project\Generated Data\Divvy_cleaned.pkl')

In [16]:
df.head()

Unnamed: 0,trip_id,start_time,end_time,bike_id,trip_duration,from_station_id,from_station_name,to_station_id,to_station_name,user_type,...,to_latitude,to_longitude,to_location,year,month_start,month_end,day,hour_start,hour_end,age
0,8546790,2015-12-31 17:35:00,2015-12-31 17:44:00,979,521,117,Wilton Ave & Belmont Ave,229,Southport Ave & Roscoe St,Subscriber,...,41.943739,-87.66402,POINT (-87.66402 41.943739),2015,12,12,Thursday,17,17,24
1,8546793,2015-12-31 17:37:00,2015-12-31 17:41:00,1932,256,301,Clark St & Schiller St,138,Clybourn Ave & Division St,Subscriber,...,41.904613,-87.640552,POINT (-87.640552 41.904613),2015,12,12,Thursday,17,17,23
2,8546795,2015-12-31 17:37:00,2015-12-31 17:40:00,1693,134,465,Marine Dr & Ainslie St,251,Clarendon Ave & Leland Ave,Subscriber,...,41.967968,-87.650001,POINT (-87.650001 41.967968),2015,12,12,Thursday,17,17,28
3,8546797,2015-12-31 17:38:00,2015-12-31 17:55:00,3370,995,333,Ashland Ave & Blackhawk St,198,Green St (Halsted St) & Madison St,Subscriber,...,41.881892,-87.648789,POINT (-87.648789 41.881892),2015,12,12,Thursday,17,17,40
4,8546798,2015-12-31 17:38:00,2015-12-31 17:41:00,2563,177,48,Larrabee St & Kingsbury St,111,Sedgwick St & Huron St,Subscriber,...,41.894666,-87.638437,POINT (-87.638437 41.894666),2015,12,12,Thursday,17,17,25


# 03. Creating a choropleth map 

In [45]:
# Group the data by starting station latitude and longitude and count the number of trips
trip_counts1 = df.groupby(['from_latitude', 'from_longitude']).size().reset_index(name='trip_count')

# Display the top 5 rows of the aggregated data
print(trip_counts1.head())

   from_latitude  from_longitude  trip_count
0      41.736646      -87.622634         269
1      41.743116      -87.614800         353
2      41.743316      -87.622849         182
3      41.743441      -87.604836          83
4      41.743921      -87.575225         124


In [47]:
# Create a base map centered on Chicago (assuming Divvy is Chicago-based)
m1 = folium.Map(location=[41.8781, -87.6298], zoom_start=12)

# Add the trip count as a HeatMap layer
heat_data1 = [[row['from_latitude'], row['from_longitude'], row['trip_count']] for index, row in trip_counts1.iterrows()]
HeatMap(heat_data1).add_to(m1)


<folium.plugins.heat_map.HeatMap at 0x223f20ea870>

In [49]:
# Save the map as an HTML file and display it
m1.save("divvy_trip_heatmap(from_station).html")

In [50]:
m1

In [51]:
# Group the data by starting station latitude and longitude and count the number of trips
trip_counts2 = df.groupby(['to_latitude', 'to_longitude']).size().reset_index(name='trip_count')

# Display the top 5 rows of the aggregated data
print(trip_counts2.head())

   to_latitude  to_longitude  trip_count
0    41.736646    -87.622634         304
1    41.743116    -87.614800         414
2    41.743316    -87.622849         177
3    41.743441    -87.604836          76
4    41.743921    -87.575225         103


In [52]:
# Create a base map centered on Chicago (assuming Divvy is Chicago-based)
m2 = folium.Map(location=[41.8781, -87.6298], zoom_start=12)

# Add the trip count as a HeatMap layer
heat_data2 = [[row['to_latitude'], row['to_longitude'], row['trip_count']] for index, row in trip_counts2.iterrows()]
HeatMap(heat_data2).add_to(m2)


<folium.plugins.heat_map.HeatMap at 0x223f20e9a00>

In [53]:
# Save the map as an HTML file and display it
m2.save("divvy_trip_heatmap (to_station).html")

In [54]:
m2

## 04. Key Questions & Answers

### 1. Which Stations Are the Most and Least Popular?

In [5]:
# Count trips starting from each station (from_station_name)
start_station_counts = df['from_station_name'].value_counts().reset_index()
start_station_counts.columns = ['station', 'trip_starts']

# Count trips ending at each station (to_station_name)
end_station_counts = df['to_station_name'].value_counts().reset_index()
end_station_counts.columns = ['station', 'trip_ends']

# Merge the start and end counts
station_popularity = pd.merge(start_station_counts, end_station_counts, on='station', how='outer').fillna(0)

# Calculate total trips per station
station_popularity['total_trips'] = station_popularity['trip_starts'] + station_popularity['trip_ends']

# Sort to find most and least popular stations
most_popular_station = station_popularity.sort_values(by='total_trips', ascending=False).head(5)
least_popular_station = station_popularity.sort_values(by='total_trips', ascending=True).head(5)

print("Most popular station:\n", most_popular_station)
print("Least popular station:\n", least_popular_station)

Most popular station:
                           station  trip_starts  trip_ends  total_trips
647       Streeter Dr & Grand Ave       322483     365427       687910
381     Lake Shore Dr & Monroe St       296184     273658       569842
173  Clinton St & Washington Blvd       285063     280581       565644
96            Canal St & Adams St       274589     264674       539263
650           Theater on the Lake       248878     267744       516622
Least popular station:
                             station  trip_starts  trip_ends  total_trips
206       Damen Ave & Garfield Blvd            7          6           13
519          Phillips Ave & 82nd St            5         10           15
246          Elizabeth St & 59th St            8         13           21
604  South Chicago Ave & Elliot Ave           12         10           22
461          Michigan Ave & 71st St           12         16           28


### 2. Are There Station Imbalances (More Starts Than Ends)?

In [7]:
# Calculate imbalance between trip starts and ends
station_popularity['imbalance'] = station_popularity['trip_starts'] - station_popularity['trip_ends']

# Find stations with the largest imbalances (more starts than ends)
imbalanced_stations = station_popularity[station_popularity['imbalance'] != 0].sort_values(by='imbalance', ascending=False)

print("Stations with more starts than ends:\n", imbalanced_stations.head(5))
print("Stations with more ends than starts:\n", imbalanced_stations.tail(5))

Stations with more starts than ends:
                           station  trip_starts  trip_ends  total_trips  \
175     Columbus Dr & Randolph St       198532     129028       327560   
381     Lake Shore Dr & Monroe St       296184     273658       569842   
638  Stetson Ave & South Water St        79810      63883       143693   
100          Canal St & Monroe St        44524      28753        73277   
281       Franklin St & Monroe St       143796     128843       272639   

     imbalance  
175      69504  
381      22526  
638      15927  
100      15771  
281      14953  
Stations with more ends than starts:
                         station  trip_starts  trip_ends  total_trips  \
619      St. Clair St & Erie St       101097     122308       223405   
475             Millennium Park       224894     247904       472798   
648   Streeter Dr & Illinois St       137666     164081       301747   
382  Lake Shore Dr & North Blvd       231508     263770       495278   
647     Streeter 

### 3. Are There Underserved Areas with Fewer Stations?

In [8]:
# Create a map centered on Chicago
m = folium.Map(location=[41.8781, -87.6298], zoom_start=12)

# Add starting station locations to the map
for index, row in df[['from_latitude', 'from_longitude', 'from_station_name']].drop_duplicates().iterrows():
    folium.Marker([row['from_latitude'], row['from_longitude']], popup=row['from_station_name']).add_to(m)

# Save the map as an HTML file
m.save("station_map.html")

# Display the map (if running in Jupyter notebook)
m

## Note:
#### From the density map, it can be concluded that most of the stations are located near the city center and along the coastal areas.