# Goal: I want to understand the movement of the fleet.
So basicaly i want to find general tendencies of where the fleet moves to over time through the City of Chicago. For that, i will create and calculate the Bikeflow-Index for every District of Chicago:

The Bikeflow-Index is a positive or negative number, depending on whether a district "loses" or "gains" Bikes over a specific time period. The Bikeflow-Index for a specific District is calculated by adding up all trips that end in this district and substracting the number of trips starting in this district. That way, trips that start and end in the same district, don't have any effect on the Bikeflow-Index.

A District with a strong negative Bikeflow Index needs to be resupplied with bikes from districts with positive Bikeflow indexes.



1. Loading community_areas table and 2022_trips data into DataFrames
2. creating function, that calculates the Bikeflow-Index (BiFi) for every District of Chicago
3. Create Heatmap of Bikeflow-Index
4. Identifying patterns of customer behavior (Morning Afternoon)

To be able to see the Visualizations, use this Link:
[Notebook Viewer](https://nbviewer.org/github/Brettmett/Divvy_Bikeshare_Chicago/tree/main/)

In [3]:
# pandas
import pandas as pd

import psycopg2

# additional import of the geopandas package
import geopandas as gpd

# numpy
import numpy as np

# import mathplotlib.pyplot as plt
import matplotlib.pyplot as plt

# shapely.geometry      Package shapely.geomerty is usefull to for checking, weather a oint is inside a polygon and converting string type
from shapely import wkt
from shapely.geometry import Polygon, LineString, Point, MultiLineString

# importing self made functions from sql_functions script
import sql_functions as sf

## 1. Loading community_areas table and 2022_trips data into DataFrames

In [4]:
# Schemaname
schema = "capstone_divvy_bikeshare"

In [5]:
# Loading Table 2022_v4 from SQL into DataFrame
df_2022 = sf.get_dataframe(f"select rideable_type, starttime, stoptime, member_casual, time_difference_seconds, airdist_meters, start_area_number, end_area_number from {schema}.trips_2022_v4")

In [7]:
# loading the community_areas from SQL database:
df_areas = sf.get_dataframe(f"select * from {schema}.community_areas")
df_areas.head(3)

Unnamed: 0,area_number,community_name,new_geometry,bikeroad_meters,area_km^2,meter_road_per_km^2,population,pop_density_per_km^2,all_bikeroads
0,1,Rogers Park,"POLYGON ((-87.654556 41.998166, -87.655737 41....",11801,4.762087,2478,54991,11547,"[<LINESTRING (-87.661 42.003, -87.661 42.003, ..."
1,2,West Ridge,"POLYGON ((-87.684653 42.019485, -87.684639 42....",6586,9.144226,720,71942,7867,"[<LINESTRING (-87.683 42.012, -87.683 42.012, ..."
2,3,Uptown,"POLYGON ((-87.641024 41.954803, -87.643996 41....",15682,6.047446,2593,56362,9319,"[<LINESTRING (-87.655 41.962, -87.655 41.962, ..."


Converting df_areas to GeodataFrame

In [8]:
gdf_areas = sf.to_gdf_new(df_areas,geometry_column="new_geometry")

In [9]:
gdf_areas.explore()

## 2. Creating function, that calculates the Bikeflow-Index (BiFi) for every District of Chicago

In [16]:
def get_bifi(df_trips):
    '''
    Input:  trips dataframe
    Output: returns a Dataframe with columns "area_number" and "bifi_index", 
            where rbifi_index is the difference between
            total number of trips ending in a district and the total number of trips 
            starting in a district. A negative number indicates, that there are 
            more bikes/trips Leaving the area than coming in.
            '''
    df_rent_return = pd.DataFrame((df_trips["end_area_number"].value_counts() - df_trips["start_area_number"].value_counts()))
    df_rent_return.reset_index(inplace=True)
    df_rent_return.columns = ["area_number", "bifi_index"]
    # how to deal with NaN Values in the rent_return_index column:
    # adding a helper column with
    df_rent_return["isna"] = df_rent_return["bifi_index"].isna()
    
    # Loop that, deals with NaN values, if there are some and calculates the rent_return_index manualy, by looking at the shape of filtered dataframes...
    for i in df_rent_return.index:
        area_number = df_rent_return["area_number"][i]
        #print("area_number: ",area_number)
        #print("index:", i, "value:", test_df.loc[i,"rent_return_index"])
        if df_rent_return["isna"][i] == True:
                df_rent_return.loc[i,"bifi_index"] = (df_trips[df_trips["end_area_number"] == area_number]).shape[0] - (df_trips[df_trips["start_area_number"] == area_number]).shape[0]
    df_rent_return.drop(columns="isna", inplace= True)
    return df_rent_return

In [17]:
df_bifi_22 = get_bifi(df_2022)
df_bifi_22.head(40)

Unnamed: 0,area_number,bifi_index
0,1,-2.0
1,2,839.0
2,3,-726.0
3,4,1614.0
4,5,2674.0
5,6,1582.0
6,7,6058.0
7,8,-9981.0
8,9,28.0
9,10,74.0


As we can see, some districts have a high negative Bikeflow-Index. Over the period of one year, theCommunity Area Loop (32) "loses" close to 15.000 Bikes.

#### Merging "df_bifi_22" and "community areas" tables:

In [18]:
gdf_areas.head(2)

Unnamed: 0,area_number,community_name,new_geometry,bikeroad_meters,area_km^2,meter_road_per_km^2,population,pop_density_per_km^2,all_bikeroads
0,1,Rogers Park,"POLYGON ((-87.65456 41.99817, -87.65574 41.998...",11801,4.762087,2478,54991,11547,"[<LINESTRING (-87.661 42.003, -87.661 42.003, ..."
1,2,West Ridge,"POLYGON ((-87.68465 42.01949, -87.68464 42.019...",6586,9.144226,720,71942,7867,"[<LINESTRING (-87.683 42.012, -87.683 42.012, ..."


In [19]:
df_bifi_22.head(2)

Unnamed: 0,area_number,bifi_index
0,1,-2.0
1,2,839.0


In [21]:
#### Merging "df_bifi_22" and "community areas" on area_number:

gdf_areas_bifi = gdf_areas.merge(df_bifi_22, how="left", on= "area_number")

In [22]:
gdf_areas_bifi

Unnamed: 0,area_number,community_name,new_geometry,bikeroad_meters,area_km^2,meter_road_per_km^2,population,pop_density_per_km^2,all_bikeroads,bifi_index
0,1,Rogers Park,"POLYGON ((-87.65456 41.99817, -87.65574 41.998...",11801,4.762087,2478,54991,11547,"[<LINESTRING (-87.661 42.003, -87.661 42.003, ...",-2.0
1,2,West Ridge,"POLYGON ((-87.68465 42.01949, -87.68464 42.019...",6586,9.144226,720,71942,7867,"[<LINESTRING (-87.683 42.012, -87.683 42.012, ...",839.0
2,3,Uptown,"POLYGON ((-87.64102 41.95480, -87.64400 41.954...",15682,6.047446,2593,56362,9319,"[<LINESTRING (-87.655 41.962, -87.655 41.962, ...",-726.0
3,4,Lincoln Square,"POLYGON ((-87.67441 41.97610, -87.67440 41.976...",12205,6.628769,1841,39493,5957,"[<LINESTRING (-87.684 41.961, -87.684 41.961)>...",1614.0
4,5,North Center,"POLYGON ((-87.67336 41.93234, -87.67342 41.932...",10533,5.300415,1987,31867,6012,"[<LINESTRING (-87.674 41.958, -87.674 41.958, ...",2674.0
...,...,...,...,...,...,...,...,...,...,...
73,74,Mount Greenwood,"POLYGON ((-87.69646 41.70714, -87.69644 41.706...",0,7.021987,0,19093,2719,[],-43.0
74,75,Morgan Park,"POLYGON ((-87.64215 41.68508, -87.64249 41.685...",3138,8.535506,367,22544,2641,"[<MULTILINESTRING ((-87.645 41.685, -87.645 41...",-43.0
75,76,Ohare,"POLYGON ((-87.83658 41.98640, -87.83658 41.986...",0,34.379604,0,12756,371,[],9.0
76,77,Edgewater,"POLYGON ((-87.65456 41.99817, -87.65456 41.998...",11652,4.501053,2588,56521,12557,"[<LINESTRING (-87.657 41.987, -87.657 41.987, ...",2805.0


## 3. Create Heatmap of Bikeflow-Index

In [25]:
import folium

m = gdf_areas_bifi.explore(
    column="bifi_index",  # make choropleth based on "rent_return_index" column
    scheme="naturalbreaks",  # use mapclassify's natural breaks scheme
    legend=True,  # show legend
    # vmin=-15000,
    # vmax=8000,
    cmap='RdYlGn',
    style_kwds= {"weight":0.8, "color":"gray"},
    k=20,  # use 10 bins
    tooltip=False,  # hide tooltip
    popup=["community_name", "area_number","pop_density_per_km^2","population", "bifi_index"],  # show popup (on-click)
    legend_kwds=dict(scale=False),  # do not use colorbar
    name="BiFi 2022",  # name of the layer in the map
)
# gdf_areas_bifi.explore(
#     m=m,
#     column="bifi_index_23",  # make choropleth based on "bifi-index" column
#     scheme="naturalbreaks",  # use mapclassify's natural breaks scheme
#     legend=True,  # show legend
#     #vmin=-50000,
#     #vmax=40000,
#     cmap='RdYlGn',
#     style_kwds= {"weight":0.8},
#     k=20,  # use 10 bins
#     tooltip=False,  # hide tooltip
#     popup=["community_name", "area_number","pop_density_per_km^2","population", "bifi_index_20"],  # show popup (on-click)
#     #legend_kwds=dict(colorbar=False),  # do not use colorbar
#     legend_kwds=dict(scale=False),
#     name="BiFi 2020",  # name of the layer in the map
# )
folium.TileLayer("CartoDB positron", show=False).add_to(m)  # use folium to add alternative tiles
folium.LayerControl().add_to(m)  # use folium to add layer control

m  # show map

To be able to see the Visualizations, use this Link:
[Notebook Viewer](https://nbviewer.org/github/Brettmett/Divvy_Bikeshare_Chicago/tree/main/)

As we see on the heatmap, downtown areas tend to "lose" bikes over the period of one Year. Areas north of Downdown "gain" bikes.

## 4. Identifying patterns of customer behavior (different behavior during different times of the day)

#### 4.1 Mornings vs. Afternoons
In The following, i am going to divide the day in two parts: 
+ First half of the day (0-12)
+ Second half of the day (12-0)

In [33]:
# 2022
gdf_2022_mor = df_2022[df_2022["starttime"].dt.hour <12]
gdf_2022_aft = df_2022[df_2022["starttime"].dt.hour >=12]

In [34]:
# Calculating the Bikeflow indexes by calling get_bifi() function:
df_bifi_mor = get_bifi(df_2022_mor)
df_bifi_aft = get_bifi(df_2022_aft)
df_bifi_mor.head(40)

Unnamed: 0,area_number,bifi_index
0,1,-1486.0
1,2,-1947.0
2,3,-6114.0
3,4,1747.0
4,5,-253.0
5,6,-20250.0
6,7,-18639.0
7,8,17077.0
8,9,21.0
9,10,33.0


In [48]:
# merging the Bikeflow-Index dataframe, with community areas dataframe:
gdf_2022_bifi_mor = gdf_areas.merge(df_bifi_mor, how="left", on="area_number")
gdf_2022_bifi_mor.rename(columns={"bifi_index":"bifi_index_mornings"}, inplace= True)
gdf_2022_bifi_aft = gdf_areas.merge(df_bifi_aft, how="left", on="area_number")
gdf_2022_bifi_aft.rename(columns={"bifi_index":"bifi_index_afternoons"}, inplace=True)

In [50]:
import folium

m = gdf_2022_bifi_mor.explore(
    column="bifi_index_mornings",  # make choropleth based on "rent_return_index" column
    scheme="naturalbreaks",  # use mapclassify's natural breaks scheme
    legend=True,  # show legend
    vmin=-35000,
    vmax=55000,
    cmap='RdYlGn',
    style_kwds= {"weight":0.8, "color":"gray"},
    k=20,  # use 10 bins
    tooltip=False,  # hide tooltip
    popup=["community_name", "area_number","pop_density_per_km^2","population", "bifi_index_mornings"],  # show popup (on-click)
    #legend_kwds=dict(colorbar=False),  # do not use colorbar
    legend_kwds=dict(scale=False),
    name="2022 Bikeflow-Index Mornings 0-12",  # name of the layer in the map
)
gdf_2022_bifi_aft.explore(
    m=m,
    column="bifi_index_afternoons",  # make choropleth based on "rent_return_index" column
    scheme="naturalbreaks",  # use mapclassify's natural breaks scheme
    legend=True,  # show legend
    vmin=-70000,
    vmax=40000,
    cmap='RdYlGn',
    style_kwds= {"weight":0.8,"color":"gray"},
    k=20,  # use 10 bins
    tooltip=False,  # hide tooltip
    popup=["community_name", "area_number","pop_density_per_km^2","population", "bifi_index_afternoons"],  # show popup (on-click)
    legend_kwds=dict(scale=False),  # do not use colorbar
    name="2022 Bikflow_Index Afternoons 12-0",  # name of the layer in the map
)

folium.TileLayer("CartoDB positron", show=False).add_to(m)  # use folium to add alternative tiles
folium.LayerControl().add_to(m)  # use folium to add layer control

m  # show map