## Part 3 - Basic Visualization example using Folium maps and Madrid's bike public system data (Bicimad)

For this example we will try to explore if there is net migration of bikes between different parts of the city and if this migration depends on the time of day. At the same time we will try to explore if week behavior is the same for weekends.

It's based on an article written by Vicent Lonji ([Link](https://blog.prototypr.io/interactive-maps-with-python-part-1-aa1563dbe5a9))

In [3]:
import json
import pandas as pd
from PIL import Image, ImageDraw
import geopandas as gpd
from shapely.geometry import Point
import geopandas as gpd
import folium
import numpy as np


We are using 3 months of data for this example (03/2018 - 06/2018). Movements data is public and you can [download it here](https://opendata.emtmadrid.es/Datos-estaticos/Datos-generales-(1)).

In [4]:
#Getting data
data = pd.read_csv('movements_for_trip_counts.csv')

#Adding some columns i will use later in the analysis and convert time strings into DateTime Objects
data["unplug_hourtime"] = pd.to_datetime(data["unplug_hourtime"])
data["plug_hourtime"] = pd.to_datetime(data["plug_hourtime"])
data["hour_departure"] = data.unplug_hourtime.map(lambda x: x.hour)
data["hour_arrival"] = data.plug_hourtime.map(lambda x: x.hour)
data["day_week"] = data.plug_hourtime.apply(lambda x: x.weekday())

#Take a look at the generated dataframe
data.head()

Unnamed: 0,user_type_code,idunplug_station,idplug_station,unplug_hourtime,code_station_departure,latitude_departure,longitude_departure,code_station_arrival,latitude_arrival,longitude_arrival,plug_hourtime,hour_departure,hour_arrival,day_week
0,1,158,154,2018-03-21 08:00:00,158,40.441597,-3.692782,154,40.445741,-3.691793,2018-03-21 08:01:59,8,8,2
1,1,49,26,2018-03-21 08:00:00,49,40.407036,-3.711051,26,40.419743,-3.708073,2018-03-21 08:06:24,8,8,2
2,1,10,46,2018-03-21 08:00:00,10,40.415606,-3.709508,46,40.410709,-3.698232,2018-03-21 08:04:52,8,8,2
3,1,119,34,2018-03-21 08:00:00,119,40.432599,-3.724653,34,40.419209,-3.711504,2018-03-21 08:06:29,8,8,2
4,1,27,69,2018-03-21 08:00:00,27,40.418215,-3.710354,69,40.41656,-3.690453,2018-03-21 08:08:09,8,8,2


For every bike movement in the analyzed period we have information about:
* Departure and arrival stations, including both exact locations
* Unplug & plug hour time for every movement
* Extra information about time and date.

As it was mentioned before, we will try to explore if there is net migration of bikes between different parts of the city and if this migration depends on the time of day.

We need a build a function that does the following:
* It receives parameters: from_hour, to_hour and dataset
* Build a Dataframe containing locations of all stations
* For the selected time period, generates a dataframe counting the number of movements starting at each station
* For the selected time period, generates a dataframe counting the number of movements arriving at each station
* Join three dataframes

In [10]:
def get_movements_counts_by_time_period(selected_hour_from, selected_hour_to, dataset):

    #Dataframe with locations of every station    
    station_locations = dataset.groupby("code_station_departure").first()
    # selecting only 2 columns we are interested in
    station_locations = station_locations.loc[:, ["latitude_departure", "longitude_departure", "name_departure"]]

    #Selecting time period of original dataset
    subset = dataset[(dataset["hour_departure"] >= selected_hour_from) & (dataset["hour_departure"] <= selected_hour_to)]
    subset_arrival = dataset[(dataset["hour_arrival"] >= selected_hour_from) & (dataset["hour_arrival"] <= selected_hour_to)]
    
    #Counting the number of movements starting at each station
    departure_counts =  subset.groupby("code_station_departure").count().iloc[:,[0]]
    #Rename column
    departure_counts.columns= ["Departure Count"]

    #Counting the number of movements arriving at each station
    arrival_counts =  subset_arrival.groupby("code_station_arrival").count().iloc[:,[0]]
    #Rename column
    arrival_counts.columns= ["Arrival Count"]
    
    #joining locations, departure counts and arriving counts of each station at the time period
    movements_counts = departure_counts.join(station_locations).join(arrival_counts)
    
    return movements_counts



Working on the hypothesis that bikes are ridden mostly to go to work or study (and to go back home) we are going to see in map if there is a pattern about migration in the morning and in the evenings.

Now we have defined the function needed, we can create 2 dataframes containing for example the information of the time period 7AM-10AM and 6PM-9PM for work days and add it later to a map.

In [11]:
data_week = data[(data.day_week != 5) & (data.day_week != 6)]

movements_morning = get_movements_counts_by_time_period(7, 10, data_week)
movements_evenings = get_movements_counts_by_time_period(18, 21, data_week)

## Visualizing each station behaviour in map

With our data ready, we can visualize it in the map. We are going to create a circle marker for each row of the Dataframe assigning a different color depending on the sign of the net departures. If there are more departures than arrivals in the station, we represent it with a yellow circle, otherwise we use blue.

I'm going to define a function which receives a dataframe, iterates over all the rows, creates a circle depending on the sign of the net departures/arrivals and returns a folium map

In [12]:
def plot_station_counts(trip_counts, radius_divisor):

    folium_map = folium.Map(width=1000,height=1000,location=[40.41, -3.7], zoom_start=14,
                        tiles="CartoDB dark_matter")
    
    for index, row in trip_counts.iterrows():
        
        net_departures = (row["Departure Count"]-row["Arrival Count"])
        popup_text = "{}<br> Total Departures: {}<br> Total Arrivals: {}<br> Balance: {}"
        popup_text = popup_text.format(row["name_departure"],
                              row["Departure Count"],
                              row["Arrival Count"],
                              net_departures)
        
        radius = abs(net_departures/radius_divisor)
        if radius == 0:
            radius = 1/radius_divisor

        if net_departures>0:
            color="#F8D210" # yellow
        else:
            color="#2E8BC0" # blue

        folium.CircleMarker(location=(row["latitude_departure"],
                                      row["longitude_departure"]),
                            radius=radius,
                            color=color,
                            popup=popup_text,
                            fill=True).add_to(folium_map)
        
    return folium_map

## Visualizing user movements work days (07-10 am)

In [13]:
#morning movements
plot_station_counts(movements_morning, radius_divisor = 20)

## Visualizing user movements work days (06-09 PM)

In [14]:
#evening movements
plot_station_counts(movements_evenings, radius_divisor = 15)

## Migration

Considering both maps you can see that stations that have more departures in the morning have more arrivals in the evening. There are regions that vary their balance departures / arrivals according to the time of day. Many of the ones that have the greatest number of departures during the morning are the ones that have more arrivals at the end of the day, which allows to conclude that there is a significant migration of bicycles to the area of Salamanca, Chueca and surroundings from southern neighborhoods mainly in the morning, having the opposite behavior at the end of the working day. Center stations (which are the most used stations) maintain a constant demand at any time of day and for that reason they are considered mixed stations.

This is really useful information for Bicimad, because workers could know which stations must be filled or emptied depending the time of the day in order to satisfy the demand. This could be the starting point for further analysis, algorithms to predict arrivals to help the system operator, or predict bike availability for users looking for a bike.

### During Weekends

Behaviour during weekends is totally different. There is no a clear tendency about migration during different times of the day and net departures/arrivals balances are close to 0. You can check it in the following two maps.

In [15]:
data_weekend = data[(data.day_week == 5) | (data.day_week == 6)]

movements_morning_weekends = get_movements_counts_by_time_period(7, 10, data_weekend)
movements_evenings_weekends = get_movements_counts_by_time_period(18, 21, data_weekend)

In [18]:
#morning movements during weekends
plot_station_counts(movements_morning_weekends, radius_divisor = 15)

In [19]:
#evening movements during weekends
plot_station_counts(movements_evenings_weekends, radius_divisor = 15)