## Part 1 - Basic Visualization example using Folium maps and Madrid's bike public system data (Bicimad)

In this part we are going to visualize the most frequent travelled routes by Bicimad user's (considering a small subset) and using:

* Folium maps
* Geodataframes
* Geospatial data (Points and LineStrings)
* Opacity as a trick for better visualization.


We check first if folium is installed correctly and how it works.

In [1]:
import pandas as pd
import folium

To test it we can use a map of Madrid and add a marker corresponding to one particular bike station

In [2]:
map = folium.Map(location=[40.4, -3.7], zoom_start=13)
folium.Marker(location=[40.3919385, -3.6971829], tooltip = "Entrada Matadero").add_to(map)
map

I had found the information of all the system's stations in this API [Link](https://opendata.aemet.es/centrodedescargas/inicio) and saved it a csv file which looks like this.

In [3]:
stations = pd.read_csv('stations.csv')
stations.head()

Unnamed: 0,id,code_station,name,num_bases,address,latitude,longitude
0,1,1,Puerta del Sol A,24,Puerta del Sol nº 1,40.416896,-3.702425
1,2,2,Puerta del Sol B,24,Puerta del Sol nº 1,40.417001,-3.702421
2,3,3,Miguel Moya,24,Calle Miguel Moya nº 1,40.420589,-3.705842
3,4,4,Plaza Conde Suchil,18,Plaza del Conde Suchil nº 2-4,40.430294,-3.706917
4,5,5,Malasaña,24,Calle Manuela Malasaña nº 5,40.428552,-3.702587


**GeoDataFrame**

I'm going to be using GeoSpatial data along this document, so i'll convert stations dataframe into a geodataframe in order to show how it works. More information [here](http://geopandas.org/data_structures.html)

In [4]:
import geopandas as gpd
import shapely.wkt
from shapely.geometry import Point

stations['geometry'] = list(zip(stations.longitude, stations.latitude))
stations['geometry'] = stations['geometry'].apply(Point)
crs = {'init': 'epsg:4326'}
stations_gdf = gpd.GeoDataFrame(stations, crs=crs, geometry='geometry')

stations_gdf.head()

Unnamed: 0,id,code_station,name,num_bases,address,latitude,longitude,geometry
0,1,1,Puerta del Sol A,24,Puerta del Sol nº 1,40.416896,-3.702425,POINT (-3.7024255 40.4168961)
1,2,2,Puerta del Sol B,24,Puerta del Sol nº 1,40.417001,-3.702421,POINT (-3.7024207 40.4170009)
2,3,3,Miguel Moya,24,Calle Miguel Moya nº 1,40.420589,-3.705842,POINT (-3.7058415 40.4205886)
3,4,4,Plaza Conde Suchil,18,Plaza del Conde Suchil nº 2-4,40.430294,-3.706917,POINT (-3.7069171 40.4302937)
4,5,5,Malasaña,24,Calle Manuela Malasaña nº 5,40.428552,-3.702587,POINT (-3.7025875 40.4285524)


Now it's easier to show all stations in folium map using this new column.

In [5]:
m = folium.Map(location=[40.4, -3.7], zoom_start=12, tiles='cartodbpositron')
folium.GeoJson(stations_gdf, tooltip=folium.features.GeoJsonTooltip(fields=['id', 'name', 'num_bases', 'address'])).add_to(m)
m

# Visualizing movements in map

One of this project´s goal was to find out which routes are the most popular between the users.
In order to accomplish this, i've used the information provided by the city government about the GPS coordinates (track) of every bike movement ([Link](https://opendata.emtmadrid.es/Datos-estaticos/Datos-generales-(1)))

What we have is a list of registered GPS coordinates associated with a bike movement. Generally the system receives a coordinate every 60 seconds.

Data was preprocessed and transformed ending up with the following dataframe. (We are using only 1 day information for the example - September 1st 2018).

**For each movement (represented by its id) we have a list of GPS coordinates representing the route of the bike.**

In [6]:
movements_coordinates = pd.read_csv('movements_coordinates.csv')
movements_coordinates.head()

Unnamed: 0,oid_bike_movement,latitude,longitude,speed,address,geometry
0,5b9058472f38434ab0d85d26,40.429798,-3.680217,3.66,PLAZA MARQUES DE SALAMANCA 2,POINT(-3.6802168 40.4297980997222)
1,5b9058472f38434ab0d85d26,40.43142,-3.679639,0.0,CALLE PADILLA,POINT(-3.67963929972222 40.43142)
2,5b9058472f38434ab0d85d26,40.434462,-3.679398,6.08,CALLE PRINCIPE DE VERGARA 71,POINT(-3.67939809972222 40.4344618)
3,5b9058472f38434ab0d85d26,40.436231,-3.679363,0.94,CALLE PRINCIPE DE VERGARA 87,POINT(-3.67936279972222 40.4362311)
4,5b9058472f38434ab0d85d26,40.437659,-3.683027,1.61,CALLE MARIA DE MOLINA 134,POINT(-3.6830273 40.4376592997222)


Representing and visualizing routes using this data type (GeoSpatial Point) does not make much sense. You can get a map with thousands of markers representing all movements but you wouldn't get much conclusions about popular routes.

In the following map an example of 5 movements its shown and as you can see it doesn't make sense

In [7]:
movements_id = ['5b9058682f38434ab0d87847', '5b9058682f38434ab0d87846', '5b9058482f38434ab0d85d94', '5b9058482f38434ab0d85d95', '5b9058482f38434ab0d85da4']
movements_coordinates_example = movements_coordinates[movements_coordinates.oid_bike_movement.isin(movements_id)]
m = folium.Map(location=[40.4, -3.7], zoom_start=13, tiles='cartodbpositron')
for index, row in movements_coordinates_example.iterrows():
    folium.Marker([row['latitude'], row['longitude']],
                 ).add_to(m)
m

## Linestring

In order to have a better visualization of routes we can make use of LINESTRING type. I've generated a Linestring type for each movement based on the GPS coordinates if at least there were two points registered for that movement.

I had saved that information before in another csv with 2000 movements example from September 2018

In [8]:
movements_lines = pd.read_csv('movements_lines.csv')
movements_lines['geometry'] = movements_lines['geometry'].apply(lambda x: shapely.wkt.loads(x))
crs = {'init': 'epsg:4326'}
movements_lines_gdf = gpd.GeoDataFrame(movements_lines.head(1500), crs=crs, geometry='geometry')
movements_lines_gdf.head()

Unnamed: 0,oid_bike_movement,geometry
0,5b944e082f3843443049e4c0,"LINESTRING (-3.696615 40.4220771, -3.6924166 4..."
1,5ba1d1532f38433fc8f484fe,LINESTRING (-3.67591109972222 40.4568297999999...
2,5bb693c82f384321806bbac5,"LINESTRING (-3.6962925 40.404153, -3.6962925 4..."
3,5b92fbe52f38435fe0fb5655,"LINESTRING (-3.67777559972222 40.4151966, -3.6..."
4,5bac09e22f384335f8d46328,"LINESTRING (-3.688021 40.4332155997222, -3.688..."


We can visualize now all selected movements represented as a LINESTRING in the folium map

In [9]:
m = folium.Map(location=[40.4, -3.7], zoom_start=13, tiles='cartodbpositron')
folium.GeoJson(movements_lines_gdf).add_to(m)
m

If we are just trying to have an idea of which are the most important streets considering bike traffic, this representation is useless because it doesn't provide clear information.

A quick way to improve it is by using the **opacity attribute**. Routes with more traffic are represented by a opaque line, while less traveled routes are almost transparent.

In [10]:
#Setting styles
style =  {'fillColor': '#1a1aff', 'color': '#1a1aff', 'opacity': .1}

In [11]:
m = folium.Map(location=[40.4, -3.7], zoom_start=13, tiles='cartodbpositron')
folium.GeoJson(movements_lines_gdf, lambda x: style).add_to(m)
m


This is a really easy way to have a first approach to more frequent routes for bikers. It could be improved of course using any API (as google maps) to determine the exact route (or at least an approximate) and not the straight line as we are using.

As shown in map it looks that the most important streets are:
* Paseo de la Castellana, Paseo del prado.
* Calle Alcala
* Serrano
* Santa Engracia

The interest thing here is that, despite of you might think, most of this streets don´t have exclusive bike paths as you can see [Here](http://madrid.maps.arcgis.com/apps/webappviewer/index.html?id=304e79ab11cb403cbd4469a60a48cdeb) so they are also where most of bike accidents occur.