## QUESTION 2

A heatmap painted over the map of Gracia district was considered the most visual and human understandable source of information to made explicit the quality of bike stations across the district. A first approach would have led to produce some histograms or density plots depicting the distribution of available bikes for each station. But, heatmaps, besides giving a quick first glance of quality and depletion, can introduce space and inter-station neighbourhood information. If I stick only to depletion metrics concerning each station I get an insolated perspective. I could get a biased decision because a good quality station could be surrounded by bad bike stations. So, my B-plan could be catasthopic in case one morning I find my station depleted of bikes, because my nearby surrounding station most probably will also be depleted. On the other hand, a heatmap will expose the quality of each station plus its nearby surrounding stations. So a good decission is based on chosing a good station surrounded by other good/mid quality station which work as a good alternative plans. 



# Strategy

The script will access to a bunch of data which was sampled along 24h. The samples are spaced by 5 minutes intervals. The script is not focusing on counting mean of free bikes but counting amount of segments where the station had zero bikes to offer. The database is a list of snapshots consisting of a timestamp, the list of Gracia district stations and another list with the neighbour stations whether they belong to Gracia or not. For each snapshot a table will sum the zerosegments (no bikes during 5 minutes). Such amount will induce a "hot spot" which will contribute to the heat of each station. The spots are spread randomly uniform around each station. The more zerosegments a station has, the more heat is amassed around the station.  

# Heatmap Interpretation

Stations with high amount of zerosegments will sum up high amounts of heat. The good choice is to settle nearby cold stations, blueish or greenish. Very high quality stations will be extremely cold and will almost paint no heat trace on the map. This will make seem similar two kind of streets. First kind are streets very near to top quality stations; and second kind of streets which are far away of any station. The second type is indeed a not good one. The user would live away from any station giving him a bad commute time. Such issue motivates to introduce "inverse heat". 

# Inverse Heat

Heatmaps depict and stress interesting entities giving a positive incremental qualitative measure. Stressing bad quality stations with a heatmap gives us an uncomplete picture of the setting. You can see above it is ambiguous about marking streets nearby high quality stations from isolated streets away of any station. The proposal is to give heat to high quality stations. The measure of amount of zeroSegments is updated to a complementary value. High quality stations will be enclosed in a cloud of heat, and thereby all the interesting streets around it will also be stressed by the heat cloud. 

# Inverse Heatmap Interpretation

The new metric will show with equal cold temperature streets near a bad station as streets far away from any station.
The user will guide its decission on highest temperature areas. It should be advisable to land on some mid point between high quality station, but not too far away from a prefered one.



In [126]:
import requests
import urllib, json

import datetime

import pymongo as mongo
from pymongo import MongoClient

import folium
from folium import plugins
from folium.plugins import HeatMap
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import numpy as np
import matplotlib.pyplot as plt
import random as rnd

import bson

__author__ = "Alexis Torrano"
__email__ = "a.torrano.m@gmail.com"
__status__ = "Production"

%matplotlib inline

In [127]:
def countZeroBikeStations(snapshot,tablCounter):
    
    GraciaStations = snapshot['gracia']
    neighbourStations = snapshot['neighbours']    
    
    # Increase the counter of each station when bikes are depleted.
    # If station is stil not registered in the counters list, it
    # sets a counter to 0 for the station.
    for x in GraciaStations:        
        if not x['id'] in tablCounter:
            tablCounter[x['id']] = 0
                
        if 0 == x['free_bikes']:            
            tablCounter[x['id']] += 1
        

    for x in neighbourStations:        
        if not x['id'] in tablCounter:
            tablCounter[x['id']] = 0
                
        if 0 == x['free_bikes']:            
            tablCounter[x['id']] += 1

In [128]:
def getStaticFeaturesOfStations(snapshot,volatileFeatures):
    # returns tables with the static part of the data model associated to stations    
    
    GraciaStations = snapshot['gracia']
    neighbourStations = snapshot['neighbours']       
    
    pandasGraciaStations = pd.DataFrame.from_dict(GraciaStations)
    pandasNeighbourStations = pd.DataFrame.from_dict(neighbourStations)
        
    pandasGraciaStations.drop(volatileFeatures,axis=1)
    pandasNeighbourStations.drop(volatileFeatures,axis=1)
    
    return pandasGraciaStations,pandasNeighbourStations

In [129]:
def getHeatSpots(pandasDF,counterDF,stats,m,inverseHeat,heatSpotsList):
    
    '''
    For each station <getHeatSpots> produces a list of spots for a heatmap. 
    Each spot originally represents a segment of time where the station had 0 free_bikes.
    The more time a station has been depleted the hotter it should appear in
    the heatmap.
    
    There is an option for the inverseHeat version of the heatmap. Then, <heat>
    measures availability of bikes in the station.
    
    The amount of time segments of depletion induces the color of the station marker
    in the city map. The color assignation follows the quantiles in the distribution
    of depleted time segments. And such assignation is inmovable whether <inverseHeat>
    is chosen or not.
    '''
    
    for index, row in pandasDF.iterrows():
        heat = counterDF[row.id]
        
        # Basic semantics of <heat> is applied in marker color assignation 
        if heat <= stats['min']:
            st_marker_color='blue'
        elif heat < stats['25%']:
            st_marker_color='green'            
        elif heat < stats['75%']:
            st_marker_color='orange'            
        elif heat < stats['max']:
            st_marker_color='red'
        elif heat == stats['max']:
            st_marker_color='black'
                     
        # Once color marker is assigned, <inverseHeat> may be activated
        if inverseHeat:
            # The parameters for <inverseHeat> were found uppon manual factor exploration
            # based on visual identification for better distinguishable separation area.
            heat = stats['max'] - heat            
            ratioHeat = float(heat) / float(stats['max'])            
            stdev = 2.0 * ratioHeat * ratioHeat * ratioHeat
            scale = 1000.0            
        else:
            ratioHeat = float(heat) / float(stats['max'])
            stdev = 1.0
            scale = 1000.0

        
        folium.CircleMarker([row['latitude'], row['longitude']],
                            radius=15,
                            popup=row['name'],
                            fill_color="#3db7e4", # divvy color
                           ).add_to(m)
        
        folium.Marker([row['latitude'], row['longitude']],                        
                            popup=str(row['name']+"::"+str(heat)),
                            icon=folium.Icon(color=st_marker_color)
                           ).add_to(m)
         
         
        # produce a list of coordinates for each bike in order to feed the heatmap        
        for s in range(int(100.0*ratioHeat)):        
            disturbLat = ((-stdev+2.0*stdev*rnd.random())/scale)
            disturbLon = ((-stdev+2.0*stdev*rnd.random())/scale)
            heatSpotsList.append([row['latitude']+disturbLat, row['longitude']+disturbLon])
        
        '''
        For each station, each 5' segment with 0 bikes will entail an occurrency 
        in the heatmap giving a random perturbation to original station coordinates.
        '''       

In [130]:
def heatmap(pandasGraciaStations,neighboursDF,counterDF,inverseHeat=False):
    ## Must paint in map all x in list <GraciaStations> and all y in y.extra.NearbyStationList
    
    '''
    I got Gracia district coordinates from openstreetmap: 
    https://nominatim.openstreetmap.org/details.php?place_id=198819829
    https://www.openstreetmap.org/relation/3773080#map=14/41.4102/2.1599
    
    map center: 41.41023,2.15087 view on osm.org
    map zoom: 14
    viewbox: 2.08989,41.42632,2.21177,41.39413
    '''    
    
    m = folium.Map([41.41023, 2.15087], zoom_start=14)
      
    # mark each station as a point: put a marker and a pop-up with station name
    # todo : add free_bikes at the pop-up
    zeroBikesList = []    
    #count,mean,std,mini,pct25,pct50,pct75,maxi=counterDF.describe()
    stats=counterDF.describe()
    
    # Get coordinates for occurences to add to heatmap and at markers in map for each Bicing Station
    getHeatSpots(pandasGraciaStations,counterDF,stats,m,inverseHeat,zeroBikesList)    
    getHeatSpots(neighboursDF,counterDF,stats,m,inverseHeat,zeroBikesList)        
    
    ## HEATMAP CALL
    if inverseHeat:
        min_opacity=0.1
        max_val=0.8
    else:
        min_opacity=0.5
        max_val=1.0
    
    
    m.add_child(plugins.HeatMap(zeroBikesList, radius=50,blur=70,min_opacity=min_opacity,max_val=max_val))    
    
    return(m)
    
    #TODO Heatmap radius : add a slider to Jupyter

In [131]:
## Please remember advice from question_0, the leeching process.
## Jupyter notebook will generate results not from a mongoDB query but from 
## a query to mongoDB database dump file.
## All code line preceded by "### MONGO ###" was used in case of direct mongoDB interaction.

### MONGO ### mongoC = MongoClient('mongodb://localhost:27017/')
### MONGO ### dbHosco = mongoC['HOSCO']
### MONGO ### timeBikeAllocation = dbHosco['timeBikeAllocation']

bsonfilePath = 'data/HOSCO/'
bsonfileName = 'timeBikeAllocation.bson'

try:
    with open(bsonfilePath+bsonfileName,'rb') as f:
        timeBikeAllocation_list = bson.decode_all(f.read())
except Exception as e:
    print(str(e))   


# Get the first snapshot to build the dataframe with station's 
# static features.
### MONGO ### snapshot = timeBikeAllocation.find_one()
snapshot = timeBikeAllocation_list[0]

volatileFeatures = ['free_bikes', 'empty_slots', 'timestamp']
pandasGraciaStations,pandasNeighbourStations = getStaticFeaturesOfStations(snapshot,volatileFeatures)
# remove from pandasNeighbourStations any station contained at pandasGraciaStations 
pandasNeighbourStations = pandasNeighbourStations[~ pandasNeighbourStations.id.isin(pandasGraciaStations.id)]
    
tablCounter = {}
### MONGO ### cursor = timeBikeAllocation.find({})
timeSliceCount = 0
### MONGO ### for snapshot in cursor:
for snapshot in timeBikeAllocation_list:
    countZeroBikeStations(snapshot,tablCounter)
    timeSliceCount += 1

## Prepare data for graphical report;
## paint heatmap from tablCounter

pandasTablCounter = pd.Series(tablCounter, name='countTimeSliceZeroBikes')
pandasTablCounter.index.name = 'id'
pandasTablCounter.reset_index()

m = heatmap(pandasGraciaStations,pandasNeighbourStations,pandasTablCounter)      



## First Heatmap.





Each station locus is tagged with a mark in the map. The quality/utility of each station is induced by the probability of depletion. Depletion is estimated from the amount of 5 minute time segments where the station has no bikes available. Based on ratio, such amount, belonging to the distribution of all “zero segments” of all stations, will receive an assigned color. See, please, the adjoining figure to discover the correspondence of color and ratio bin where the station is assigned. Such code color will mark the better station to select a good nearby house. A station with a ratio of 0% zero segments is ideal.

<html>
<head>
	<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
	<title></title>
	<meta name="generator" content="LibreOffice 5.1.6.2 (Linux)"/>
	<meta name="created" content="00:00:00"/>
	<meta name="changed" content="2018-11-22T22:34:19.945470688"/>
	
</head>
<body lang="en-US" dir="ltr">
<table width="290" cellpadding="4" cellspacing="0" style="page-break-before: always">
	<col width="283">	
	<tr valign="top">
		<td width="183" style="border-top: 1px solid #000000; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding-top: 0.04in; padding-bottom: 0.04in; padding-left: 0.04in; padding-right: 0in">
			<p> <img src="img/legend.png" alt="legend" width="70%"> </p>
		</td>		
	</tr>
</table>
<p style="margin-bottom: 0in; line-height: 100%"><br/>

</p>
</body>
</html>

In [132]:
m

In [133]:
m = heatmap(pandasGraciaStations,pandasNeighbourStations,pandasTablCounter,inverseHeat=True)      

## Heatmap of inverse heat

In [134]:
m