# **DATA VISUALIZATION**
## *Marc Fuentes i Víctor Novelle*

*November 2020*

##**Environment preparation**
In order to work with the dataset and create the visualizations we need to prepare the environment we will be working on.

In [None]:
# Necessary libraries for code execution.
import pandas as pd
import altair as alt

In [None]:
# Google drive loading as work station for local-usage of the files.
from google.colab import drive
drive.mount('/content/gdrive',force_remount= True)

Mounted at /content/gdrive


We need to disable the maximum number of rows when using Altair to avoid errors.

In [None]:
# Disabling the maximum rows option to avoid problems in Altair utilitzation. 
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

##**Dataset loading**
We now can load the file produced in the *Data Cleaning* script.

In [None]:
# Load of the clean data.
df = pd.read_csv('/content/gdrive/My Drive/GCED/Q5/VI/Projecte 1/CleanData.csv')
df.head()

Unnamed: 0,station,num_bikes_available,num_bikes_available_types.mechanical,num_bikes_available_types.ebike,num_docks_available,last_reported
0,Sants,0,0,0,21,2019-09-30 21:59:06
1,FIB,0,0,0,31,2019-09-30 21:58:24
2,Sarrià,0,0,0,24,2019-09-30 21:59:31
3,Sants,0,0,0,21,2019-09-30 22:03:28
4,FIB,0,0,0,31,2019-09-30 22:02:46


In this dataset we have information of three different stations. Each row corresponds to the metrics of a *station* at an exact moment. Specifically we have the total number of bikes, both electrical and mechanical and the number of available docks.

###**1 - OVERVIEW**
We want to take a general approach to the whole data and see it in a global view. To be able to extract the essence of the data with a glance we have built an overview. We filtered the rows, discarding the *num_bikes_available* and *num_docks_available* attributes. Even though it may seem counterintuitive not selecting all the columns for the overview this was mainly due for two reasons:

* The user is interested in acquiring general knowledge of the bike functioning, and the number of available docks does not provide any information regarding this task. Moreover, including a visual codification for it will imply a major number of variables in the representation, difficulting its understanding, which is a problem we want to avoid, especially in the overview.

* In all three stations, the total number of bikes is formed, almost entirely, by the number of mechanical bikes (*Sants* and *Sarrià*) or the electrical ones(*FIB*). For this reason, and as said in the former point, to ease the understanding of the visualization we decided to not include the variable.



In [None]:
overview = df[['last_reported','num_bikes_available_types.mechanical','station','num_bikes_available_types.ebike']]
overview.head()

Unnamed: 0,last_reported,num_bikes_available_types.mechanical,station,num_bikes_available_types.ebike
0,2019-09-30 21:59:06,0,Sants,0
1,2019-09-30 21:58:24,0,FIB,0
2,2019-09-30 21:59:31,0,Sarrià,0
3,2019-09-30 22:03:28,0,Sants,0
4,2019-09-30 22:02:46,0,FIB,0


To differentiate the mechanic and electric bicycles into two different columns we will transform our dataset in long-form having the time and station as the identifier variables and the number of bikes and their type as values.
The column type is extracted from the previous columns of available electrical and available mechanical. We also performed a rename operation in the type values to make the dataset easier to understand.

In [None]:
overview_melt = pd.melt(overview, id_vars = ['last_reported','station'],var_name = "type",value_vars = ['num_bikes_available_types.mechanical','num_bikes_available_types.ebike'],value_name = "Available bikes")
overview_melt['type'].replace({"num_bikes_available_types.mechanical":"Mechanical","num_bikes_available_types.ebike": "Electrical"},inplace=True)
overview_melt.head()

Unnamed: 0,last_reported,station,type,Available bikes
0,2019-09-30 21:59:06,Sants,Mechanical,0
1,2019-09-30 21:58:24,FIB,Mechanical,0
2,2019-09-30 21:59:31,Sarrià,Mechanical,0
3,2019-09-30 22:03:28,Sants,Mechanical,0
4,2019-09-30 22:02:46,FIB,Mechanical,0


The next step after the data conversion consisted of deciding which visual representation plus encoding were the best to share the information we wanted. In the overview, we wanted to show the number of available bikes, differentiating by their type, for each hour of the day and all the days of the week.

As can be seen, we have 3 keys (2 categorical, 1 ordinal) to represent the values. We considered that using a line chart was the best option to encode these attributes. On one hand, it allows showing the trends between values using the connection of the marks and the ordering of the key axis. On the other hand, it allows us to visualize precisely a value in one especfic instant, unlike other temporal-oriented representations, and is a well-known graph, making it easy to interpret to any user, which is one key condition in the overview. 

After deciding which representation we should use, we had to decide how to encode the *station* and *type* categorical attributes. Our initial idea consisted of generating a unique plot, where the stations would be differentiated by colours and the bike type by the type of stroke used (which both perform well in the ranking of visual variables).

However, we discarded this idea because we found out that plotting the electrical and mechanical bikes on the same plot makes the visualization unreadable due to the number of superimposed lines (This can be seen in the [Annex](#a1)). Therefore we opted for 
uniform design juxtaposition for the electrical and mechanical plots.

For last, indicate that because the data of every station is provided every 10 seconds, to obtain the value of bikes available per hour every day we performed an average of these value for the mentioned period, computed using the *timeunit* and *calculate* Altair transformations.
<a name="1"></a>

In [None]:
A=alt.Chart(overview_melt).mark_line().transform_timeunit(
    datetime_day = "day(last_reported)"
).transform_calculate(
    day_number = "day(datum.last_reported)", 
    day_name = "split(datum.datetime_day,' ')[0]",
    day_hour = "(datum.day_name) + ' ' + hours(datum.last_reported) + ':00'",
).encode(
     alt.X('day_hour:O', sort=alt.SortField(field="day_number"), title = 'Hour-Day'),
     alt.Y('average(Available bikes):Q',title = 'Average available bikes'),
     order="day_number:Q",
     color=alt.Color('station:N', legend=alt.Legend(title="Station"),scale = alt.Scale(scheme= 'set2')),
).properties(width = 1000, title='Mechanical').transform_filter(alt.datum.type == "Mechanical")

B=alt.Chart(overview_melt).mark_line().transform_timeunit(
    datetime_day = "day(last_reported)"
).transform_calculate(
    day_number = "day(datum.last_reported)", 
    day_name = "split(datum.datetime_day,' ')[0]",
    day_hour = "(datum.day_name) + ' ' + hours(datum.last_reported) + ':00'",
).encode(
     alt.X('day_hour:O', sort=alt.SortField(field="day_number"),title = 'Hour-Day'),
     alt.Y('average(Available bikes):Q',title = 'Average available bikes'),
     order="day_number:Q",
     color=alt.Color('station:N', legend=alt.Legend(title="Station"),scale = alt.Scale(scheme= 'set2')),
).properties(width = 1000,title='Electrical').transform_filter(alt.datum.type == "Electrical")

OV = (A & B).properties(title ="Overview")
OV

Output hidden; open in https://colab.research.google.com to view.


###**2 - HEAT MAPS**
After generating the overview, we then proceeded to develop more specific graphs that could allow the user to solve different tasks more easily. Even though in the previous graph we can compare the behaviour of the different stations, we thought that other charts were needed to successfully fullfil this objective.

We thought that it could be interesting analyzing how the different selected stations work depending on the day and hour, regardless of the type of bike used. Despite this could be accomplished by an easy modfication of the former line chart (substituting the two charts by one with the total available bikes) this solution is not suitable for our needs. For example, if we want to compare how the station works at 2 PM depending on if it's Friday or Monday, two very far apart points should be compared, making the task of finding clusters difficult.

For this reason, we thought that elaborating a heat map for each station could be the best solution. We selected the y-axis to encode the day and on the x-axis the hours. Using this encoding, we can compare easily for an especific hour how the station behaves depending on the day (comparing the column), how the behaviour varies for a given day (comparing the row), and also eases the task of finding clusters using the colour encoding, which translates in seeing which day-hours a major number of bikes are available.

Also, the user could quickly see the general discrepancies between stations depending on the location of the clusters. If a more in-depth comparision wants to be performed, it could be done comparing the respective column / row / cell for the desired stations.  
<a name="2"></a>

In [None]:
hm_Sarr = alt.Chart(df).mark_rect().encode(
    x = alt.X('hours(last_reported):O', title = "Hours"),
    y = alt.Y('day(last_reported):O', title = 'Day',sort=alt.Sort(['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])),
    color = alt.Color('average(num_bikes_available):Q') 
).transform_filter(alt.datum.station == "Sarrià").properties(title ="Sarrià")

In [None]:
hm_Sants = alt.Chart(df).mark_rect().encode(
    x = alt.X('hours(last_reported):O', title = "Hours"),
    y = alt.Y('day(last_reported):O', title = 'Day',sort=alt.Sort(['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])),
    color = alt.Color('average(num_bikes_available):Q', legend=alt.Legend(title = 'Available bikes'))   
).transform_filter(alt.datum.station == "Sants").properties(title ="Sants-Estació")

In [None]:
hm_FIB = alt.Chart(df).mark_rect().encode(
    x = alt.X('hours(last_reported):O', title ="Hours"),
    y = alt.Y('day(last_reported):O', title = "Day",sort=alt.Sort(['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])),
    color = alt.Color('average(num_bikes_available):Q', scale = alt.Scale(scheme= 'lightmulti'))
).transform_filter(alt.datum.station == "FIB").properties(title ="FIB").properties(title='FIB')

In [None]:
HM = ((hm_FIB | hm_Sarr) & (hm_Sants)).properties(title ="Heat maps")
HM

Output hidden; open in https://colab.research.google.com to view.

###**3 - WEEKDAY PER STATION**
We also were interested in comparing the available bikes for station depending on the day of the week. As we have said previously, this task could be executed using the previous representation but it could not be easily resolved. Because the temporal division of the overview is the hour, in order to observe which station has in average more bikes along the day, all the 24 points should be compared for the 3 stations (72 points in total) and perform the average operation, which is highly inefficient. For the same reason, the usage of the heat maps is not recommended, as we should compare 3 rows, which could be difficult if the values for each cell are similar).

Taking all the previously said into account, we considered that developing a grouped bar chart was the best option for our need. It allowed us to compare the bike disponibility daily average between stations without losing the weekly trend. Also, it allows the user to have an exact look-up value for each day/station, a fact that could not be accomplished with the previous plots.

We used the spatial region to encode the days of the week and the colour to encode the stations. We opted for using a Brewer palette in colour encoding to ensure that an equal weight was provided to each station because no highlighting was wanted.
<a name="3"></a>

In [None]:
WPS = alt.Chart(df).mark_bar().encode(
    alt.X('station:N',axis = None),
    alt.Y('average(num_bikes_available):Q',title = 'Average available bikes'),
    color= alt.Color('station:N', legend=alt.Legend(title="Station"),scale = alt.Scale(scheme= 'set2')),
    column=alt.Column('day(last_reported):O',
                      sort=alt.Sort(['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']),
                      header = alt.Header(labelOrient="bottom"),
                      title  = "Day"
                      )
)
WPS

Output hidden; open in https://colab.research.google.com to view.

###**4 - DAILY BIKES FOR NON-WEEKENDS**
To differentiate between weekend and not weekend we created and auxiliar dataframe with a new column containing a number identifier for the day of the week. Therefore we can now ask for specific days of the week.


In [None]:
dfaux=df
dfaux["weekday"] = pd.to_datetime(dfaux.last_reported).dt.dayofweek

After the realization of the previous charts, we noticed that when we wanted to compare to different hours we needed to look them for each day. We found out that this was a very important task to perform for regular Bicing users. These users have to follow the same daily schedule during the week (from Monday to Friday), they go to work/study to the same place and therefore, they are interested in the hourly distribution to establish their routine and not the day-hour combinations, such as in heatmap. 

Sporadic users would use the heatmap to locate a specific moment in the weekly timeline (*Go to the dentist on Wednesday e.g.*). On the other hand, regular users need to find an hour that suits well for their whole week. As a result of that, we will not take into account the weekends, as usually different activities are performed during these days, and that would include a bias that we don't want. To do so, we used the *weekday* column explained before.

To accomplish this task, we decided to develop a bar chart instead of a line chart because we want to focus on the absolute values on each range instead of observing the connections between them.

We encoded each station with a bar chart plotting for each hour the average number of available bikes from Monday to Friday. We also decided to apply colour to the chart modifying its saturation, according to the value to better emphasize the peak values, which is the principal users' concern. These encoding helps to fulfill this task preemptively.
<a name="4"></a>







In [None]:
DBNW = alt.Chart(dfaux).mark_bar().encode(
    alt.X('hours(last_reported):O', title = "Hours"),
    alt.Y('average(num_bikes_available):Q', title = 'Average available bikes'),
    column=alt.Column('station:N', title = "Availability of bikes per hour on working days"),
    color=alt.Color('average(num_bikes_available):Q',scale = alt.Scale(scheme= 'blues'),legend = None)
).transform_filter(
    alt.FieldRangePredicate(field='weekday', range=[0, 4])
).properties(width = 325)
DBNW

Output hidden; open in https://colab.research.google.com to view.

###**5 - WEEKEND VS NON-WEEKEND**
We will proceed like we did in the last visualization so that we can specify what day of the week we want.

In [None]:
# Splitting between workable and weekend days.
evmaux=overview_melt
evmaux["weekday"] = pd.to_datetime(evmaux.last_reported).dt.dayofweek
evmaux["is_weekend"] = evmaux["weekday"].transform(lambda x: x>4)
# Change of values and column names (is_weekend).
evmaux['is_weekend'].replace({True: "Weekend",False:"Workable"},inplace=True)
evmaux = evmaux.rename(columns = {'is_weekend':'Week moment'})


As we had seen during the creation of the former visualization, it is logical to think that the week could be split into two groups, workable days and weekends. We thought that applying a comparison between the two groups was interesting, as we may find some differences in their hourly behaviour.

With the previous representations, one can easily compare two different days but it is hard to compare them as groups. As we want to analyze the hourly distribution for both groups, the third and fourth representations are discarded (the former for being too general and the last for being specific on the workable days). Last, the overview doesn't allow us to realize this comparison as obtaining a group view is difficult due to its distribution.

Having decided this classification of the data, we also decided to take into account the type of bicycle to get more insights about the data. For this reason, the use of the heatmap was also discarded, requiring a new visualization.

We opted to use a line chart because it allows us to see the trend evolution for each classification. We used colour to encode the bicycle type and a different stroke type to encode the workable/weekend division.

We divided the charts for each station for diverse reasons. On one hand, it avoids cluttering, making chart interpretation easier. One the other hand, the main objective of this graphic is to compare intra-station behaviour, so this distribution is well suited for our needs. (The comparison between stations is possible but not prioritized).
<a name="5"></a>

In [None]:
VS = alt.Chart(evmaux).mark_line().encode(
    x=alt.X('hours(last_reported):O', title = "Hours"),
    y=alt.Y('average(Available bikes):Q',title = 'Average available bikes'),
    column=alt.Column('station:N',title='Number of average bikes per type'),
    color=alt.Color('type:N',sort=alt.Sort(['Mechanical','Electrical']),legend=alt.Legend(title="Type")),
    strokeDash=alt.StrokeDash('Week moment',legend=alt.Legend(title='Moment of the week'),sort=alt.Sort(['Workable','Weekend']))
).properties(width = 325)
VS

Output hidden; open in https://colab.research.google.com to view.

##**Final visualization**

In [168]:
VSMOD = alt.Chart(evmaux).mark_line().encode(
    x=alt.X('hours(last_reported):O', title = "Hours"),
    y=alt.Y('average(Available bikes):Q',title = 'Average available bikes'),
    column=alt.Column('station:N',title='Number of average bikes per type'),
    color=alt.Color('type:N',sort=alt.Sort(['Mechanical','Electrical']),legend=alt.Legend(title="Type")),
    strokeDash=alt.StrokeDash('Week moment',legend=alt.Legend(title='Moment of the week',orient = "none", legendX= 1112, legendY = 2190),sort=alt.Sort(['Workable','Weekend']))
).properties(width = 350)

OV & HM & WPS & DBNW & VSMOD

Output hidden; open in https://colab.research.google.com to view.

To arrange properly the charts, a modification of the last graph's legends has been made. We had to exactly set the position for this legend to not distort the general view. 

##**Problem resolution**

####Q1: Does a docking station near a train station (or some other communication hub) behave like a station far from transportation sites?
We can answer this question by looking at the [heat maps](#2). We get to see how *Sants* and *FIB* highly differ in shape. In *FIB* the bikes arrive at around 7 AM when students start class and people go to work. These bikes are not taken until 10-11 AM when students start finishing their classes. For *Sants* we can see how the bikes arrive very early in the morning at around 4-5 AM, that's probably due to a worker refilling the station and they get rapidly taken, as people go to work. At around 2 PM there is a small peak due to people finishing working and then the people coming back home take those bikes. The low amount of bikes in *Sants* is explained because it is a crowded place and when there's an available bike there's also someone willing to take it.

&nbsp;
####Q2: Do electrical bicycles behave the same as non-electrical ones?
We can answer this question by looking at the [first](#1) and [last](#5) visualizations. For the first one, we see that in *Sants* and *Sarrià* people mostly use mechanical bikes but at *FIB* people use the electrical ones. This can be explained since the *FIB* station is at a high altitude and students must pedal to the top so they prefer to use electrical bikes. Although *Sarrià* is also at a high altitude it corresponds to a residential zone and therefore people only take the bike when they leave home, which is downhill, but then come back using other methods. People use more mechanical bikes than electrical when the route is not uphill, in the other case they either use electrical or other methods.


&nbsp;
####Q3: Does the station suffer from lack of bicycles?
To answer this question one can take a look at the [4th](#4) visualization. There we can see how the number of bikes per hour varies, in *FIB* and *Sarrià* there is a high amount of bikes early in the day but at the afternoon there are little to none bikes. In *Sants* we can see how most of the day the station is at a minimum level, with two peaks for the start of the morning (5 AM) and start of the afternoon (2 PM). 

To see it in a global perspective we can also take a look at the [overview](#1) or the [heat maps](#2), in both cases we see how in the student/residential zone for the weekend there are no bikes while in *Sants* (public transport zone) the number of bikes during the week stays pretty low (high demand) and remounts on Saturday and Sunday, when most people don't need to go to work and therefore the bikes are taken at a smaller rate. 

To take a more general look one could use the [3rd](#3)  visualization and therefore see that for *Sants* the average number of bikes available for the day during the working days is always 2 or 3, while for *FIB* and *Sarrià* there are from 5 to 7.

&nbsp;
####Q4: Does the behaviour change according to the day of the week?
To answer this question we can use the [3rd](#3)  visualization, where we have a bar chart for each day of the week. We can see that *Sarrià* and *FIB* have a relatively high amount of bikes from Monday to Friday, with *FIB* having its maximum on Thursday and *Sarrià* on Tuesday and Wednesday. Then they get to around 1 over the weekend when people don't go to class or work. For *Sants* the behaviour is different, from Monday to Friday the amount of bikes stays pretty low, slowly growing throughout the week. For Saturday there is a large peak and on Sunday it halves. 

To go deeper we can use the [heat map](#2) to compare each whole day to another, where we can see the difference between Monday to Friday to Saturday and Sunday; or we can use the overview to also look for differences between the two bike types, where we can see how each type of bike (especially the predominant on for each station) behaves as we have seen for the bikes in general.

&nbsp;
####Q5: What is the average available bicycles per day?
If we want to answer this question we can use the visualization [3](#3). There we can see the average number of available bikes per day for each station. As we have seen before, for *FIB* and *Sarrià* the average stays around 6 for the workable days and 1 during the weekend, while in *Sants* it is 2-3 during the week and 7 and 4 for Saturday and Sunday respectively.

* *See [Annex](#a2)*.

&nbsp;
####Q6: Does the behaviour change during the weekend?
As we have seen before in the questions Q4 and Q5, and using the visualizations [2](#2) and [3](#3), we can see two different behaviours in our stations. For *FIB* and *Sarrià* we see how during the weekend the number of bikes drastically drop to almost none. On the other hand, we see that *Sants* presents a different scenario where during the week the number of bikes is low but then increases for the weekend.

Going deeper into every station we can explain the behaviours:
* The station *FIB* during the workable days are filled with the bikes from the students that go to class and then emptied when they leave. During the weekend they don't go to class and therefore they don't use the station.

* For *Sarrià* during the workable days the residents take the bikes to go to work and then get back home using a different transport method as this station is at a high altitude. The bikes are filled early in the morning by the Bicing workers and the emptied when they leave to work. For the weekend as people won't use the bikes, the stations aren't filled by the company.

* For *Sants* during the workable days there is a high demand of bicycles and therefore whenever one is available it is quickly taken. We can see two peaks in the early morning when the workers fill the station and another at the early afternoon when people leave work. All this movement during the week reduces the number of bikes available so on weekends we see a much higher number of bikes. During the weekend the workers likely take bikes from *Sants* to fill other stations, especially on Sunday.

We can do the same comparations but looking into the types of bikes with visualization [5](#5). We can see how the bikes for *FIB* and *Sarrià* decrease on weekends and how they increase for *Sants* with the peak displaced to the right (later in the day). The types aren't affected differently by the weekend changes and we can see the dominant type for each station: *FIB* - Electrical and *Sarrià* & *Sants* - Mechanical.

&nbsp;
####Q7: At what hours we are more likely to find bikes on working days?
We can answer this question using [4th](#4) visualization. In this chart, we can see the average number of bikes per hour for the workable days per station. The opacity highlights the larger values and we can easily see how the peaks are at *FIB*-9AM, *Sants*-5AM, *Sarrià*-3AM. The intervals where we are more likely to find a bike are:
* *FIB*: 7 AM-11 AM
* *Sants*: 5 AM & 14 AM (still low values)
* *Sarrià*: 1 AM-6 AM

&nbsp;
####Q8: Does a docking station in a residential area behave like a station located in a non-residential one?
To answer this question we can take a look at the [heat map](#2). There we can see how in *Sarrià* the bikes are taken early in the morning when people leave home, and then we should expect them to come back but as this station is uphill, people prefer to use other transport methods. In the other two stations, the bikes follow distributions according to the patterns of the people usage of the facilities: In *FIB* there are bikes while there are students and in *Sants* more dynamic traffic is seen. To see the distribution more clear we can use the fourth visualization, where we see how the peak in *Sarrià* is earlier than in *FIB*, as residents would take the bikes to go to work/study (to places like *FIB*) and therefore they move the bikes from their residential area station to student/work areas.

&nbsp;
####Q9: What type of bike do the users prefer?
Using the [last](#5) visualization we can see how the usage of the different type of bikes goes. For each station there is a different behaviour:
* In *FIB* we can see how both mechanical and electrical fill up at the same time but the mechanical ones are emptied first, although there's a smaller amount.
* In *Sants* the number of bikes is pretty low and we see no preference for the users, as there are not many bikes, any option is good.
* In *Sarrià* we can see a similar behaviour than in *FIB*. For the peak in the morning, we see how the mechanical bikes are taken before the electrical ones.

This can be explained because:
1. Using an electrical bike is more expensive than a mechanical one (users have to pay an extra fee).
2. Bicing workers seem to refill more mechanical bikes than electrical bikes (we can see it at the peak on visualization [4](#4) for *Sarrià*).
3. The benefits of having an electrical bike vanish when a downhill route is made, like leaving *Sarrià* or *FIB*, so the extra fee isn't worth it.

Overall users prefer mechanical bikes but in some specific cases (like in the *FIB*: uphill, not many alternatives) they prefer electrical ones.

---
##**ANNEX**
<a name="a1"></a>
####**A.1**

In [None]:
alt.Chart(overview_melt).mark_line().transform_timeunit(
    datetime_day = "day(last_reported)"
).transform_calculate(
    day_number = "day(datum.last_reported)", 
    day_name = "split(datum.datetime_day,' ')[0]",
    day_hour = "(datum.day_name) + ' ' + hours(datum.last_reported) + ':00'",
).encode(
     alt.X('day_hour:O', sort=alt.SortField(field="day_number"),title = 'Hour-Day'),
     alt.Y('average(Available bikes):Q',title = 'Average available bikes'),
     order="day_number:Q",
     color=alt.Color('station:N', legend=alt.Legend(title="Station"),scale = alt.Scale(scheme= 'set2')),
     strokeDash='type:N'
).properties(width = 1000, title='Overview')

Output hidden; open in https://colab.research.google.com to view.

<a name="a2"></a>
####**A.2**
This question was reformuled. 

The original was: *What is the average of bicycles taken per day?*

After talking to the coordinator of this project we find convinient to clarify the aim of the question. The question asks for the number of bikes taken per day but this metric is impssible to compute accurately due to the limitation of the dataset itself (*one bike could be taken in the same interval as other one is docked, losing the information e.g*).