# **INTERACTION**
## *Marc Fuentes i Víctor Novelle*

*December 2020*

##**Environment preparation**
In order to work with the dataset, create the visualizations and implement different interaction techniques, we need to prepare the environment we will be working on.

In [112]:
# Necessary libraries for code execution.
import pandas as pd
import altair as alt
import datetime

In [113]:
# Google drive loading as work station for local-usage of the files.
from google.colab import drive
drive.mount('/content/gdrive',force_remount= True)

Mounted at /content/gdrive


We disabled the maximum number of rows when using Altair to avoid possible errors, even though this dataframe don't have large dimensions.

In [114]:
# Disabling the maximum rows option to avoid problems in Altair utilitzation. 
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

##**Dataset loading**
We now can load the file produced in the *Preprocessing* script.

In [115]:
# Load of the clean data.
df = pd.read_csv('/content/gdrive/My Drive/GCED/Q5/VI/Projecte 2/CleanData.csv')
df.head()

Unnamed: 0,station_id,day,weekday,day_month,available_bikes,mechanical,electrical,num_docks_available,name,lat,lon,altitude,address,post_code,capacity,Latlon,Barri,Districte
0,1,2019-10-01,2,1,12.533101,12.12892,0.404181,14.937282,"GRAN VIA CORTS CATALANES, 760",41.397978,2.180107,16.0,"GRAN VIA CORTS CATALANES, 760",8013.0,46.0,"41.3979779,2.1801068999999997",el Fort Pienc,Eixample
1,1,2019-10-02,3,2,13.615917,13.439446,0.176471,13.079585,"GRAN VIA CORTS CATALANES, 760",41.397978,2.180107,16.0,"GRAN VIA CORTS CATALANES, 760",8013.0,46.0,"41.3979779,2.1801068999999997",el Fort Pienc,Eixample
2,1,2019-10-03,4,3,14.159722,13.902778,0.256944,13.159722,"GRAN VIA CORTS CATALANES, 760",41.397978,2.180107,16.0,"GRAN VIA CORTS CATALANES, 760",8013.0,46.0,"41.3979779,2.1801068999999997",el Fort Pienc,Eixample
3,1,2019-10-04,5,4,12.836806,12.440972,0.395833,16.677083,"GRAN VIA CORTS CATALANES, 760",41.397978,2.180107,16.0,"GRAN VIA CORTS CATALANES, 760",8013.0,46.0,"41.3979779,2.1801068999999997",el Fort Pienc,Eixample
4,1,2019-10-05,6,5,18.298611,18.0625,0.236111,9.916667,"GRAN VIA CORTS CATALANES, 760",41.397978,2.180107,16.0,"GRAN VIA CORTS CATALANES, 760",8013.0,46.0,"41.3979779,2.1801068999999997",el Fort Pienc,Eixample


In this dataset we have information about all *Bicing®* stations. Each row corresponds to the average metrics of a station per October day. 

Specifically,we have the total number of bikes, both electrical and mechanical and the number of available docks. Moreover, each row includes the geospatial information of the respective station.



###**1 - OVERVIEW**

On this fist view, which consists in a combination of a coropleth and a symbol map, we had to main purposes. On one hand, we wanted to show the number of available bikes per district in a format that allowed the user to extract this information with a glance. On the other hand, we wanted to establish the map as an important element for interaction, allowing to select several districts to further explore their attributes in the following plots.


In order to further explain this visualization, we will fractionate its analysis in two parts:

* **VISUAL REPRESENTATION**

As previously said, this representation consists of a choropleth map. On it, the average number of available bikes (of the selected type) is encoded using the colour chanel and applying it uniformly to the district in which it belongs. We also decided to overlap a symbol map layer to the choropleth.

In this, the circles' colours (which was the selected shape for the symbols) encode the districts' names. This layer has the purpose to allow the user to identify the districts without needing to hover over them (as we will explain in *Interaction*). This identification is performed associating the circle colour to its name using the interactive legend provided in the visualization.

&nbsp;
* **INTERACTION**

In this plot, we can distinguish three interactive objects:

- The *Kind* selector: Consists in a simple legend that allows the user to select the type of bicycle he wants to retrieve information. 

- The *district legend*: Squared legend that allows selecting one/multiple districts. 

- The *Map* itself. It also allows the selection of one/multiple districts.

\
The *Kind* selector behaves as a non-spatial filter, modifying the values taken into account by the choropleth. 

The district selection/s imply a highlighting of the regions, increasing the thickness of its border and making unselected regions more transparent.

For last, when hovering over a district, its name and the exact value for the average shown is provided.


---
The first step to create this visualization consists of loading the map data.

In [116]:
#Loading the BCN disricts map.
dataGeo = 'https://raw.githubusercontent.com/martgnz/bcn-geodata/master/districtes/districtes.geojson'
map_data = alt.Data(url=dataGeo, format=alt.DataFormat(property='features',type='json'))

To compute the average availability metrics per district and to allow the user to filter by the bike type, an auxiliary dataframe was created.

To do so, a groupby operation with the mean function was performed, as well as a long-form transformation of the data set. We also executed a rename operation in the *Type* values to make them easier to understand.

For the symbol placement, another auxiliary dataframe was computed, containing the centroid of the stations per district.

\
*PD: Although these operations could have been done using Altair aggregate and fold functions, we decided to store the dataframes they will be later used, avoiding code repetition.*

In [117]:
# Auxiliar data frames

# Computing the average metrics
aux_avg_df = df.groupby(["Districte"]).agg('mean')[["available_bikes","mechanical","electrical"]]
aux_avg_df = aux_avg_df.reset_index()

# Long-form conversion
aux_avg_df =  pd.melt(aux_avg_df, id_vars=['Districte'],var_name = 'Type', value_name='Average')
aux_avg_df['Type'].replace({"available_bikes":"Total","mechanical":"Mechanical","electrical":"Electrical"},inplace = True)

# Centroid auxiliar dataframe
centroids = df.groupby(["Districte"]).agg('mean')[["lat","lon"]]
centroids = centroids.reset_index()

In [118]:
aux_avg_df.head()

Unnamed: 0,Districte,Type,Average
0,Ciutat Vella,Total,14.450508
1,Eixample,Total,8.049021
2,Gràcia,Total,4.339049
3,Horta-Guinardó,Total,6.120688
4,Les Corts,Total,3.256274


In [119]:
centroids.head()

Unnamed: 0,Districte,lat,lon
0,Ciutat Vella,41.381461,2.180608
1,Eixample,41.390028,2.164306
2,Gràcia,41.404566,2.160486
3,Horta-Guinardó,41.416406,2.169837
4,Les Corts,41.385446,2.126571


The next step after the data conversion consisted of creating different interactive techniques. 

For the type of bikes, we made it single selection, including the nearest option, to not allow multiple/empty choices, since conceptually they don't make sense

Regarding district selection, it is multiple but does not include the nearest option. This is mainly to allow the user to click off the map and select all districts quickly

In [120]:
# District selection 
map_multi = alt.selection_multi(fields=['Districte'], init = [{'Districte':'Les Corts'}], empty = 'all')

# Kind of bike selection
type_select = alt.selection_single(fields=['Type'], init = {'Type':'Total'}, nearest = True)

# Auxiliar condition for district color encoding
district_color = alt.condition(map_multi,alt.Color('Districte:N',legend = None,scale = alt.Scale(scheme= 'category10')),alt.value('lightgray'))

Once all the necessary resources for the visualization were properly encoded, we proceeded to elaborate the plot.
<a name="1"></a>

In [121]:
# Coropleth
coro = alt.Chart(aux_avg_df).mark_geoshape(stroke = 'black').properties(
    title = "Average number of bikes per district"
).encode(
    color = alt.Color('Average:Q',scale = alt.Scale(scheme= 'yelloworangered')),
    strokeWidth = alt.condition(map_multi,alt.value(3),alt.value(1)),
    opacity = alt.condition(map_multi,alt.value(1.0),alt.value(0.3)),
    tooltip = ['Districte', 'Average']
).transform_lookup(
    lookup = 'Districte',
    from_ = alt.LookupData(map_data,'properties.NOM',fields=["type", "properties", "geometry"])
).transform_filter(
    type_select
).add_selection(
    map_multi
).properties(width = 750, height = 750)


# District symbols
districts = alt.Chart(centroids).mark_circle(size = 300, stroke = "Black").encode(
    opacity = alt.condition(map_multi,alt.value(1),alt.value(0.5)),
    strokeWidth = alt.condition(map_multi,alt.value(3),alt.value(1)),
    longitude="lon:Q",
    latitude="lat:Q",
    #size = 4,
    fill = alt.Color('Districte:N',scale = alt.Scale(scheme= 'category10'), legend = None),
)


# Buttons for type of bike selection
type_butt =  alt.Chart(aux_avg_df).mark_circle().encode(
    y = alt.Y('Type:N',axis = alt.Axis(orient = 'right', title = None),sort = alt.Sort(['Total','Mechanical','Electrical'])),
    color = alt.condition(type_select, alt.value('black'),alt.value('lightgray'))
).add_selection(
    type_select
).properties(title = 'Kind')


# Legend for district selection
legend = alt.Chart(df).mark_rect().encode(
    y = alt.Y('Districte',axis = alt.Axis(orient = 'right', title = None)),
    color = district_color
).add_selection(
    map_multi
).properties(title = 'District')


def_coro = coro + districts

(type_butt | def_coro |legend).resolve_scale(
    x = 'independent',
    y = 'independent'
)

Output hidden; open in https://colab.research.google.com to view.

###**2 - BUBBLE CHART**

In this view, the average mechanical/electrical bikes and the number of stations per district is encoded using a bubble chart. 

* **VISUAL REPRESENTATION**

In this plot, the average number of mechanical bikes is encoded using the X-axis, and the average number of electrical bikes is encoded using the Y-axis.
Using the color channel, the district name is encoded as in the previous representations. For last, the number of stations per district is encoded using the size of the marks.

To avoid changes on the scales when performing modifications on the districts/day of the month when using interaction, they have been fixed ( after previously visualizing the maximum respective ranges).

\
* **INTERACTION**

In this plot, there are also three interactive objects:

- The *Day of the Month* selector: Slider that allows the user to select the day of the month (and previous) to be shown on the display.

- The *district legend*.

- The *Bubbles* of the chart. As in the choropleth and the legend, allow multiple district selections by clicking on the desired bubbles.

\
The district selection/s imply applying a lightgray value to the non-selected districts while maintaining the original colour to the selected ones.

The *Day of the Month* slider has two functions. On one hand, it behaves as a non-spatial filter, only showing the bubbles for the days smaller or equal than the introduced value. On the other hand, maintains the opacity of the selected value while making translucid the ones that are smaller, leaving a trail.

For last, an auxiliary tooltip layer was implemented. The main reason for this is to only provide the details-on-demand for the selected values, instead of showing them when hovering any bubble.

This helps the user to visualize only the information of the district/day he is interested in, as well as easing the hovering task for a specific bubble, which becomes difficult in dense regions if this solution is not implemented.
<a name="2"></a>

In [122]:
# Tooltip selector
tooltips = alt.selection_single(on = 'mouseover',empty = 'none') 

In [123]:
# Slider/selector for day of the month
day_month_slider = alt.binding_range(max=31, min = 1, step = 1, name = 'Day of the month')

day_month_selection = alt.selection_single(fields=['day_month'], bind = day_month_slider, init = {'day_month':1})

In [124]:
# Bubble chart (splited in two charts to reuse base for tooltip)
base = alt.Chart(df).mark_circle().transform_aggregate(
    avg_mechanical="mean(mechanical)",
    avg_electrical="mean(electrical)",
    Dist_count="distinct(station_id)",
    groupby=['Districte','day_month']
).encode(
    x = alt.X("avg_mechanical:Q", title = "Average Mechanical Bikes",scale=alt.Scale(domain=[0,20])),
    y = alt.Y("avg_electrical:Q", title = "Average Electrical Bikes", scale=alt.Scale(domain=[0,2.5])),
    size = alt.Size("Dist_count:Q",scale=alt.Scale(domain=[0,100]), title = "Number of stations"),
)

bubble = base.encode(
    color = district_color,
    opacity = alt.condition(day_month_selection,alt.value(1.0),alt.value(0.3)),
).add_selection(
    day_month_selection,
    map_multi
).transform_filter(
   alt.datum.day_month <= day_month_selection.day_month)

# Auxiliar layer for tooltip (only hovers on selection)
tooltip_aux = base.encode(
     opacity=alt.value(0),
    tooltip=[alt.Tooltip('Dist_count:Q', title ="Number of stations"),alt.Tooltip('avg_mechanical:Q', title = "Average mechanical"), alt.Tooltip('avg_electrical:Q',title = "Average electrical")]
  ).transform_filter(
  day_month_selection & map_multi
).add_selection(tooltips)

def_bubble = (bubble+tooltip_aux).properties(width = 600, height = 350, title = "Bubble chart")
legend | def_bubble 

Output hidden; open in https://colab.research.google.com to view.

###**3 - SCATTER PLOT**

After generating the previous visualizations, we then proceeded to develop representations that have a higher granularity, selecting the district as an atomic entity.

This could allow the user to solve tasks regarding the intra-district and inter-district behaviour comparison more easily. Even though in the previous graph we can compare the difference in the number of stations per district, the user cannot, per example, visualize if all the stations behave similarly or, conversely, several operating patterns are present.

In order to provide this knowledge, a scatter plot visualization was performed.

* **VISUAL REPRESENTATION**

In this plot, the average number of mechanical/electrical bikes is encoded using the 2-D position of the mark. Emphasize that each point of this visualization represents a station.

Using the colour channel, the district name is encoded as in the previous representations. 

To avoid the change of scales on the different working days, we also fixed the X and Y axis domains, allowing direct comparison between the juxtaposed plots.

\
* **INTERACTION**

In this plot, there are two interactive objects:

- The *district legend*.

- The *points* of the chart. Idem district legend.

\
The district selection/s imply applying a lightgray value to the non-selected districts while maintaining the original colour to the selected ones.


In this plot, we have decided not to include a tooltip. The reasoning behind this decision is that as the stations of the same district show, in general, a similar behaviour, selecting a specefic point is difficult. Moreover, this visualization was designed to provide a general description of the internal district activity and not to evaluate each station specifically.
<a name="3"></a>

In [125]:
# Creating an SP with all the values.
sp = alt.Chart(df).mark_circle().transform_aggregate(
    avg_mech = 'mean(mechanical)',
    avg_elec = 'mean(electrical)',
    groupby = ['station_id','weekday','Districte']
).encode(
    x= alt.X('avg_mech:Q', title = "Average Mechanical Bikes", scale=alt.Scale(domain=[0,35])),
    y= alt.Y('avg_elec:Q', title = "Average Electrical Bikes", scale = alt.Scale(domain = [0,6])),
    color=district_color
).add_selection(
    map_multi
).properties(width=300, height=300)

# Obteining the SPs per day and linking them.
sp_1=sp.transform_filter(alt.datum.weekday == 1).properties(title='Monday')
sp_2=sp.transform_filter(alt.datum.weekday == 2).properties(title='Tuesday')
sp_3=sp.transform_filter(alt.datum.weekday == 3).properties(title='Wednesday')
sp_4=sp.transform_filter(alt.datum.weekday == 4).properties(title='Thursday')
sp_5=sp.transform_filter(alt.datum.weekday == 5).properties(title='Friday')

f_sp = sp_1 | sp_2 | sp_3| sp_4 | sp_5

legend | f_sp

Output hidden; open in https://colab.research.google.com to view.

###**4 - HISTOGRAM**

To follow this more in-depth analysis of the districts, we considered that having a visualization showing the station distribution conditioned to the number of available bikes could be an interesting field of study, that will allow delving into the information provided by the [scatter plots](#3).

To fulfil this task, we considered that a histogram was the best option.

* **VISUAL REPRESENTATION**

In this plot, the X-axis encodes the average number of available bikes (of the selected type), transforming this variable to ordinal applying binning.

The longitude of each bar represents the number of stations that fall into that range of values.

For last, the district name is encoded using the colour channel as in the previous cases. 

\
* **INTERACTION**

In this plot, we can see two interactions:

- The *district legend*.

- The *Kind* selector.

\

The *Kind* selection filters the type of bikes take into account to compute the bins.

The district selection allows visualizing the histogram for the chosen district. As it is a multiple selector, if several districts are picked, a stacked bar chart is visualized.


In this plot, we wanted to apply two different fixed scales, one for the total/mechanical bikes and the other for electrical bikes. Doing this, we could establish the same bin partition for all the districts, making the task of comparing histograms much easier. However, applying a condition in the scale is not currently supported by *Altair*. As consequence, the binning partitions cannot be fixed, because applying criteria that suits the mechanical bikes would imply that compacting the electrical bikes into a single/two bars.


---
To encode this histogram, we have created an auxiliary dataframe similar to the used in the [overview](#1). In this, the groupby operation is performed by station, instead of per district. In this auxiliary data, a long-form format conversion was also applied.
<a name="4"></a>


In [126]:
# Auxiliar data frame that computes the average per station.
aux_avg_station_df = df.groupby(["Districte","station_id"]).agg('mean')[["available_bikes","mechanical","electrical"]]
aux_avg_station_df = aux_avg_station_df.reset_index()

# Long-form conversion
aux_avg_station_df =  pd.melt(aux_avg_station_df, id_vars=['Districte','station_id'],var_name = 'Type', value_name='Average')
aux_avg_station_df['Type'].replace({"available_bikes":"Total","mechanical":"Mechanical","electrical":"Electrical"},inplace = True)

In [127]:
# Histogram (stacked if  multiple selections)
hist = alt.Chart(aux_avg_station_df).mark_bar().encode(
    x = alt.X("Average:Q", bin = True),
    y = alt.Y("count():Q",title = "Number of stations"),
    color = district_color
).transform_filter(
    type_select 
).transform_filter(
    map_multi
).properties(width = 600)

type_butt | hist |legend

Output hidden; open in https://colab.research.google.com to view.

###**5 - FINAL VISUALIZATION**

To arrange properly the charts, a modification of the [bubble chart](#2) legend has been made. We had to exactly set the position for this legend to not distort the general view. 

In [128]:
base_mod = alt.Chart(df).mark_circle().transform_aggregate(
    avg_mechanical="mean(mechanical)",
    avg_electrical="mean(electrical)",
    Dist_count="distinct(station_id)",
    groupby=['Districte','day_month']
).encode(
    x = alt.X("avg_mechanical:Q", title = "Average Mechanical Bikes",scale=alt.Scale(domain=[0,20])),
    y = alt.Y("avg_electrical:Q", title = "Average Electrical Bikes", scale=alt.Scale(domain=[0,2.5])),
    size = alt.Size("Dist_count:Q",scale=alt.Scale(domain=[0,100]), title = "Number of stations", legend = alt.Legend(orient = "none",legendX= 2433, legendY= 0)),
)

bubble_mod = base_mod.encode(
    color = district_color,
    opacity = alt.condition(day_month_selection,alt.value(1.0),alt.value(0.3)),
).add_selection(
    day_month_selection,
    map_multi
).transform_filter(
   alt.datum.day_month <= day_month_selection.day_month)

# Auxiliar layer for tooltip (only hovers on selection)
tooltip_aux_mod = base_mod.encode(
     opacity=alt.value(0),
    tooltip=[alt.Tooltip('Dist_count:Q', title ="Number of stations"),alt.Tooltip('avg_mechanical:Q', title = "Average mechanical"), alt.Tooltip('avg_electrical:Q',title = "Average electrical")]
  ).transform_filter(
  day_month_selection & map_multi
).add_selection(tooltips)

def_bubble_mod = (bubble_mod + tooltip_aux_mod).properties(width = 600)

In [129]:
coro_type = (type_butt | def_coro).resolve_scale(
    x = 'independent',
    y = 'independent'
)
coro_type | ((legend | hist | def_bubble_mod) & f_sp)

Output hidden; open in https://colab.research.google.com to view.

---
#**Extra visualizations**

In additon to encoding the requested visualitzations by the project coordinator, we have also elaborated some extra representations to further analyze differnts aspects of our data, as well as gain a deeper knowledge on interaction techniques.

###**E1 - GRADUATED SYMBOL MAP**

This first extra view has the same purpose as the overview in the mandatory section. On this, instead of using a choropleth, we decided to encode our data using a graduated symbol map, providing the same information as before but using a different format.

* **VISUAL REPRESENTATION**

This representation consists of a graduated symbol map. On it, the average number of available bikes (of the selected type) is encoded using the size of the points. Each district is filled with the same colour it has on the legend, to ease the user the district identification task.

* **INTERACTION**

In this plot, we can distinguish three interactive objects:

- The *Kind* selector.
- The *district legend*: This selector was modified, only allowing a single selection. This was done has the following extra plots make no sense when performing a multi-district analysis.
- The *Map* itself.As the new legend only allows selecting one district.

\
 
The district selection implies highlighting the region, increasing the thickness of its border and making unselected regions more transparent. These effects are also applied to the graduated symbols.

For last, the hovering now is performed over the symbols, to be coherent with the chosen visualization.


---
In this part, we encoded the new single selectors.
<a name="E1"></a>

In [130]:
# District selection (single)
map_single = alt.selection_single(fields=['Districte'], init = {'Districte':'Les Corts'},empty='none')

# Auxiliar condition for single district color encoding
district_color_single = alt.condition(map_single,alt.Color('Districte:N',legend = None,scale = alt.Scale(scheme= 'category10')),alt.value('lightgray'))

In [131]:
# Background map (encoded with district colors)
map = alt.Chart(aux_avg_df).mark_geoshape(stroke = 'black').properties(
    title = "Average number of bikes per district"
).encode(
    color = alt.Color('Districte:N',scale = alt.Scale(scheme= 'category10'), legend = None),
    strokeWidth = alt.condition(map_single,alt.value(3),alt.value(1)),
    opacity = alt.condition(map_single,alt.value(1),alt.value(0.075)),
).transform_lookup(
    lookup = 'Districte',
    from_ = alt.LookupData(map_data,'properties.NOM',fields=["type", "properties", "geometry"])
).add_selection(
    map_single
).properties(width = 750, height = 750)


# Symbol encoding
points = alt.Chart(aux_avg_df).transform_lookup(
    lookup = 'Districte',
    from_ = alt.LookupData(centroids,'Districte',fields=["lat", "lon"])
).mark_circle(color ="black").encode(
    opacity = alt.condition(map_single,alt.value(1),alt.value(0.3)),
    longitude="lon:Q",
    latitude="lat:Q",
    size=alt.Size("Average:Q", legend = None),
    tooltip = ['Districte', 'Average']
).transform_filter(
     type_select 
).add_selection(tooltips)


# Legend for single district selection
single_legend = alt.Chart(df).mark_rect().encode(
    y = alt.Y('Districte',axis = alt.Axis(orient = 'right', title = None)),
    color = district_color_single
).add_selection(
    map_single
).properties(title = 'District')


gsm = (type_butt|map + points).resolve_scale(
    x = 'independent',
    y = 'independent'
)

gsm | single_legend

Output hidden; open in https://colab.research.google.com to view.

###**E2 - HEATMAP**
As we are showing the data extracted from a month we thought it would be more familiar to the user to use a calendar-shaped view. By doing this, the user will be able to compare the data for the whole month, for a week or for a specific day of the week.
- **VISUAL REPRESENTATION**

We want to show the bike availability for each type and district during the month in a calendar-shape. Therefore we will use a chart with weekday and week of the month as X and Y-axis respectively to give the chart the shape of a calendar.\
As we want to show the number of bikes we will encode it with colour to see the evolution better (rather than showing the numerical values). This leaves us with a heatmap.

\
- **INTERACTION**

In this plot there are three interactive elements:

- The *district* legend.
- The *type* selector.
- The day and value tooltip.

\
To be able to show the values of each bike type and district we decided to use the two selectors defined for the graduated symbol map. One to select a district and the other to select the bike type (or the total).\
We also decided to add a tooltip to show the exact value and the day of the month that box represents to be able to allow more specific queries.

---
To be able to plot the desired chart we need two previous steps.

First, with the help of an auxiliary dataframe, we need to add a column corresponding to the week of the month for each row to be able to plot the calendar. We also map the numbers to the corresponding day of the week to ease the reading of the data.

Then, in the auxiliary dataframe, a groupby operation with the mean function was performed, as well as a long-form transformation of the data set. We also executed a rename operation in the *Type* values to make them easier to understand.
<a name="E2"></a>

In [132]:
# Auxiliar data frame dayweek/ #week/ daymonth average.

day_weekday = df
day_weekday['week'] = ((day_weekday['day_month'])//7)+1

# Renaming the days of the week and encoding the name of the month as a new variable.
day_weekday['weekday'].replace({1:'Monday', 2:'Tuesday', 3:'Wednesday', 4:'Thursday',5:'Friday', 6:'Saturday', 7:'Sunday'},inplace = True)
day_weekday['date'] = day_weekday.weekday + ", " + day_weekday.day_month.astype(str)

# Computing the average values.
day_weekday = df.groupby(["Districte","weekday","week","date"]).agg('mean')[["available_bikes","mechanical","electrical"]]
day_weekday = day_weekday.reset_index()

# Long-form conversion
day_weekday =  pd.melt(day_weekday, id_vars=['Districte','weekday','week','date'],var_name = 'Type', value_name='Average')
day_weekday['Type'].replace({"available_bikes":"Total","mechanical":"Mechanical","electrical":"Electrical"},inplace = True)

day_weekday.head()

Unnamed: 0,Districte,weekday,week,date,Type,Average
0,Ciutat Vella,Friday,1,"Friday, 4",Total,13.991604
1,Ciutat Vella,Friday,2,"Friday, 11",Total,15.849324
2,Ciutat Vella,Friday,3,"Friday, 18",Total,14.419221
3,Ciutat Vella,Friday,4,"Friday, 25",Total,14.352131
4,Ciutat Vella,Monday,2,"Monday, 7",Total,14.982624


In [133]:
# Heatmap
heatmap = alt.Chart(day_weekday).mark_rect().encode(
    x = alt.X('weekday:O',title = "Weekday",sort=alt.Sort(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])),
    y = alt.Y('week:O',title = "Week"),
    color = alt.Color('Average:Q',scale = alt.Scale(scheme= 'yelloworangered')),
    tooltip = [alt.Tooltip('date', title = "Day of the month"),'Average']
).transform_filter(
    map_single
).transform_filter(
    type_select
).add_selection(tooltips).properties(width = 500, height = 300)

type_butt|heatmap|single_legend

Output hidden; open in https://colab.research.google.com to view.

###**E3 - LINE CHART**

On this visualization, we wanted to show the different daily behaviour depending on the bike type per station. As we wanted to search for trends and there is a connexion between values, the line chart suited well our needs.

* **VISUAL REPRESENTATION**

In this plot, the day of the month is encoded using the X-axis, and the average number of bikes is encoded using the Y-axis.

The colour is used to show the user the district that is being analyzed and the stroke dash is used to differentiate mechanical bikes from electric ones.

When hovering over the plot, an auxiliary ruler is shown, to ease the task of detecting in which day the user is located in. Moreover, the exact values for the average metrics are shown for both bikes types, and a circular mark highlights its locations on the lines.

* **INTERACTION**

In this plot, we can distinguish two interactive objects:

- The *district legend*
- An *auxiliary hovering selector* that retrieves the X-axis position, used to display the details-on-demand.

\
 
The district selection filters the data, only showing the metrics for the chosen region.
<a name="E3"></a>

In [134]:
# Auxiliar data frame that computes the average available bikes discerning 
# by type per day of the month per district

aux_line = df.groupby(["Districte","day_month"]).agg('mean')[["mechanical","electrical"]]
aux_line = aux_line.reset_index()

# Long-form conversion
aux_line  =  pd.melt(aux_line , id_vars=['Districte','day_month'],var_name = 'Type', value_name='Average')
aux_line['Type'].replace({"mechanical":"Mechanical","electrical":"Electrical"},inplace = True)

In [141]:
# Create a selection that chooses the nearest point and selects based on X-value
nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['day_month'], empty='none')

# The basic line
line = alt.Chart(aux_line).mark_line(interpolate = 'cardinal').encode(
    x=alt.X('day_month:N', title = "Day") ,
    y='Average:Q',
    strokeDash=alt.StrokeDash('Type:N',sort=alt.Sort(['Mechanical','Electrical'])),
    color = 'Districte:N'
)

# Invisible selectors that provide information about
# the x-value of the cursor
selectors = alt.Chart(aux_line).mark_point().encode(
     x=alt.X('day_month:N', title = "Day"),
    opacity=alt.value(0),
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Average:Q', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(aux_line).mark_rule(color='gray').encode(
      x=alt.X('day_month:N', title = "Day"), 
).transform_filter(
    nearest
)

# Combining the layers
lines =alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=300
).transform_filter(
    map_single
)

single_legend| lines

Output hidden; open in https://colab.research.google.com to view.

###**E4 - FINAL**

To arrange properly the charts, a modification of the [line chart](#E3) legend has been made. We had to exactly set the position for this legend to not distort the general view. 

In [136]:
line_mod =  alt.Chart(aux_line).mark_line(interpolate = 'cardinal').encode(
    x=alt.X('day_month:N', title = "Day") ,
    y='Average:Q',
    strokeDash=alt.StrokeDash('Type:N',sort=alt.Sort(['Mechanical','Electrical']),legend = alt.Legend(orient = "none",legendX= 1575, legendY= 400)),
    color = 'Districte:N'
)

lines_mod=alt.layer(
    line_mod, selectors, points, rules, text
).properties(
    width=650, height=300
).transform_filter(
    map_single
)


In [137]:
gsm |((single_legend| heatmap) & lines_mod)

Output hidden; open in https://colab.research.google.com to view.

##**Problem resolution**
###Q1: Which is the district with a higher activity?
We can answer this question by looking at the [overview](#1). We can see the number of bikes encoded with colour. The ones with more bikes are: *Sant Martí*, *Ciutat Vella* and *Sant Andreu*, in this order.
####Q1.1: Which is the district with more mechanical bikes?
Still in the [overview](#1) we have the possibility to specify between mechanical, electrical or both types. With this interaction, we can answer the same question but looking just at the mechanical bikes. The districts with more mechanical bikes are: *Ciutat Vella*, *Sant Martí* and *Sant Andreu. The values for mechanical are very similar to the ones for both types.
####Q1.2: Which is the district with more electrical bikes?
As we did earlier we can now specify the bikes for the electrical type. The districts with more electrical bikes are: *Horta-Guinardó*, *Nou Barris* and *Les Corts*. *Horta-Guinardó* has a value of 1.7, which is almost two times higher than the other ones. This is because this district is located in an uneven altitude zone with most of the district being uphill (w.r.t the neighbour districts), and therefore people tend to use more e-bikes. 

&nbsp;
###Q2: Does electrical and mechanical bike availability correlate?
In order to answer this question we can use the [scatter plots](#3). The number of electrical and mechanical bikes are encoded in the X and Y- axis. If we fix a district through selection we can see how the bike availability behaves for each weekday. Now we can see how both types of bikes correlate. There are two behaviours present in the data:
* One corresponds to the districts that have a small amount of e-bike range and a wide distribution for the mechanical bikes. In this group we can fit the districts: *Sant Martí*, *Sant Andreu*, *Sants-Montjuïc*, *Eixample* and *Ciutat Vella*.
* The other group are districts with less active stations, where both values form a cloud of points. In this group we can fit *Gràcia*, *Horta-Guinardó*, *Les Corts*, *Nou Barris*, *Sarrià-Sant Gervasi*.

Another way to see it could be via the [bubble chart](#2). If we want to focus on the evolution during the month rather than on the stations in the district we will need to use this visualization.

If we look at the chart we can see how the cloud of points of each district once the month is over (selecting day 31) has a positive slope. In some cases it is very small, in others it is more accentuated. The only district with a strange behaviour is *Sant Andreu* where the cloud of points seems to have a negative but small slope.

&nbsp;
###Q3: How does the activity evolve along weekdays or weekends of a month per each district?
We can answer this question in two ways: 

&nbsp;  1 - We could use the [bubble chart](#2) to plot the evolution of each district. The bubble chart encodes both types of bikes and the total and has the possibility to interact with it. There are two possible interactions: with one we can select the districts we want to focus on; with the other, we can select the day of the month and see the evolution up to that day. We can see the variation during the month for each district.
And if we were to see this evolution per weekday we can take a look at the [scatter plots](#3). Here we can use the interaction of district selection with the weekdays already split.

&nbsp; 2 - If we want to see the evolution in a more continuous way we can use the [heatmap](#E2) and [line chart](#E3). With the line chart we can see the evolution for both types of bikes, for the selected district with a tooltip that specifies both values. And with the heatmap we have the possibility to see the evolution both through the month (day1 -> day2 -> day3) and weekday (monday1 -> monday2 -> monday3).

We can see how, in general, the number of bikes is similar during the month with the minimum values appearing at the weekends. The trend is the same for electrical and mechanical but the decrease during the weekends is higher in the mechanicals as there are more of them during the week.

&nbsp;
###Q4: Do neighbouring districts behave similarly?
To answer this question one might look at two traits. One option is to compare the evolution of both districts and another one is to compare the average values for the month.

To compare this we need to select the neighbouring districts in the [overview](#1) map and then, using the [bubble chart](#2), we can compare the evolution and values of them. By doing this we can see that the neighbour districts are correlated mostly in pairs and not with all of them. For example, *Nou Barris* is neighbour to *Horta-Guinardó* and *Sant Andreu* but the behaviour of the first two is similar while the third one is way different. Other examples of correlated pairs are: *Les Corts* and *Sarrià Sant-Gervasi*, *Ciutat Vella* and *Sant Martí*, *Sants-Montjuïc* and *Eixample*.

&nbsp;
###Q5: How are bikes distributed among districts? And in the whole city?
To answer this question we will use the [histogram](#4). With the selection provided by the map or the interactive legend we can specify which districts we want to analyze. We can see the distributions for each district, groups of districts and the whole city. If we take a look at these histograms we can see that:
 * The distributions vary a lot depending on the district.
 * All districts have stations with very low usage
 * The city has a big number of low usage stations
 * The average value is around 10.

If we now also use the bike type interaction we can see that:
 * The mechanical bikes behave just like the total (proof that they are the majority of bikes used)
 * The e-bike distribution is accumulated at low values.
 * The higher e-bike values are between 3 and 5, while for mechanical it is between 30 and 35.

&nbsp;
###Q6: Does the behaviour change according to the day of the week?
To answer this question we can use the [heatmap](#E2). In the heatmap we have a calendar-shaped visualization of the number of bikes for each district. With a district selected we can see the number of bikes per day of the month. We can see how evolution depends on the district. 
* Some of them decrease: *Les Corts*, *Sarrià Sant-Gervasi*, *Gràcia*.
* Some others do not experience much change: *Eixample*, *Sant Martí*, *Horta-Guinardó*.
* And others increase: *Nou Barris*, *Sants-Montjuïc*, *Sant Andreu*, *Ciutat Vella*.

Another way to approach this question is by using the [scatter plots](#3), we can now compare the values between districts and stations of the same district of the working days. With the scatter plot we lose the possibility to compare w.r.t day of the month but we can see a more detailed depiction of the behaviour during the week.

We can see how the values are close to zero (especially in the e-bike) and as the week goes on some of the points tend to grow (Monday to Tuesday) and then decrease at the end of the week (Thursday to Friday).