Dataset taken from https://www.pippanorris.com/data  
Experiment proposed by Simon Rudkin, https://www.swansea.ac.uk/staff/som/academic-staff/s.t.rudkin/  
This notebook was prepared by Davide Gurnari. 

In [None]:
import numpy as np
import pandas as pd
import networkx as nx

from matplotlib import pyplot as plt

In [None]:
full_df = pd.read_csv('data/brexitdata', sep='\t', encoding='iso-8859-1')
print(full_df.shape)
full_df.head()

In this whole example we use the simplified socio-economical data gathering information about housing, relationship, number of cars, profession, self assessed health as well as the level of deprivation of the area the constituancy is located.

In [None]:
subset = ['c11HouseOutright', 'c11HouseMortgage', 'c11HouseholdOnePerson', 
          'c11HouseholdMarried', 'c11CarsNone', 'c11CarsOne', 'c11CarsTwo',
          'c11NSSECLowerManager', 'c11QualNone', 'c11QualLevel4', 
          'c11HealthVeryGood', 'c11HealthGood', 'c11DeprivedNone', 'c11Deprived1']

db104 = full_df[subset].copy()

coloring_subset = ['leaveHanretty', 'lm17', 'cm17']
coloring_df = full_df[coloring_subset].copy()

## Create Ball Mapper

In [None]:
from pyballmapper import BallMapper

In [None]:
bm = BallMapper(points = db104.values, # the pointcloud, as a numpy array
                epsilon = 18) # the radius of the balls

## Coloring

In [None]:
from matplotlib.colors import ListedColormap
from matplotlib import cm

my_rainbow_palette = cm.get_cmap(name='plasma')

In [None]:
plt.figure(figsize= (8,6))
bm.add_coloring(coloring_df)
#In this case we color the graph by support for Brexit in 2016 referendum.
bm.draw_networx(coloring_variable='leaveHanretty', color_palette=my_rainbow_palette, colorbar=True)
plt.title('BM graph colored by support for Brexit')
plt.show()

In [None]:
# In this case we color the graph by support for Labour party in 2017 election.
plt.figure(figsize= (8,6))
bm.add_coloring(coloring_df)
#In this case we color the graph by support for Brexit in 2016 referendum.
bm.draw_networx(coloring_variable='lm17', color_palette=my_rainbow_palette, colorbar=True)
plt.title('BM graph colored by support for Labour party')
plt.show()

In [None]:
# In this case we color the graph by support for Conservative party in 2017 election
plt.figure(figsize= (8,6))
bm.add_coloring(coloring_df)
#In this case we color the graph by support for Brexit in 2016 referendum.
bm.draw_networx(coloring_variable='cm17', color_palette=my_rainbow_palette, colorbar=True)
plt.title('BM graph colored by support for Conservative party')
plt.show()

In [None]:
# We can plot the same graphs using Bokeh
from pyballmapper.plotting import graph_GUI
from bokeh.plotting import figure, show

bm.add_coloring(coloring_df)
my_fancy_gui = graph_GUI(bm.Graph, my_rainbow_palette)

In [None]:
#In this case we color the graph by support for Brexit in 2016 referendum.
my_fancy_gui.color_by_variable('leaveHanretty')
show(my_fancy_gui.plot)

In [None]:
# In this case we color the graph by support for Labour party in 2017 election.
my_fancy_gui.color_by_variable('lm17')
show(my_fancy_gui.plot)

In [None]:
# In this case we color the graph by support for Conservative party in 2017 election
my_fancy_gui.color_by_variable('cm17')
show(my_fancy_gui.plot)

In [None]:
#We may want to see why there is a node 20 so different from others supporting Tories
one = [20]
two = [6,7,10]

one_points = np.unique(np.concatenate([bm.points_covered_by_landmarks[node] 
                                            for node in one]))

two_points = np.unique(np.concatenate([bm.points_covered_by_landmarks[node] 
                                            for node in two]))

In [None]:
# absolute difference of the averages, divided by the average in the whole dataset
(abs(db104.iloc[one_points].mean() - db104.iloc[two_points].mean()) / db104.mean()).sort_values(ascending=False)

In [None]:
# We see that the coordinates 9, 10 and 13 makes much difference. To find out what
# They are, we should look at the list of variables:
# 1 - dt1$c11HouseOutright,
# 2 - dt1$c11HouseMortgage,
# 3 - dt1$c11HouseholdOnePerson,
# 4 - dt1$c11HouseholdMarried,
# 5 - dt1$c11CarsNone,
# 6 - dt1$c11CarsOne,
# 7 - dt1$c11CarsTwo,
# 8 - dt1$c11NSSECLowerManager,
# 9 - dt1$c11QualNone,
# 10 - dt1$c11QualLevel4,
# 11 - dt1$c11HealthVeryGood,
# 12 - dt1$c11HealthGood,
# 13 - dt1$c11DeprivedNone,
# 14 - dt1$c11Deprived1
# Therefore we have no (9) or lowest (10) level of qualification and deprivation
# of the area. One can bring this analysis forward, and find the regions responsible
# for that, but we will not do it here.