<a href="https://colab.research.google.com/github/SDS-AAU/SDS-master/blob/master/notebooks/M2_power_elites_starter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:
!pip install --upgrade scipy -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m34.5/34.5 MB[0m [31m21.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [6]:
# Basic packaging for network exploration
import pandas as pd
import networkx as nx
from community import community_louvain

import altair as alt

# Exploring the graph of Danish Power Elites

![](https://source.unsplash.com/GWe0dlVD9e0)

> Many people dream of being one of them, but only few make it all the way to the top. According to two CBS researchers, it takes more than just hard work to get to the top of the Danish hierarchy of power. [read more](https://www.cbs.dk/en/alumni/news/a-look-the-danish-power-elite)

In this project we are going to construct and explore a network of Danish power elites derived from boards of various organisations in th country.
We will construct an association network: Who is being in the same board? And first explore "basic" centrality indicators. Then identify communities and central persons within those. Finally we look at some "fancier" interactive network visualisation.

In this tutorial we will be using some more advanced Pandas techniques that may be new for you. Use the documentation if in doubt.

You can read some related research [here](https://research-api.cbs.dk/ws/portalfiles/portal/57663543/anton_grau_larsen_and_christoph_houmann_ellersgaard_who_listens_to_the_top_acceptedversion.pdf)



# Obtaining exploring data

## Loading the data

In [8]:
# Import data :-) and quick check
data = pd.read_csv('https://raw.githubusercontent.com/SDS-AAU/SDS-master/master/00_data/networks/elite_den17.csv')
data.head()

Unnamed: 0,NAME,AFFILIATION,ROLE,TAGS,POSITION_ID,ID,SECTOR,TYPE,DESCRIPTION,CREATED,ARCHIVED,LAST_CHECKED,CVR_PERSON,CVR_AFFILIATION,PERSON_ID,AFFILIATION_ID
0,Aage Almtoft,Middelfart Sparekasse,Member,"Corporation, FINA, Banks, Finance",1,95023,Corporations,,Automatisk CVR import at 2016-03-12 18:01:28: ...,2016-03-12T18:01:28Z,,2017-11-09T15:38:01Z,4003984000.0,24744817.0,1,3687
1,Aage B. Andersen,Foreningen Østifterne - Repræsentantskab (Medl...,Member,"Charity, Foundation, Insurance, Socialomraadet",4,67511,NGO,Organisation,Direktør,2016-02-05T14:45:10Z,,2016-02-12T14:41:09Z,,,3,2528
2,Aage Christensen,ÅRHUS SØMANDSHJEM,Chairman,"Foundation, Marine, Tourism",6,100903,Foundations,,Automatisk CVR import at 2016-03-12 18:08:31: ...,2016-03-12T18:08:31Z,,2017-11-09T15:50:09Z,4000054000.0,29094411.0,4,237
3,Aage Dam,"Brancheforeningen automatik, tryk & transmissi...",Chairman,"Business association, Interest group, Technology",8,69156,NGO,Organisation,"Formand, Adm. direktør, Bürkert Contromatic A/S",2016-02-10T15:18:47Z,,2016-02-10T14:19:20Z,,,5,469
4,Aage Dam,Dansk Erhverv (bestyrelse),Member,Employers association,9,72204,NGO,Stat,Adm. dir. Aage Dam- Bürkert-Contromatic A/S,2016-02-16T10:49:01Z,,2016-02-16T11:55:34Z,,43232010.0,5,1041


As we can see each person has different attributes, among others IDs and affiliation IDs. There are also data for sector and role tha we could use for filtering or EDA.

In [None]:
data['AFFILIATION'].value_counts(ascending=False).nlargest(20)

H.M. Dronningens 75-års fødselsdag                                                           803
Axcelfuture - konferencedeltagere                                                            332
Gallatafler ved statsbesøg (2016-17) I                                                       250
Uddannelses- og Forskningsministeriet (Kvalifikationsnævnet - Medlemmer)                     232
Miljø- og Fødevareministeriet (Natur- og Miljøklagenævnet - Den sagkyndige sammensætning)    214
Folketingets Presseloge (Institutioner under Folketinget) (Medlemmer)                        199
Reception på Kongeskibet Dannebrog (2017)                                                    196
Gallatafler ved statsbesøg (2016-17) II                                                      195
Landvind i en ny virkelighed (Konference)                                                    148
Nytårskur og -taffel (2016 – 2018)                                                           146
Venstre (Hovedbestyrelse)     

In [9]:
data['SECTOR'].value_counts(ascending=False)

NGO             17720
State           13601
Corporations     7989
Foundations      6987
VL_networks      3803
Events           1948
Parliament       1087
Commissions       795
Municipal         320
Family            207
Politics           37
Organisation        6
Name: SECTOR, dtype: int64

In [10]:
# this would be the way for you to subset for corporate affiliation (which also have CVR numbers)
data = data.query('SECTOR == "Corporations"')
data = data.dropna(subset = ['CVR_AFFILIATION'])

In [11]:
data['AFFILIATION'].value_counts(ascending=False).nlargest(20)

Kromann Reumert                     55
Bech-Bruun                          54
Gorrissen Federspiel                40
Plesner                             40
EnergiMidt                          31
Lett Law Firm                       27
Syd Energi (SE)                     24
TDC (note)                          24
Bruun & Hjejle                      23
Dansk Retursystem                   22
Alm. Brand                          22
Danske Bank                         21
SEAS-NVE                            20
Rønne & Lundgren                    20
Nykredit Realkredit (Bestyrelse)    20
Carlsberg                           19
Naturgas Fyn                        19
PensionDanmark                      19
Vandcenter Syd                      19
Novo Nordisk                        18
Name: AFFILIATION, dtype: int64

In [12]:
data['NAME'].value_counts(ascending=False).nlargest(20)

Karen Frøsig                  7
Gert Rinaldo Jonassen         7
Jeppe Christiansen            6
Michael Christiansen 25501    6
Henning Kruse Petersen        6
Jørgen Huno Rasmussen         5
Anders Christen Obel          5
Niels Thomas Heering          5
Preben Sunke                  5
Jens Bjerg Sørensen           5
Lars Nørby Johansen           5
Jørn Ankær Thomsen            5
Niels Jørgen Kornerup         5
John Christiansen 16895       5
Kim Simonsen                  5
Lasse Nyby                    5
Niels Jacobsen 27459          5
David Hellemann               5
Hans Henrik Kjølby 10930      5
John Bull Fisker              5
Name: NAME, dtype: int64

## EDA

In [13]:
toplot = data['AFFILIATION'].value_counts(ascending=False).nlargest(20).reset_index()
alt.Chart(toplot).mark_bar().encode(
    x='index:N',
    y='AFFILIATION:Q'
)

## Edgelist construction

Given that each person and affiliation have unique IDs, we have perfect input for network construction




In [14]:
# select name and IDs
data_select = data[['NAME', 'PERSON_ID', 'AFFILIATION_ID']]

We can create an edge dataframe utilising a "trick" where we merge the dataframe with itself using `AFFILIATION_ID` as key. The only thing that we then need to remove are self-links since a person can not really be in a board with itself.

The initial dataframe has ~60 rows. The new after the merger ~160k. That looks promising.

In [15]:
# create edge DF by merge with itself.
edges = pd.merge(data_select, data_select, on='AFFILIATION_ID')
edges.head()

Unnamed: 0,NAME_x,PERSON_ID_x,AFFILIATION_ID,NAME_y,PERSON_ID_y
0,Aage Almtoft,1,3687,Aage Almtoft,1
1,Aage Almtoft,1,3687,Allan Buch,311
2,Aage Almtoft,1,3687,Bo Skovby Rosendahl,4491
3,Aage Almtoft,1,3687,Bo Smith 4493,4493
4,Aage Almtoft,1,3687,Martin Nørholm Baltser,24816


In [16]:
# Filter out self-edges
edges = edges[edges.PERSON_ID_x != edges.PERSON_ID_y]

We are now in a situation whre people that sit in multiple boards together will have one row per board. This can be aggregated in the following way by grouping.



In [17]:
# grouping to aggregate multiple co-occurences and to generate a weight: 
# How many times did PesonX and PersonY sit in boards together
# reset_index makes everytging from a multi-index-series into a dataframe
edges = edges.groupby(['PERSON_ID_x', 'PERSON_ID_y']).size().reset_index()

In [18]:
# column "0" is now our weight
edges.head()

Unnamed: 0,PERSON_ID_x,PERSON_ID_y,0
0,1,311,1
1,1,4491,1
2,1,4493,1
3,1,24816,1
4,1,31093,1


In [19]:
edges[0].value_counts()

1    58824
2     1560
4      222
3       34
5       12
Name: 0, dtype: int64

In [20]:
# finally we rename the "0" column to weight
edges.rename({0:'weight'}, axis = 1, inplace=True)

In [21]:
len(edges)

60652

Most of the people co-occure only once. There are only 4 cases where 2 people meet each other in 15 boards. This is also the strongest weight.

## Creating the Graph object with NetworkX

Now we can create a network object from this edgelist. From here we will calculate various centrality measures and perform community detection. Think about the latter as UML (which it actually is).
This will allow us to investigate e.g.:

- Are there power clusters within different domains (education, agriculture...)?
- Who are the top people in these communities



In [None]:
# Create network object from pandas edgelist
G = nx.from_pandas_edgelist(edges, source='PERSON_ID_x', target='PERSON_ID_y', edge_attr='weight', create_using=nx.Graph())