<a href="https://colab.research.google.com/github/aaubs/ds-master/blob/main/notebooks/M2_power_elites_starter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import networkx as nx

import numpy as np
import pandas as pd
import seaborn as sns

# Exploring the graph of Danish Power Elites

![](https://source.unsplash.com/GWe0dlVD9e0)

> Many people dream of being one of them, but only few make it all the way to the top. According to two CBS researchers, it takes more than just hard work to get to the top of the Danish hierarchy of power. [read more](https://www.cbs.dk/en/alumni/news/a-look-the-danish-power-elite)

In this project we are going to construct and explore a network of Danish power elites derived from boards of various organisations in th country.
We will construct an association network: Who is being in the same board? And first explore "basic" centrality indicators. Then identify communities and central persons within those. Finally we look at some "fancier" interactive network visualisation.

In this tutorial we will be using some more advanced Pandas techniques that may be new for you. Use the documentation if in doubt.

You can read some related research [here](https://research-api.cbs.dk/ws/portalfiles/portal/57663543/anton_grau_larsen_and_christoph_houmann_ellersgaard_who_listens_to_the_top_acceptedversion.pdf)



# Obtaining exploring data

## Loading the data

In [None]:
# Import data :-) and quick check
data = pd.read_csv('https://github.com/SDS-AAU/SDS-master/raw/master/00_data/networks/elite_den17.csv')
data.head()

Unnamed: 0,NAME,AFFILIATION,ROLE,TAGS,POSITION_ID,ID,SECTOR,TYPE,DESCRIPTION,CREATED,ARCHIVED,LAST_CHECKED,CVR_PERSON,CVR_AFFILIATION,PERSON_ID,AFFILIATION_ID
0,Aage Almtoft,Middelfart Sparekasse,Member,"Corporation, FINA, Banks, Finance",1,95023,Corporations,,Automatisk CVR import at 2016-03-12 18:01:28: ...,2016-03-12T18:01:28Z,,2017-11-09T15:38:01Z,4003984000.0,24744817.0,1,3687
1,Aage B. Andersen,Foreningen Østifterne - Repræsentantskab (Medl...,Member,"Charity, Foundation, Insurance, Socialomraadet",4,67511,NGO,Organisation,Direktør,2016-02-05T14:45:10Z,,2016-02-12T14:41:09Z,,,3,2528
2,Aage Christensen,ÅRHUS SØMANDSHJEM,Chairman,"Foundation, Marine, Tourism",6,100903,Foundations,,Automatisk CVR import at 2016-03-12 18:08:31: ...,2016-03-12T18:08:31Z,,2017-11-09T15:50:09Z,4000054000.0,29094411.0,4,237
3,Aage Dam,"Brancheforeningen automatik, tryk & transmissi...",Chairman,"Business association, Interest group, Technology",8,69156,NGO,Organisation,"Formand, Adm. direktør, Bürkert Contromatic A/S",2016-02-10T15:18:47Z,,2016-02-10T14:19:20Z,,,5,469
4,Aage Dam,Dansk Erhverv (bestyrelse),Member,Employers association,9,72204,NGO,Stat,Adm. dir. Aage Dam- Bürkert-Contromatic A/S,2016-02-16T10:49:01Z,,2016-02-16T11:55:34Z,,43232010.0,5,1041


Given that each person and affiliation have unique IDs, we have perfect input for network construction




In [None]:
# select name and IDs
data_select = data[['NAME', 'PERSON_ID', 'AFFILIATION_ID']]

We can create an edge dataframe utilising a "trick" where we merge the dataframe with itself using `AFFILIATION_ID` as key. The only thing that we then need to remove are self-links since a person can not really be in a board with itself.

The initial dataframe has ~60 rows. The new after the merger ~160k. That looks promising.

In [None]:
# create edge DF by merge with itself.
edges = pd.merge(data_select, data_select, on='AFFILIATION_ID')
edges.head()

Unnamed: 0,NAME_x,PERSON_ID_x,AFFILIATION_ID,NAME_y,PERSON_ID_y
0,Aage Almtoft,1,3687,Aage Almtoft,1
1,Aage Almtoft,1,3687,Allan Buch,311
2,Aage Almtoft,1,3687,Bo Skovby Rosendahl,4491
3,Aage Almtoft,1,3687,Bo Smith 4493,4493
4,Aage Almtoft,1,3687,Martin Nørholm Baltser,24816
