## Data Summary for Student Survey

[->Link to Codebook<-](https://docs.google.com/spreadsheets/d/18IHM1UxofGLepFGYCxdTPjGb8OH7ijN7H58fSJNLitI/edit?pli=1&gid=0#gid=0)

### Notation for Network Analysis: 

### General Information for Data:
- Missing Values: `-999` 
- Primary key: `participant.label` with the unique three letter code for participants, that was supposed to be kept between waves

In [1]:
## Packages for General Analysis 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Packages for Network Analysis

import networkx as nx #this is the easiest to use network analysis package for python, it basically holds all the functions you need
#import igraph as ig #this is basically the R package igraph imported to python so you might feel at home with it
#import graph_tool as gt #this is an alternative package that holds some additional functionalities but is not needed, its also really fast

#### Data Tables 
- Note: here we are keeping all observations that provided a legal ID-code, in each wave
- W1 (n=180), W2(n=180), W3(n=170)

In [2]:
# Tables from processing 
df_w1 = pd.read_csv('../Cooked/df_w1_prepared.csv')
df_w2 = pd.read_csv('../Cooked/df_w2_prepared.csv') 
df_w3 = pd.read_csv('../Cooked/df_w3_prepared.csv')

#### Network Analysis 
- Note: we are only keeping the people who were present in all waves for the .gml files (n=153)

In [3]:
# Network Data in gml form, the graph is basically multiplex network
# all relations are in one graph and the type stored in edge attributes
G_1 = nx.read_gml("../NA/multiplex_graph_w1.gml")
G_2 = nx.read_gml("../NA/multiplex_graph_w2.gml")
G_3 = nx.read_gml("../NA/multiplex_graph_w3.gml")

##### Examples of how top access data

In [48]:
# All variables that are in the datatables are also stored in the node attributes 
list(G_1.nodes(data=True))[:1]

[('kru',
  {'browser': 3.0,
   'device_type': 3.0,
   'lang': 0.0,
   'operating_system': 2.0,
   'use_of_device': 1.0,
   'age': 22.0,
   'edu_father': 1.0,
   'edu_mother': 1.0,
   'gender': 1.0,
   'grade': -999.0,
   'ocu_father': 1.0,
   'ocu_mother': 1.0,
   'postcode': 88.0,
   'tutorial': 6.0,
   'linksrechts_self': 0.0,
   'lr_AfD': 11.0,
   'lr_BSW': 8.0,
   'lr_CDU': 6.0,
   'lr_CSU': 6.0,
   'lr_FDP': 3.0,
   'lr_Gruene': 4.0,
   'lr_Linke': 3.0,
   'lr_SPD': 5.0,
   'noteligible_sunday_party_vote': 0.0,
   'politics_question_five': 2.0,
   'politics_question_four': 2.0,
   'politics_question_one': 4.0,
   'politics_question_seven': 4.0,
   'politics_question_six': 4.0,
   'politics_question_three': 4.0,
   'politics_question_two': 2.0,
   'scalo_afd': -5.0,
   'scalo_bsw': -3.0,
   'scalo_cdu': -1.0,
   'scalo_csu': -1.0,
   'scalo_fdp': 0.0,
   'scalo_gruene': 2.0,
   'scalo_linke': 1.0,
   'scalo_pep10': -5.0,
   'scalo_pep11': -5.0,
   'scalo_pep12': 1.0,
   'scalo_pep1

In [43]:
list(G_1.edges(data=True))[:5]

[('kru', 'bs3', {'type': 'aquaintance'}),
 ('kru', 'bs3', {'type': 'leftright', 'weight': 4}),
 ('kru', 'bs3', {'type': 'friend'}),
 ('kru', 'k4w', {'type': 'aquaintance'}),
 ('kru', 'k4w', {'type': 'leftright', 'weight': 4})]

In [51]:
# Only show edges where the 'type' attribute is 'friend'
filtered_edges = [(u, v, d) for u, v, d in G_1.edges(data=True) if d.get('type') == 'friend']
print(filtered_edges[0:5])

# Create a Friendship Graph
G_friends = nx.DiGraph()  # or nx.DiGraph() if original graph is directed
G_friends.add_nodes_from(G_1.nodes(data=True))  # keep node attributes
G_friends.add_edges_from(filtered_edges)

print(G_friends.number_of_edges())

[('kru', 'bs3', {'type': 'friend'}), ('bs3', 'kru', {'type': 'friend'}), ('kmw', 'cyw', {'type': 'friend'}), ('b3p', '9he', {'type': 'friend'}), ('b3p', 'kmw', {'type': 'friend'})]
336
