# 링크 예측 전후 네트워크 내 노드들의 특성 비교 

- 네트워크 내 노드 특성
    - centrality
    - weighted centrality
    - betweeness centrality

In [1]:
import networkx as nx
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
before_link_prediction = pd.read_csv('network_data/contributor_coupling.csv', index_col=0)
constructed_edge = pd.read_csv('result/constructed_edge_2.csv')
meaning_edge = pd.read_csv('result/meaning_edge_2.csv')

In [3]:
edge_list_after_link_prediction = pd.concat([constructed_edge, meaning_edge]).reset_index()

In [4]:
def change_edge_list_to_adjacency(edge_list) :
    G = nx.from_pandas_edgelist(edge_list, source='node1', target='node2').to_undirected()
    adjacency = nx.to_pandas_adjacency(G)
    return adjacency

In [5]:
after_link_prediction = change_edge_list_to_adjacency(edge_list_after_link_prediction)

---

## Isolated node 제거
   
각 네트워크는 모두 isolate 노드를 가지고 있으므로 이를 사전에 제거해주기

In [6]:
def remove_non_isolated_node(network) :
    network_sum = network.sum()
    non_isolated_node = list(network_sum[network_sum>0].index)
    
    return network.loc[non_isolated_node, non_isolated_node]

In [7]:
before_link_prediction = remove_non_isolated_node(before_link_prediction)
after_link_prediction = remove_non_isolated_node(after_link_prediction)

In [8]:
print('Number of node before link prediction : {}'.format(len(before_link_prediction.columns)))
print('Number of node after link prediction : {}'.format(len(after_link_prediction.columns)))

Number of node before link prediction : 154
Number of node after link prediction : 154


In [9]:
before_G = nx.from_pandas_adjacency(before_link_prediction)
after_G = nx.from_pandas_adjacency(after_link_prediction)

---

## 기본 네트워크 정보
   
- link prediction 이전 $\rightarrow$ 노드 : 154개, 엣지 442개, average degree : 5.74 
- link prediction 이후 $\rightarrow$ 노드 : 154개, 엣지 654개, average degree : 8,49  

link prediction 이후 노드는 일정하나 엣지는 오히려 증가   
그에 따라 평균 degree도 5에서 8로 급증

In [10]:
print(nx.info(before_G))

Name: 
Type: Graph
Number of nodes: 154
Number of edges: 442
Average degree:   5.7403


In [11]:
print(nx.info(after_G))

Name: 
Type: Graph
Number of nodes: 154
Number of edges: 654
Average degree:   8.4935


---

## 노드별 degree
   
Link prediction 이전과 이후 각 노드별 degree의 차이를 비교    
    
    
link prediction 이전에 높은 degree를 가지고 있던 노드들이 그대로 link prediction 이후에도 유지되고 있음을 볼 수 있음   
그중 가장 높은 degree를 가지고 있던 apollo는 degree가 11이 감소했음에도 여전히 degree값이 가장 큼

In [12]:
before_degree_ranking = pd.DataFrame(before_G.degree, columns=['repo name', 'degree']).sort_values(by='degree', ascending=False)
after_degree_ranking = pd.DataFrame(after_G.degree, columns=['repo name', 'degree']).sort_values(by='degree', ascending=False)

In [13]:
before_degree_ranking.head(10)

Unnamed: 0,repo name,degree
15,ApolloAuto/apollo,41
7,udacity/CarND-Extended-Kalman-Filter-Project,20
29,deepdrive/deepdrive,19
91,bark-simulator/bark,19
55,BeamNG/BeamNGpy,19
41,joytafty-work/Rimchala_CarND-Traffic-Sign-Clas...,19
38,amodeus-science/amod,19
102,tier4/AutomanTools,19
63,TeamPerceptronics/Capstone,18
75,TeamDoernbach/SDCNanodegreeCapstone,18


In [14]:
after_degree_ranking.head(10)

Unnamed: 0,repo name,degree
99,ApolloAuto/apollo,39
93,amodeus-science/amod,21
144,joytafty-work/Rimchala_CarND-LaneLines-P1,21
125,tier4/AutomanTools,21
147,joytafty-work/Rimchala_CarND-Traffic-Sign-Clas...,21
114,deepdrive/deepdrive,21
149,Self-Driving-Car-NDegree-M-Russo/CarND-P1-Lane...,21
112,joytafty-work/Rimchala_CarND-Advanced-Lane-Lines,21
130,TeamDoernbach/SDCNanodegreeCapstone,20
120,DolphinDSL/dolphin,20


---

## 노드별 degree centrality
   
link prediction 이전과 이후 degree centrality의 차이를 비교
   
기본적으로 degree와 큰 차이가 없음

In [15]:
def make_degree_centrality_df(data) :
    data = pd.DataFrame.from_dict([nx.degree_centrality(data)]).T.reset_index()
    data.columns = ['repo name', 'degree centrality']
    data.sort_values(by='degree centrality', ascending=False, inplace=True)
    
    return data

In [16]:
before_degree_centrality_ranking = make_degree_centrality_df(before_G)
after_degree_centrality_ranking = make_degree_centrality_df(after_G)

In [17]:
before_degree_centrality_ranking.head(10)

Unnamed: 0,repo name,degree centrality
15,ApolloAuto/apollo,0.267974
7,udacity/CarND-Extended-Kalman-Filter-Project,0.130719
29,deepdrive/deepdrive,0.124183
91,bark-simulator/bark,0.124183
55,BeamNG/BeamNGpy,0.124183
41,joytafty-work/Rimchala_CarND-Traffic-Sign-Clas...,0.124183
38,amodeus-science/amod,0.124183
102,tier4/AutomanTools,0.124183
63,TeamPerceptronics/Capstone,0.117647
75,TeamDoernbach/SDCNanodegreeCapstone,0.117647


In [18]:
after_degree_centrality_ranking.head(10)

Unnamed: 0,repo name,degree centrality
99,ApolloAuto/apollo,0.254902
93,amodeus-science/amod,0.137255
144,joytafty-work/Rimchala_CarND-LaneLines-P1,0.137255
125,tier4/AutomanTools,0.137255
147,joytafty-work/Rimchala_CarND-Traffic-Sign-Clas...,0.137255
114,deepdrive/deepdrive,0.137255
149,Self-Driving-Car-NDegree-M-Russo/CarND-P1-Lane...,0.137255
112,joytafty-work/Rimchala_CarND-Advanced-Lane-Lines,0.137255
130,TeamDoernbach/SDCNanodegreeCapstone,0.130719
120,DolphinDSL/dolphin,0.130719


---

## 노드별 betweenness centrality
   
link prediction 이전과 이후 betweenness centrality의 차이를 비교
   

In [19]:
def make_betweenness_centrality_df(data) :
    data = pd.DataFrame.from_dict([nx.betweenness_centrality(data)]).T.reset_index()
    data.columns = ['repo_name', 'betweenness_centrality']
    data.sort_values(by='betweenness_centrality', ascending=False, inplace=True)
    
    return data

In [20]:
before_betweenness_centrality_ranking = make_betweenness_centrality_df(before_G)
after_betweenness_centrality_ranking = make_betweenness_centrality_df(after_G)

In [21]:
before_betweenness_centrality_ranking.head(10)

Unnamed: 0,repo_name,betweenness_centrality
15,ApolloAuto/apollo,0.209532
23,carla-simulator/carla,0.048435
60,fzi-forschungszentrum-informatik/Lanelet2,0.039065
9,microsoft/AirSim,0.029906
7,udacity/CarND-Extended-Kalman-Filter-Project,0.01959
126,carla-simulator/leaderboard,0.013588
2,autorope/donkeycar,0.013201
129,MRWolves/donkeycar,0.013201
63,TeamPerceptronics/Capstone,0.009431
109,udacity-om/CarND-Capstone,0.009431


In [22]:
after_betweenness_centrality_ranking.head(10)

Unnamed: 0,repo_name,betweenness_centrality
99,ApolloAuto/apollo,0.419133
94,carla-simulator/carla,0.239035
59,erdos-project/erdos,0.138975
17,PacktPublishing/Hands-On-Vision-and-Behavior-f...,0.130289
106,fzi-forschungszentrum-informatik/oadrive-bsd,0.103113
20,tier4/AutowareArchitectureProposal.proj,0.091743
125,tier4/AutomanTools,0.087655
107,fzi-forschungszentrum-informatik/aadc2016,0.084365
6,CankayaUniversity/ceng-407-408-2020-2021-3D-Ob...,0.074303
30,bark-simulator/bark-ml,0.065617


---

## 노드별 betweenness centrality 증가량
   
link prediction 이전과 이후를 비교하여 betweenness centrality의 증가량이 가장 크거나 감소량이 가장 큰 5개의 노드 추출

In [23]:
new_before_betweenness_centrality_ranking = []

for idx, row in before_betweenness_centrality_ranking.iterrows() :
    if row.repo_name in list(after_betweenness_centrality_ranking.repo_name):
        new_before_betweenness_centrality_ranking.append(row)
new_before_betweenness_centrality_ranking = pd.DataFrame(new_before_betweenness_centrality_ranking)

new_before_betweenness_centrality_ranking.sort_index(inplace=True)
new_before_betweenness_centrality_ranking.reset_index(drop=True, inplace=True)
new_before_betweenness_centrality_ranking.set_index('repo_name', inplace=True)

after_betweenness_centrality_ranking.sort_index(inplace=True)
after_betweenness_centrality_ranking.reset_index(drop=True, inplace=True)
after_betweenness_centrality_ranking.set_index('repo_name', inplace=True)

In [28]:
difference_betweenness = after_betweenness_centrality_ranking.subtract(new_before_betweenness_centrality_ranking, axis=1).sort_values(by='betweenness_centrality', ascending=False)
difference_betweenness['betweenness_centrality'] = np.divide(difference_betweenness['betweenness_centrality'], new_before_betweenness_centrality_ranking['betweenness_centrality'])
difference_betweenness.replace([np.inf], np.nan, inplace=True)
difference_betweenness = difference_betweenness.dropna(axis=0).sort_values(by='betweenness_centrality', ascending=False)

In [34]:
new_before_betweenness_centrality_ranking.loc['bark-simulator/bark-ml', :]

betweenness_centrality    0.000344
Name: bark-simulator/bark-ml, dtype: float64

In [33]:
after_betweenness_centrality_ranking.loc['bark-simulator/bark-ml', :]

betweenness_centrality    0.065617
Name: bark-simulator/bark-ml, dtype: float64

In [35]:
difference_betweenness.head(15)

Unnamed: 0_level_0,betweenness_centrality
repo_name,Unnamed: 1_level_1
purdue-arc/autonomous_car_RRT,545.333333
bark-simulator/bark-ml,189.75
lgsvl/simulator,163.69405
tier4/AutowareArchitectureProposal.proj,93.826049
tier4/AutomanTools,11.701016
whats-in-a-name/CarND-Capstone,7.638412
ZhuiFengChaseWind/Self-Driving_Car_Capstone,4.349703
carla-simulator/carla,3.935166
udacity/CarND-Unscented-Kalman-Filter-Project,2.559616
bedlamite5/CarND-Capstone,1.32494


In [30]:
difference_betweenness.tail(10)

Unnamed: 0_level_0,betweenness_centrality
repo_name,Unnamed: 1_level_1
unstoppables/Real-Self-Driving-Car,-0.880789
bark-simulator/bark,-0.889223
BeamNG/BeamNGpy,-0.942123
usdot-fhwa-stol/carma-platform,-0.974684
KNR-Selfie/selfie_carolocup2020,-1.0
Roboy/autonomous_driving_src,-1.0
teamsoulless/self-driving-car-sim,-1.0
KNR-Selfie/selfie_carolocup2019,-1.0
tareeq-av/tareeqav,-1.0
tokyo-drift/capstone-project,-1.0


In [27]:
difference_betweenness.tail(10)

Unnamed: 0_level_0,betweenness_centrality
repo_name,Unnamed: 1_level_1
joytafty-work/Rimchala_CarND-Vehicle-Detection,-0.880335
waymo-research/waymo-open-dataset,-0.565008
bark-simulator/bark,-0.889223
usdot-fhwa-stol/carma-platform,-0.974684
bmj-autonomous/donkey_ai,-0.653125
BeamNG/BeamNGpy,-0.942123
TeamDoernbach/SDCNanodegreeCapstone,-0.700166
tareeq-av/tareeqav,-1.0
unstoppables/Real-Self-Driving-Car,-0.880789
carla-simulator/leaderboard,-0.594937
