## Prepare data

Get edges data from the previous task

In [None]:
!gdown 1sd5preIDrXtTo6PXjheiD8978kXBDOM5

Downloading...
From: https://drive.google.com/uc?id=1sd5preIDrXtTo6PXjheiD8978kXBDOM5
To: /content/edge_df.csv
  0% 0.00/22.9k [00:00<?, ?B/s]100% 22.9k/22.9k [00:00<00:00, 25.1MB/s]


In [None]:
import pandas as pd
import networkx as nx
from networkx.algorithms import community

Make pandas DataFrame of edges

In [None]:
edge_df = pd.read_csv('edge_df.csv')
edge_df

Unnamed: 0,u1,u2
0,39FT2Ui8KUXwmUt6hnwy-g,79yaBDbLASfIdB-C2c8DzA
1,0FVcoJko1kfZCrJRfssfIA,BDjiEmXljD2ZHT61Iv9rrQ
2,LcCRMIDz1JgshpPGYfLDcA,_VTEyUzzH92X3w-IpGaXVA
3,MtdSCXtmrSxj_uZOJ5ZycQ,tekHDsd0fskYG3tqu4sHQw
4,_VTEyUzzH92X3w-IpGaXVA,bHufZ2OTlC-OUxBDRXxViw
...,...,...
493,DkLSyxogCcJXY5DbTZ-f2A,bSUS0YcvS7UelmHvCzNWBA
494,Nf_Jw_W_CwOz5WJ7ApSMxg,tcWnoX_IfuDmlDl6o6y3_g
495,Nf_Jw_W_CwOz5WJ7ApSMxg,pDNeS1nbkKS7mJmhRQJPig
496,h-ajC_UHD0QAyAzySN6g2A,tcWnoX_IfuDmlDl6o6y3_g


## Task 2

Create a graph with networkx and put data into it

In [None]:
graph = nx.Graph()
graph.add_edges_from(edge_df.values)

In [None]:
graph.number_of_edges()

498

Run Girvan Newman algorithm

In [None]:
group = community.girvan_newman(graph)

Choose the best communities

In [None]:
best_modularity = -1000
best_communities = []
edges_list = []

for communities in group:
    modularity_score = community.modularity(graph, communities)
    if modularity_score > best_modularity: 
        best_modularity = modularity_score
        best_communities = communities

best_modularity

0.6872550442734798

Write result to file

In [None]:
result = str(sorted(sorted(list(sorted(c) for c in best_communities), key = lambda x: x[0]), key=len)).replace('], ','\n').replace('[','').replace(']]','')
with open("task2_result.txt", "w") as file2:
    file2.write(result)
    file2.write('\n'+str(best_modularity))

### Compare the result of task 1 and task 2

Get files result of task 1

In [None]:
!gdown 17R1S3dkSp3nv_9-2Hy4YaNx_8UtR0-Mn
!gdown 1gCS5-i1idojFNw1PE69rRQOH492DsRHp

Downloading...
From: https://drive.google.com/uc?id=17R1S3dkSp3nv_9-2Hy4YaNx_8UtR0-Mn
To: /content/task1_2_result.txt
100% 5.77k/5.77k [00:00<00:00, 5.05MB/s]
Downloading...
From: https://drive.google.com/uc?id=1gCS5-i1idojFNw1PE69rRQOH492DsRHp
To: /content/task1_2_result2.txt
100% 5.77k/5.77k [00:00<00:00, 6.71MB/s]


Compare result of two tasks

In [None]:
with open("task1_2_result.txt","r") as file1, open('task2_result.txt', 'r') as file2:
    same_result = [line1==line2 for line1, line2 in zip(file1, file2)]
    print(same_result)

with open("task1_2_result.txt","r") as file1, open('task2_result.txt', 'r') as file2:
    print('Modulairy in task 1: ', file1.readlines()[-1])
    print('Modulairy in task 2: ', file2.readlines()[-1])

[True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, False]
Modulairy in task 1:  0.6872550442734674
Modulairy in task 2:  0.6872550442734798


- There is no difference in communites between two tasks.
- The modularities are the same (slightly different, it is not even noticeable, just because of the difference formula)

Campare result of two tasks with the modularity used in task1 is the simulation of NetworkX modularity

In [None]:
with open("task1_2_result2.txt","r") as file1, open('task2_result.txt', 'r') as file2:
    same_result = [line1==line2 for line1, line2 in zip(file1, file2)]
    print(same_result)

with open("task1_2_result2.txt","r") as file1, open('task2_result.txt', 'r') as file2:
    print('Modulairy in task 1: ', file1.readlines()[-1])
    print('Modulairy in task 2: ', file2.readlines()[-1])

[True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]
Modulairy in task 1:  0.6872550442734798
Modulairy in task 2:  0.6872550442734798


- Now the result is the same.
- The module we create is the same as the module NetworkX used.