# 第15週 中文社群網路分析

**Author** : 蔡俊宏

**Date created** : 2023/05/14

In [None]:
! pip install pandas
! pip install numpy
! pip install pyvis
! pip install networkx
! pip install IPython

In [2]:
import pandas as pd
import numpy as np
import pyvis
import networkx as nx
import IPython

# 連結雲端資料夾

In [3]:
import os

from google.colab import drive
drive.mount('/content/drive')

os.chdir('/content/drive/MyDrive/31LAB/2024SMA/week15') #切換該目錄
os.listdir() #確認目錄內容

Mounted at /content/drive


['net_func.py',
 'en_person_net.html',
 '.DS_Store',
 '__pycache__',
 'data',
 'pers_netWork.html',
 'pers_eig_netWork.html',
 'pers_page_netWork.html',
 'pers_out_netWork.html',
 'pers_in_netWork.html',
 'pers_bet_netWork.html',
 'max_sub.html',
 'person_net.html',
 '.ipynb_checkpoints',
 'basic_netWork.html',
 'week15_en.ipynb',
 'week15.ipynb']

# 讀取資料
使用與第三週相同的「吃到飽」PTT 美食版(Food)文章來查看發文者與留言者的關係

In [3]:
raw_data = pd.read_csv('./data/zh_buffet_20_22.csv')
raw_data = raw_data[raw_data.artComment != '[]']
raw_data = raw_data.sample(round(raw_data.shape[0]/4), random_state=2024)
print(raw_data.shape)
raw_data.head()

(123, 11)


Unnamed: 0,system_id,artUrl,artTitle,artDate,artPoster,artCatagory,artContent,artComment,e_ip,insertedDate,dataSource
969,970,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,[食記]台北新這一鍋台北ATT大直殿，畫畫很有趣,2021-06-10 20:21:33,minglong1985,Food,餐廳名稱：新這一鍋台北ATT大直殿\r\n 消費時間：2021年/1月\r\n 地址...,"[{""cmtStatus"": ""推"", ""cmtPoster"": ""oc4r"", ""cmtC...",203.75.119.252,2021-06-11 00:16:07,ptt
1395,1396,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,[食記]台中焼肉スマイル一人說走就走燒肉店,2022-05-30 14:03:07,OOOSAMU,Food,餐廳名稱：燒肉Smile 台中逢甲店\r\n消費時間：2022年5月15日\r\n地址：台中...,"[{""cmtStatus"": ""推"", ""cmtPoster"": ""coolliang"", ...",125.230.249.88,2022-05-31 01:12:19,ptt
4,5,https://www.ptt.cc/bbs/Food/M.1578125940.A.9AA...,[食記]長榮皇璽桂冠艙飛機餐SEA-TPE聖誕好運,2020-01-04 08:12:57,Sherlock56,Food,店名：長榮皇璽桂冠艙飛機餐\r\n 地址：無\r\n 電話：無\r\n 營業時間...,"[{""cmtStatus"": ""推"", ""cmtPoster"": ""swatseal"", ""...",121.109.166.209,2020-01-05 00:21:43,ptt
776,777,https://www.ptt.cc/bbs/Food/M.1612754932.A.B7D...,[請益]台北除夕還有得訂的餐廳（中式為佳）,2021-02-08 11:28:50,chichan,Food,"如題,因為計畫改變臨時要找除夕餐廳\r\n小家庭三個人，每人2000之內。\r\n吃到飽或中...","[{""cmtStatus"": ""推"", ""cmtPoster"": ""oc4r"", ""cmtC...",61.216.71.121,2021-02-09 00:19:55,ptt
983,984,https://www.ptt.cc/bbs/Food/M.1624103697.A.14B...,[食記]桃園市外帶歡樂時光的Mr.May義式百匯,2021-06-19 19:54:55,vicky11016,Food,餐廳名稱：Mr. May 義式百匯\r\n 消費時間：2021年/6月\r\n 地址...,"[{""cmtStatus"": ""推"", ""cmtPoster"": ""btcocomo"", ""...",219.71.163.8,2021-06-20 00:12:54,ptt


# 發文者與留言者關係

取得留言者跟狀態（推）

In [4]:
# parse comment
# 處理某篇文章的所有留言（取出留言者與狀態）
def getComtInfo(com):
  cmters,cmt_statuss = [],[]
  com = eval(com)
  for i in com:
    # print(i)
    cmters.append(i['cmtPoster'])
    cmt_statuss.append(i['cmtStatus'])
  return pd.Series([cmters, cmt_statuss])

raw_data[['artComter','artStatus']] = raw_data['artComment'].apply(lambda r: getComtInfo(r))
raw_data.head()

Unnamed: 0,system_id,artUrl,artTitle,artDate,artPoster,artCatagory,artContent,artComment,e_ip,insertedDate,dataSource,artComter,artStatus
969,970,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,[食記]台北新這一鍋台北ATT大直殿，畫畫很有趣,2021-06-10 20:21:33,minglong1985,Food,餐廳名稱：新這一鍋台北ATT大直殿\r\n 消費時間：2021年/1月\r\n 地址...,"[{""cmtStatus"": ""推"", ""cmtPoster"": ""oc4r"", ""cmtC...",203.75.119.252,2021-06-11 00:16:07,ptt,"[oc4r, yvonneeeee, yvonneeeee]","[推, 推, →]"
1395,1396,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,[食記]台中焼肉スマイル一人說走就走燒肉店,2022-05-30 14:03:07,OOOSAMU,Food,餐廳名稱：燒肉Smile 台中逢甲店\r\n消費時間：2022年5月15日\r\n地址：台中...,"[{""cmtStatus"": ""推"", ""cmtPoster"": ""coolliang"", ...",125.230.249.88,2022-05-31 01:12:19,ptt,"[coolliang, CAINPT, CAINPT, bibibobo5566, bibi...","[推, →, →, 推, →, →, 推, 推, →, →, →, 推, →, →, 推, ..."
4,5,https://www.ptt.cc/bbs/Food/M.1578125940.A.9AA...,[食記]長榮皇璽桂冠艙飛機餐SEA-TPE聖誕好運,2020-01-04 08:12:57,Sherlock56,Food,店名：長榮皇璽桂冠艙飛機餐\r\n 地址：無\r\n 電話：無\r\n 營業時間...,"[{""cmtStatus"": ""推"", ""cmtPoster"": ""swatseal"", ""...",121.109.166.209,2020-01-05 00:21:43,ptt,[swatseal],[推]
776,777,https://www.ptt.cc/bbs/Food/M.1612754932.A.B7D...,[請益]台北除夕還有得訂的餐廳（中式為佳）,2021-02-08 11:28:50,chichan,Food,"如題,因為計畫改變臨時要找除夕餐廳\r\n小家庭三個人，每人2000之內。\r\n吃到飽或中...","[{""cmtStatus"": ""推"", ""cmtPoster"": ""oc4r"", ""cmtC...",61.216.71.121,2021-02-09 00:19:55,ptt,"[oc4r, oc4r, save, au0303, winston06, dyc2008,...","[推, →, 推, →, →, →, 推, →]"
983,984,https://www.ptt.cc/bbs/Food/M.1624103697.A.14B...,[食記]桃園市外帶歡樂時光的Mr.May義式百匯,2021-06-19 19:54:55,vicky11016,Food,餐廳名稱：Mr. May 義式百匯\r\n 消費時間：2021年/6月\r\n 地址...,"[{""cmtStatus"": ""推"", ""cmtPoster"": ""btcocomo"", ""...",219.71.163.8,2021-06-20 00:12:54,ptt,[btcocomo],[推]


依據每一筆留言展開

In [5]:
raw_data = raw_data.explode(['artComter','artStatus'])
socail_data = raw_data[['artUrl','artPoster','artComter','artStatus']]
socail_data.head(10)

Unnamed: 0,artUrl,artPoster,artComter,artStatus
969,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,minglong1985,oc4r,推
969,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,minglong1985,yvonneeeee,推
969,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,minglong1985,yvonneeeee,→
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,coolliang,推
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,CAINPT,→
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,CAINPT,→
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,bibibobo5566,推
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,bibibobo5566,→
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,CAINPT,→
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,kopuck,推


查看留言狀態

In [6]:
socail_data.artStatus.unique()

array(['推', '→', '噓'], dtype=object)

建立邊的資料

In [7]:
import random

# 發文者對文章
po_df = socail_data[['artPoster','artUrl']].drop_duplicates().rename(columns = {'artPoster':'src','artUrl':'dis'})

# sample 300 篇文章
random.seed(2024)
sample_url = random.choices(po_df.dis.unique().tolist(),k=300)
po_df = po_df[po_df.dis.isin(sample_url)]

# 留言者對文章，狀態為 weight
re_df = socail_data[['artComter','artUrl','artStatus']].rename(columns = {'artComter':'src','artUrl':'dis','artStatus':'weight'})
re_df = re_df[re_df.dis.isin(sample_url)]
re_df = re_df[~re_df['src'].isna()]
re_df.head()

Unnamed: 0,src,dis,weight
969,oc4r,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,推
969,yvonneeeee,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,推
969,yvonneeeee,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,→
1395,coolliang,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,推
1395,CAINPT,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,→


計算某留言者對文章的總評論分數
- 轉換邊的狀態 -> weight

In [8]:
def convertStatus(s):
  if s == '推':
    return 2
  elif s == '→':
    return 1
  else :
    return -1
re_df['weight'] = re_df['weight'].map(convertStatus)
# 計算某位留言者對某篇文章的總分數
re_df = re_df.groupby(['src','dis']).sum().reset_index()
re_df

Unnamed: 0,src,dis,weight
0,AlphaD,https://www.ptt.cc/bbs/Food/M.1579766074.A.BB5...,2
1,AlphaD,https://www.ptt.cc/bbs/Food/M.1584978030.A.0C4...,4
2,AlphaD,https://www.ptt.cc/bbs/Food/M.1597004755.A.080...,1
3,Andriy6016,https://www.ptt.cc/bbs/Food/M.1639747922.A.680...,2
4,Arutha,https://www.ptt.cc/bbs/Food/M.1587898980.A.623...,7
...,...,...,...
385,yvonneeeee,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,3
386,zeroyaking,https://www.ptt.cc/bbs/Food/M.1618113730.A.DDB...,2
387,zhmmg25,https://www.ptt.cc/bbs/Food/M.1593124291.A.00E...,2
388,zhubaby,https://www.ptt.cc/bbs/Food/M.1660050803.A.92E...,1


In [9]:
# 設定分數低的為紅色
def getColor(w):
  if w>0:
    return 'green'
  else:
    return 'red'
re_df['color'] = re_df.weight.map(getColor)
re_df

Unnamed: 0,src,dis,weight,color
0,AlphaD,https://www.ptt.cc/bbs/Food/M.1579766074.A.BB5...,2,green
1,AlphaD,https://www.ptt.cc/bbs/Food/M.1584978030.A.0C4...,4,green
2,AlphaD,https://www.ptt.cc/bbs/Food/M.1597004755.A.080...,1,green
3,Andriy6016,https://www.ptt.cc/bbs/Food/M.1639747922.A.680...,2,green
4,Arutha,https://www.ptt.cc/bbs/Food/M.1587898980.A.623...,7,green
...,...,...,...,...
385,yvonneeeee,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,3,green
386,zeroyaking,https://www.ptt.cc/bbs/Food/M.1618113730.A.DDB...,2,green
387,zhmmg25,https://www.ptt.cc/bbs/Food/M.1593124291.A.00E...,2,green
388,zhubaby,https://www.ptt.cc/bbs/Food/M.1660050803.A.92E...,1,green


製作網路圖

In [10]:
# 人為綠色節點
# po文為橘色節點

# 建立一個網路圖
netWork = pyvis.network.Network(notebook=True, cdn_resources='in_line',directed=True)
# 所有發文者＋留言者
person = list(set(po_df.src.unique().tolist()+re_df.src.unique().tolist()))
url = po_df.dis.unique().tolist()

# 加入節點（人）
netWork.add_nodes(
    nodes = person,
    value = [1 for i in range(len(person))],
    color = ['#66CDAA' for i in range(len(person))],
    title = person
)
# 加入節點（文章）
netWork.add_nodes(
    nodes = url,
    value = [2 for i in range(len(url))],
    color = ['#FFB366' for i in range(len(url))],
    title = url
)

# 加入邊（發文者 -> 文章）
for i in po_df.to_numpy():
  netWork.add_edge(i[0],i[1],width = 2,color='grey')
# 加入邊（留言者 -> 文章），顏色為某發文者對該文章的總分（>0:綠; <=0:紅）
for i in re_df.to_numpy():
  netWork.add_edge(i[0],i[1],width = 2,color=i[3])

# 設定layout，圖節點之間的斥力
netWork.repulsion()

# netWork.show("./basic_netWork.html")
netWork.save_graph("./basic_netWork.html")
IPython.display.HTML('basic_netWork.html')

# 網友之間的關係

In [11]:
pos_cmt = socail_data.copy()
pos_cmt = pos_cmt[~pos_cmt.artComter.isna()]
pos_cmt.head(10)

Unnamed: 0,artUrl,artPoster,artComter,artStatus
969,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,minglong1985,oc4r,推
969,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,minglong1985,yvonneeeee,推
969,https://www.ptt.cc/bbs/Food/M.1623327695.A.AAB...,minglong1985,yvonneeeee,→
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,coolliang,推
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,CAINPT,→
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,CAINPT,→
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,bibibobo5566,推
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,bibibobo5566,→
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,CAINPT,→
1395,https://www.ptt.cc/bbs/Food/M.1653890591.A.D75...,OOOSAMU,kopuck,推


In [12]:
pos_cmt['score'] = pos_cmt['artStatus'].apply(lambda r: convertStatus(r))

# 某留言者對某發文者的總分
pos_cmt = pos_cmt.groupby(['artComter','artPoster']).sum('score').reset_index()
pos_cmt = pos_cmt[pos_cmt.score>=0]
pos_cmt = pos_cmt[pos_cmt.artComter != pos_cmt.artPoster]
# pos_cmt.score = 1
pos_cmt

Unnamed: 0,artComter,artPoster,score
1,AlphaD,NouTsan,4
2,AlphaD,gillianshine,2
3,Andriy6016,michaelsc,2
4,Arutha,NouTsan,7
5,Augustus5,Sherlock56,2
...,...,...,...
393,yvonneeeee,minglong1985,3
394,zeroyaking,reesion,2
395,zhmmg25,roger31311,2
396,zhubaby,HIKARU5,1


In [13]:
mat = pd.pivot_table(pos_cmt,index = 'artComter', columns = 'artPoster' ,values='score').fillna(0)
mat

artPoster,AlphaD,Ashaku,EachGone,FinalFayjais,Guyinkt,HIKARU5,ILB1800600,IkarusWillie,JeremyKSKGA,JimXpp,...,traveler0,treebeard,vhygdih,vicky11016,windsora,wingwn,yajuyeh,z24518261,zine1215,zizou
artComter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AlphaD,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Andriy6016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Arutha,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Augustus5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B10057090,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
yvonneeeee,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
zeroyaking,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
zhmmg25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
zhubaby,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


計算網友之間的分數矩陣
- mat: 有方向性，A -> B 分數與 B -> A 分數不一定一樣。
- mat_s: 無方向性（互動總分），A -> B 分數 +  B -> A 分數

In [14]:
# 所有留言者與發文者
pers = np.unique(pos_cmt[['artComter', 'artPoster']])

# 建立評分矩陣（留言者對發文者分數）
# 取得所有人對於其他人的分數（有方向性）
# 矩陣中的值 mat[i][j] 代表使用者 pers[i]（留言者）對使用者 pers[j]（發文者）的評分，
    # 留言者和發文者之間的關係是有方向的（即，mat[i][j] 不一定等於 mat[j][i]）
mat = pd.pivot_table(pos_cmt,index = 'artComter', columns = 'artPoster' ,values='score' ).fillna(0)\
  .reindex(columns=pers, index=pers, fill_value=0).to_numpy()
print(mat.shape)

# 取得所有人與其他人互動分數總和（無方向，兩個方向分數相加）
# 矩陣中的每一對元素都對稱，即 mat_s[i][j] 等於 mat_s[j][i]
# 對稱矩陣中的值 mat_s[i][j] 表示使用者 pers[i] 與使用者 pers[j] 之間的互動總分。
    # 透過將 pers[i] 對 pers[j] 的評分和 pers[j] 對 pers[i] 的評分相加而得到的。
# np.tril(mat, -1): 生成一個下三角矩陣（包括主對角線下方的所有元素，但不包括主對角線上的元素）
# np.triu(mat, 1): 生成一個上三角矩陣（包括主對角線上方的所有元素，但不包括主對角線上的元素）
tri = (np.tril(mat,-1).T + np.triu(mat,1))
mat_s = tri+tri.T # 上三角和下三角都包括在內，但對角線是 0
mat_s


(377, 377)


array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

matPresentGraph()：用來將矩陣轉換為視覺化的網絡圖
- 利用 pyvis 來繪製網絡圖，並設定節點和邊的顏色。

In [15]:
# 產生隨機的顏色，畫圖使用
def random_color():
  r = lambda: random.randint(0,255)
  return '#%02X%02X%02X' % (r(),r(),r())

def matPresentGraph(mat:np.array, node_id:list, node_type:list=None, node_value:list=None, directed=True, edge_color=None) -> pyvis.network.Network:
  # 有方向或無方向
  if directed != True:
    # 矩陣對稱
    if (mat == mat.transpose()).all():
      print('matrix is Symmetric')
    # 矩陣不對稱
    else:
      print('matrix is not Symmetric')

  # 設定節點的類別及顏色
  # 沒有分節點類別
  if node_type == None:
    c = random_color()
    node_colors = [c for i in range(len(node_id))]
    node_type = [" " for i in range(len(node_id))]
  # 有分節點類別
  else:
    node_color_map = {}
    for i in set(node_type):
      # 幫每個類別產生一個顏色
      while True:
        c = random_color()
        if c not in node_color_map.values():
          break
      node_color_map[i] = c
    node_colors = [node_color_map[i] for i in node_type]
  # print(node_colors)

  # 如果沒有給邊的顏色，隨機產生一個
  if edge_color == None:
    edge_color = random_color()

  # 如果沒有給 node 值，就都分配 1
  if node_value == None:
    node_value = [1 for i in range(len(node_id))]

  # 建立圖
  net = pyvis.network.Network(notebook=True, directed = directed, cdn_resources='in_line')

  titles_list = []
  for i ,j in zip(node_id,node_type):
    titles_list.append(str(i)+":"+str(j))

  net.add_nodes(
      nodes = node_id,
      value = node_value,
      label = node_id,
      title = titles_list,
      color = node_colors
  )

  for row in range(len(node_id)):
    for col in range(len(node_id)):
      if mat[row][col]>0.:
        net.add_edge(
            node_id[row],node_id[col],width = mat[row][col],color = edge_color,title = mat[row][col]
        )
  net.repulsion()
  return net


設定網友的節點類型（同時為發文、留言者 / 發文者 / 留言者）

In [16]:
# 定義發文者（po）、評論者(cmt)、兩者兼具(both)
node_type = []

cmt_list = pos_cmt['artComter'].unique().tolist()
po_list = pos_cmt['artPoster'].unique().tolist()
both_list = list(set(cmt_list) & set(po_list))
for p in pers:
  if p in both_list:
    node_type.append('both')

  elif p in cmt_list:
    node_type.append('cmt')

  elif p in po_list:
    node_type.append('po')


In [17]:
net = matPresentGraph(mat=mat, node_id=pers, node_type=node_type)
net.save_graph("./pers_netWork.html")
IPython.display.HTML('pers_netWork.html')

# Measures on node
- 目的：找出影響力大的節點（人）
- 方法：
    1. Eigenvector
    2. PageRank
    3. Hitss Score
    4. betweenness centrality

## 計算 eigenvector centrality （無向圖）
- 概念：與你連接的人越重要，你也越重要(有影響力)
- 連到的節點重要性較高，也會貢獻較大的重要性

特徵向量中心性是一種衡量節點在網絡中重要性的方法，它基於連接節點的重要性計算每個節點的中心性值。經過歸一化處理後，這些中心性值可以更方便地進行比較和可視化。

In [18]:
# nx.Graph(mat_s): 將對稱矩陣 mat_s 轉換為 NetworkX 的無向圖對象
# nx.eigenvector_centrality: 計算圖中每個節點的特徵向量中心性
# max_iter=10000: 演算法的最大迭代次數，以確保演算法收斂。 特徵向量中心性的計算是一個迭代過程，可能需要多次迭代才能達到穩定狀態。
eigenvec = np.array(list(nx.eigenvector_centrality(nx.Graph(mat_s),max_iter = 10000).values()))

# 將特徵向量中心性值歸一化到 [0, 1] 範圍內
eigenvec = (eigenvec-np.min(eigenvec))/(np.max(eigenvec)-np.min(eigenvec))

In [19]:
q = np.quantile(eigenvec,[.2,.4,.6,.8])
q

array([4.85293489e-05, 3.33435065e-04, 2.18341652e-03, 1.58488730e-02])

In [20]:
node_value = []


for i in eigenvec:
  # 越重要分數越高
  if i > q[3]:
    node_value.append(25)
  elif i >q[2]:
    node_value.append(20)
  elif i>q[1]:
    node_value.append(15)
  elif i>q[0]:
    node_value.append(10)
  else:
    node_value.append(5)
net = matPresentGraph(mat = mat_s,node_id = pers,node_type = node_type,node_value = node_value,directed=False)
net.save_graph("./pers_eig_netWork.html")
IPython.display.HTML('pers_eig_netWork.html')


matrix is Symmetric


我們可以發現有評論的文章同時有多人評論的，重要程度愈高（節點越大）

## PageRank (有向圖)
- 在一個有向圖中，PageRank 值高的節點通常是那些被很多其他節點連結到的節點，或是被一些重要節點連結到的節點

In [21]:
# Digraph 設定有向圖
# nx.DiGraph(mat): 使用 NetworkX 的 DiGraph 類別將矩陣 mat 轉換為有向圖物件。
# nx.pagerank(): 計算圖中每個節點的 PageRank 值
pagerank = np.array(list(nx.pagerank(nx.DiGraph(mat)).values()))
# pagerank = (pagerank-np.min(pagerank))/(np.max(pagerank)-np.min(pagerank))
pagerank

array([0.0018235 , 0.00155862, 0.00155862, 0.00685613, 0.00155862,
       0.00155862, 0.00155862, 0.00155862, 0.00155862, 0.00155862,
       0.00155862, 0.00155862, 0.00354519, 0.00155862, 0.00155862,
       0.00222081, 0.00155862, 0.00155862, 0.00155862, 0.00486956,
       0.00155862, 0.00990377, 0.00155862, 0.00597321, 0.002883  ,
       0.00155862, 0.00155862, 0.01822429, 0.002883  , 0.00155862,
       0.00155862, 0.00155862, 0.00155862, 0.00155862, 0.00155862,
       0.00155862, 0.002883  , 0.00155862, 0.00155862, 0.00155862,
       0.00155862, 0.00155862, 0.00155862, 0.00934604, 0.00155862,
       0.01375393, 0.00155862, 0.00155862, 0.00155862, 0.002883  ,
       0.00155862, 0.02160086, 0.01122657, 0.00155862, 0.00155862,
       0.00155862, 0.00155862, 0.00307219, 0.00155862, 0.00155862,
       0.00155862, 0.00155862, 0.00619394, 0.00155862, 0.00155862,
       0.00155862, 0.00155862, 0.00155862, 0.01756735, 0.00155862,
       0.00155862, 0.00672197, 0.00155862, 0.00155862, 0.00155

In [22]:
net = matPresentGraph(mat = mat,node_id = pers,node_type = node_type,node_value=(pagerank*1000).tolist())
net.save_graph("./pers_page_netWork.html")
IPython.display.HTML('pers_page_netWork.html')

可以看出indegree 較多，重要程度愈高

## Hits score （有向圖）
- 網頁重要性指標
    - index page (索引網頁，可以連結到其他網頁)：Hub Score 越高代表連到的 content page 都是 high authority score
    - content page (內容網頁)：authority score 越高代表越多 high hub score 的 index page 指向他

In [23]:
# nx.hits 計算圖中每個節點的 Hub 和 Authority 值。 HITS 演算法將每個節點分為兩種角色：
    # Hub： 一個好的 Hub 節點連結到許多 Authority 節點。
    # Authority： 一個好的 Authority 節點被許多 Hub 節點連結到。

# out_：每個節點的 Hub 值。
# in_：每個節點的 Authority 值。

out_,in_ = nx.hits(nx.DiGraph(mat),max_iter=100)

In [24]:
out_ = np.array(list(out_.values()))
out_q = np.quantile(out_,[.2,.4,.6,.8])
out_q

array([0.00000000e+00, 1.43109016e-17, 4.17194045e-05, 9.14717040e-04])

In [25]:
in_ = np.array(list(in_.values()))
in_q = np.quantile(in_,[.2,.4,.6,.8])
in_q

array([-2.15383704e-18,  6.36626143e-20,  2.06174091e-18,  5.53944209e-18])

In [26]:
node_value = []

# 根據 Hub 值設定節點大小
for i in out_:
  if i > out_q[3]:
    node_value.append(25)
  elif i >out_q[2]:
    node_value.append(20)
  elif i>out_q[1]:
    node_value.append(15)
  elif i>out_q[0]:
    node_value.append(10)
  else:
    node_value.append(5)

net = matPresentGraph(mat = mat,node_id = pers,node_type = node_type,node_value=node_value)
net.save_graph("./pers_out_netWork.html")
IPython.display.HTML('pers_out_netWork.html')

可以發現連結到越重要的發文者（許多人也同時對他評論），自己也越重要

In [27]:
node_value = []

# 根據 Authority 值設定節點大小
for i in in_:
  if i > in_q[3]:
    node_value.append(25)
  elif i >in_q[2]:
    node_value.append(20)
  elif i>in_q[1]:
    node_value.append(15)
  elif i>in_q[0]:
    node_value.append(10)
  else:
    node_value.append(5)

net = matPresentGraph(mat = mat,node_id = pers,node_type = node_type,node_value=node_value)
net.save_graph("./pers_in_netWork.html")
IPython.display.HTML('pers_in_netWork.html')

可以發現 indegree 的多寡會決定其重要程度

## betweenness centrality
- 中介中心性: 成為任兩節點之間的最短路徑次數
- 概念：連結到越多社群網路的節點，重要性越高

<img src="https://dist.neo4j.com/wp-content/uploads/20190201101243/betweenness-centrality-visualization-7.jpg" width="35%">

In [28]:
bet = np.array(list(nx.betweenness_centrality(nx.Graph(mat_s)).values()))
bet = (bet-np.min(bet))/(np.max(bet)-np.min(bet))
bet_q = np.quantile(bet,[.2,.4,.6,.8])

In [29]:
node_value = []
for i in bet:
  if i > bet_q[3]:
    node_value.append(25)
  elif i > bet_q[2]:
    node_value.append(20)
  elif i>bet_q[1]:
    node_value.append(15)
  elif i>bet_q[0]:
    node_value.append(10)
  else:
    node_value.append(5)

net = matPresentGraph(mat = mat_s,node_id = pers,node_type = node_type,node_value=node_value ,directed=False)
net.save_graph("./pers_bet_netWork.html")
IPython.display.HTML('pers_bet_netWork.html')

matrix is Symmetric


可以看到與多個社群相連的人，越重要

# Measures on graph

Transitivity/Density/Distance/Diameter/Clustering

計算最大的subgraph 的 measure

In [30]:
# 計算max subgraph
G = nx.Graph(mat_s)
G_sub = sorted(nx.connected_components(G), key=len, reverse=True)
G_max_sub = G.subgraph(G_sub[0])
# # 重新定義 mat
sub_mat = nx.adjacency_matrix(G_max_sub).todense()
node_idx = list(G_max_sub.nodes)
sub_pers = pers[node_idx]


畫出最大subgraph

In [31]:
net = matPresentGraph(mat=sub_mat,node_id=sub_pers,directed=False)
net.save_graph("./max_sub.html")
IPython.display.HTML('max_sub.html')

matrix is Symmetric


計算 transitivity
- transitivity 是衡量一個圖 (graph) 中閉合三角形比例的指標。在社交網絡或其他連結圖中，它反映了節點之間形成緊密社群的程度。


In [33]:
nx.transitivity(nx.Graph(sub_mat))

0.0018714909544603868

計算 density （實際 edge 數/最大 edge 數）

In [32]:
nx.density(nx.Graph(sub_mat))

0.007647459680472477

計算 distance（平均每兩個節點的 shortest path 長度）

In [35]:
nx.average_shortest_path_length(nx.Graph(sub_mat))

6.345473360086823

計算diameter（最長 shortest path 長度）

In [36]:
nx.diameter(nx.Graph(sub_mat))

15

計算 clustering

In [37]:
nx.average_clustering(nx.Graph(sub_mat))

0.003629378895336342