# IV Comment Network Modelling




## Table of Contents

1. [Loading the Data and Necessary Libraries](#loading-dependencies)
2. [Link and Weight Calculation](#link-and-weight)
3. [Load or Train Model ](#Gensim-Model)
4. [Calculate Vector representations and Save results](#Vectors)


## Loading Data and Libraries 
<a class="anchor" id="loading-dependencies"></a>

In [1]:
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
from tqdm import tqdm

pd.options.mode.chained_assignment = None 

df_c = pd.read_parquet('Comments.parquet')                          
Sentiments = pd.read_parquet('sentiments.parquet')    
Explicit_links = pd.read_parquet('explicit_links.parquet')     
Implicit_links = pd.read_parquet('implicit_links.parquet')  

## Link and Weight Calculation
<a class="anchor" id="link-and-weight"></a>

By merging the previously calculated sentiment and similarity scores, the link type between comments as well as the link weight are computed.

The link type is calculated according to Definition 2 for implicit and explicit connections:

- **Negative links:** If a comment does not have the same sentiment orientation.  
- **Positive links:** If a comment has the same sentiment orientation.


In [2]:
all_links = pd.concat([Explicit_links, Implicit_links]) 

merged_a = all_links.merge(Sentiments[['commentID','polarity_scores']], left_on='commentID_a', right_on='commentID', how='inner')[['articleID','commentID_a','commentID_b','similarities','polarity_scores']]
merged_a = merged_a.rename(columns={'polarity_scores': 'polarity_score_a'})

merged_b = all_links.merge(Sentiments[['commentID','polarity_scores']], left_on='commentID_b', right_on='commentID',  how='inner')[['commentID_a','commentID_b','similarities','polarity_scores','approveDate_a','createDate_b']] 
merged_b = merged_b.rename(columns={'polarity_scores': 'polarity_score_b'})

merged_all = pd.merge(merged_a, merged_b, on=['commentID_a','commentID_b','similarities'])

conditions = [
    (merged_all['polarity_score_a'] >= 0.05) & (merged_all['polarity_score_b'] >= 0.05),   # positive_link
    (merged_all['polarity_score_a'] <= -0.05) & (merged_all['polarity_score_b'] <= -0.05), # positive_link
    ((merged_all['polarity_score_a'] >= -0.05) & (merged_all['polarity_score_a'] <= 0.05)) &
    ((merged_all['polarity_score_b'] >= -0.05) & (merged_all['polarity_score_b'] <= 0.05))  # positive_link
    ]

values = [1, 1, 1] 

merged_all["link_type"] = np.select(conditions, values, default= -1)
merged_all['weight'] =  merged_all["link_type"] * merged_all["similarities"]

merged_all.to_parquet('all_links.parquet')
merged_all.head()

Unnamed: 0,articleID,commentID_a,commentID_b,similarities,polarity_score_a,polarity_score_b,approveDate_a,createDate_b,link_type,weight
0,nyt://article/69a7090b-9f36-569e-b5ab-b0ba5bb3...,104387472,104387873,0.429301,-0.705669,0.552231,2020-01-01 01:05:47,2020-01-01 01:52:25,-1,-0.429301
1,nyt://article/69a7090b-9f36-569e-b5ab-b0ba5bb3...,104387472,104387976,0.440271,-0.705669,-0.286687,2020-01-01 01:05:47,2020-01-01 02:06:05,1,0.440271
2,nyt://article/69a7090b-9f36-569e-b5ab-b0ba5bb3...,104387873,104390628,0.188239,0.552231,-0.592152,2020-01-01 01:52:26,2020-01-01 14:38:50,-1,-0.188239
3,nyt://article/69a7090b-9f36-569e-b5ab-b0ba5bb3...,104390628,104391463,0.403543,-0.592152,0.09277,2020-01-01 14:38:52,2020-01-01 16:23:14,-1,-0.403543
4,nyt://article/69a7090b-9f36-569e-b5ab-b0ba5bb3...,104391463,104392390,0.365926,0.09277,-0.295654,2020-01-01 16:23:15,2020-01-01 18:22:09,-1,-0.365926


## Comment network Bulding and Opinon Rank Score Calulation
<a class="anchor" id="network-bulding"></a>

The Graph comment network is constructed for each article.
While the graph is in memory, the Opinion Rank Score calculations (PageRank) are performed,  
resulting in a new table that consists of the Opinion Rank Score sum for each user present in each article.

In [3]:
def damping_function(approve_date, create_date, D=0.85, K= 2):
    """ Calculate the damping function based on time difference. """
    delta_t = abs(create_date - approve_date).total_seconds()
    #print(delta_t)
    return np.power(D, ((abs(delta_t) * K)/(3600 *24)))

Opinion_rank_score_user_sum = pd.DataFrame({'articleID': [], 'userID': [], 'Opinion_rank_score_sum': []})

for article in tqdm(df_c.articleID.unique(), desc="Opinon Rank Score Calulation"):
    df = merged_all[merged_all.articleID == article]
    df = df.dropna()
    G = nx.from_pandas_edgelist(df, 'commentID_a', 'commentID_b',
                                 edge_attr=['weight', 'approveDate_a', 'createDate_b'], create_using=nx.DiGraph())
    G = G.reverse(copy=True)#<---------------------------------------------------------------------------------------Important do not remove else we calc the inverse of influence
    
    for u, v, d in G.edges(data=True):
        approve_date = pd.to_datetime(d['approveDate_a'], pd.Timestamp(0))
        create_date  = pd.to_datetime(d['createDate_b'], pd.Timestamp(0))
        d['time_decay_weight'] = abs(d['weight']) * damping_function(approve_date, create_date)

    Opinion_rank_score = nx.pagerank(G, alpha=0.85, weight='time_decay_weight', max_iter=2000, tol=1e-6)

    df_article = df_c[df_c.articleID == article]
    
    df_article['Opinion_rank_score'] = df_article['commentID'].map(Opinion_rank_score)
    df_article['Opinion_rank_score'] = df_article['Opinion_rank_score'].fillna(0)
    
    df_article = df_article.groupby(['articleID', 'userID'], as_index=False).agg({'Opinion_rank_score': 'sum'})
    df_article = df_article.rename(columns={'Opinion_rank_score': 'Opinion_rank_score_sum'})
    Opinion_rank_score_user_sum = pd.concat([Opinion_rank_score_user_sum, df_article], ignore_index=True)

Opinion_rank_score_user_sum.to_parquet('Opinion_rank_scores.parquet')
Opinion_rank_score_user_sum.head()

Opinon Rank Score Calulation: 100%|████████████████████████████████████████████| 16787/16787 [3:09:56<00:00,  1.47it/s]


Unnamed: 0,articleID,userID,Opinion_rank_score_sum
0,nyt://article/69a7090b-9f36-569e-b5ab-b0ba5bb3...,288194.0,0.003875
1,nyt://article/69a7090b-9f36-569e-b5ab-b0ba5bb3...,788867.0,0.007168
2,nyt://article/69a7090b-9f36-569e-b5ab-b0ba5bb3...,1454164.0,0.01521
3,nyt://article/69a7090b-9f36-569e-b5ab-b0ba5bb3...,1948827.0,0.005436
4,nyt://article/69a7090b-9f36-569e-b5ab-b0ba5bb3...,3017703.0,0.0
