## The purpose of this file
Make summary statistics of flattened temporally aggregated network and save it.<br>The aggregate temporal network is made by projecting a bipartite graph of whole time span on hashtag graph.<br>**Note** that the very last time interval, in which the last interaction occured, is excluded.<br>The resultant network should be the same as the temporally aggregated network (**see section 2**) with all edge weight set to one.<br>The nodes to be removed should be determined in the **section 3**.

In [1]:
import sys
sys.path.append('../')
import networkx as nx
import toolbox as tb
import pandas as pd

In [2]:
tag = "starbucks"
hashtag = "„Çπ„Çø„Éº„Éê„ÉÉ„ÇØ„Çπ"
timespan = "21-29"
file = f"../data/datasets/{tag}/{tag}_{timespan}.pkl"
df = tb.get_dataframe(hashtag, file)

In [3]:
start = "2022-11-22T00:00+09:00"
end = "2022-11-25T23:59+09:00"
timespan = "23-25"
start = pd.to_datetime(start)
end = pd.to_datetime(end)
DF = df[(start <= df.index) & (df.index <= end)]

In [4]:
#DF = df.copy()

In [5]:
tau, snapshots = tb.get_snapshots_closed_intervals(DF, delta='minutes=30')
#tau, snapshots = tb.get_snapshots_closed_intervals(DF, 'hours=1')

In [6]:
G = tb.get_flattened_temporally_aggregated_network(DF, snapshots)

In [7]:
G.remove_node(hashtag)
print(f"Isolates after removal of the searchtag: \n{list(nx.isolates(G))}")
print("These isolate nodes are to be removed.")
G.remove_nodes_from(list(nx.isolates(G)))
assert len(list(nx.isolates(G))) == 0, "There is at least one isolate node left."
print("============================================")
print("The isolate nodes were successfully deleted.")

Isolates after removal of the searchtag: 
['Áî∞Áî∫ÈßÖ', '„Å¶„ÅÉ„ÅÇ„Çâ„Åï„Çì„Å®Áπã„Åå„Çä„Åü„ÅÑ', '„ÇØ„É™„Çπ„Éî„Éº„ÇØ„É™„Éº„É†„Éâ„Éº„Éä„ÉÑ', '„ÅÇ„ÇÜ„ÇÜ„ÅÆ„Ç§„É©„Çπ„Éà', 'Â¶äÊ¥ª‰∏≠', 'Â≤êÈòú„Çø„É≥„É°„É≥Âçä„ÉÅ„É£„É≥„Çª„ÉÉ„Éà', '„ÅÜ„Åï„Åé„Å®ÊöÆ„Çâ„Åô', '‰ΩèÂèãÊûóÊ•≠„ÅÆÂπ≥Â±ã', 'ÁæΩÁî∞Á©∫Ê∏ØÁ¨¨‰∏Ä„Çø„Éº„Éü„Éä„É´', '„Ç´„Ç¶„É≥„Çø„Éº', 'ÊÑõÂÆï„Ç∞„É™„Éº„É≥„Éí„É´„Ç∫', '„Ç≥„Éº„Éí„ÉºÁâõ‰π≥Êàª„Å£„Å¶„Åì„Å™„ÅÑ„Åã„Å™', '„Ç∑„Çß„Ç±„É©„Éº„Éà„Éï„É¨„ÉÉ„Éâ', 'Êú®„ÅÆ„Åã„Åü„Åæ„Çä', 'OL„Åî„ÅØ„Çì', 'rkt_boom', '„ÉÄ„Éº„ÇØ„É¢„Ç´„ÉÅ„ÉÉ„Éó„Éï„É©„Éö„ÉÅ„Éº„Ééüç´\u2061', '„Éê„Éã„É©„Éì„Éº„É≥„É©„ÉÜ', 'ÂêçÂè§Â±ãÂâáÊ≠¶Êñ∞Áî∫2ÈöéÂ∫ó', 'ÂêåÁ™ì‰ºö', 'spiralgirl', 'ÈÅ†„Åè„Å®„ÇÇ‰∏ÄÂ∫¶„ÅØË©£„ÇåÂñÑÂÖâÂØ∫', 'Á•ûÂØæÂøú', '„Çπ„Éà„É≠„Éô„É™„Éº‰Ωï„Å®„Åã', '„Çπ„Çø„Éº„Éê„ÉÉ„ÇØ„Çπ„Éâ„É©„Ç§„Éñ„Çπ„É´„Éº', 'misakiicafe', 'flairespresso']
These isolate nodes are to be removed.
The isolate nodes were successfully deleted.


In [8]:
N = G.number_of_nodes()
L = G.number_of_edges()
braket_k = (2*G.number_of_edges()) / G.number_of_nodes()
braket_C = nx.average_clustering(G)
density = nx.density(G)

In [9]:
fname = f'../data/temporal_network_summary_statistics/{tag}/ssftn_{timespan}_{tau}.pkl'
statistics = pd.DataFrame({r"$N$": N, r"$L$": L, r"<k>":braket_k, r"<C>":braket_C, r"$Density$": density}, index=[hashtag])
statistics.to_pickle(fname)
print(fname)
statistics

../data/temporal_network_summary_statistics/starbucks/ssftn_23-25_189.pkl


Unnamed: 0,$N$,$L$,<k>,<C>,$Density$
„Çπ„Çø„Éº„Éê„ÉÉ„ÇØ„Çπ,17549,303589,34.599008,0.879787,0.001972
