## The purpose of this file
See how well the maximum likelihood estimation works by comparing actual values and theoretical values.

The matrix $LP$ containing the Linking Probability of nodes $i$ and $j$ is defined as follows:
$$
LP=
\begin{pmatrix}
0&p_{12}&p_{13}&\dots&p_{1N} \\
p_{21}&0&p_{23}&\dots&p_{2N} \\
p_{31}&p_{32}&0&\dots&p_{3N} \\
\vdots&\vdots&\vdots&\ddots&\vdots \\
p_{N1}&p_{N2}&p_{N3}&\dots&0 \\
\end{pmatrix}
$$
The expected number of total number of interactions $m_{ij}$ between nodes $i$ and $j$ during the whole period $T$ ($\tau$ snapshots) is
$$
\tau LP=
\tau
\begin{pmatrix}
0&p_{12}&p_{13}&\dots&p_{1N} \\
p_{21}&0&p_{23}&\dots&p_{2N} \\
p_{31}&p_{32}&0&\dots&p_{3N} \\
\vdots&\vdots&\vdots&\ddots&\vdots \\
p_{N1}&p_{N2}&p_{N3}&\dots&0 \\
\end{pmatrix}
$$

In [None]:
import sys
sys.path.append('../')
import numpy as np
import pandas as pd
import networkx as nx
import toolbox as tb

In [None]:
tag = "universitylife"
hashtag = "universitylife" # This variable is necessary for data processing in the module 'toolbox'.
timespan = "21-23"
tau = 66
Gpath = f'../data/graph_data/{tag}/modified/{hashtag}_{timespan}_{tau}_mdaam.graphml'
Epath = f'../data/ML_estimate/{tag}/{timespan}_{tau}_krylov.npy'
print(Gpath + '\n' + Epath)

In [None]:
G = nx.read_graphml(Gpath)
aam = nx.to_numpy_array(G)
emptau = np.sum(aam) / 2
N = G.number_of_nodes()

In [None]:
LP = tb.connection_probability(np.load(Epath))
tauLP = np.sum(tau*LP) / 2

In [None]:
emptau, tauLP

In [None]:
fname = f'../data/goodness_of_fit/{tag}_{timespan}_{tau}.pkl'
fit = pd.DataFrame({"hashtag":hashtag, "N":N, "tau":tau, "actual":emptau, "theoretical":tauLP}, index=[0])
fit.to_pickle(fname)
print(fname)
fit