## Calibrating an activity-driven transaction model

Proposed project for an MS student

#### Demonstration

In [2]:
from methods.model import *

In [4]:
# Short run with a few nodes
N = 100
T = 10
# initialize the model
nodes = create_nodes(N, scaling="const", coupling=False)
acts = initialize_activations(nodes,distribution=np.random.exponential)
atts = initialize_attractivities(nodes,monies=100,distribution=np.ones)
bals = initialize_balances(nodes)
# print the output header
header = ["timestamp","source","target","amount","source_bal","target_bal"]
print(",".join(header))
# run the model
for i in range(T):
    transaction = transact(nodes,acts,atts,bals)
    print(",".join([str(transaction[term]) for term in header]))

timestamp,source,target,amount,source_bal,target_bal
0.0003165501596207758,10,96,14.808070452663427,85.19192954733657,114.80807045266343
0.00472731729293401,80,12,18.297368549697175,81.70263145030282,118.29736854969718
0.01815445788554316,99,47,94.68570733447105,5.314292665528953,194.68570733447103
0.019336101176223048,4,90,71.22395292530558,28.776047074694418,171.22395292530558
0.022946027161092522,8,59,53.777851913699614,46.222148086300386,153.7778519136996
0.06408218262230221,14,43,58.44256710290904,41.55743289709096,158.44256710290904
0.10137364812508638,78,80,83.300760105828,16.699239894171996,165.00339155613082
0.1057897608246732,92,17,51.140075022025535,48.859924977974465,151.14007502202554
0.11111289543492064,77,83,77.31520394871517,22.684796051284835,177.31520394871518
0.11693273841326858,7,42,25.05512555568919,74.94487444431081,125.05512555568919


The `transact` function holds the core of the model itself, and it has three components:

`activate` - simulates node activation, given the present time and a node's activity.       
`select` - simulates selection of a target for a transaction, given node attractivities.      
`pay` - simulates a payment, given two nodes and the sender's account balance.     

Transactions are simulated in time order by storing the next node activation for each node in a heap data structure.

### Project outline

* Calibrate the `activate` and `select` elements of the model to real data:
    * Read up on established methods within the literature on "Activity-driven models"
    * e.g. activity/attractivity map onto empirical in/out degree distributions
    * Apply appropriate statistical methods for fitting the relevant paramters
    * Update the code to enable parameter fitting given an empirical dataset
    * If needed, update the model to allow properly fitted initializations

* Develop a method for calibrating the novel `pay` element of the model:
    * The model samples from a "beta distribution" in simulating the weight of a temporal link. This is new, and so the main portion of the project is to develop a suitable method for calibration to real data:
        * Read up on "beta distributions" in the statistics and data science literatures
        * Figure out how to extract the relevant empirical distribution(s)
        * Explore empirical distribution(s) for, e.g., key correlations 
        * Test + compare methods for fitting the paramter(s)
    * Then, update the code and (potentially) the model

* Evaluate the calibrated model
    * How well does the calibrated model match real data?
    * Where does the calibrated model fall short?

* (optional) Pick an improvement + document its impact
    * For use with ISP data, 

### Relevant datasets

* Sarafu CIC 2020-2021         
    `Mattsson, C. E. S., Criscione, T., & Ruddick, W. O. (2022). Sarafu Community Inclusion Currency 2020-2021. Scientific Data, 9(426). https://doi.org/10.1038/s41597-022-01539-4`

* ISP Transactions (may require a specific improvement)             
    `Starnini, M., Tsourakakis, C. E., Zamanipour, M., Panisson, A., Allasia, W., Fornasiero, M., Puma, L. L., Ricci, V., Ronchiadin, S., Ugrinoska, A., Varetto, M., & Moncalvo, D. (2021). Smurf-Based Anti-money Laundering in Time-Evolving Transaction Networks. In Y. Dong, N. Kourtellis, B. Hammer, & J. A. Lozano (Eds.), Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track (pp. 171â€“186). Springer International Publishing. https://doi.org/10.1007/978-3-030-86514-6_11`