# Overview:

## Random network
sequence = nx.random_powerlaw_tree_sequence(100, tries=5000)

G = nx.configuration_model(sequence)

## Assignment objective: 
- show empirically how close to expectation the real network generated from the White Helmets dataset is compared to a baseline model
- compare the characteristics of the network extracted from the White Helmets dataset described in the first lecture with the expected characteristics of a “regular” network generated using some classical network generation models.
- Choose and justify your choice for at least 5 network metrics
- Choose at least two generative network models to compare against. Justify your choice. 
- Via empirical evaluation, make the case that the network structures from the White Helmets dataset are expected/unexpected (with an attempt to distinguish between organic and coordinated behaviour)

## Dataset
- The dataset has the following format:
<videoID>, <userID_1>, <userID_2>, <timestamp_1>, <timestamp_2>
- where the posts by userID_1 and userID_2 are within 52 seconds of each others and both contain a link to the same videoID. 
- timestamp_1 is the time when userID_1 posted the url to videoID, timestamp_2 is the time when userID_2 posted the same url.

## Networks:

### Reason for comparing real networks with random networks:
Comparing two complete different types of random/baseline networks with the real network to see which structures and properties in the real network are not random.
- Which properties of the real network are statistically significant and which properties are occurring also in random networks
- If a real network, for example, has a much higher clustering than a random network of the same density → indication of community structures or local organization.
- Evaluating the benefits of a real network in comparison to a random network

### Chosen random networks:
- Watts-Strogatz graph/Small world -> low average path length and high Clustering coefficient -> random restructuring of edges
- Barabasi-Albert model -> Scale-free network -> Creating Hub -> More connections/edges

## Metrics:

### Chosen metrics and reasons for choosing theses metrics:
- Clustering Coefficient
    → Measures how likely it is that two neighbors of a node are also connected.
    Rationale: Coordinated campaigns often form tightly connected groups that amplify similar narratives. Random networks usually have low clustering, so higher values indicate non-random coordination.

- Modularity
    → Quantifies the presence of distinct communities within the network.
    Rationale: Influence operations typically involve groups of accounts promoting similar messages. High modularity compared to a random baseline suggests organized, topic-based coordination.

- Degree Assortativity
    → Captures whether nodes of similar degree tend to connect with each other.
    Rationale: Coordinated networks often display unusual degree correlations (e.g., central “hub” accounts linked to many smaller ones). Deviations from randomness reveal hierarchical or controlled connectivity.

- Betweenness Centrality
    → Measures how frequently a node appears on the shortest paths between others.
    Rationale: Cross-platform actors who bridge YouTube, Twitter, and Facebook tend to have high betweenness. This highlights key agents coordinating dissemination across platforms.

- Degree Distribution
    → Describes how many links each node has.
    Rationale: Random networks tend to have homogeneous degree patterns. Real disinformation networks often exhibit heavy-tailed distributions with dominant hubs — evidence of intentional amplification.


