White Helmets
<h2>Prepare data</h2>

To get a better understanding of our data you asked us to create 3 new datasets from the original ones by randomizing certain features. This technique is called Null Models, we've talked about it for P3, and it was talked about in the paper "The structure and function of antagonistic ties in village social networks" by Amir Ghasemian and Nicholas A. Christakis.
The Null Models are to identify what properties of the real network are significant and not due to hazard. Since the null models are generated by randomizing a feature but keeping the rest intact, some properties of the original dataset might disapear. So these are the significant properties that will help us understand what are the rules of our real network(hence, not an organic network but one constructeed by the propaganda machine of the WH).

<h2>Extract network </h2>
  
So for both twitter and facebook datasets, we now have 4 datasets. From these we extract the same type of graph that for the first assignement, nodes are users and there is an edge present is they interacted on the same video with a time interval of less than 52 seconds. Here we have a tuning parameter for the network, the time delta we're using. This 52 seconds rule suppose it's less than the time a human needs to see that there is a new post, whatch the video, and THEN also interact with it. So 52s is a good threshold to detect bot activities, that would automatically repost the video for instance. 
But if we're facing human coordinated activities, maybe that this threshold is too short to detect all the potential activity. If we increase it, we face the risk of including real human interaction. So there is a trade off here that can be used for the analysis. 

Twitter network density with 52s rule : 0.000021
Facebook network density with 52s rule : 0.000651

Both seems to me really low.

<h2>Compare networks</h2>

To compare these networks I will use different metrics:
- number of edges and density : here indicates how synchronised the actions are
- clustering coefficient : Measures how likely it is that two neighbors of a node are also connected. Coordinated campaigns often form tightly connected groups that amplify similar narratives. Random networks usually have low clustering, so higher values indicate non-random coordination. 
- modularity : Quantifies the presence of distinct communities within the network. Influence operations typically involve groups of accounts promoting similar messages. High modularity compared to a random baseline suggests organized, topic-based coordination.
- betweenness centrality : Measures how frequently a node appears on the shortest paths between others. This highlights key agents coordinating dissemination across platforms.



In [None]:
import pandas as pd
import networkx as nx
from datetime import timedelta
import matplotlib.pyplot as plt
import community as community_louvain

def extract_network(df, time_threshold=180):

    G = nx.Graph()
    all_users = df['nodeUserID'].unique()
    G.add_nodes_from(all_users)
    df['nodeTime'] = pd.to_datetime(df['nodeTime'])
    grouped = df.groupby('videoID')
    edge_list = []
    
    for video_id, group in grouped:
        if len(group) < 2:
            continue
            
        group = group.sort_values('nodeTime')
        
        users = group['nodeUserID'].tolist()
        times = group['nodeTime'].tolist()
        n = len(users)

        for i in range(n):
            for j in range(i + 1, n):
                delta = (times[j] - times[i]).total_seconds()
                if delta <= time_threshold:
                    if users[i] != users[j]:
                        edge_list.append((users[i], users[j]))
                else:
                    break
    G.add_edges_from(edge_list)
    return G

def plot_network_overview(G, title):
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    pos_spring = nx.spring_layout(G, seed=42)
    nx.draw(G, pos_spring, ax=axes[0], node_size=20, node_color='blue', 
            edge_color='gray', alpha=0.7)
    axes[0].set_title(f"{title}\n(Spring Layout)")
    axes[0].axis('off')
    
    pos_circular = nx.circular_layout(G)
    nx.draw(G, pos_circular, ax=axes[1], node_size=20, node_color='blue', 
            edge_color='gray', alpha=0.7)
    axes[1].set_title(f"{title}\n(Circular Layout)")
    axes[1].axis('off')
    
    plt.tight_layout()
    plt.show()

import numpy as np

def analyze_network_metrics(G, name="Réseau"):
    
    num_nodes = G.number_of_nodes()
    num_edges = G.number_of_edges()

    # Density
    density = nx.density(G)

    # Clustering Coeff
    avg_clustering = nx.average_clustering(G)
    
    # Modularity (Louvain)
    try:
        partition = community_louvain.best_partition(G)
        modularity = community_louvain.modularity(partition, G)
        num_communities = len(set(partition.values()))
    except ValueError:
        modularity = 0.0
        num_communities = 0

    # Betweenness Centrality
    bet_centrality = nx.betweenness_centrality(G, weight=None, normalized=True)
    avg_betweenness = np.mean(list(bet_centrality.values()))
    max_betweenness = max(bet_centrality.values())

    return {
        "Network": name,
        "Nodes": num_nodes,
        "Edges": num_edges,
        "Density": round(density, 6),
        "Clustering Coeff": round(avg_clustering, 4),
        "Modularity": round(modularity, 4),
        "Communities": num_communities,
        "Avg Betweenness": round(avg_betweenness, 5),
        "Max Betweenness": round(max_betweenness, 5)
    }

if __name__ == "__main__":
   
    print("Chargement des données...")
    
    df_twitter = pd.read_csv('original_data/twitter_cross_platform.csv', sep=',') 
    df_twitter_time = pd.read_csv('data/twitter/random_time.csv', sep=',')
    df_twitter_video = pd.read_csv('data/twitter/random_video.csv', sep=',')
    df_twitter_time_video = pd.read_csv('data/twitter/random_video&time.csv', sep=',')
    
    G_twitter = extract_network(df_twitter)
    G_twitter_time = extract_network(df_twitter_time)
    G_twitter_video = extract_network(df_twitter_video)
    G_twitter_time_video = extract_network(df_twitter_time_video)

    twitter_networks = {
        "Original": G_twitter,
        "Random Time": G_twitter_time,
        "Random Video": G_twitter_video,
        "Random Time & Video": G_twitter_time_video
    }

    df_facebook = pd.read_csv('original_data/facebook_cross_platform.csv', sep=',')
    df_facebook_time = pd.read_csv('data/facebook/random_time.csv', sep=',')
    df_facebook_video = pd.read_csv('data/facebook/random_video.csv', sep=',')
    df_facebook_time_video = pd.read_csv('data/facebook/random_video&time.csv', sep=',')

    G_facebook = extract_network(df_facebook)
    G_facebook_time = extract_network(df_facebook_time)
    G_facebook_video = extract_network(df_facebook_video)
    G_facebook_time_video = extract_network(df_facebook_time_video)

    facebook_networks = {
        "Original": G_facebook,
        "Random Time": G_facebook_time,
        "Random Video": G_facebook_video,
        "Random Time & Video": G_facebook_time_video
    }
    
    allmetrics = []
    
    for name, G in twitter_networks.items():
        #print(f"Twitter network {name}: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")
        # plot_network_overview(G, f"Twitter Network - {name}")
        allmetrics.append(analyze_network_metrics(G, name=f"Twitter - {name}"))

    for name, G in facebook_networks.items():
        #print(f"Facebook network {name}: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")
        # plot_network_overview(G, f"Facebook Network - {name}")
        allmetrics.append(analyze_network_metrics(G, name=f"Facebook - {name}"))

    df_metrics = pd.DataFrame(allmetrics)
    columns_order = ["Network", "Nodes", "Edges", "Modularity", "Clustering Coeff", "Communities", "Density", "Avg Betweenness", "Max Betweenness"]
    df_metrics = df_metrics[columns_order]
    
    print(df_metrics.to_string(index=False))



Chargement des données...
                       Network  Nodes  Edges  Modularity  Clustering Coeff  Communities  Density  Avg Betweenness  Max Betweenness
            Twitter - Original   4947    805      0.9839            0.0359         4259 0.000066          0.00000          0.00010
         Twitter - Random Time   4947    235      0.9287            0.0072         4747 0.000019          0.00000          0.00015
        Twitter - Random Video   4947    219      0.9799            0.0026         4734 0.000018          0.00000          0.00002
 Twitter - Random Time & Video   4947    738      0.7772            0.0307         4418 0.000060          0.00000          0.00586
           Facebook - Original    684    294      0.9117            0.1376          540 0.001259          0.00001          0.00089
        Facebook - Random Time    684     11      0.8595            0.0044          674 0.000047          0.00000          0.00000
       Facebook - Random Video    684     11      0.8595 

<h2>Results</h2>

With 52 seconds threshold (bot activity):

| Network                        | Nodes | Edges | Modularity | Clustering Coeff | Communities   | Density  | Avg Betweenness | Max Betweenness |
|--------------------------------|-------|-------|------------|------------------|---------------|----------|-----------------|-----------------|
| Twitter - Original             | 4947  | 253   | 0.9895     | 0.0042           | 4702 0.000021 | 0.000021 | 0.0             | 0.00000         |
| Twitter - Random Time          | 4947  | 93    | 0.9529     | 0.0021           | 4858 0.000008 | 0.000008 | 0.0             | 0.00001         |
| Twitter - Random Video         | 4947  | 72    | 0.9695     | 0.0000           | 4875 0.000006 | 0.000006 | 0.0             | 0.00000         |
| Twitter - Random Time & Video  | 4947  | 249   | 0.8402     | 0.0056           | 4722 0.000020 | 0.000020 | 0.0             | 0.00055         |
| Facebook - Original            | 684   | 152   | 0.9024     | 0.0740           | 593 0.000651  | 0.000651 | 0.0             | 0.00046         |
| Facebook - Random Time         | 684   | 10    | 0.8400     | 0.0044           | 675 0.000043  | 0.000043 | 0.0             | 0.00000         |
| Facebook - Random Video        | 684   | 3     | 0.6667     | 0.0000           | 681 0.000013  | 0.000013 | 0.0             | 0.00000         |
| Facebook - Random Time & Video | 684   | 4     | 0.6250     | 0.0000           | 680 0.000017  | 0.000017 | 0.0             | 0.00000         |


With a 3 minutes threshold (organized humans):

| Network                        | Nodes | Edges | Modularity | Clustering Coeff | Communities   | Density  | Avg Betweenness | Max Betweenness |
|--------------------------------|-------|-------|------------|------------------|---------------|----------|-----------------|-----------------|
| Twitter - Original             | 4947  | 805   | 0.9839     | 0.0359           | 4259 0.000066 | 0.000066 | 0.00000         | 0.00010         |
| Twitter - Random Time          | 4947  | 235   | 0.9287     | 0.0072           | 4747 0.000019 | 0.000019 | 0.00000         | 0.00015         |
| Twitter - Random Video         | 4947  | 219   | 0.9799     | 0.0026           | 4734 0.000018 | 0.000018 | 0.00000         | 0.00002         |
| Twitter - Random Time & Video  | 4947  | 738   | 0.7772     | 0.0307           | 4418 0.000060 | 0.000060 | 0.00000         | 0.00586         |
| Facebook - Original            | 684   | 294   | 0.9117     | 0.1376           | 540 0.001259  | 0.001259 | 0.00001         | 0.00089         |
| Facebook - Random Time         | 684   | 11    | 0.8595     | 0.0044           | 674 0.000047  | 0.000047 | 0.00000         | 0.00000         |
| Facebook - Random Video        | 684   | 11    | 0.8595     | 0.0044           | 674 0.000047  | 0.000047 | 0.00000         | 0.00000         |
| Facebook - Random Time & Video | 684   | 9     | 0.8642     | 0.0000           | 675 0.000039  | 0.000039 | 0.00000         | 0.00000         |


<h2>Analysis</h2>

Analysis of the Twitter network reveals coordination hidden behind significant background noise. At the microscopic scale of bots (52-second threshold), although the number of edges in the original network (253) is close to the total chaos model ‘Random Time & Video’ (249), the qualitative structure differs radically: the clustering coefficient of the original network (0.0042) is twice that of the random time model (0.0021), signalling a desire to form closed triangles. This trend increases dramatically when the window is widened to 3 minutes to include organised human coordination: the original network explodes to 805 edges compared to only 235 for random time, and clustering climbs to 0.0359, five times more than random (0.0072). This persistence of extreme modularity (>0.98) combined with structural densification when the time constraint is relaxed proves the existence of distinct militant cells operating in a synchronised manner, a signature that cannot be achieved by simple organic virality.

The Facebook case provides the most irrefutable evidence of manipulation based on ‘Time Locality’. Unlike Twitter, the signal here is pure and suffers from no statistical noise. At 52 seconds, the network literally collapses as soon as the time variable is touched: we go from 152 edges and a strong clustering of 0.0740 in the original to only 10 edges and a negligible clustering of 0.0044 in the random model. This 93% drop in interactions proves that almost all activity depends on synchronisation to the nearest second. The finding is identical on a 3-minute scale, where the original network maintains 294 connections compared to 11 for the random model. The total absence of structure in the ‘Random Video’ models (3 edges) also confirms that the content itself is not viral; it is the coordinated action of posting it simultaneously that creates the illusion of popularity.

In conclusion, empirical comparison of these four universes allows us to formally reject the hypothesis of organic behaviour. The metrics reveal two distinct modes of coordinated inauthentic behaviour (CIB). On Facebook, manipulation is crude and dependent on strict temporal synchronisation (without which the network disappears), typical of automation or a precise publication order. On Twitter, manipulation is more structural, relying on dense, segmented communities that are massively active over wider time windows (3 minutes), suggesting a hybrid operation combining bots and human activists. In both cases, the differences in clustering and density compared to null models confirm that these networks are not the result of natural social interactions, but of an orchestrated influence campaign.

