In [6]:
import networkx as nx
import pandas as pd
import random

In [7]:
def estimate_influence_spread(G, seed_set, num_simulations=10):
    def simulate_diffusion(G, seed_set):
        active_nodes = set(seed_set)
        newly_active_nodes = set(seed_set)
        while newly_active_nodes:
            next_newly_active_nodes = set()
            for node in newly_active_nodes:
                neighbors = set(G.neighbors(node))
                for neighbor in neighbors - active_nodes:
                    if random.random() < 0.1:  # Activation probability
                        next_newly_active_nodes.add(neighbor)
            active_nodes.update(next_newly_active_nodes)
            newly_active_nodes = next_newly_active_nodes
        return len(active_nodes)

    total_spread = 0
    for _ in range(num_simulations):
        total_spread += simulate_diffusion(G, seed_set)
    return total_spread / num_simulations

In [8]:
def influence_maximization(G, k):
    seed_set = set()
    for _ in range(k):
        best_node = None
        best_spread = 0
        for node in G.nodes:
            if node not in seed_set:
                temp_set = seed_set | {node}
                spread = estimate_influence_spread(G, temp_set)
                if spread > best_spread:
                    best_spread = spread
                    best_node = node
        if best_node is not None:
            seed_set.add(best_node)
    return seed_set

In [9]:
df = pd.read_csv('../../facebook_clean_data/tvshow_edges.csv')
# Create a sample graph
G = nx.from_pandas_edgelist(df, 'node_1', 'node_2')

# Run greedy influence maximization
k = 3  # Number of nodes to select
selected_nodes = influence_maximization(G, k)

print(f"Selected nodes for influence maximization: {selected_nodes}")

Selected nodes for influence maximization: {672, 2434, 111}


Functionality: 

simulate_diffusion: Simulates the diffusion of influence starting from a given seed set. It models the spread of influence by activating neighboring nodes with a probability of 0.1. The process continues until no new nodes are activated. The function returns the total number of nodes influenced by the initial seed set.

estimate_influence_spread: This function runs multiple simulations (controlled by num_simulations) to estimate the average influence spread from the seed set. By averaging over multiple simulations, it provides a more robust estimate of the spread.

influence_maximization: This greedy algorithm iteratively selects k nodes that maximize the estimated influence spread. For each iteration, the algorithm evaluates each node that is not yet in the seed set, adding the node that results in the greatest increase in spread. This process repeats until k nodes are selected.

Performance:

Efficiency: The greedy algorithm is effective for maximizing influence with a reasonable balance between computational complexity and accuracy. However, evaluating each node's potential influence across multiple simulations can be computationally intensive, especially in large graphs.

Scalability: The algorithm should perform well for moderately sized networks but may face challenges with very large networks due to the repeated simulation process required for each candidate node.

Strengths:

Effectiveness: The greedy approach is well-suited for problems like influence maximization. It builds the seed set incrementally, ensuring that each node added provides the maximum marginal gain in influence.

Applicability: The method is general and can be applied to various types of networks where influence maximization is a concern, such as social networks, marketing strategies, and information dissemination.

Limitations:

Computational Cost: The primary limitation is the computational expense of running multiple simulations for each node in every iteration, especially for large graphs or high values of k and num_simulations.

Simplistic Diffusion Model: The diffusion model assumes a uniform activation probability of 0.1 for all edges, which might not reflect the real-world dynamics where certain connections could be stronger or weaker.