# Influence Maximisation Problem with time and random delays

This is the issue of maximising value in a netowork by selecting nodes that are most likely to yield the highest value. In this notebook we will implement some well studied models and extend them to account for time delays. We will design heuristic and greedy strategies to tackle the influence maximisation problem and analyse performance of the algorithms.

## Jargon: understanding vocabulary

Here we put some vocabulary we might need in order to understand and tackle the problem.

* **States** is how we represent the node, 0 is inactive, -1 latent active, and 1 is active.
* **Propogation value** PV(S) is the expected value of activated nodes in a seed set, S.
* **Seed set** is the set of nodes that contain the solution.
* **k seeds** is the number of initial set of nodes before starting influence maximisation.
* **Freashness function** is a function to determine the value of a node with respect to the time till activation.
* **Freashness value** is the actual value of the node.


In [1]:
# Import packages
import matplotlib.pyplot as plt
from random import uniform, seed
import numpy as np
import time
import snap
import networkx as nx
import random
import itertools
import urllib
import csv
import functools
import heapq as h
import re
import operator
from math import floor
from statistics import mean, stdev

## Generate a graph

gen_ts_graph uses seeds that allows us to generate random values consistently. If we want to reference a graph generated in the past we just input the seed that was used. Any size graph can be created simply by giving the number of nodes.

In [2]:
def gen_ts_graph(size, seed=1000, pr=1):
    """creates a network with time delay on edges, and the capability to add three states."""
    random.seed(seed*size)
    print(random.randrange(size*3, round(size*(size - 1)/150, 0)))
    G = snap.GenRndGnm(snap.PNEANet, size, random.randrange(size*3, round(size*(size - 1)/150, 0)))
            
    if pr:
        print("Graph successfully generated! \n")
            
    return G

In [18]:
graph1 = gen_ts_graph(2500, 5)

21322
Graph successfully generated! 



The freshness functions can be any positive decreasing function for example, exponential, polynomial or piecewise linear function. fval(v)=ff(v.acttime) the value of a node is calculated by input of the activation time in the freshness function.

In [4]:
def exponential(x):
    if x >= 0:
        return (np.exp(1)**(x*-0.2))
    else:
        return print("Enter a value greater than or equal to 0")

def polynomialFreeDecay(x):
    if x >= 0:
        return (-0.001*x) + 100
    else:
        return print("Enter a value greater than or equal to 0")
    
def piecewiselinear(x):
    if x >= 0 and x <= 112:
        return (-x + 200)
    elif x > 112 and x <= 1000:
        return ((-0.1*x) + 100)
    else:
        return 0

## Time Sensitive Independent Cascade

Each node has one chance to activate its neighbor, if its probability is greater than the neighbor' threshold then the neighbor becomes activated. This function returns a propogation value for a seed set that is greater than 0. And in this model it takes a probability p, such that each node has the activation chance p. 

In [5]:
def ic(g, s, p=0.5, mc=100):
    """ 
    input: g - graph, s - seed set, mc - number of montecarlo simulations, p - the activaiton probability
    output: propogation value
    """

    #result of mc simulations
    pvS = []
    
    for i in range(mc):
        #propogate under ic style
        aCurr, allActive, latentA, latentDict, pv = s.copy(), s.copy(), [], {}, 0
        count = 0
        
        while aCurr or latentA:
            
            #neighbors of the nodes that activate
            candidates = []
            for n in aCurr:
                
                if n == None:
                    continue
                    
                #thresholds = list()
                neighbors = list()
                node = g.GetNI(int(n))
                
                for neighbor in node.GetOutEdges():
                    #thresholds.append(g.GetFltAttrDatN(neighbor, "threshold"))
                    neighbors.append(neighbor)
                
                np.random.seed(i)
                activation = np.random.uniform(0,1, node.GetOutDeg()) < p
                candidates += list(np.extract(activation, neighbors))
            
            temp = list(set(candidates) - set(allActive) - set(latentA))

            #store activation times of nodes in latentDict and nodes in latentA
            for node in temp:
                #latentNode = g.GetNI(node)
                latentDict[node] = random.randrange(0, 10) + count 
                latentA += [int(node)]
            
            # resets list of current active nodes
            # checks if item in latentA is ready to become active 
            # adds to aCurr 
            aCurr = []
            remove = []
                
            for nodeL in latentDict:
                actTime = int(latentDict[nodeL])
                if actTime == count:
                    aCurr.append(nodeL)
                    pv += exponential(actTime)
                    remove.append(nodeL)
                    
            if not len(remove) == 0:
                for node in remove:
                    del latentDict[node]
                    latentA.remove(node)
            
            if not len(aCurr) == 0:
                allActive += aCurr

            if pv == 0:
                break

            count += 1

        pvS.append(pv)
    return np.mean(pvS)                

In [21]:
print("IC Propogation value of: ", ic(graph1, [100, 200, 300, 400, 500]))

IC Propogation value of:  297.7820806347474


## Greedy algorithm - Independent Casecade

In the greedy algorithm, we find a seed set of size k. The greedy algorithm works by computing the marginal gain for each possible candidate and selecting he node that gives use he best propogation value(bestPv). The marginal gain is simply the propogation value return by starting propogation on the model with [seedset + node_n].

In [7]:
def greedy(g, k, p=0.5, mc=100):
    """
    input: g - graph, k - set set size
    output: optimal seed set
    """
    seedSet, totalPv = [], 0
    
    #largest marginal gain
    for i in range(k):
        bestPv, bestNode = 0, 0
        #candidates are nodes not in the seed set, k
        for node in g.Nodes():
            
            # do not consider nodes in the seedSet
            if node.GetId() in seedSet:
                continue
            
            # get propogation value
            pv = ic(g, seedSet + [node.GetId()], p, mc)
            
            if pv > bestPv:
                bestPv = pv
                bestNode = node.GetId()
        
        seedSet.append(bestNode)
        print(seedSet)
        totalPv += bestPv
    
    return (seedSet, totalPv)

In [23]:
timer0start = time.time()
result0 = greedy(graph1, 20, 0.025, 10)

print(result0)
print(ic(graph1, result0[0], 0.025, 10))

timer0end = time.time() 
print("Time taken to execute greedyic: ", timer0end - timer0start)

[748]
[748, 192]
[748, 192, 642]
[748, 192, 642, 577]
[748, 192, 642, 577, 1965]
[748, 192, 642, 577, 1965, 325]
[748, 192, 642, 577, 1965, 325, 2142]
[748, 192, 642, 577, 1965, 325, 2142, 2420]
[748, 192, 642, 577, 1965, 325, 2142, 2420, 2455]
[748, 192, 642, 577, 1965, 325, 2142, 2420, 2455, 1807]
[748, 192, 642, 577, 1965, 325, 2142, 2420, 2455, 1807, 477]
[748, 192, 642, 577, 1965, 325, 2142, 2420, 2455, 1807, 477, 1880]
[748, 192, 642, 577, 1965, 325, 2142, 2420, 2455, 1807, 477, 1880, 835]
[748, 192, 642, 577, 1965, 325, 2142, 2420, 2455, 1807, 477, 1880, 835, 1768]
[748, 192, 642, 577, 1965, 325, 2142, 2420, 2455, 1807, 477, 1880, 835, 1768, 1744]
[748, 192, 642, 577, 1965, 325, 2142, 2420, 2455, 1807, 477, 1880, 835, 1768, 1744, 808]
[748, 192, 642, 577, 1965, 325, 2142, 2420, 2455, 1807, 477, 1880, 835, 1768, 1744, 808, 1093]
[748, 192, 642, 577, 1965, 325, 2142, 2420, 2455, 1807, 477, 1880, 835, 1768, 1744, 808, 1093, 1199]
[748, 192, 642, 577, 1965, 325, 2142, 2420, 2455, 18

## Celf - Independent Cascade

The celf aglorithm is an optimisation of the greedy algorithm. The idea is because of submodularity if we compute the marginal gain and of the [seedSet + node_n], and the result is still the best result then node_n must therefore be a better choice than the other candidates.

In [9]:
def celf(g, k, p=0.5, mc=100):
    """
    input - g - graph, k - seet set
    output - a seed set
    """
    
    #first iteration list and sort using a heap
    candidates = [(-ic(g, [node.GetId()], p, mc), node.GetId()) for node in g.Nodes()]
    h.heapify(candidates)
    
    # get the starting node and remove 
    seedSet, propVal = [candidates[0][1]], candidates[0][0]*-1
    h.heappop(candidates)
    
    for i in range(k - 1):
        
        lock = False
        while not lock:
            
            # recalculate the PV, and see if the node gives the largest marginal gain
            current = h.heappop(candidates)
            h.heappush(candidates, ((ic(g, seedSet + [current[1]], p, mc)*-1), current[1]))
            
            if candidates[0][1] == current[1]:
                lock = True
        
        print(seedSet)
        temp = h.heappop(candidates)
        seedSet.append(temp[1])
        propVal += (temp[0]*-1)
    
    return (seedSet, propVal)

In [27]:
timer1start = time.time()
result1 = celf(graph1, 20, 0.025, 10)

print(result1)
print(ic(graph1, result1[0], 0.025, 10))

timer1end = time.time() 
print("Time taken to execute celfic: ", timer1end - timer1start)

[1260]
[1260, 435]
[1260, 435, 2369]
[1260, 435, 2369, 1479]
[1260, 435, 2369, 1479, 106]
[1260, 435, 2369, 1479, 106, 2255]
[1260, 435, 2369, 1479, 106, 2255, 2434]
[1260, 435, 2369, 1479, 106, 2255, 2434, 1332]
[1260, 435, 2369, 1479, 106, 2255, 2434, 1332, 163]
[1260, 435, 2369, 1479, 106, 2255, 2434, 1332, 163, 633]
[1260, 435, 2369, 1479, 106, 2255, 2434, 1332, 163, 633, 1954]
[1260, 435, 2369, 1479, 106, 2255, 2434, 1332, 163, 633, 1954, 1653]
[1260, 435, 2369, 1479, 106, 2255, 2434, 1332, 163, 633, 1954, 1653, 1404]
[1260, 435, 2369, 1479, 106, 2255, 2434, 1332, 163, 633, 1954, 1653, 1404, 1013]
[1260, 435, 2369, 1479, 106, 2255, 2434, 1332, 163, 633, 1954, 1653, 1404, 1013, 1342]
[1260, 435, 2369, 1479, 106, 2255, 2434, 1332, 163, 633, 1954, 1653, 1404, 1013, 1342, 2211]
[1260, 435, 2369, 1479, 106, 2255, 2434, 1332, 163, 633, 1954, 1653, 1404, 1013, 1342, 2211, 1648]
[1260, 435, 2369, 1479, 106, 2255, 2434, 1332, 163, 633, 1954, 1653, 1404, 1013, 1342, 2211, 1648, 1039]
[1260,

## Celf++ - Independent Casecade

The celf++ algorithm is a further optimisation on the celf algorithm. This algorithm says that we are able to compute the marginal gain of another node within the current iteration, therefore in a further step in computation if the other node were to be chosen we do not need to recompute the marginal gain thereby saving computational time.

In [10]:
def celfPlus(g, k, p=0.5, mc=100):
    """
    input - g - graph, k - seet set
    output - a seed set
    """
    
    #first iteration list and sort using a heap
    seedSet, candidates, lastSeed, curBest = [], [], None, None
    mg1Max, propVal = 0, 0
    
    for node in g.Nodes():
        mg1 = -ic(g, [node.GetId()], p, mc)
        prevBest = curBest
        mg2 = -ic(g, [curBest] + [node.GetId()], p, mc)
        flag = 0 
        candidates.append([mg1, node.GetId(), prevBest, mg2, flag])
        curBest = candidates[0][1]
    h.heapify(candidates)
    
    while len(seedSet) < k:
        u = candidates[0]

        if u[1] in seedSet:
            candidates.remove(u)
            continue
        
        if u[4] == len(seedSet):
            print(seedSet)
            seedSet += [u[1]]
            propVal += (u[0]*-1)
            candidates.remove(u)
            lastSeed = u[1]
            continue
        elif u[2] == lastSeed:
            u[0] = u[3]
        else:
            u[0] = -ic(g, [u[1]], p, mc)
            u[2] = curBest
            u[3] = -ic(g, [curBest] + [u[1]], p, mc)

            stat = {candidates[0][1]:candidates[0][0], u[1]:u[0]}
            curBest = max(stat.items(), key=operator.itemgetter(1))[0]

        u[4] = len(seedSet)
        h.heappush(candidates, u)
        
    return (seedSet, propVal)

In [28]:
timer2start = time.time()
result2 = celfPlus(graph1, 20, 0.025, 10)

print(result2)
print(ic(graph1, result2[0], 0.025, 10))

timer2end = time.time() 
print("Time taken to execute celfplusic: ", timer2end - timer2start)

[]
[1322]
[1322, 2429]
[1322, 2429, 110]
[1322, 2429, 110, 519]
[1322, 2429, 110, 519, 733]
[1322, 2429, 110, 519, 733, 422]
[1322, 2429, 110, 519, 733, 422, 954]
[1322, 2429, 110, 519, 733, 422, 954, 128]
[1322, 2429, 110, 519, 733, 422, 954, 128, 1159]
[1322, 2429, 110, 519, 733, 422, 954, 128, 1159, 1315]
[1322, 2429, 110, 519, 733, 422, 954, 128, 1159, 1315, 379]
[1322, 2429, 110, 519, 733, 422, 954, 128, 1159, 1315, 379, 1548]
[1322, 2429, 110, 519, 733, 422, 954, 128, 1159, 1315, 379, 1548, 1773]
[1322, 2429, 110, 519, 733, 422, 954, 128, 1159, 1315, 379, 1548, 1773, 1875]
[1322, 2429, 110, 519, 733, 422, 954, 128, 1159, 1315, 379, 1548, 1773, 1875, 1934]
[1322, 2429, 110, 519, 733, 422, 954, 128, 1159, 1315, 379, 1548, 1773, 1875, 1934, 1069]
[1322, 2429, 110, 519, 733, 422, 954, 128, 1159, 1315, 379, 1548, 1773, 1875, 1934, 1069, 277]
[1322, 2429, 110, 519, 733, 422, 954, 128, 1159, 1315, 379, 1548, 1773, 1875, 1934, 1069, 277, 2367]
[1322, 2429, 110, 519, 733, 422, 954, 128, 1

## Highest Degree priority

In [31]:
def degree(g, k, p=0.5, mc=100):
    seedSet, i = [], 0
    bestNode, bestPv = 0, 0

    #gets starter node
    for node in g.Nodes():
        
        # do not consider nodes in the seedSet
        if node.GetId() in seedSet:
            continue
        
        # get propogation value
        pv = ic(g, seedSet + [node.GetId()], p, mc)
        
        if pv > bestPv:
            bestPv = pv
            bestNode = node.GetId()

    seedSet.append(bestNode)

    #gets highest degree
    while i < k:
        candidates = []
        print(seedSet)
        for node in seedSet:
            nodeIT = g.GetNI(int(node))
            for neighbor in nodeIT.GetOutEdges():
                neighborIT = g.GetNI(int(neighbor))
                candidates.append([neighborIT.GetInDeg(), neighbor])
        h.heapify(candidates)
        seedSet.append(candidates.pop()[1])
        i+=1
    return seedSet

In [32]:
timer3start = time.time()
result3 = degree(graph1, 20, 0.025, 10)

print(result3)
print(ic(graph1, result3, 0.025, 10))

timer3end = time.time() 
print("Time taken to execute high degree ic: ", timer3end - timer3start)

[131]
[131, 627]
[131, 627, 799]
[131, 627, 799, 1555]
[131, 627, 799, 1555, 135]
[131, 627, 799, 1555, 135, 799]
[131, 627, 799, 1555, 135, 799, 979]
[131, 627, 799, 1555, 135, 799, 979, 2082]
[131, 627, 799, 1555, 135, 799, 979, 2082, 1329]
[131, 627, 799, 1555, 135, 799, 979, 2082, 1329, 70]
[131, 627, 799, 1555, 135, 799, 979, 2082, 1329, 70, 2264]
[131, 627, 799, 1555, 135, 799, 979, 2082, 1329, 70, 2264, 605]
[131, 627, 799, 1555, 135, 799, 979, 2082, 1329, 70, 2264, 605, 551]
[131, 627, 799, 1555, 135, 799, 979, 2082, 1329, 70, 2264, 605, 551, 1704]
[131, 627, 799, 1555, 135, 799, 979, 2082, 1329, 70, 2264, 605, 551, 1704, 2031]
[131, 627, 799, 1555, 135, 799, 979, 2082, 1329, 70, 2264, 605, 551, 1704, 2031, 1359]
[131, 627, 799, 1555, 135, 799, 979, 2082, 1329, 70, 2264, 605, 551, 1704, 2031, 1359, 2299]
[131, 627, 799, 1555, 135, 799, 979, 2082, 1329, 70, 2264, 605, 551, 1704, 2031, 1359, 2299, 821]
[131, 627, 799, 1555, 135, 799, 979, 2082, 1329, 70, 2264, 605, 551, 1704, 203

## Genetic algorithm - tournament selection

In [33]:
def tournamentSelection(g, k, p=0.5, mc=100):
    seedSet, bestNode, bestPv = [], 0, 0

    Rnd = snap.TRnd(42)
    Rnd.Randomize()
    while len(seedSet) < k:
        #choose k individuals from the population at random
        randNode = g.GetRndNId(Rnd)

        pv = ic(g, seedSet + [randNode], p, mc)
        
        if pv > bestPv:
            print(seedSet)
            bestPv = pv
            bestNode = randNode
            seedSet.append(randNode)
    return seedSet

In [34]:
timer4start = time.time()
result4 = tournamentSelection(graph1, 20, 0.025, 10)

print(result4)
print(ic(graph1, result4, 0.025, 10))

timer4end = time.time() 
print("Time taken to execute tournament selection ic: ", timer4end - timer4start)

[]
[595]
[595, 2254]
[595, 2254, 945]
[595, 2254, 945, 1708]
[595, 2254, 945, 1708, 2416]
[595, 2254, 945, 1708, 2416, 1364]
[595, 2254, 945, 1708, 2416, 1364, 168]
[595, 2254, 945, 1708, 2416, 1364, 168, 1357]
[595, 2254, 945, 1708, 2416, 1364, 168, 1357, 2358]
[595, 2254, 945, 1708, 2416, 1364, 168, 1357, 2358, 87]
[595, 2254, 945, 1708, 2416, 1364, 168, 1357, 2358, 87, 886]
[595, 2254, 945, 1708, 2416, 1364, 168, 1357, 2358, 87, 886, 1406]
[595, 2254, 945, 1708, 2416, 1364, 168, 1357, 2358, 87, 886, 1406, 593]
[595, 2254, 945, 1708, 2416, 1364, 168, 1357, 2358, 87, 886, 1406, 593, 467]
[595, 2254, 945, 1708, 2416, 1364, 168, 1357, 2358, 87, 886, 1406, 593, 467, 1380]
[595, 2254, 945, 1708, 2416, 1364, 168, 1357, 2358, 87, 886, 1406, 593, 467, 1380, 724]
[595, 2254, 945, 1708, 2416, 1364, 168, 1357, 2358, 87, 886, 1406, 593, 467, 1380, 724, 648]
[595, 2254, 945, 1708, 2416, 1364, 168, 1357, 2358, 87, 886, 1406, 593, 467, 1380, 724, 648, 152]
[595, 2254, 945, 1708, 2416, 1364, 168, 13

## Genetic algorithm - Stochastic Universal Sampling

In [35]:
def sus(population, n, g, seedSet, p=0.5, mc=100):
    f = 0

    for item in population:
        f += ic(g, seedSet + [item[1]], p, mc)

    p = floor(f/n)
    start = random.randrange(0, p)
    pointers = [start + i*p for i in range(n-1)]
    return rws(population, pointers, f)

#roulette wheel selection
def rws(population, points, f):
    keep = []
    for p in points:
        i = 0
        while f < p:
            i+=1
        keep.append(population[i])
    return keep

def stochasticSampling(g, k, p=0.5, mc=100):
    seedSet = []

    #first iteration list and sort using a heap
    candidates = [(-ic(g, [node.GetId()], p, mc), node.GetId()) for node in g.Nodes()]
    h.heapify(candidates)

    # get the starting node and remove 
    seedSet, propVal = [candidates[0][1]], candidates[0][0]*-1
    h.heappop(candidates)
    
    #largest marginal gain
    for i in range(k - 1):
        member = sus(candidates, 3, g, seedSet, 0.025, mc)[random.randrange(0,2)][1]
        temp = []

        while member != candidates[0][1]:
            temp.append(h.heappop(candidates))

        h.heappop(candidates)
        print(seedSet)
        for item in temp:
            h.heappush(candidates, item)
        seedSet.append(member)

    return seedSet

In [None]:
timer5start = time.time()
result5 = stochasticSampling(graph1, 20, 0.025, 10)

print(result5)
print(ic(graph1, result5, 0.025, 10))

timer5end = time.time() 
print("Time taken to execute stochastic Sampling ic: ", timer5end - timer5start)

## Time sensitive Linear Threshold

In the linear threshold model a node gets activated if the sum of the influence exceeds the threshold for that node. The function returns a propogation value based on the seed set, s, and the graph, g, that is provided. Propogation value can be calculated with 1 or more nodes in the seed set. 

In [36]:
def lt(g, s, mc=100):
    """ 
    input: g - graph, s - seed set, mc - number of montecarlo simulations
    output: propogation value
    """
    #result of mc simulations
    pvS = []
    
    for i in range(mc):
        #propogate under lt style
        aCurr, allActive, latentA, latentDict, pv = s.copy(), s.copy(), [], {}, 0
        count = 0
        
        while aCurr or latentA:
            candidates = []
            for n in aCurr:
                
                if n == None:
                    continue
                
                neighbors = list()
                node = g.GetNI(int(n))
                influences = list()
                
                for neighbor in node.GetOutEdges():

                    if neighbor in allActive:
                        continue

                    #thresholds.append(g.GetFltAttrDatN(neighbor, "threshold"))
                    neighbors.append(neighbor)
                    neighborNode = g.GetNI(int(neighbor))
                    
                    #give normalised influence values
                    np.random.seed(i)
                    weights = [random.random() for j in range(neighborNode.GetInDeg())]
                    weights = [k / sum(weights) for k in weights]
                    
                    #calculate total influence from nodes
                    influenceValue = 0
                    counting = 0
                    for neighborNeighbor in neighborNode.GetInEdges():
                        if neighborNeighbor in allActive:
                            influenceValue += weights[counting]
                        counting +=1
                    
                    influences.append(influenceValue)
                
                #loop through graph and assign weights to inedges
                thresholds = np.random.uniform(low=0, high=1, size=(len(influences),))
                activation = np.asarray(influences) >= (np.array(thresholds)/1.1)
                candidates += list(np.extract(activation, neighbors))

            temp = list(set(candidates) - set(allActive) - set(latentA))

            #store activation times of nodes in latentDict and nodes in latentA
            for node in temp:
                latentDict[node] = random.randrange(0, 10) + count 
                latentA += [int(node)]

            # resets list of current active nodes
            # checks if item in latentA is ready to become active 
            # adds to aCurr 
            aCurr = []
            remove = []

            for nodeL in latentDict:
                actTime = int(latentDict[nodeL])
                if actTime == count:
                    aCurr.append(nodeL)
                    pv += exponential(actTime)
                    remove.append(nodeL)

            if not len(remove) == 0:
                for node in remove:
                    del latentDict[node]
                    latentA.remove(node)

            if not len(aCurr) == 0:
                allActive += aCurr

            if pv == 0:
                break

            count += 1
                
        pvS.append(pv)
    return np.mean(pvS)

## Greedy - Linear Threshold

In [51]:
def greedylt(g, k, mc=100):
    """
    input: g - graph, k - set set size
    output: optimal seed set
    """
    seedSet, totalPv = [], 0
    
    #largest marginal gain
    for i in range(k):
        bestPv, bestNode = 0, 0
        #candidates are nodes not in the seed set, k
        for node in g.Nodes():
            
            # do not consider nodes in the seedSet
            if node.GetId() in seedSet:
                continue
            
            # get propogation value
            pv = lt(g, seedSet + [node.GetId()], mc)
            
            if pv > bestPv:
                bestPv = pv
                bestNode = node.GetId()
        
        seedSet.append(bestNode)
        print(seedSet)
        totalPv += bestPv
    
    return (seedSet, totalPv)

In [53]:
timer6start = time.time()
result6 = greedylt(graph1, 20, 1)

print(result6)
print(lt(graph1, result6[0], 10))

timer6end = time.time() 
print("Time taken to execute stochastic greedylt: ", timer6end - timer6start)

[640]
[640, 332]
[640, 332, 1868]
[640, 332, 1868, 2150]
[640, 332, 1868, 2150, 1215]
[640, 332, 1868, 2150, 1215, 1504]
[640, 332, 1868, 2150, 1215, 1504, 2132]
[640, 332, 1868, 2150, 1215, 1504, 2132, 425]


KeyboardInterrupt: 

## Celf - Linear Threshold

In [49]:
def celflt(g, k, mc=100):
    """
    input - g - graph, k - seet set
    output - a seed set
    """
    
    #first iteration list and sort using a heap
    candidates = [(-lt(g, [node.GetId()], mc), node.GetId()) for node in g.Nodes()]
    h.heapify(candidates)
    
    # get the starting node and remove 
    seedSet, propVal = [candidates[0][1]], candidates[0][0]*-1
    h.heappop(candidates)
    
    for i in range(k - 1):
        
        lock = False
        while not lock:
            
            # recalculate the PV, and see if the node gives the largest marginal gain
            current = h.heappop(candidates)
            h.heappush(candidates, ((lt(g, seedSet + [current[1]], mc)*-1), current[1]))
            
            if candidates[0][1] == current[1]:
                lock = True
        
        print(seedSet)
        temp = h.heappop(candidates)
        seedSet.append(temp[1])
        propVal += (temp[0]*-1)
    
    return (seedSet, propVal)

In [50]:
timer7start = time.time()
result7 = celflt(graph1, 20, 1)

print(result7)
print(lt(graph1, result7[0], 10))

timer7end = time.time() 
print("Time taken to execute stochastic celflt: ", timer7end - timer7start)

[440]
[440, 737]
[440, 737, 498]
[440, 737, 498, 2014]
[440, 737, 498, 2014, 596]
[440, 737, 498, 2014, 596, 1718]
[440, 737, 498, 2014, 596, 1718, 386]
[440, 737, 498, 2014, 596, 1718, 386, 1549]
[440, 737, 498, 2014, 596, 1718, 386, 1549, 191]
[440, 737, 498, 2014, 596, 1718, 386, 1549, 191, 1326]
[440, 737, 498, 2014, 596, 1718, 386, 1549, 191, 1326, 640]
[440, 737, 498, 2014, 596, 1718, 386, 1549, 191, 1326, 640, 198]
[440, 737, 498, 2014, 596, 1718, 386, 1549, 191, 1326, 640, 198, 141]
[440, 737, 498, 2014, 596, 1718, 386, 1549, 191, 1326, 640, 198, 141, 1277]
[440, 737, 498, 2014, 596, 1718, 386, 1549, 191, 1326, 640, 198, 141, 1277, 2179]
[440, 737, 498, 2014, 596, 1718, 386, 1549, 191, 1326, 640, 198, 141, 1277, 2179, 2138]
[440, 737, 498, 2014, 596, 1718, 386, 1549, 191, 1326, 640, 198, 141, 1277, 2179, 2138, 940]
[440, 737, 498, 2014, 596, 1718, 386, 1549, 191, 1326, 640, 198, 141, 1277, 2179, 2138, 940, 1954]
[440, 737, 498, 2014, 596, 1718, 386, 1549, 191, 1326, 640, 198, 1

## Celf++ - Linear Threshold

In [47]:
def celfPluslt(g, k, mc=100):
    """
    input - g - graph, k - seet set
    output - a seed set
    """
    
    #first iteration list and sort using a heap
    seedSet, candidates, lastSeed, curBest = [], [], None, None
    mg1Max, propVal = 0, 0
    
    for node in g.Nodes():
        mg1 = -lt(g, [node.GetId()], mc)
        prevBest = curBest
        mg2 = -lt(g, [curBest] + [node.GetId()], mc)
        flag = 0 
        candidates.append([mg1, node.GetId(), prevBest, mg2, flag])
        curBest = candidates[0][1]
    h.heapify(candidates)
    
    while len(seedSet) < k:
        u = candidates[0]
        
        if u[1] in seedSet:
            candidates.remove(u)
            continue

        if u[4] == len(seedSet):
            print(seedSet)
            seedSet += [u[1]]
            propVal += (u[0]*-1)
            candidates.remove(u)
            lastSeed = u[1]
            continue
        elif u[2] == lastSeed:
            # optimisation: recompute the marginal gain without using lt method
            u[0] = u[3]
        else:
            u[0] = -lt(g, [u[1]], mc)
            u[2] = curBest
            u[3] = -lt(g, [curBest] +[u[1]], mc)

            stat = {candidates[0][1]:candidates[0][0], u[1]:u[0]}
            curBest = max(stat.items(), key=operator.itemgetter(1))[0]

        u[4] = len(seedSet)
        h.heappush(candidates, u)
        
    return (seedSet, propVal)

In [48]:
timer8start = time.time()
result8 = celfPluslt(graph1, 20, 1)

print(result8)
print(lt(graph1, result8[0], 10))

timer8end = time.time() 
print("Time taken to execute stochastic celfPluslt: ", timer8end - timer8start)

[]
[264]
[264, 303]
[264, 303, 1599]
[264, 303, 1599, 2422]
[264, 303, 1599, 2422, 326]
[264, 303, 1599, 2422, 326, 1757]
[264, 303, 1599, 2422, 326, 1757, 1018]
[264, 303, 1599, 2422, 326, 1757, 1018, 2071]
[264, 303, 1599, 2422, 326, 1757, 1018, 2071, 2449]
[264, 303, 1599, 2422, 326, 1757, 1018, 2071, 2449, 1332]
[264, 303, 1599, 2422, 326, 1757, 1018, 2071, 2449, 1332, 725]
[264, 303, 1599, 2422, 326, 1757, 1018, 2071, 2449, 1332, 725, 772]
[264, 303, 1599, 2422, 326, 1757, 1018, 2071, 2449, 1332, 725, 772, 216]
[264, 303, 1599, 2422, 326, 1757, 1018, 2071, 2449, 1332, 725, 772, 216, 1873]
[264, 303, 1599, 2422, 326, 1757, 1018, 2071, 2449, 1332, 725, 772, 216, 1873, 1947]
[264, 303, 1599, 2422, 326, 1757, 1018, 2071, 2449, 1332, 725, 772, 216, 1873, 1947, 1049]
[264, 303, 1599, 2422, 326, 1757, 1018, 2071, 2449, 1332, 725, 772, 216, 1873, 1947, 1049, 2221]
[264, 303, 1599, 2422, 326, 1757, 1018, 2071, 2449, 1332, 725, 772, 216, 1873, 1947, 1049, 2221, 2412]
[264, 303, 1599, 2422, 

## High Degree - Linear Threshold

In [75]:
def degreelt(g, k, mc=100):
    seedSet, i = [], 0
    bestNode, bestPv = 0, 0

    #gets starter node
    for node in g.Nodes():
        
        # do not consider nodes in the seedSet
        if node.GetId() in seedSet:
            continue
        
        # get propogation value
        pv = lt(g, seedSet + [node.GetId()], mc)
        
        if pv > bestPv:
            bestPv = pv
            bestNode = node.GetId()

    seedSet.append(bestNode)

    #gets highest degree
    while i < k:
        candidates = []
        print(seedSet)
        for node in seedSet:
            nodeIT = g.GetNI(int(node))
            for neighbor in nodeIT.GetOutEdges():
                neighborIT = g.GetNI(int(neighbor))
                candidates.append([neighborIT.GetInDeg(), neighbor])
        h.heapify(candidates)
        seedSet.append(candidates.pop()[1])
        i+=1
    return seedSet

In [None]:
timer9start = time.time()
result9 = degreelt(graph1, 20, 10)

print(result9)
print(lt(graph1, result9, 10))

timer9end = time.time() 
print("Time taken to execute stochastic degreelt: ", timer9end - timer9start)

## Genetic Algorithm Tournament Selection - Linear Threshold

In [77]:
def tournamentSelectionlt(g, k, p=0.5, mc=100):
    seedSet, bestNode, bestPv = [], 0, 0

    Rnd = snap.TRnd(42)
    Rnd.Randomize()
    while len(seedSet) < k:
        #choose k individuals from the population at random
        randNode = g.GetRndNId(Rnd)

        pv = lt(g, seedSet + [randNode], p, mc)
        
        if pv > bestPv:
            print(seedSet)
            bestPv = pv
            bestNode = randNode
            seedSet.append(randNode)
    return seedSet

In [None]:
timer10start = time.time()
result10 = tournamentSelectionlt(graph1, 20, 10)

print(result10)
print(lt(graph1, result10, 10))

timer10end = time.time() 
print("Time taken to execute stochastic tournamentSelectionlt: ", timer10end - timer10start)

## Genetic Algorithm Stochastic Sampling - Linear Threshold

In [76]:
def suslt(population, n, g, seedSet, p=0.5, mc=100):
    f = 0

    for item in population:
        f += lt(g, seedSet + [item[1]], p, mc)

    p = floor(f/n)
    start = random.randrange(0, p)
    pointers = [start + i*p for i in range(n-1)]
    return rwslt(population, pointers, f)

#roulette wheel selection
def rwslt(population, points, f):
    keep = []
    for p in points:
        i = 0
        while f < p:
            i+=1
        keep.append(population[i])
    return keep

def stochasticSamplinglt(g, k, p=0.5, mc=100):
    seedSet = []

    #first iteration list and sort using a heap
    candidates = [(-lt(g, [node.GetId()], p, mc), node.GetId()) for node in g.Nodes()]
    h.heapify(candidates)

    # get the starting node and remove 
    seedSet, propVal = [candidates[0][1]], candidates[0][0]*-1
    h.heappop(candidates)
    
    #largest marginal gain
    for i in range(k - 1):
        member = suslt(candidates, 3, g, seedSet, 0.025, mc)[random.randrange(0,2)][1]
        temp = []

        while member != candidates[0][1]:
            temp.append(h.heappop(candidates))

        h.heappop(candidates)
        print(seedSet)
        for item in temp:
            h.heappush(candidates, item)
        seedSet.append(member)

    return seedSet

In [None]:
timer11start = time.time()
result11 = stochasticSamplinglt(graph1, 20, 10)

print(result11)
print(lt(graph1, result11, 10))

timer11end = time.time() 
print("Time taken to execute stochastic stochasticSamplinglt: ", timer11end - timer11start)

# Test and Benchmarking

Here we load data sets from Twitter, WikiVote, HEP-PH, and Epinions to benchmark the algorithms and compare them with the results produced in the paper by Mohammedi et al. We expect that there will be a curvature to the shape of the graphs which suggests submodularity. 

In [66]:
import re
def initialiseGraph(numNodes, path):
    f = open("Datasets/wiki-Vote.txt", "r")
    g = snap.TUNGraph.New()
    
    #adding nodes
    for i in range(numNodes):
        g.AddNode(i+1)
    
    #adding edges
    i = 0 
    for row in f:
        if i>4:
            x = re.search("(^\d+)\s+(\d+)$", row)
            g.AddEdge(int(x.group(1)), int(x.group(2)))
        i+=1
    return g

In [67]:
epinions = initialiseGraph(75879, "Datasets/soc-Epinions1.txt")
wikivote = initialiseGraph(8297, "Datasets/wiki-Vote.txt")
hepph = initialiseGraph(34546, "Datasets/cit-HepPh.txt")

In [78]:
arrayic = [greedy, celf, celfPlus, degree, tournamentSelection, stochasticSampling]
arraylt = [greedylt, celflt, celfPluslt, degreelt, tournamentSelectionlt, stochasticSamplinglt]

datasets = ["wikivote", "hepph", "epinions"]

### Testing IC

In [81]:
def pvseedSetic(method, graph, p=0.5):
    runningTime = []
    pV = []
    i = 10
    
    while i < 20:
        start = time.time()
        result = method(graph, i, p, 1)
        end = time.time()
        pVSim = lt(graph, result[0], p, 10)
        pV.append((i, pVSim))
        runningTime.append((i, end-start))
        i+=10
    
    return (runningTime, pV)

In [82]:
resultsic = []
for method in arrayic:
    resultsic.append(pvseedSetic(method, wikivote, 0.025), "wikivote")

[2775]


KeyboardInterrupt: 

In [None]:
resultsic = []
for method in arrayic:
    for dataset in datasets
        resultsic.append(pvseedSetic(method, dataset, 0.025), dataset)

# test this before running the cell
# with open("outputIC.txt", "w") as txt_file:
#     for line in resultsic:
#         txt_file.write(" ".join(line) + "\n")

### Testing lt

In [None]:
def pvseedSetlt(method, graph):
    runningTime = []
    pV = []
    i = 10
    
    while i < 50:
        start = time.time()
        result = method(graph, i, 1)
        end = time.time()
        pVSim = lt(graph, result[0], 1000)
        pV.append((i, pVSim))
        runningTime.append((i, end-start))
        i+=10
    
    return (runningTime, pV)

In [None]:
resultslt = []
for method in arraylt:
    for dataset in datasets:
        resultslt.append(pvseedSetlt(greedy, dataset), dataset)