# Influence Maximisation Problem with time and random delays

This is the issue of maximising value in a netowork by selecting nodes that are most likely to yield the highest value. In this notebook we will implement some well studied models and extend them to account for time delays. We will design heuristic and greedy strategies to tackle the influence maximisation problem and analyse performance of the algorithms.

## Jargon: understanding vocabulary

Here we put some vocabulary we might need in order to understand and tackle the problem.

* **States** is how we represent the node, 0 is inactive, -1 latent active, and 1 is active.
* **Propogation value** PV(S) is the expected value of activated nodes in a seed set, S.
* **Seed set** is the set of nodes that contain the solution.
* **k seeds** is the number of initial set of nodes before starting influence maximisation.
* **Freashness function** is a function to determine the value of a node with respect to the time till activation.
* **Freashness value** is the actual value of the node.


## Generate a graph

A model where the sum of influence will determine whether a node becomes active. The influence must exceed the threshold of a node. Each edge and node has a random value assigned to it between 0 and 1 when a new instance is generated.

In [70]:
import snap
import networkx as nx
%matplotlib inline
import matplotlib.pyplot as plt
import random
import time
import itertools
import urllib
import csv
import functools
import numpy as np
from statistics import mean, stdev

gen_graph uses seeds. A seed is a feature of python that allows us to generate random values associated to nodes and edges when we need them. If we want to reference a graph generated in the past we just input the seed that was used.

In [71]:
def gen_graph(size, seed=42):
    """generates a network, each value assigned to the node and edge is random float between 0 and 1"""
    nodes = size
    G = snap.GenFull(snap.PNEANet,nodes)
    random.seed(seed*size)
    
    # define float and str attributes on nodes
    G.AddFltAttrN("NValFlt", 0.0)
    G.AddStrAttrN("NValStr", "0")

    # define an int attribute on edges
    G.AddFltAttrE("EValFlt", 0)

    # add attribute values, node ID for nodes, edge ID for edges

    for NI in G.Nodes():
        nid = NI.GetId()
        val = nid
        G.AddFltAttrDatN(nid, random.random(), "NValFlt")
        G.AddStrAttrDatN(nid, str(val), "NValStr")
        G.AddIntAttrDatN(nid, 0, "State")

        for nid1 in NI.GetOutEdges():
            eid = G.GetEId(nid,nid1)
            val = eid
            G.AddFltAttrDatE(eid, random.random(), "EValFlt")

    # print out attribute values
    for n in G.Nodes():
        nid = n.GetId()
        fval = G.GetFltAttrDatN(nid, "NValFlt")
        sval = G.GetStrAttrDatN(nid, "NValStr")
        
        print("node %d, NValFlt %.2f, NValStr %s" % (nid, fval, sval))

        for nid2 in n.GetOutEdges():
            eid = G.GetEId(nid, nid2)
            val1 = G.GetFltAttrDatE(eid, "EValFlt")
            print("edge %d (%d,%d), EValFlt %f" % (eid, nid, nid2, val1))
            
    return G

In [72]:
def node_states(g, attr, size):
    """A method iterating through the network using an iterator to check the states of each node."""
    iterator = g.BegNI()
    print("NId %d has the following state %d" % (iterator.GetId() ,g.GetIntAttrDatN(iterator.GetId(), attr)))
    for i in range(0, size):
        temp = iterator.Next()
        if temp != g.EndNI():
            NId = temp.GetId()
            print("NId %d has the following state %d" % (NId ,g.GetIntAttrDatN(NId, attr)))
        else:
            break
        

In [73]:
graph1 = gen_graph(10, 14)
node_states(graph1, "State", graph1.GetNodes())

node 0, NValFlt 0.77, NValStr 0
edge 0 (0,1), EValFlt 0.095012
edge 1 (0,2), EValFlt 0.266874
edge 2 (0,3), EValFlt 0.531715
edge 3 (0,4), EValFlt 0.053134
edge 4 (0,5), EValFlt 0.722369
edge 5 (0,6), EValFlt 0.015054
edge 6 (0,7), EValFlt 0.189503
edge 7 (0,8), EValFlt 0.706732
edge 8 (0,9), EValFlt 0.838524
node 1, NValFlt 0.39, NValStr 1
edge 9 (1,0), EValFlt 0.983654
edge 10 (1,2), EValFlt 0.893276
edge 11 (1,3), EValFlt 0.230050
edge 12 (1,4), EValFlt 0.189567
edge 13 (1,5), EValFlt 0.839557
edge 14 (1,6), EValFlt 0.694611
edge 15 (1,7), EValFlt 0.921233
edge 16 (1,8), EValFlt 0.056072
edge 17 (1,9), EValFlt 0.553310
node 2, NValFlt 0.40, NValStr 2
edge 18 (2,0), EValFlt 0.491260
edge 19 (2,1), EValFlt 0.708380
edge 20 (2,3), EValFlt 0.656524
edge 21 (2,4), EValFlt 0.144910
edge 22 (2,5), EValFlt 0.867704
edge 23 (2,6), EValFlt 0.659677
edge 24 (2,7), EValFlt 0.748998
edge 25 (2,8), EValFlt 0.592459
edge 26 (2,9), EValFlt 0.662082
node 3, NValFlt 0.96, NValStr 3
edge 27 (3,0), EVa

## Linear Threshold

This is the case of where the ff(t) = 1. The freshness function can be any positive decreasing function for example, exponential, polynomial or piecewise linear function. fval(v)=ff(v.acttime) the value of a node is calculated by input of the activation time in the freshness function.

In [74]:
def edit_nodestate(graph, NId, attr, val):
    """changes the state of a specific node in the network"""
    graph.DelAttrDatN(NId, attr)
    graph.AddIntAttrDatN(NId, val, attr)
    
    return graph

In [75]:
def lt_prop(g, k):
    """simple propogation in a linear threshold way with k = 1 and ff(t) = 1"""
    s = list()
    
    #check to see if propogation already carried out
    for n in g.Nodes():
        if g.GetIntAttrDatN(n.GetId(), "State") == 1:
            return print("Influence propogation has been carried out, please re-generate graph. \n")
    
    # select k random nodes
    for i in range(0,k):
        initial = g.GetRndNId()
        while initial in s:
            initial = g.GetRndNId()
        g = edit_nodestate(g, initial, "State", 1)
        s.append(initial)
    
    # adds nodes to s that exceed thresholds in lt
    sizeS = len(s)
    while True:
        for n in g.Nodes():
            nid = n.GetId()
            if g.GetIntAttrDatN(nid, "State") == 0:
                theta = g.GetFltAttrDatN(nid, "NValFlt")
                influence = 0
                for nid2 in n.GetOutEdges():
                    if g.GetIntAttrDatN(nid2, "State") != 0:
                        eid = g.GetEId(nid, nid2)
                        influence += g.GetFltAttrDatE(eid, "EValFlt")

                if influence >= theta:
                    edit_nodestate(g, nid, "State", 1)
                    s.append(nid)

        # if no further changes to number of nodes close
        if sizeS != len(s) and len(s) <= g.GetNodes():
            sizeS = len(s)
        else:
            break
    
    # check the proportion of nodes that are activated
    print("Proportion of nodes influenced %0.0f%% \n" % ((sizeS/g.GetNodes())*100))

In [366]:
graph2 = gen_graph(10, 10)

node 0, NValFlt 0.15, NValStr 0
edge 0 (0,1), EValFlt 0.454927
edge 1 (0,2), EValFlt 0.770784
edge 2 (0,3), EValFlt 0.705513
edge 3 (0,4), EValFlt 0.731959
edge 4 (0,5), EValFlt 0.433514
edge 5 (0,6), EValFlt 0.800020
edge 6 (0,7), EValFlt 0.532901
edge 7 (0,8), EValFlt 0.080154
edge 8 (0,9), EValFlt 0.455946
node 1, NValFlt 0.05, NValStr 1
edge 9 (1,0), EValFlt 0.932962
edge 10 (1,2), EValFlt 0.947078
edge 11 (1,3), EValFlt 0.335351
edge 12 (1,4), EValFlt 0.309406
edge 13 (1,5), EValFlt 0.768018
edge 14 (1,6), EValFlt 0.203870
edge 15 (1,7), EValFlt 0.178461
edge 16 (1,8), EValFlt 0.188595
edge 17 (1,9), EValFlt 0.347004
node 2, NValFlt 0.63, NValStr 2
edge 18 (2,0), EValFlt 0.963316
edge 19 (2,1), EValFlt 0.210834
edge 20 (2,3), EValFlt 0.956101
edge 21 (2,4), EValFlt 0.555400
edge 22 (2,5), EValFlt 0.901152
edge 23 (2,6), EValFlt 0.818018
edge 24 (2,7), EValFlt 0.160422
edge 25 (2,8), EValFlt 0.648543
edge 26 (2,9), EValFlt 0.124093
node 3, NValFlt 0.01, NValStr 3
edge 27 (3,0), EVa

In [367]:
lt_prop(graph2, 1)
node_states(graph2, "State", graph2.GetNodes())

Proportion of nodes influenced 100% 

NId 0 has the following state 1
NId 1 has the following state 1
NId 2 has the following state 1
NId 3 has the following state 1
NId 4 has the following state 1
NId 5 has the following state 1
NId 6 has the following state 1
NId 7 has the following state 1
NId 8 has the following state 1
NId 9 has the following state 1


## Independent Cascade

In [6]:
def ic_prop(g, k):
    """simple propogation in an independent casecade way with k = 1 and ff(t) = 1"""
    s = list()
    
    #check to see if propogation already carried out
    for n in g.Nodes():
        if g.GetIntAttrDatN(n.GetId(), "State") == 1:
            return print("Influence propogation has been carried out, please re-generate graph. \n")
        
    # select k random nodes
    for i in range(0,k):
        initial = g.GetRndNId()
        while initial in s:
            initial = g.GetRndNId()
        g = edit_nodestate(g, initial, "State", 1)
        s.append(initial)
    
    # start propogation, add a value to check if the node has tried to activate its neighbours(independent)
    sizeS = len(s)
    while True:
        for n in g.Nodes():
            nid = n.GetId()
            if g.GetIntAttrDatN(nid, "State") == 1:
                for nid2 in n.GetOutEdges():
                    eid = g.GetEId(nid, nid2)
                    theta = g.GetFltAttrDatN(nid2, "NValFlt")
                    if g.GetFltAttrDatE(eid, "EValFlt") >= theta and nid2 not in s:
                        edit_nodestate(g, nid2, "State", 1)
                        s.append(nid2)
        
        if sizeS != len(s) and len(s) <= g.GetNodes():
            sizeS = len(s)
        else:
            break
    
    # check the proportion of nodes that are activated
    print("Proportion of nodes influenced %0.0f%% \n" % ((sizeS/g.GetNodes())*100))                 

In [19]:
graph3 = gen_graph(200, 12)

node 0, NValFlt 0.06, NValStr 0
edge 0 (0,1), EValFlt 0.495854
edge 1 (0,2), EValFlt 0.045132
edge 2 (0,3), EValFlt 0.839571
edge 3 (0,4), EValFlt 0.533781
edge 4 (0,5), EValFlt 0.748611
edge 5 (0,6), EValFlt 0.721103
edge 6 (0,7), EValFlt 0.113018
edge 7 (0,8), EValFlt 0.632362
edge 8 (0,9), EValFlt 0.960201
edge 9 (0,10), EValFlt 0.082323
edge 10 (0,11), EValFlt 0.787005
edge 11 (0,12), EValFlt 0.745409
edge 12 (0,13), EValFlt 0.063924
edge 13 (0,14), EValFlt 0.330950
edge 14 (0,15), EValFlt 0.230552
edge 15 (0,16), EValFlt 0.539032
edge 16 (0,17), EValFlt 0.075694
edge 17 (0,18), EValFlt 0.911056
edge 18 (0,19), EValFlt 0.636479
edge 19 (0,20), EValFlt 0.580074
edge 20 (0,21), EValFlt 0.079965
edge 21 (0,22), EValFlt 0.234661
edge 22 (0,23), EValFlt 0.215476
edge 23 (0,24), EValFlt 0.927800
edge 24 (0,25), EValFlt 0.779949
edge 25 (0,26), EValFlt 0.358544
edge 26 (0,27), EValFlt 0.757640
edge 27 (0,28), EValFlt 0.950712
edge 28 (0,29), EValFlt 0.891229
edge 29 (0,30), EValFlt 0.4437

In [20]:
ic_prop(graph3, 1)
node_states(graph3, "State", graph3.GetNodes())

Proportion of nodes influenced 98% 

NId 0 has the following state 1
NId 1 has the following state 1
NId 2 has the following state 1
NId 3 has the following state 1
NId 4 has the following state 1
NId 5 has the following state 1
NId 6 has the following state 1
NId 7 has the following state 1
NId 8 has the following state 1
NId 9 has the following state 1
NId 10 has the following state 1
NId 11 has the following state 1
NId 12 has the following state 1
NId 13 has the following state 1
NId 14 has the following state 1
NId 15 has the following state 1
NId 16 has the following state 1
NId 17 has the following state 1
NId 18 has the following state 1
NId 19 has the following state 1
NId 20 has the following state 1
NId 21 has the following state 1
NId 22 has the following state 1
NId 23 has the following state 1
NId 24 has the following state 1
NId 25 has the following state 1
NId 26 has the following state 1
NId 27 has the following state 1
NId 28 has the following state 1
NId 29 has the f

## Time sensitive influence maximisation

We need to incorporate more information into the network in order to account for time. Firstly, each node will have three states, for 0 for inactive, 1 for latent , and 2 for active state. Once a node has been triggered to be active it goes into a latent active state, and will activate in (t+d) steps.

In [59]:
def gen_ts_graph(size, seed=42):
    """creates a network with time delay on edges, and the capability to add three states."""
    nodes = size
    G = snap.GenFull(snap.PNEANet,nodes)
    random.seed(seed*size)
    
    # define float and str attributes on nodes
    G.AddFltAttrN("NValFlt", 0.0)
    G.AddStrAttrN("NValStr", "0")
    
    # define an flt attribute on edges
    G.AddFltAttrE("EValFlt", 0)

    # add attribute values, node ID for nodes, edge ID for edges

    for NI in G.Nodes():
        nid = NI.GetId()
        val = nid
        G.AddFltAttrDatN(nid, random.random(), "NValFlt")
        G.AddStrAttrDatN(nid, str(val), "NValStr")
        G.AddIntAttrDatN(nid, 0, "State")
        G.AddIntAttrDatN(nid, 0, "activationTime")
        G.AddIntAttrDatN(nid, random.randrange(5, 30), "delay")

        for nid1 in NI.GetOutEdges():
            eid = G.GetEId(nid,nid1)
            val = eid
            G.AddFltAttrDatE(eid, random.random(), "EValFlt")

    # print out attribute values
    for n in G.Nodes():
        nid = n.GetId()
        fval = G.GetFltAttrDatN(nid, "NValFlt")
        sval = G.GetStrAttrDatN(nid, "NValStr")
        val2 = G.GetIntAttrDatN(nid, "delay")
        
        print("node %d, NValFlt %.2f, NValStr %s, delay %d" % (nid, fval, sval, val2))

        for nid2 in n.GetOutEdges():
            eid = G.GetEId(nid, nid2)
            val1 = G.GetFltAttrDatE(eid, "EValFlt")
            print("edge %d (%d,%d), EValFlt %f" % (eid, nid, nid2, val1))
            
    return G

In [60]:
graph4 = gen_ts_graph(20, 20)

node 0, NValFlt 0.31, NValStr 0, delay 13
edge 0 (0,1), EValFlt 0.787116
edge 1 (0,2), EValFlt 0.092797
edge 2 (0,3), EValFlt 0.489070
edge 3 (0,4), EValFlt 0.777760
edge 4 (0,5), EValFlt 0.411293
edge 5 (0,6), EValFlt 0.776073
edge 6 (0,7), EValFlt 0.422835
edge 7 (0,8), EValFlt 0.256768
edge 8 (0,9), EValFlt 0.843924
edge 9 (0,10), EValFlt 0.447665
edge 10 (0,11), EValFlt 0.661621
edge 11 (0,12), EValFlt 0.430563
edge 12 (0,13), EValFlt 0.141945
edge 13 (0,14), EValFlt 0.805065
edge 14 (0,15), EValFlt 0.983723
edge 15 (0,16), EValFlt 0.864699
edge 16 (0,17), EValFlt 0.619392
edge 17 (0,18), EValFlt 0.816469
edge 18 (0,19), EValFlt 0.385591
node 1, NValFlt 0.62, NValStr 1, delay 18
edge 19 (1,0), EValFlt 0.933458
edge 20 (1,2), EValFlt 0.218874
edge 21 (1,3), EValFlt 0.098633
edge 22 (1,4), EValFlt 0.325949
edge 23 (1,5), EValFlt 0.923088
edge 24 (1,6), EValFlt 0.596450
edge 25 (1,7), EValFlt 0.868980
edge 26 (1,8), EValFlt 0.773424
edge 27 (1,9), EValFlt 0.866367
edge 28 (1,10), EVal

## Delayed Linear Threshold

The solution to the influence maximisation problem will be deterministic for the Linear threshold model. If we start at the same initial node it will give us the same solution each time. We can do a greedy or approximate approach to solve the problem for linear threshold. In the greedy approach we need to evaluate the value of a node when we add it to the solution, we determine the value using the freshness function.

In the greedy algorithm, we define k number of nodes in the solution we want. If there are 20 nodes and k is equal to 10 we will select the 10 best nodes where the value of the node depends on the freshness function. Suppose that we have 5 activated nodes, out fo these we select the best to add to the solution. Once we have reach k number of nodes we can calculate the final value of the solution.

In [8]:
def exponential(x):
    if x >= 0:
        return (np.exp(1)**-x)
    else:
        return print("Enter a value greater than or equal to 0")

def polynomial(x):
    if x >= 1:
        return 1/x
    else:
        return print("Enter a value greater than or equal to 1")
    
def piecewiselinear(x):
    if x >= 5 and x <= 30:
        return -x + 30
    elif x > 30 and x <= 40:
        return -0.5 * x + 20
    else:
        return print("Enter a value between 5 and 30")

In [49]:
def im_g_lt(g, x, k):
    s = {}
    
    #check to see if k is larger than g
    if k >= g.GetNodes():
        return print("Pick a k value less than %d" % (g.GetNodes()))
    
    #check to see if propogation already carried out
    for n in g.Nodes():
        if g.GetIntAttrDatN(n.GetId(), "State") == 1:
            return print("Influence propogation has been carried out, please re-generate graph. \n")
    
    # select x initial nodes
    for i in range(0,x):
        initial = g.GetRndNId()
        while initial in s:
            initial = g.GetRndNId()
        g = edit_nodestate(g, initial, "State", 2)
        s[initial] = (initial, 0, g.GetIntAttrDatN(initial, "activationTime"), g.GetIntAttrDatN(initial, "delay"))
    
    # propogation and selection of nodes
    timeStep = 0
    for i in range(0, k):
        sizeS = len(s)
        temp = {}
        
        while True:
            timeStep += 1
            # path over the network and switch on nodes
            for n in g.Nodes():
                nid = n.GetId()
                if g.GetIntAttrDatN(nid, "State") == 0:
                    theta = g.GetFltAttrDatN(nid, "NValFlt")
                    influence = 0
                    for nid2 in n.GetOutEdges():
                        if g.GetIntAttrDatN(nid2, "State") == 2:
                            eid = g.GetEId(nid, nid2)
                            influence += g.GetFltAttrDatE(eid, "EValFlt")  
                        if influence >= theta:
                            edit_nodestate(g, nid, "State", 1)
                            edit_nodestate(g, nid, "activationTime", timeStep)
                elif g.GetIntAttrDatN(nid, "State") == 1:
                    activationTime = g.GetIntAttrDatN(nid, "activationTime")
                    delay = g.GetIntAttrDatN(nid, "delay")
                    if activationTime + delay  <= timeStep:
                        edit_nodestate(g, nid, "State", 2)
            
            # find the nodes that are activated and store into a dictionary
            for n in g.Nodes():
                NId = n.GetId()
                if NId not in s and g.GetIntAttrDatN(NId, "State") == 2:
                    delay = g.GetIntAttrDatN(NId, "delay")
                    activationTime = g.GetIntAttrDatN(NId, "activationTime")
                    if activationTime + delay  == timeStep:
                        activationTime = g.GetIntAttrDatN(NId, "activationTime")
                        timeDiff = abs(activationTime - delay)
                        freshVal = exponential(timeDiff)*100000000
                        temp[freshVal] = (NId, timeStep, delay, activationTime)
            
            if bool(temp):
                largK = 0
                for key, value in temp.items():
                    if largK < key:
                        largK = key
                kVal = temp[largK]
                s[kVal[0]] = kVal
                print(kVal)
                del temp[largK]
                break
    
    pv = 0.0
    # calculate the total value of solution
    for i in range(0, k):
        for key in s:
            pv += exponential(abs(timeStep - s[key][3]))
        
    
    if pv <= 0:
        return print("Calculation error \n")
    else:
        return print("Solution of size k = %d, has a spreadvalue of %f \n" % (k, pv))
    

In [68]:
graph5 = gen_ts_graph(10, 20)

node 0, NValFlt 0.05, NValStr 0, delay 11
edge 0 (0,1), EValFlt 0.734504
edge 1 (0,2), EValFlt 0.030363
edge 2 (0,3), EValFlt 0.617711
edge 3 (0,4), EValFlt 0.798708
edge 4 (0,5), EValFlt 0.916363
edge 5 (0,6), EValFlt 0.691844
edge 6 (0,7), EValFlt 0.709865
edge 7 (0,8), EValFlt 0.718631
edge 8 (0,9), EValFlt 0.438606
node 1, NValFlt 0.44, NValStr 1, delay 12
edge 9 (1,0), EValFlt 0.192856
edge 10 (1,2), EValFlt 0.852824
edge 11 (1,3), EValFlt 0.300025
edge 12 (1,4), EValFlt 0.700827
edge 13 (1,5), EValFlt 0.350950
edge 14 (1,6), EValFlt 0.100942
edge 15 (1,7), EValFlt 0.434587
edge 16 (1,8), EValFlt 0.685655
edge 17 (1,9), EValFlt 0.487008
node 2, NValFlt 0.64, NValStr 2, delay 8
edge 18 (2,0), EValFlt 0.735683
edge 19 (2,1), EValFlt 0.862964
edge 20 (2,3), EValFlt 0.482055
edge 21 (2,4), EValFlt 0.160549
edge 22 (2,5), EValFlt 0.189721
edge 23 (2,6), EValFlt 0.487560
edge 24 (2,7), EValFlt 0.949743
edge 25 (2,8), EValFlt 0.567033
edge 26 (2,9), EValFlt 0.547149
node 3, NValFlt 0.65,

In [69]:
im_g_lt(graph5, 1, 5)
node_states(graph5, "State", graph5.GetNodes())

(6, 6, 5, 1)
(2, 9, 8, 1)
(0, 12, 11, 1)
(3, 16, 15, 1)
(5, 17, 16, 1)
Solution of size k = 5, has a spreadvalue of 0.033693 

NId 0 has the following state 2
NId 1 has the following state 1
NId 2 has the following state 2
NId 3 has the following state 2
NId 4 has the following state 1
NId 5 has the following state 2
NId 6 has the following state 2
NId 7 has the following state 2
NId 8 has the following state 1
NId 9 has the following state 1


Freshness functions: exponential, polynomial or piecewise linear.

In the approximate algorithms we look at which nodes to pick that will give use the best solution. Whereas in the greedy it does not matter, however on average the hill climbing strategy is 63% better. For example, if we select nodes with the highest degree, we will always look to add nodes with the highest degree.

In [90]:
def im_a_lt(g, k)

SyntaxError: invalid syntax (<ipython-input-90-747614071b63>, line 1)

## Delayed Independend Cascade