You are running the marketing campaign for a brand new pocket device. Initially you can sign contracts with a few people to advertize your gadget among their neigbours. The more "famous" person you are picking the greater price appears in the contract. Contract cost can be calculated as 300usd * NN(i), where NN(i) is the number of neigbours of the person i. If at least 18% a person's neighbors have already been affected, then he/she will also be affected tomorrow. You earn 50usd per each affected person. Every day you have to choose whether to sign new contracts or wait. You need time to discuss terms of a contract, so you cannot sign more than 10 contracts on the same day. Your task is to maximize profit of your campaign with an initial budget of 10,000usd. Your campaign is considered completed after 60 days.

Again, all parameters of the task:
Budget: 10,000usd
Contract cost: 300usd * NN(i)
Income per person: 50usd
Exposure threshold: 18%
Contracts limit: 10 per day
Time limit: 60 days
A model of society is based on undirected SNAP Facebook network edge_list.txt

In [1]:
import networkx as nx
import numpy as np
from tqdm.notebook import tqdm
import pandas as pd

In [2]:
with open('edge_list.txt', 'r') as f:
    edge_list = [list(map(int, line.split())) for line in f]

In [3]:
G = nx.from_edgelist(edge_list)
F = nx.convert_node_labels_to_integers(G)

In [6]:
thresholds = 0.18 * np.ones(len(G.nodes))

In [67]:
def linear_threshold(G, active_nodes, thresholds, n_steps):
    simulation = [active_nodes]
    steps = 0
    while steps < n_steps:
        new_active = active_nodes.copy()
        for node in np.argwhere(active_nodes == 0):
            node = node[0]
            predecessors = np.array(G[node])
            if not predecessors.shape[0]:
                continue
            if active_nodes[predecessors].sum() / predecessors.shape[0] >= thresholds[node]:
                new_active[node] = 1
        active_nodes = new_active.copy()
        simulation.append(active_nodes)
        steps += 1
        if np.all(simulation[-1] == simulation[-2]):
            steps = n_steps
    return np.array(simulation)

In [79]:
# baseline

def greedy_influence_max(G, thresholds, active_nodes, top_nodes, degrees, our_money, current_profit, 
                         old_active, n_steps):
    old_profit = current_profit
    best = []
    idx = 0
    current_active = np.sum(linear_threshold(G, active_nodes, thresholds, 10)[-1])
    current_profit += 50 * (current_active - old_active)
    current_active = np.sum(linear_threshold(G, active_nodes, thresholds, n_steps)[-1])
    while (idx < 10) and (our_money >= 0):
        best_profit = 0
        best_node = None
        for node in tqdm(top_nodes):
            if active_nodes[node] == 0:
                price = 300 * degrees.loc[node]
                if (price < our_money):
                    active_nodes[node] = 1
                    active_size = np.sum(linear_threshold(G, active_nodes, thresholds, n_steps)[-1])
                    new_active = active_size - current_active
                    profit = 50 * (new_active - 1) - price
                    active_nodes[node] = 0
                    if profit > best_profit:
                        best_profit = profit
                        best_node = node
        if best_node is None:
            break
        active_nodes[best_node] = 1
        current_active = np.sum(linear_threshold(G, active_nodes, thresholds, n_steps)[-1])
        ### next 10
        active_size = np.sum(linear_threshold(G, active_nodes, thresholds, 10)[-1])
        new_active = active_size - current_active
        profit = 50 * (new_active - 1) - price
        current_profit += profit
        print(best_node, current_profit, current_active)
        our_money -= 300 * degrees.loc[best_node]
        best.append(best_node)
        idx += 1
    our_money += current_profit - old_profit
    active_nodes = linear_threshold(G, active_nodes, thresholds, 10)[-1]
    current_active = np.sum(active_nodes)
    return best, active_nodes, current_active, current_profit, our_money

In [69]:
# best_start_nodes:

def greedy_influence_score(G, thresholds, active, n_steps, our_money):
    top_nodes = pd.Series(dict(G.degree())).sort_values(ascending=True)
    active_nodes = active.copy()
    current_active = np.sum(linear_threshold(G, active_nodes, thresholds, n_steps)[-1])
    scores = np.zeros_like(active_nodes)
    for node in tqdm(np.argwhere(active_nodes == 0)):
        node = node[0]
        price = 300 * top_nodes.loc[node]
        if (price < our_money):
            active_nodes[node] = 1
            active_size = np.sum(linear_threshold(G, active_nodes, thresholds, n_steps)[-1])
            new_active = active_size - current_active
            profit = 50 * (new_active + 1) - price
            active_nodes[node] = 0
            scores[node] = profit
    return scores

In [70]:
def pipeline(F, thresholds, n_steps):
    degrees = pd.Series(dict(F.degree())).sort_values(ascending=True)
    active_nodes = np.zeros(len(F.nodes), dtype=int)
    current_profit = 0
    current_active = 0
    our_money = 10000
    all_best = []
    for step in range(n_steps):
        scores = greedy_influence_score(F, thresholds, active_nodes, 60 / n_steps, our_money)
        top_nodes = pd.Series(scores)[pd.Series(scores) > 0].sort_values(ascending=False)[:100].index.tolist()
        print(top_nodes)
        best, active_nodes, current_active, current_profit, our_money = greedy_influence_max(F, thresholds, 
            active_nodes, top_nodes, degrees, our_money, current_profit, current_active, 60 / n_steps)
        print(best, current_active, current_profit, our_money)
        all_best.append(best)
    return all_best

In [78]:
all_best = pipeline(F, thresholds, 3)

  0%|          | 0/3953 [00:00<?, ?it/s]

[1782, 3380, 3377, 3378, 3375, 3376, 3373, 3384, 3371, 387, 3934, 1988, 388, 390, 1989, 3911, 3914, 3938, 3867, 1918, 1920, 3935, 2882, 1937, 1666, 3706, 3881, 1942, 1941, 400, 401, 402, 158, 434, 460, 3412, 3483, 3866, 423, 3880, 424, 3882, 3894, 1836, 386, 383]


  0%|          | 0/46 [00:00<?, ?it/s]

1782 10750 312


  0%|          | 0/46 [00:00<?, ?it/s]

3380 19950 533


  0%|          | 0/46 [00:00<?, ?it/s]

387 21550 602


  0%|          | 0/46 [00:00<?, ?it/s]

3934 23150 647


  0%|          | 0/46 [00:00<?, ?it/s]

2882 24100 673


  0%|          | 0/46 [00:00<?, ?it/s]

1937 24700 698


  0%|          | 0/46 [00:00<?, ?it/s]

[1782, 3380, 387, 3934, 2882, 1937] 698 24700 24800


  0%|          | 0/3255 [00:00<?, ?it/s]

[3323, 3706, 3214, 3202, 3204, 3708, 3195, 3193, 3196, 3707, 3190, 1067, 3203, 3206, 3179, 3191, 3170, 3148, 3198, 1759, 3183, 1821, 1809, 3185, 3182, 3187, 3168, 1532, 3181, 3184, 3186, 2509, 3188, 1394, 1808, 3160, 3189, 3867, 1920, 1918, 3881, 1941, 1942, 192, 460, 3866, 3483, 3880, 3882, 3894]


  0%|          | 0/50 [00:00<?, ?it/s]

3323 36700 1382


  0%|          | 0/50 [00:00<?, ?it/s]

3706 47900 1655


  0%|          | 0/50 [00:00<?, ?it/s]

1067 57000 2060


  0%|          | 0/50 [00:00<?, ?it/s]

3867 57500 2089


  0%|          | 0/50 [00:00<?, ?it/s]

1920 58000 2106


  0%|          | 0/50 [00:00<?, ?it/s]

3894 58200 2135


  0%|          | 0/50 [00:00<?, ?it/s]

192 58350 2151


  0%|          | 0/50 [00:00<?, ?it/s]

460 58400 2159


  0%|          | 0/50 [00:00<?, ?it/s]

[3323, 3706, 1067, 3867, 1920, 3894, 192, 460] 2159 58400 38700


  0%|          | 0/1794 [00:00<?, ?it/s]

[806, 157, 87, 182, 154, 84, 235, 30, 120, 11, 78, 170, 202, 34, 731, 143, 129, 62, 782, 585, 36, 131, 9, 138, 91, 168, 54, 148, 2509, 98, 16, 119, 74]


  0%|          | 0/33 [00:00<?, ?it/s]

806 71850 2979


  0%|          | 0/33 [00:00<?, ?it/s]

157 80650 3180


  0%|          | 0/33 [00:00<?, ?it/s]

2509 81850 3397


  0%|          | 0/33 [00:00<?, ?it/s]

[806, 157, 2509] 3397 81850 47150


In [73]:
all_best

[[1782, 3706, 3380, 3934, 1918], [1640, 3323, 2509, 387, 460]]

In [24]:
top_nodes = pd.Series(scores)[pd.Series(scores) > 0].index.tolist()
best = greedy_influence_max(F, thresholds, top_nodes)

  0%|          | 0/46 [00:00<?, ?it/s]

1782 30200


  0%|          | 0/46 [00:00<?, ?it/s]

3706 52400


  0%|          | 0/46 [00:00<?, ?it/s]

3377 61900


  0%|          | 0/46 [00:00<?, ?it/s]

3934 63500


  0%|          | 0/46 [00:00<?, ?it/s]

1918 64000


  0%|          | 0/46 [00:00<?, ?it/s]

None 64000


Бейзлайн - берём вершины, у которых высокая степень - не будем перебирать

Идея 1:
Разбиваем граф на кластера. а уже потом применяем алгоритмы.

Идея 2:
Скорим людей раз в 10 дней, отбираем какой-то разумный топ

In [77]:
mapping = {key: val for key, val in enumerate(G.nodes)}
sample_submission = ''.join(["{\n", *[f"{int(idx * 60 / len(all_best))}: {[mapping[x] for x in val]},\n" for idx, val in enumerate(all_best)],"}"])
with open("sample.txt", 'w') as f:
    f.write(sample_submission)

In [34]:
scores

array([    0,     0, -3500, ...,  -650,  -150,  -150])

In [29]:
mapping[3952]

3987