# Introduction
This is a study of how multiple entities can successfully gather and manage "information". In this case, information is just a set of random integers (I'll call these **facts**) drawn from a distribution that makes low numbers more likely (common knowledge), and high numbers rarer (little known pieces of information). 

We have a pool of entities, who are "working". To do their work they need to find particular facts. In the first instance there is only one way for the entities to find facts, that is to have them mine for them - equivalent to desk research.

First lets pick an appropriate distribution for **facts** so some numbers are more common that others. I'm a big fan of the Beta distribution, and picked some values using [this app](https://homepage.divms.uiowa.edu/~mbognar/applets/beta.html).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
a = 0.8
b = 2
n_facts = 100
facts = (n_facts * np.random.beta(a,b, (1000,))).astype(int)
plt.hist(facts, bins=99, range=(0,n_facts-1))
ax = plt.gca()
ax.set_title(f"Frequency of facts generated using a Beta distribution with a={a} and b={b}");
ax.set_ylabel("count")
ax.set_xlabel("fact id")


Now we define the entities. Lets make them simple classes so they can have a state.

In [None]:
import random


In [None]:
class Worker:
    """Abstract superclass for workers"""
    known_facts: set[int]
    target_fact: int|None
    n_facts: int


    def __init__(self, n_facts, known_facts) -> None:
        self.n_facts = n_facts
        self.known_facts = known_facts
        self.target_fact = None

    def act(self) -> None:
        pass

    def update(self) -> bool:
        """Returns true if work done (a needed fact is found), else false"""
        if self.target_fact in self.known_facts:
            self.generate_new_target_fact()
            return True
        return False

    def generate_new_target_fact(self) -> None:
        self.target_fact = random.randint(0, n_facts-1)
    
    def __repr__(self) -> str:
        return f"Known facts: {self.known_facts}, target: {self.target_fact}"
    
class Researcher(Worker):
    """Worker who only research facts"""
    def __init__(self, n_facts, known_facts) -> None:
        super().__init__(n_facts, known_facts)
        self.generate_new_target_fact()

    def do_research(self):
        new_fact = int(self.n_facts * random.betavariate(a, b))
        self.known_facts.add(new_fact)

    def act(self) -> None:
        self.do_research()

In [None]:
N_WORKERS = 100
N_STEPS = 50000
workers = [Researcher(n_facts=n_facts, known_facts=set()) for _ in range(N_WORKERS)]

n_successes = []

for _ in range(N_STEPS):
    for worker in workers:
        worker.act()

    n_successes_this_step = 0
    for worker in workers:
        n_successes_this_step += worker.update()
    n_successes.append(n_successes_this_step/N_WORKERS)

In [None]:
plt.plot(n_successes)
ax = plt.gca()
ax.set_title("Rate of work per worker over time")
ax.set_ylabel("Average units of work per worker")
ax.set_xlabel("Ticks")

This plot shows what we expect - work is initially slow as people start off not knowing anything, but as they accumulate knowledge the work rate increases up to the theoretical maxima of 1 unit per worker. Lets assume that 50,000 ticks would be a very long time, and you would be very lucky to have such experienced staff on your books.

This also makes a second important assumption, that workers have perfect memories! What does this look like if we make our academics forgetful? Let's start with a forget rate of on average 1 fact per 50 ticks by checking if a random number between 0 and 1 is <0.02

In [None]:
class ForgetfulResearcher(Researcher):
    """Worker who only research facts"""

    forget_rate: int

    def __init__(self, n_facts, known_facts, forget_rate) -> None:
        super().__init__(n_facts, known_facts)
        self.generate_new_target_fact()
        self.forget_rate = forget_rate

    def forget(self):
        if len(self.known_facts) > 0:
            if random.random() < self.forget_rate:
                self.known_facts.pop()


    def act(self) -> None:
        self.forget()
        self.do_research()

In [None]:
def run_experiment(workers, n_steps):
    n_workers = len(workers)

    n_successes = []

    for _ in range(n_steps):
        for worker in workers:
            worker.act()

        n_successes_this_step = 0
        for worker in workers:
            n_successes_this_step += worker.update()
        n_successes.append(n_successes_this_step/n_workers)
    
    return n_successes

def plot_experiment(n_successes):
    plt.plot(n_successes)
    ax = plt.gca()
    ax.set_title("Rate of work per worker over time")
    ax.set_ylabel("Average units of work per worker")
    ax.set_xlabel("Ticks")


FORGET_RATE = 0.01
N_WORKERS = 100
N_STEPS = 50000
workers = [ForgetfulResearcher(n_facts=n_facts, known_facts=set(), forget_rate=FORGET_RATE) for _ in range(N_WORKERS)]
n_successes_forgetful = run_experiment(workers, N_STEPS)
plot_experiment(n_successes_forgetful)

Well now that is way more interesting than I thought! I'd simply assumed that we'd see a shape similar to the perfect researcher, but stabilising at a lower value as people forget facts as quickly as they can research them. In fact we see people working away until slowly more and more of them get stuck searching for a particularly hard to find fact, and while doing that forgetting everything else they know! We see a couple of cycles of everyone experiencing this effect roughly together before things spread out and we see the average work stabilising. 

**I am pretty suspicious of this result but can see any obvious bugs**

In [None]:
FORGET_RATE = 0.005
N_WORKERS = 100
N_STEPS = 50000
workers = [ForgetfulResearcher(n_facts=n_facts, known_facts=set(), forget_rate=FORGET_RATE) for _ in range(N_WORKERS)]
n_successes_forgetful = run_experiment(workers, N_STEPS)
plot_experiment(n_successes_forgetful)

Interestingly if I lower `FORGET_RATE` to 0.005, we see the same pattern but smoother. This is very odd and quite interesting, but not my goal for now which is to start exploring the benefits of sharing knowledge and centralised knowledge bases.

I also want to see if the pattern does indeed stabilise, so lets run for many steps.

In [None]:
FORGET_RATE = 0.01
N_WORKERS = 100
N_STEPS = 500000
workers = [ForgetfulResearcher(n_facts=n_facts, known_facts=set(), forget_rate=FORGET_RATE) for _ in range(N_WORKERS)]
n_successes_forgetful = run_experiment(workers, N_STEPS)
plot_experiment(n_successes_forgetful)

OK its very difficult to see what is going on there, so lets smoothen things to look at how the average changes. 

In [None]:
WINDOW_SIZE = 3001
smooth_n_successes_forgetful = np.convolve(n_successes_forgetful, np.ones(WINDOW_SIZE), 'valid') / WINDOW_SIZE

ax = plt.subplot()
ax.plot(smooth_n_successes_forgetful)
ax.set_title("Rate of work per worker over time")
ax.set_ylabel("Average units of work per worker")
ax.set_xlabel("Ticks")
ax.axvline(75000, linestyle=":", color="k")
ax.text(x=77000, y=0.08, s="Settles into constant rate", rotation=90)

Interestingly we see a few cycles of work and learn, getting stuck and forget, work, getting stuck, where everyone is a little aligned before a more chaotic pattern emerges after the first 75K steps where I believe workers are still getting stuck looking for facts, but their are no longer syncronised.

Lets look now into how effective it can be to ask colleagues for help. We'll now let workers ask other workers, and if their target fact is in a colleagues known facts, work is done. This requires a new worker class. 

In [None]:
class SociableResearcer(ForgetfulResearcher):
    """Worker who asks colleages for help occasionally"""

    socialise_rate: int

    def __init__(self, n_facts, known_facts, forget_rate, socialise_rate) -> None:
        super().__init__(n_facts, known_facts, forget_rate)
        self.socialise_rate = socialise_rate

    def set_colleagues(self, colleagues):
        """Has to be done outside of the constructor, as for convenience this list usually includes this reasearcher,
        and it can be tricky to create a set of researchers who are all aware of eachother."""
        self.colleageus = colleagues
        self.n_colleagues = len(colleagues)

    def ask_colleague(self):
        random_idx = random.randint(0,self.n_colleagues-1)
        colleagues_facts = self.colleageus[random_idx].known_facts

        if self.target_fact in colleagues_facts:
            self.known_facts.add(self.target_fact)

    def act(self) -> None:
        self.forget()
        if random.random() < self.socialise_rate:
            self.ask_colleague()
        else:
            self.do_research()

In [None]:
N_STEPS = 50000
workers = [SociableResearcer(n_facts=n_facts, 
                             known_facts=set(), 
                             forget_rate=FORGET_RATE,
                             socialise_rate=0.1) for _ in range(N_WORKERS)]
for worker in workers:
    worker.set_colleagues(workers)

n_successes_sociable = run_experiment(workers, N_STEPS)
plot_experiment(n_successes_sociable)

The power of asking colleagues! The work rate ramps up very quickly, and hovers around the maximum. This is a bit generous however, as currently asking a colleagues doesn't impact the person being asked. To make this more realistic, lets assume that being asked takes that colleague out of the pool of people who can potentially do work. 

This means we need to have a potentially complex system of going through workers in order, seeing who they would like to ask each time, and then taking that person out of the pool, which means a little re-write to the way we run the experiment code in a way that breaks the existing act function. I'm not sure on the best way to handle such a breaking change, other than to write a new class.

In [None]:
class SociableResearcer_v2(ForgetfulResearcher):
    """Worker who asks colleages for help occasionally"""

    socialise_rate: int

    def __init__(self, n_facts, known_facts, forget_rate, socialise_rate) -> None:
        super().__init__(n_facts, known_facts, forget_rate)
        self.socialise_rate = socialise_rate

    def set_colleagues(self, colleagues):
        """Has to be done outside of the constructor, as for convenience this list usually includes this reasearcher,
        and it can be tricky to create a set of researchers who are all aware of eachother."""
        self.colleageus = colleagues
        self.n_colleagues = len(colleagues)

    def ask_colleague(self, free_resource_ids):
        chosen_idx = random.choice(free_resource_ids)
        free_resource_ids.remove(chosen_idx)
        colleagues_facts = self.colleageus[chosen_idx].known_facts

        if self.target_fact in colleagues_facts:
            self.known_facts.add(self.target_fact)

    def act(self, free_resource_ids) -> None:
        self.forget()
        at_least_one_free_colleague = len(free_resource_ids) > 0
        feeling_sociable = random.random() < self.socialise_rate
        if at_least_one_free_colleague & feeling_sociable:
            self.ask_colleague(free_resource_ids)
        else:
            self.do_research()

In [None]:
def run_experiment_v2(workers, n_steps):
    n_workers = len(workers)

    n_successes = []

    for _ in range(n_steps): 
        free_resources = [i for i in range(0, n_workers)]
        random.shuffle(free_resources)  # Act in a random order to give everyone a chance


        for idx in range(n_workers):
            if idx in free_resources:
                free_resources.remove(idx)
                workers[idx].act(free_resources)

        n_successes_this_step = 0

        for worker in workers:
            n_successes_this_step += worker.update()
        n_successes.append(n_successes_this_step/n_workers)
    
    return n_successes

In [None]:
N_STEPS = 10000
SOCIALISE_RATE = 0.1
workers = [SociableResearcer_v2(n_facts=n_facts, 
                             known_facts=set(), 
                             forget_rate=FORGET_RATE,
                             socialise_rate=0.1) for _ in range(N_WORKERS)]
for worker in workers:
    worker.set_colleagues(workers)

n_successes_sociable_v2 = run_experiment_v2(workers, N_STEPS)
plot_experiment(n_successes_sociable_v2)

In [None]:
ax = plt.subplot()   
ax.plot(n_successes_sociable[:10000])
ax.plot(n_successes_sociable_v2, "g")
ax.plot(n_successes_forgetful[:10000], "r")
ax.set_title("Rate of work per worker over time")
ax.set_ylabel("Average units of work per worker")
ax.set_xlabel("Ticks")
ax.legend(['Sociable-V1', 'Sociable-V2', "Forgetful"])

We can see now the impact of being sociable, even when it means you take someone else out of the equation temporarily! You can imagine a situation where if one spends too much time asking around then nothing actually gets done. What happens when the socialise rate is turned up to abominable levels?

In [None]:
N_STEPS = 10000
SOCIALISE_RATE = 0.9
workers = [SociableResearcer_v2(n_facts=n_facts, 
                             known_facts=set(), 
                             forget_rate=FORGET_RATE,
                             socialise_rate=SOCIALISE_RATE) for _ in range(N_WORKERS)]
for worker in workers:
    worker.set_colleagues(workers)

n_successes_sociable_v3 = run_experiment_v2(workers, N_STEPS)
plot_experiment(n_successes_sociable_v3)

Again this was not the graph I was expecting - this is turning out to be a very interesting system with some hard to predict behaviours. If researchers spend 90% of their time asking eachother rather than looking for facts they spend a very long time getting nothing done, however once they discover a few facts the speed of progress is very rapid. It seems that in cases like this the whole group is getting stuck looking for a single fact, creating bottlenecks. This isn't really realistic, as it isn't likely that a whole organisation will all be working on the same thing at the same time (and spending most of their time asking eachother about it).

There is one final mechanism I'd like to introduce into this system - the centralised knowledge repository. Workers can spend some time contributing to this repository, giving them a mechanism to push knowledge to a central database. To represent the fact it takes much more time to contribute something to a repository than it does to read, I'll assign a contribution probability to things (so I don't have to modify the tick system I currently use which doesn't allow for actions that take more than one tick!). They can then also query this database, in addition to being able to ask random colleagues.

Since we've got quite a few settings now I'll group them together into a dataclass to simplify things.

In [None]:
from dataclasses import dataclass

@dataclass
class DatabaseResearcherSettings:
    forget_rate: float
    socialise_rate: float
    contribution_success_rate: float
    database_query_rate: float
    database_write_rate: float    

class DatabaseResearcher(SociableResearcer_v2):
    """Worker who have access to, and contribute to, a central database, in addition to asking colleagues."""

    contribution_success_rate: float
    socialise_threshold: float
    database_query_threshold: float
    database_write_threshold: float

    def __init__(self, n_facts, known_facts, settings: DatabaseResearcherSettings, database: set) -> None:
        super().__init__(n_facts, known_facts, settings.forget_rate, settings.socialise_rate)
        self.validate_settings(settings)
        self.init_thresholds(settings)

        self.contribution_success_rate = settings.contribution_success_rate
        self.database = database

    def validate_settings(self, settings: DatabaseResearcherSettings):
        """Make sure the event probabilities add to less than 1, or some will never occur."""
        assert (settings.socialise_rate + settings.database_query_rate + settings.database_write_rate) < 1., "Event rates (socialise_rate, query_rate, write_rate) cannot total more than 1."

    def init_thresholds(self, settings: DatabaseResearcherSettings):
        """Initialise thresholds for actions based on the probability of each action occuring."""
        self.socialise_threshold = settings.socialise_rate
        self.database_query_threshold = self.socialise_threshold + settings.database_query_rate
        self.database_write_threshold = self.database_query_threshold + settings.database_write_rate

    def query_database(self):
        """If the thing we're looking for is in the database, add it to our known facts"""
        if self.target_fact in self.database:
            self.known_facts.add(self.target_fact)
            
    def write_to_database(self):
        chosen_fact = random.choice(list(self.known_facts))  # There must be a better way but I have no internet!
        succeeds_at_writing = random.random() < self.contribution_success_rate
        if succeeds_at_writing:
            self.database.add(chosen_fact)

    def act(self, free_resource_ids) -> None:
        self.forget()
        at_least_one_free_colleague = len(free_resource_ids) > 0
        random_value = random.random()
        feeling_sociable = random_value < self.socialise_threshold
        knows_something = len(self.known_facts) > 0

        if at_least_one_free_colleague & feeling_sociable:
            self.ask_colleague(free_resource_ids)
        elif random_value < self.database_query_threshold:
            self.query_database()
        elif knows_something & (random_value < self.database_write_threshold):
            self.write_to_database()
        else:
            self.do_research()

In [None]:
N_STEPS = 10000
settings = DatabaseResearcherSettings(forget_rate=FORGET_RATE, 
                                      socialise_rate=0.1,
                                      contribution_success_rate=0.1,
                                      database_query_rate=0.1,
                                      database_write_rate=0.05)
shared_database=set()
workers = [DatabaseResearcher(n_facts=n_facts, 
                             known_facts=set(), 
                             settings=settings,
                             database=shared_database) for _ in range(N_WORKERS)]
for worker in workers:
    worker.set_colleagues(workers)

n_successes_database = run_experiment_v2(workers, N_STEPS)
plot_experiment(n_successes_database)

Nice! So now we can run a few experiments to look at how each of these scenarios plays out as we modify the number of facts and the number of workers. To simply this I'll turn things into a number of workers and a ratio of facts per worker.

In [None]:
from multiprocessing.pool import Pool
from functools import partial

In [None]:
class ForgetfulResearcher_v2(ForgetfulResearcher):
    """An ugly creation to make the researcher handle the change to the act method in subclasses..."""

    def __init__(self, n_facts, known_facts, forget_rate) -> None:
        super().__init__(n_facts, known_facts, forget_rate)

    def act(self, _) -> None:
        self.forget()
        self.do_research()

def run_grand_experiment(settings: DatabaseResearcherSettings, n_workers: int, facts_per_worker: int, steps: int):
    
    n_facts = n_workers * facts_per_worker

    # This section is nasty and I'm sorry. Fix when I can be bothered to make a proper factory.
    worker_dict = dict()
    worker_dict["Forgetful"] = [ForgetfulResearcher_v2(n_facts=n_facts, 
                                                    known_facts=set(), 
                                                    forget_rate=settings.forget_rate) 
                                for _ in range(n_workers)]
    
    worker_dict["Sociable"] = [SociableResearcer_v2(n_facts=n_facts, 
                                                    known_facts=set(), 
                                                    forget_rate=settings.forget_rate,
                                                    socialise_rate=settings.socialise_rate) 
                                for _ in range(n_workers)]
    for worker in worker_dict["Sociable"]:
        worker.set_colleagues(worker_dict["Sociable"])

    shared_database=set()
    worker_dict["Datebaser"] = [DatabaseResearcher(n_facts=n_facts, 
                                                   known_facts=set(), 
                                                   settings=settings,
                                                   database=shared_database) 
                                for _ in range(n_workers)]
    for worker in worker_dict["Datebaser"]:
        worker.set_colleagues(worker_dict["Datebaser"])

    results_dict = dict()
    for researcher_type, researchers in worker_dict.items():
        results_dict[researcher_type] = run_experiment_v2(researchers, steps)
    return results_dict

def plot_grand_experiment(results_dict, settings: DatabaseResearcherSettings):
    ax = plt.subplot() 
    legend_labels = []

    for researcher_type, results in results_dict.items():
        ax.plot(results)
        legend_labels.append(researcher_type)

    ax.set_title("Rate of work per worker over time")
    ax.set_ylabel("Average units of work per worker")
    ax.set_xlabel("Ticks")
    ax.legend(legend_labels)

In [None]:
settings = DatabaseResearcherSettings(forget_rate=FORGET_RATE, 
                                      socialise_rate=0.1,
                                      contribution_success_rate=0.1,
                                      database_query_rate=0.1,
                                      database_write_rate=0.05)
results = run_grand_experiment(settings, n_workers=10, facts_per_worker=100, steps=10_000)
plot_grand_experiment(results, settings)

In [None]:
settings = DatabaseResearcherSettings(forget_rate=FORGET_RATE, 
                                      socialise_rate=0.1,
                                      contribution_success_rate=0.05,
                                      database_query_rate=0.1,
                                      database_write_rate=0.1)
results = run_grand_experiment(settings, n_workers=1000, facts_per_worker=1000, steps=10_000)
plot_grand_experiment(results, settings)

In [None]:
settings = DatabaseResearcherSettings(forget_rate=FORGET_RATE, 
                                      socialise_rate=0.1,
                                      contribution_success_rate=0.05,
                                      database_query_rate=0.1,
                                      database_write_rate=0.3)
results = run_grand_experiment(settings, n_workers=1000, facts_per_worker=1000, steps=10_000)
plot_grand_experiment(results, settings)