The run initialization should be protected with a Lock in order to safely start experiments in a multithreaded environment #369

galatolofederico · 2018-10-18T15:16:17Z

When trying to run this code

from sacred import Experiment
from multiprocessing.dummy import Pool as ThreadPool
import itertools


ex = Experiment('toy_experiment')

from sacred.observers import MongoObserver
ex.observers.append(MongoObserver.create())

@ex.config
def config():
    param_a = 0.001
    param_b = 20
    param_c = 100


@ex.automain
def main(_run, param_a, param_b, param_c):
    import random, time
    print("I am id %s with param_a:%s param_b:%s param_c:%s " % (_run._id, param_a, param_b, param_c))
    time.sleep(random.randint(0,10))
    return param_a*param_b*param_c + random.randint(0,10)


getExperiments = lambda params : [dict((x,y) for x,y in zip(params.keys(), param)) for param in itertools.product(*params.values())]

params = {
    "param_a": [0.001, 0.01],
    "param_b": [20, 50]
}


pool = ThreadPool(2)
results = pool.map(
    lambda e : ex.run(config_updates=e),
    getExperiments(params)
)
pool.close()
pool.join()

I came across a non-deterministic behavior from sacred, sometimes throwing random errors and sometimes messing around with the configuration fields of the runs.
Looking in depth into the framework I've found out that the function run(...) of Experiment calls the _create_run(...) which is not thread-safe. Since the call to _create_run(...) is not protected with a mutex running multiple experiments in parallel can lead to all kind of race conditions.

In this PR I've simply added a Lock instance in the Experiment class and protected the call to _create_run(...) . For my understanding of the framework this can be a feasible solution since the _create_run(..) function is fast to run and protect it is not so costly in terms of performances.
A better way to do that would be protecting only the actually thread-unsafe blocks.

coveralls · 2018-10-18T15:19:17Z

Coverage increased (+0.02%) to 84.158% when pulling fa81aaa on galatolofederico:master into 6faaa8b on IDSIA:master.

Qwlouse · 2018-10-22T07:58:13Z

Hi @galatolofederico. Thanks for engaging! Unfortunately sacred in its current form is fundamentally non-thread-safe and I don't think locking _create_run will solve that. The main problem is that the active configuration is tracked globally for each captured function. That means calling a captured function from different threads will lead to inconsistent configurations.
Fixing this would be substantially more work, and probably include turning all global variables into thread-local ones.

For now I would suggest to not use sacred in a multi-threaded environment, and instead use muliprocessing.

Qwlouse · 2019-02-21T20:08:07Z

Closing this, since it is an incomplete solution and would require substantially more work.

Protected the run initialization with a Lock

fa81aaa

Qwlouse closed this Feb 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The run initialization should be protected with a Lock in order to safely start experiments in a multithreaded environment #369

The run initialization should be protected with a Lock in order to safely start experiments in a multithreaded environment #369

galatolofederico commented Oct 18, 2018

coveralls commented Oct 18, 2018

Qwlouse commented Oct 22, 2018

Qwlouse commented Feb 21, 2019

The run initialization should be protected with a Lock in order to safely start experiments in a multithreaded environment #369

The run initialization should be protected with a Lock in order to safely start experiments in a multithreaded environment #369

Conversation

galatolofederico commented Oct 18, 2018

coveralls commented Oct 18, 2018

Qwlouse commented Oct 22, 2018

Qwlouse commented Feb 21, 2019