Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix module loading in multi-threading environments #13823

Closed
wants to merge 1 commit into from

Conversation

jeblair
Copy link

@jeblair jeblair commented Jan 12, 2016

When ansible loads a module from disk for a plugin, it first checks
to see if the module has already been loaded (i.e., exists in
sys.modules), and if it does, it returns that reference rather than
loading the module again (which, as described in issue #13110, must
not happen because repeating the module load may re-initialize
module global variables).

When a module is loaded, the reference is immediately added to
sys.modules, even before the module initialization is complete.
Therefore, if two threads race the module loading code, the first
will begin loading the module from disk, and the second will see
the module in sys.modules. It might begin using that module before
it is even fully loaded, resulting in AttributeErrors on objects
which should be defined in the module.

This change wraps the method which checks for existence in
sys.modules and loads modules from disk in a global mutex so that
more than one module is never loaded from disk at a time.

Fixes #13822

When ansible loads a module from disk for a plugin, it first checks
to see if the module has already been loaded (i.e., exists in
sys.modules), and if it does, it returns that reference rather than
loading the module again (which, as described in issue ansible#13110, must
not happen because repeating the module load may re-initialize
module global variables).

When a module is loaded, the reference is immediately added to
sys.modules, even before the module initialization is complete.
Therefore, if two threads race the module loading code, the first
will begin loading the module from disk, and the second will see
the module in sys.modules.  It might begin using that module before
it is even fully loaded, resulting in AttributeErrors on objects
which should be defined in the module.

This change wraps the method which checks for existence in
sys.modules and loads modules from disk in a global mutex so that
more than one module is never loaded from disk at a time.

Fixes ansible#13822
@jimi-c
Copy link
Member

jimi-c commented Jan 12, 2016

Ansible's API has never been advertised as being thread-safe, as we use the multiprocessing library (which does everything in forks). This patch may fix the module loading issue, however I'm sure there are many other places in the code which may harbor similar race conditions. As such, I'm not particularly opposed to this patch, however I am against trying to make the entire API thread safe.

@bcoca
Copy link
Member

bcoca commented Jan 12, 2016

-1, adding threading and locking code we do not use adds complexity, performance degradation and possible pitfalls for no gain

@emonty
Copy link
Contributor

emonty commented Jan 12, 2016

@jimi-c @bcoca would you accept a patch to the documentation then that says "the python API is known to not be thread safe?" I think that might have prevented confusion in this case.

@jimi-c
Copy link
Member

jimi-c commented Jan 12, 2016

@emonty absolutely.

@bcoca
Copy link
Member

bcoca commented Jan 12, 2016

done 2374abb

@bcoca bcoca closed this Jan 12, 2016
@ansible ansible locked and limited conversation to collaborators Apr 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document that Ansible API is not thread-safe
4 participants