Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of global variables #107

Closed
BrenBarn opened this issue Feb 25, 2017 · 7 comments
Closed

Use of global variables #107

BrenBarn opened this issue Feb 25, 2017 · 7 comments

Comments

@BrenBarn
Copy link

There seems to be a lot of code in this library that uses global variables that are shared across multiple processes (for instance, the various parameters for tags to ignore, etc.). This is not a good practice and especially causes problems on Windows. It would be better to reorganize the code so that the global state is stored in an object that is passed to each process.

I encountered this in the form of repeated NameError: global name 'discardElements' is not defined errors. This appears to be due to the recent PR #102 that moved the definition of this element inside a function, so it's no longer defined globally.

@attardi
Copy link
Owner

attardi commented Feb 25, 2017

Moved definition of discardElements at the beginning.

@attardi attardi closed this as completed Feb 25, 2017
@BrenBarn
Copy link
Author

Thanks for your quick response, but I don't think that will fix it. That will stop the error message, but it won't fix the underlying problem: on Windows, the global namespace of the parent process is not shared with the subprocesses. Instead, the module is re-executed, getting a new global namespace for each subprocess. So all the modifications to discardElements, etc., that happen within functions called from main() do not affect the subprocesses.

I'm looking at doing a real fix. This would mean that the various options are no longer stored in separate global variables (or as properties of the Extractor class, as some options appear to be). Instead, they would be stored in an "options" object (e.g., a dict), and that object would passed as an argument to the subprocesses. I think this would be cleaner anyway, since it would centralize the various options.

@attardi
Copy link
Owner

attardi commented Feb 25, 2017

These are really just global constants. They are not modified in the subprocesses.
As you noticed, it was PR #102 that moved their definition into main().
I moved them back, including ignoredTags, to the top.

@BrenBarn
Copy link
Author

It's not a matter of whether they're modified in the subprocess. It's a matter of whether they're modified in code that does not run in the subproces. The point is that the subprocess does not see the parent module's global state. Any modifications made from within main() (whether to global variables or to the Extractor class) are not seen by the subprocess, because the subprocess doesn't run main().

Here's a simple example to illustrate what I mean:

from __future__ import print_function
import multiprocessing

x = "Original"

def worker():
	print("The worker sees", x)

def main():
	global x
	x = "Modified"
	print("The main sees", x)
	p = multiprocessing.Process(target=worker)
	p.start()
	print("The main still sees", x)

if __name__ == "__main__":
	main()	

On Windows, the output is:

The main sees Modified
The main still sees Modified
The worker sees Original

It doesn't matter that the modification to x takes place before the subprocess is created. main() is never run in the subprocess, so the modification to x never takes place inside the subprocess.

@attardi
Copy link
Owner

attardi commented Feb 26, 2017

You are right.
Then not only global variables, but also class variables for class Extractor, should be dealt like that.
Class Extractor was in fact meant to group all parameters.

@attardi attardi reopened this Feb 26, 2017
@BrenBarn
Copy link
Author

I made PR #108 to address this. I'm using the new version and it works, but the change touches a lot of code, so it would be good to look it over.

@BrenBarn
Copy link
Author

BrenBarn commented Mar 8, 2017

Closed by PR #108

@BrenBarn BrenBarn closed this as completed Mar 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants