Add __all__
to Python files outside of webapp
#1585
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request is a stripped down version of #1438. Let me first give some background and then explain why I thought a separate pull request was useful.
Use of
__all__
This is a nice explanation of what the
__all__
variable does in Python:__all__
variable defined what will be imported from a module when usingfrom module import *
__init__.py
, the__all__
variable determines the same for the package as a wholeThis is what
__all__
does purely technically. However there are also some other implications:__all__
to determine what is available__all__
signals to users what parts of the code are public and what is private. (Note that it does not enforce anything, users can still import anything usingfrom module import foo
, they just can't accessfoo
by doingfrom module import *
.) (You could also do this by prefixing all private names with_
, however then you also need to use aliases anytime you import something from another package, e.g.from pathlib import Path as _Path
.)For us, there would be some advantages to declaring
__all__
:Comparison with #1438
In #1438, besides adding the
__all__
variables, there are also changes to what is imported, especially in the__init__.py
files. Moreover there are tests with generating documentation automatically. I think it will be easier to merge if we split these things up into separate pull requests. Also, I tried to be a bit more critical about what we actually put in__all__
. This hopefully brings it a bit closer to reflecting what is public and private in our Python API.I do think it will be useful to have a critical look at what we import in
__init__.py
. Also while going through the code I saw a number of functions that were not really used, or only used by one class. It would make sense to clean up the Python code, remove some of these functions, turn them into class methods, or prefix their name with an underscore to clearly indicate them as private. It might at some point also be an idea to formally state what parts we consider private and what parts public.Notes
These are some things I noticed while going through the code and deciding whether objects should end up in
__all__
:data/statistics.py
: All these functions should actually be methods of the ASReviewData class. Therefor I made them private.io
: Should the reader/writer classes have a baseclass? That way it's easier to see what you need to implement if you want to add a new one.io/paper_record.py
: Only thePaperRecord
class is used, the two other functions not. I made them private.io/utils.py
: There a bunch of unused functions here, they could be removed. There is also_standardize_dataframe
which is used by all our reader classes. Maybe this one should be public, in case someone else wants to make a reader class? I left it private for now.io/utils.py
: There are theget_reader_class
andget_writer_class
functions. They are not imported in theio/__init__.py
, but they seem useful and public functions, maybe they should be imported in the init as well. This would make it consistent with the__init__
files in models for example, where theget_model_class
functions are also imported. The other option would be to declare all these functions as private. Whatever decision we make, this should be consistent over allutils.py
files throughout the package.models/balance/triple.py
: I left this class private as the docstring states it's borken and for internal use only.models/classifiers/base.py
: The base class is callBaseTrainClassifier
and notBaseClassifier
for some reason.models/classifiers/lstm_base.py
: Do we want to make this class private / deprecate it?models/classifiers/lstm_pool.py
: Do we want to make this class private / deprecate it?models/classifiers/utils.py
: We should renameget_classifier
toget_classifier_model
to be more consistent with the other models utils functions.models/__init__.py
: Only the classifiers are imported here. Either not import them, or also import query/feature/balance classes.models/__init__.py
:list_classifiers
is imported as_list_classifiers
. Any reason? I left it out of the all because of this.state/legacy
: I ignored everything here.state/sql_converter.py
: I made the converter privatestate/utils.py
: Everything in here appears to be unused and is functionality of the project class anyway. I made them private. We could remove these.utils.py
: The__all__
variable should move to the top of the file to conform with PEP8: https://peps.python.org/pep-0008/#module-level-dunder-names. We can't move__version__
as well because it needs an import. So I left__all__
and__version__
together where they were.compat.py
: This doesn't seem to belong here. Made it private.config.py
: Not sure if these are private implementation details or not. Made them public.project.py
: There is a bunch of functions here that might be considered private.settings.py
:SETTINGS_TYPE_DICT
seems an implementation detail ofASReviewSettings
.types.py
: This function seems like an implementation detail. I kept it private.utils.py
: Left out the general python utility functions.__init__.py
:load_data
is imported but is not in__all__
. We should probably remove the import.