Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __all__ to Python files outside of webapp #1585

Merged
merged 4 commits into from Nov 16, 2023

Conversation

PeterLombaers
Copy link
Member

This pull request is a stripped down version of #1438. Let me first give some background and then explain why I thought a separate pull request was useful.

Use of __all__

This is a nice explanation of what the __all__ variable does in Python:

  • The __all__ variable defined what will be imported from a module when using from module import *
  • If it is in a packages __init__.py, the __all__ variable determines the same for the package as a whole

This is what __all__ does purely technically. However there are also some other implications:

  • Other tools scanning the code (e.g. autocomplete software) may use __all__ to determine what is available
  • __all__ signals to users what parts of the code are public and what is private. (Note that it does not enforce anything, users can still import anything using from module import foo, they just can't access foo by doing from module import *.) (You could also do this by prefixing all private names with _, however then you also need to use aliases anytime you import something from another package, e.g. from pathlib import Path as _Path.)

For us, there would be some advantages to declaring __all__:

  • Give us more control over what parts of the Python API are public and what are private.
  • Allow for automatically generating documentation.
  • Better work with code editors, autocompletion, etc.

Comparison with #1438

In #1438, besides adding the __all__ variables, there are also changes to what is imported, especially in the __init__.py files. Moreover there are tests with generating documentation automatically. I think it will be easier to merge if we split these things up into separate pull requests. Also, I tried to be a bit more critical about what we actually put in __all__. This hopefully brings it a bit closer to reflecting what is public and private in our Python API.

I do think it will be useful to have a critical look at what we import in __init__.py. Also while going through the code I saw a number of functions that were not really used, or only used by one class. It would make sense to clean up the Python code, remove some of these functions, turn them into class methods, or prefix their name with an underscore to clearly indicate them as private. It might at some point also be an idea to formally state what parts we consider private and what parts public.

Notes

These are some things I noticed while going through the code and deciding whether objects should end up in __all__:

  • data/statistics.py: All these functions should actually be methods of the ASReviewData class. Therefor I made them private.
  • io: Should the reader/writer classes have a baseclass? That way it's easier to see what you need to implement if you want to add a new one.
  • io/paper_record.py: Only the PaperRecord class is used, the two other functions not. I made them private.
  • io/utils.py: There a bunch of unused functions here, they could be removed. There is also _standardize_dataframe which is used by all our reader classes. Maybe this one should be public, in case someone else wants to make a reader class? I left it private for now.
  • io/utils.py: There are the get_reader_class and get_writer_class functions. They are not imported in the io/__init__.py, but they seem useful and public functions, maybe they should be imported in the init as well. This would make it consistent with the __init__ files in models for example, where the get_model_class functions are also imported. The other option would be to declare all these functions as private. Whatever decision we make, this should be consistent over all utils.py files throughout the package.
  • models/balance/triple.py: I left this class private as the docstring states it's borken and for internal use only.
  • models/classifiers/base.py: The base class is call BaseTrainClassifier and not BaseClassifier for some reason.
  • models/classifiers/lstm_base.py: Do we want to make this class private / deprecate it?
  • models/classifiers/lstm_pool.py: Do we want to make this class private / deprecate it?
  • models/classifiers/utils.py: We should rename get_classifier to get_classifier_model to be more consistent with the other models utils functions.
  • models/__init__.py: Only the classifiers are imported here. Either not import them, or also import query/feature/balance classes.
  • models/__init__.py: list_classifiers is imported as _list_classifiers. Any reason? I left it out of the all because of this.
  • state/legacy: I ignored everything here.
  • state/sql_converter.py: I made the converter private
  • state/utils.py: Everything in here appears to be unused and is functionality of the project class anyway. I made them private. We could remove these.
  • utils.py: The __all__ variable should move to the top of the file to conform with PEP8: https://peps.python.org/pep-0008/#module-level-dunder-names. We can't move __version__ as well because it needs an import. So I left __all__ and __version__ together where they were.
  • compat.py: This doesn't seem to belong here. Made it private.
  • config.py: Not sure if these are private implementation details or not. Made them public.
  • project.py: There is a bunch of functions here that might be considered private.
  • settings.py: SETTINGS_TYPE_DICT seems an implementation detail of ASReviewSettings.
  • types.py: This function seems like an implementation detail. I kept it private.
  • utils.py: Left out the general python utility functions.
  • __init__.py: load_data is imported but is not in __all__. We should probably remove the import.

@J535D165 J535D165 merged commit 9bd1d09 into asreview:master Nov 16, 2023
10 checks passed
@J535D165
Copy link
Member

👍

cskaandorp pushed a commit to cskaandorp/asreview that referenced this pull request Apr 22, 2024
This pull request is a stripped down version of asreview#1438. Closes asreview#1438
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants