Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New settings object #1424

Closed
wants to merge 6 commits into from

Conversation

PeterLombaers
Copy link
Member

This is a work in progress of creating a new settings object for asreview. It started out as a small project, but is getting a little bigger, so I thought I would make a draft pull request to get some feedback. Here are the notes I've been taking:

New ASReviewSettings object

Purpose of the current object

  • CLI info is stored in the settings objects
  • This info is then used to initialize models/reviews
  • This info is stored in the state object
  • webapp gets review info from it

Furthermore, the settings object does some type casting. There is a hard coded list of names for which it typecasts the value to a certain type.

Finally it can read settings from a config file, and read model default parameters/type check/type cast whilst doing it.

Related to the settings objects, BaseModel class has an attribute called param. This is an attempt to automatically generate the model parameters. It is only used in review/base.py and webapp/api/projects.py to initialize an ASReviewSettings object. It also does some type checking/casting and complains if the type in the signature of __init__ is different from the type of a corresponding stored attribute.

Occurences

  • entrypoints/simulate.py:
    • for webapp, load settings from state and initialize model
    • for simulate, load settings from cli/config and initialize model/reviewer -> could skip settings object.
  • review/base.py:
    Review.settings:
    Store the info on the settings of the review (model/n_instances/stop_if/n_prior_included/n_prior_excluded/
    model_parameters)
  • state/legacy/dict-hdf5-json.py:
    Initialize from unknown dictionary.
  • state/sqlstate.py:
    Store the settings of the state.
  • webapp/api/projects.py:
    Set the settings in the state.

Contents

  • Model info:
    Classifier/Query/Balance/Feature
  • seperately, model_parameters.
  • Review settings:
    • n_instances
    • stop_if
    • n_prior_included
    • n_prior_excluded
  • Data object, but it is unused.

Ideas for improvement

  • Remove reading config file from the settings object. It should be dealt with at the entrypoints.
    (Maybe have a Settings.from_json() method.)
  • Let model components declare settings themselves, using the principle that Model.__init__(**model.settings) = model
  • Let review components declare settings using the principle
    Review.__init__(review.model, **review.settings) = review
  • Remove the param attribute from BaseModel.
  • Let the settings object have a to_json method so that the state can store the settings.
    (And make sure Settings.from_json(settings.to_json())) == settings.)
  • Make sure that random state satisfies RandomState(random_state.random_seed) = random_state and
    store it in the settings as it's seed. If the seed has been set for a random state,
    you can retrieve it by random_state.get_state()[1][0]. We can write a small wrapper around
    the numpy random state, so that you can only initialize it using a seed, and the seed is
    an attribute of the state.
  • Make tests that check random state is always seeded.
  • Make tests that check Model.__init__(**model.settings) = model
  • Make BaseModel._settings an abstract property so that you are forced to implement it
    when creating a new model component.

Progress

  • Random state is now always seeded, and it is easy to obtain the seed from the object.
  • All model components declare which settings they use.
  • Made tests for random_state
  • Made tests checking Model.__init__(**settings).settings == settings.

This commit contains two changes:
- The get_random_state function is changed to the class SeededRandomState. This class can be initialized from an integer or another SeededRandomState instance. It is just a numpy RandomState object, but using a seed is forced.
- All models have a settings attribute, such that the model can be initialized from the settings. When defining a new model class, one should define the _settings attribute. Returning the settings attribute is handled by BaseModel.
Add tests for the seed random state and change the attribtue 'seed' to 'random_seed' since it clashes with a method.
Test if the init arguments can be obtained from the model settings. Also turned the _settings propert into an abstractproperty so that it is forced to use it.
@PeterLombaers
Copy link
Member Author

I will extract the useful parts from this and make separate pull requests for those.

@J535D165
Copy link
Member

Thanks, this is still very promising!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants