Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scenario Definitions Cannot Over-write Parameters Defined as a pd.Series or a pd.DataFrame #988

Open
tbhallett opened this issue May 31, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request framework

Comments

@tbhallett
Copy link
Collaborator

The below exemplar scenario definition -- wherein we attempt to over-write in draw_parameters a parameter that is defined either as a pd.Series or a pd.DataFrame -- fails with this error:

TypeError: Object of type DataFrame is not JSON serializable

But, many parameters that need to be overwritten are in the format!

from pathlib import Path
import pandas as pd
from tlo import Date, logging
from tlo.methods.fullmodel import fullmodel
from tlo.scenario import BaseScenario


class TestScenario(BaseScenario):
    def __init__(self):
        super().__init__()
        self.seed = 0
        self.start_date = Date(2010, 1, 1)
        self.end_date = Date(2020, 1, 1)
        self.pop_size = 100_000
        self.number_of_draws = 1
        self.runs_per_draw = 5

    def log_configuration(self):
        return {
            'filename': 'effect_of_each_treatment',
            'directory': Path('./outputs'),
            'custom_levels': {
                '*': logging.WARNING,
                'tlo.methods.demography': logging.INFO,
                'tlo.methods.demography.detail': logging.WARNING,
                'tlo.methods.healthburden': logging.INFO,
                'tlo.methods.healthsystem.summary': logging.INFO,
            }
        }

    def modules(self):
        return fullmodel(resourcefilepath=self.resources)

    def draw_parameters(self, draw_number, rng):
        return {
            'Module': {
                'param_defined_as_series': pd.Series(['a', 'b', 'c']),
                'param_defined_as_dataframe': pd.DataFrame([[0, 1, 2], [3, 4, 5]]),
            },
        }


if __name__ == '__main__':
    from tlo.cli import scenario_run

    scenario_run([__file__])
@tamuri
Copy link
Collaborator

tamuri commented Jun 6, 2023

Sadly, it's not a simple fix. We write out the full specification for the scenario draws in a JSON file so they can run in parallel and independently. To faciliate that, the decision was made to only support overriding parameters with simple scalars. This was extended to other native Python types (e.g. list).

Scalars, list and dicts can be serialised to and from JSON natively, without any post-processing. However, Pandas Series and Dataframes will require the storage of extra metadata to make sure they are restored using the Pandas to/from_json methods, with the right types etc. We can do that, but it'll need to be done carefully.

For now, I recommend that modules accept overridden parameters as scalars, lists or dicts and then, inside the module, handle any conversion to a Series or Dataframe.

@tbhallett
Copy link
Collaborator Author

tbhallett commented Jun 6, 2023 via email

@tamuri tamuri moved this from To do to Issues in Issue management Jul 26, 2023
@tamuri tamuri moved this from Issues to To do in Issue management Jul 26, 2023
@tamuri tamuri added enhancement New feature or request framework labels Jul 26, 2023
@tamuri
Copy link
Collaborator

tamuri commented Oct 12, 2023

See #392

@matt-graham matt-graham mentioned this issue May 13, 2024
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request framework
Projects
Development

No branches or pull requests

2 participants