Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure individual properties updated across modules in generic first appointments #1387

Merged
merged 27 commits into from
Jun 17, 2024

Conversation

matt-graham
Copy link
Collaborator

@matt-graham matt-graham commented May 30, 2024

Part of addressing #1348 and also fixes #1350

Specifically aims to fix issue identified by @marghe-molaro in #1348 (comment) and discussed further in #1348 (comment) by updating data structure used to provide cached (memoized) view of a population data frame row to allow both reading and writing.

Also moves the code related to generic first appointments out of tlo.core into the tlo.method.hsi_generic_first_appts module, creating a new subclass GenericFirstApptsModule that modules which wish to subscribe to this behaviour need to subclass from.


After the changes in this PR, the Population object has a new method individual_properties which returns an instance of IndividualProperties (previously PatientDetails) for a particular person_id row index and with option to choose whether the view on to a population dataframe row provided by the IndividualProperties instance is read-only or not.

The IndividualProperties instance allows reading the properties of the individual with square-bracket indexing notation for example

individual_properties = sim.population.individual_properties(person_id=person_id)
if individual_properties["is_alive"] and individual_properties["age_years"] > 5:
    # do something

Unlike the previous PatientDetails type, I've removed the ability to access properties using dot-based attribute access in favour of standardizing on indexing notation implemented using a __getitem__ method as (i) I think its better to have one standard way of doing things for consistency, (ii) overriding the __getattr__ special method can sometimes lead to infinite recursion bugs if not careful and (iii) using indexing to access properties allows adding further functionality in future to the IndividualProperties class exposed as attributes without worrying about these clashing with property names (for example we could add a symptoms property to allow easier checking / updating of symptoms as a set of strings).

As for the previous PatientDetails type, IndividualProperties uses a lazy memoized approach to reading properties from the dataframe. A reference to the person_id row index and population dataframe are stored internally when an instance is initialised along with a dictionary to as a cache. On the first attempt to access a property, the value will be read from the dataframe and stored in the cache dictionary, with subsequent accesses of the same property using the cache rather than reading again, with the assumption being that the population dataframe is now being changed between reads.

To allow writing the IndividualProperties class now also defines a __setitem__ method. If an IndividualProperties instances has had the read_only argument to the initialiser set to True, any attempts to write properties using the __setitem__ method will raise an exception. Otherwise the key of the property being updated is added to a internal set (empty on initialisation) recording any properties updated, and the new value written to the property cache dictionary, meaning any subsequent attempts to access this property will return the updated value rather than the original value in the dataframe.

By default any updated property values will not be reflected back in the original population dataframe. To allow the updates to be synchronized back to the dataframe a new synchronize_updates_to_dataframe method has been added to IndividualProperties which writes all properties from the updated set with their new values.

To guard against this manual finalisation / synchronization step being missed, I've made IndividualProperties a context manager by implementing __enter__ and __exit__ method, with synchronize_updates_to_dataframe automatically called on exiting in this case. So for example the recommended way of creating and using writeable instance of IndividualProperties is something like

with sim.population.individual_properties(person_id=person_id, read_only=False) as individual_properties:
    individual_properties["is_alive"] = False
    individual_properties["age_years"] = 100
    ...

Optionally, we could also use the context manager to try to explicitly prevent the user being able to read / write to the population dataframe within the with block context by having IndividualProperties store a reference to the Population object rather than just its props dataframe attribute, and doing something like temporarily assigning a dummy value like None to the props attribute before restoring the reference to the dataframe when exiting. This would allow us to guard against incorrect usages where the population dataframe is being changed while an IndividualProperties instance is in use. I haven't currently implemented this, as we know there are currently a few cases where the initialisers for HSI events that scheduled in the do_at_generic_first_appt methods are reading properties of the target directly from the population dataframe. Ideally we want to refactor these cases to pass in the relevant target attributes directly and read from the IndividualProperties instance.

While at the moment the IndividualProperties class is only being used in the generic first appointments logic, it could potentially be used in a lot of other places in the model to mediate accesses to the dataframe and having some logic to guard against incorrect use in this case as suggested above would be more important then.

In addition to the changes to IndividualProperties this PR also further slightly refactors the Population class by removing the stored reference to the top-level Simulation object and changing the initializer to instead take as an argument a dictionary of the properties to use to create the population dataframe columns (keyed by the property names). Removing this coupling to the Simulation object made it much easier to write unit tests for the Population object (added in a new test_population.py module in this PR) as it can now be initialised without creating a simulation just by creating a dummy dictionary of Property instances. There were a few cases of population level events which were using the stored reference to the simulation object in the population to acccess the simulation date but these have been refactored to instead use the simulation reference stored in the module for the event.

@matt-graham matt-graham marked this pull request as ready for review May 31, 2024 16:02
@matt-graham matt-graham requested a review from tamuri May 31, 2024 16:02
src/tlo/methods/alri.py Outdated Show resolved Hide resolved

def __init__(
self,
population_dataframe: pd.DataFrame,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question as to whether we should hold reference to population instance or props here. Insisting on using this class with a context manager should discourage holding the object for long periods of time. That'll be a problem because we switch out the population's props dataframe when we add rows for growing population.

Copy link
Collaborator Author

@matt-graham matt-graham Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now refactored the code a bit to simplify allowing 'locking' the population dataframe when within the context manager in future without having to store a reference to population in individual properties. The Population.individual_properties method itself now acts as a context manager (rather than the returned object) and so can easily in future do something like set a self._locked attribute which if true prevents direct access to the dataframe via the props attribute (making it a property) and also disallows do_birth being called. The object returned by the Population.individual_properties context manager will now also be 'finalized' on exiting the context, so that any subsequent attempts to read or update properties raise an error. The object also no longer stores even a 'private' reference to the population dataframe (with access instead mediated via closures defined in initialiser) which should further guard against direct accesses to population dataframe.

@matt-graham matt-graham requested a review from tamuri June 14, 2024 16:49
Copy link
Collaborator

@tamuri tamuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Had a thought whether to wrap the yield properties; properties.finalize() in a try: ... finally: ... but I think it's better it raises exception if it happens to fail for some reason.

@matt-graham
Copy link
Collaborator Author

@tamuri Do you we want further review here or is this good to merge?

@tamuri
Copy link
Collaborator

tamuri commented Jun 17, 2024

Good to merge 👍.

@matt-graham matt-graham merged commit 4e983a5 into master Jun 17, 2024
58 checks passed
@matt-graham matt-graham deleted the mmg/hsi-generic-individual-updates-fix branch June 17, 2024 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

do_at_generic_first_appt and do_at_generic_first_appt_emergency should not be in core.Module (probably)
2 participants