Ensure individual properties updated across modules in generic first appointments #1387

matt-graham · 2024-05-30T19:35:58Z

Part of addressing #1348 and also fixes #1350

Specifically aims to fix issue identified by @marghe-molaro in #1348 (comment) and discussed further in #1348 (comment) by updating data structure used to provide cached (memoized) view of a population data frame row to allow both reading and writing.

Also moves the code related to generic first appointments out of tlo.core into the tlo.method.hsi_generic_first_appts module, creating a new subclass GenericFirstApptsModule that modules which wish to subscribe to this behaviour need to subclass from.

After the changes in this PR, the Population object has a new method individual_properties which returns an instance of IndividualProperties (previously PatientDetails) for a particular person_id row index and with option to choose whether the view on to a population dataframe row provided by the IndividualProperties instance is read-only or not.

The IndividualProperties instance allows reading the properties of the individual with square-bracket indexing notation for example

individual_properties = sim.population.individual_properties(person_id=person_id)
if individual_properties["is_alive"] and individual_properties["age_years"] > 5:
    # do something

Unlike the previous PatientDetails type, I've removed the ability to access properties using dot-based attribute access in favour of standardizing on indexing notation implemented using a __getitem__ method as (i) I think its better to have one standard way of doing things for consistency, (ii) overriding the __getattr__ special method can sometimes lead to infinite recursion bugs if not careful and (iii) using indexing to access properties allows adding further functionality in future to the IndividualProperties class exposed as attributes without worrying about these clashing with property names (for example we could add a symptoms property to allow easier checking / updating of symptoms as a set of strings).

As for the previous PatientDetails type, IndividualProperties uses a lazy memoized approach to reading properties from the dataframe. A reference to the person_id row index and population dataframe are stored internally when an instance is initialised along with a dictionary to as a cache. On the first attempt to access a property, the value will be read from the dataframe and stored in the cache dictionary, with subsequent accesses of the same property using the cache rather than reading again, with the assumption being that the population dataframe is now being changed between reads.

To allow writing the IndividualProperties class now also defines a __setitem__ method. If an IndividualProperties instances has had the read_only argument to the initialiser set to True, any attempts to write properties using the __setitem__ method will raise an exception. Otherwise the key of the property being updated is added to a internal set (empty on initialisation) recording any properties updated, and the new value written to the property cache dictionary, meaning any subsequent attempts to access this property will return the updated value rather than the original value in the dataframe.

By default any updated property values will not be reflected back in the original population dataframe. To allow the updates to be synchronized back to the dataframe a new synchronize_updates_to_dataframe method has been added to IndividualProperties which writes all properties from the updated set with their new values.

To guard against this manual finalisation / synchronization step being missed, I've made IndividualProperties a context manager by implementing __enter__ and __exit__ method, with synchronize_updates_to_dataframe automatically called on exiting in this case. So for example the recommended way of creating and using writeable instance of IndividualProperties is something like

with sim.population.individual_properties(person_id=person_id, read_only=False) as individual_properties:
    individual_properties["is_alive"] = False
    individual_properties["age_years"] = 100
    ...

Optionally, we could also use the context manager to try to explicitly prevent the user being able to read / write to the population dataframe within the with block context by having IndividualProperties store a reference to the Population object rather than just its props dataframe attribute, and doing something like temporarily assigning a dummy value like None to the props attribute before restoring the reference to the dataframe when exiting. This would allow us to guard against incorrect usages where the population dataframe is being changed while an IndividualProperties instance is in use. I haven't currently implemented this, as we know there are currently a few cases where the initialisers for HSI events that scheduled in the do_at_generic_first_appt methods are reading properties of the target directly from the population dataframe. Ideally we want to refactor these cases to pass in the relevant target attributes directly and read from the IndividualProperties instance.

While at the moment the IndividualProperties class is only being used in the generic first appointments logic, it could potentially be used in a lot of other places in the model to mediate accesses to the dataframe and having some logic to guard against incorrect use in this case as suggested above would be more important then.

In addition to the changes to IndividualProperties this PR also further slightly refactors the Population class by removing the stored reference to the top-level Simulation object and changing the initializer to instead take as an argument a dictionary of the properties to use to create the population dataframe columns (keyed by the property names). Removing this coupling to the Simulation object made it much easier to write unit tests for the Population object (added in a new test_population.py module in this PR) as it can now be initialised without creating a simulation just by creating a dummy dictionary of Property instances. There were a few cases of population level events which were using the stored reference to the simulation object in the population to acccess the simulation date but these have been refactored to instead use the simulation reference stored in the module for the event.

src/tlo/methods/alri.py

tamuri · 2024-06-03T23:52:09Z

src/tlo/population.py

+
+    def __init__(
+        self,
+        population_dataframe: pd.DataFrame,


Interesting question as to whether we should hold reference to population instance or props here. Insisting on using this class with a context manager should discourage holding the object for long periods of time. That'll be a problem because we switch out the population's props dataframe when we add rows for growing population.

I've now refactored the code a bit to simplify allowing 'locking' the population dataframe when within the context manager in future without having to store a reference to population in individual properties. The Population.individual_properties method itself now acts as a context manager (rather than the returned object) and so can easily in future do something like set a self._locked attribute which if true prevents direct access to the dataframe via the props attribute (making it a property) and also disallows do_birth being called. The object returned by the Population.individual_properties context manager will now also be 'finalized' on exiting the context, so that any subsequent attempts to read or update properties raise an error. The object also no longer stores even a 'private' reference to the population dataframe (with access instead mediated via closures defined in initialiser) which should further guard against direct accesses to population dataframe.

src/tlo/analysis/hsi_events.py

tamuri

Looks good to me. Had a thought whether to wrap the yield properties; properties.finalize() in a try: ... finally: ... but I think it's better it raises exception if it happens to fail for some reason.

matt-graham · 2024-06-17T12:41:25Z

@tamuri Do you we want further review here or is this good to merge?

tamuri · 2024-06-17T12:48:57Z

Good to merge 👍.

matt-graham added 13 commits May 30, 2024 15:48

Allow updating memoized population dataframe row view

a21e348

Renaming patient_id and patient_details

4ac4d5c

Updates to individual properties via IndividualProperties object

3c77fd3

Remove dot based attributed access from IndividualProperties

a8c6bd5

Remove unused random_state argument

0632660

Refactor and rename HSI_BaseGenericFirstAppt

2244249

Remove unused target_is_alive property

8786e86

Move do_at_generic_first_appt outside of core

67b1c72

Pass in schedule_hsi_event to generic first appointment actions

913612c

Minor pylint fixes

78341fd

Exclude GenericFirstApptsModule from enumerations

038f945

Subclass Hiv and Epilepsy from GenericFirstApptsModule

9bd3e77

Decouple Population class from Simulation to allow easier testing

2636683

matt-graham mentioned this pull request May 31, 2024

Fixes for diarrhoea generic first appointment logic and updating of individual properties across modules #1389

Merged

matt-graham added 3 commits May 31, 2024 11:49

Fix import order to satisfy isor

bcedc31

Remove unused sim slot from Population

17af1ea

Add unit tests for population and individual properties

4ff8029

matt-graham marked this pull request as ready for review May 31, 2024 16:02

matt-graham requested a review from tamuri May 31, 2024 16:02

matt-graham added 6 commits June 3, 2024 08:45

Merge branch 'master' into mmg/hsi-generic-individual-updates-fix

f7e9583

Fix errors from bad manual merge

fd1b874

Remove unused schedule_hsi_event property

09dfdb4

Make do_at_generic_first_appt methods keyword argument only

e495a12

Tidy up docstrings

36d3fba

Avoid accidental leaking of methods details into core

baabac7

tamuri reviewed Jun 3, 2024

View reviewed changes

src/tlo/methods/alri.py Outdated Show resolved Hide resolved

tamuri reviewed Jun 3, 2024

View reviewed changes

matt-graham added 3 commits June 14, 2024 11:18

Merge branch 'master' into mmg/hsi-generic-individual-updates-fix

ad81fee

Changing from subclass to mix-in

ef2ef56

Refactor individual properties context manager

4545a59

Fix use of context manager in tests

e798d18

matt-graham requested a review from tamuri June 14, 2024 16:49

matt-graham commented Jun 17, 2024

View reviewed changes

src/tlo/analysis/hsi_events.py Show resolved Hide resolved

Correct symptoms type hint

b3ad04f

tamuri approved these changes Jun 17, 2024

View reviewed changes

matt-graham merged commit 4e983a5 into master Jun 17, 2024
58 checks passed

matt-graham deleted the mmg/hsi-generic-individual-updates-fix branch June 17, 2024 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure individual properties updated across modules in generic first appointments #1387

Ensure individual properties updated across modules in generic first appointments #1387

matt-graham commented May 30, 2024 •

edited

Loading

tamuri Jun 3, 2024

matt-graham Jun 14, 2024 •

edited

Loading

tamuri left a comment

matt-graham commented Jun 17, 2024

tamuri commented Jun 17, 2024

Ensure individual properties updated across modules in generic first appointments #1387

Ensure individual properties updated across modules in generic first appointments #1387

Conversation

matt-graham commented May 30, 2024 • edited Loading

tamuri Jun 3, 2024

Choose a reason for hiding this comment

matt-graham Jun 14, 2024 • edited Loading

Choose a reason for hiding this comment

tamuri left a comment

Choose a reason for hiding this comment

matt-graham commented Jun 17, 2024

tamuri commented Jun 17, 2024

matt-graham commented May 30, 2024 •

edited

Loading

matt-graham Jun 14, 2024 •

edited

Loading