-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure individual properties updated across modules in generic first appointments #1387
Conversation
src/tlo/population.py
Outdated
|
||
def __init__( | ||
self, | ||
population_dataframe: pd.DataFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting question as to whether we should hold reference to population instance or props here. Insisting on using this class with a context manager should discourage holding the object for long periods of time. That'll be a problem because we switch out the population's props dataframe when we add rows for growing population.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now refactored the code a bit to simplify allowing 'locking' the population dataframe when within the context manager in future without having to store a reference to population in individual properties. The Population.individual_properties
method itself now acts as a context manager (rather than the returned object) and so can easily in future do something like set a self._locked
attribute which if true prevents direct access to the dataframe via the props
attribute (making it a property) and also disallows do_birth
being called. The object returned by the Population.individual_properties
context manager will now also be 'finalized' on exiting the context, so that any subsequent attempts to read or update properties raise an error. The object also no longer stores even a 'private' reference to the population dataframe (with access instead mediated via closures defined in initialiser) which should further guard against direct accesses to population dataframe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Had a thought whether to wrap the yield properties; properties.finalize()
in a try: ... finally: ...
but I think it's better it raises exception if it happens to fail for some reason.
@tamuri Do you we want further review here or is this good to merge? |
Good to merge 👍. |
Part of addressing #1348 and also fixes #1350
Specifically aims to fix issue identified by @marghe-molaro in #1348 (comment) and discussed further in #1348 (comment) by updating data structure used to provide cached (memoized) view of a population data frame row to allow both reading and writing.
Also moves the code related to generic first appointments out of
tlo.core
into thetlo.method.hsi_generic_first_appts
module, creating a new subclassGenericFirstApptsModule
that modules which wish to subscribe to this behaviour need to subclass from.After the changes in this PR, the
Population
object has a new methodindividual_properties
which returns an instance ofIndividualProperties
(previouslyPatientDetails
) for a particularperson_id
row index and with option to choose whether the view on to a population dataframe row provided by theIndividualProperties
instance is read-only or not.The
IndividualProperties
instance allows reading the properties of the individual with square-bracket indexing notation for exampleUnlike the previous
PatientDetails
type, I've removed the ability to access properties using dot-based attribute access in favour of standardizing on indexing notation implemented using a__getitem__
method as (i) I think its better to have one standard way of doing things for consistency, (ii) overriding the__getattr__
special method can sometimes lead to infinite recursion bugs if not careful and (iii) using indexing to access properties allows adding further functionality in future to theIndividualProperties
class exposed as attributes without worrying about these clashing with property names (for example we could add asymptoms
property to allow easier checking / updating of symptoms as a set of strings).As for the previous
PatientDetails
type,IndividualProperties
uses a lazy memoized approach to reading properties from the dataframe. A reference to theperson_id
row index and population dataframe are stored internally when an instance is initialised along with a dictionary to as a cache. On the first attempt to access a property, the value will be read from the dataframe and stored in the cache dictionary, with subsequent accesses of the same property using the cache rather than reading again, with the assumption being that the population dataframe is now being changed between reads.To allow writing the
IndividualProperties
class now also defines a__setitem__
method. If anIndividualProperties
instances has had theread_only
argument to the initialiser set toTrue
, any attempts to write properties using the__setitem__
method will raise an exception. Otherwise the key of the property being updated is added to a internal set (empty on initialisation) recording any properties updated, and the new value written to the property cache dictionary, meaning any subsequent attempts to access this property will return the updated value rather than the original value in the dataframe.By default any updated property values will not be reflected back in the original population dataframe. To allow the updates to be synchronized back to the dataframe a new
synchronize_updates_to_dataframe
method has been added toIndividualProperties
which writes all properties from the updated set with their new values.To guard against this manual finalisation / synchronization step being missed, I've made
IndividualProperties
a context manager by implementing__enter__
and__exit__
method, withsynchronize_updates_to_dataframe
automatically called on exiting in this case. So for example the recommended way of creating and using writeable instance ofIndividualProperties
is something likeOptionally, we could also use the context manager to try to explicitly prevent the user being able to read / write to the population dataframe within the with block context by having
IndividualProperties
store a reference to thePopulation
object rather than just itsprops
dataframe attribute, and doing something like temporarily assigning a dummy value likeNone
to theprops
attribute before restoring the reference to the dataframe when exiting. This would allow us to guard against incorrect usages where the population dataframe is being changed while anIndividualProperties
instance is in use. I haven't currently implemented this, as we know there are currently a few cases where the initialisers for HSI events that scheduled in thedo_at_generic_first_appt
methods are reading properties of the target directly from the population dataframe. Ideally we want to refactor these cases to pass in the relevant target attributes directly and read from theIndividualProperties
instance.While at the moment the
IndividualProperties
class is only being used in the generic first appointments logic, it could potentially be used in a lot of other places in the model to mediate accesses to the dataframe and having some logic to guard against incorrect use in this case as suggested above would be more important then.In addition to the changes to
IndividualProperties
this PR also further slightly refactors thePopulation
class by removing the stored reference to the top-levelSimulation
object and changing the initializer to instead take as an argument a dictionary of the properties to use to create the population dataframe columns (keyed by the property names). Removing this coupling to theSimulation
object made it much easier to write unit tests for thePopulation
object (added in a newtest_population.py
module in this PR) as it can now be initialised without creating a simulation just by creating a dummy dictionary ofProperty
instances. There were a few cases of population level events which were using the stored reference to the simulation object in the population to acccess the simulation date but these have been refactored to instead use the simulation reference stored in the module for the event.