Skip to content

Cookbook

Tim Hallett edited this page Jun 18, 2021 · 52 revisions

Add requests in this document

Table of Contents

Date arithmetic

Pandas timeseries documentation

Dates should be 'TimeStamp' objects and intervals should be 'Timedelta' objects.

Handling partial months and years

Note: Pandas does not know how to handle partial years and months. Convert the interval to days e.g.

# pandas can handle partial days
>>> pd.to_timedelta([0.25, 0.5, 1, 1.5, 2], unit='d')
TimedeltaIndex(['0 days 06:00:00', '0 days 12:00:00', '1 days 00:00:00',
                '1 days 12:00:00', '2 days 00:00:00'],
               dtype='timedelta64[ns]', freq=None)

# pandas cannot handle partial months
>>> pd.to_timedelta([0.25, 0.5, 1, 1.5, 2], unit='M')
TimedeltaIndex([ '0 days 00:00:00',  '0 days 00:00:00', '30 days 10:29:06',
                '30 days 10:29:06', '60 days 20:58:12'],
               dtype='timedelta64[ns]', freq=None)

# pandas cannot handle partial years
>>> pd.to_timedelta([0.25, 0.5, 1, 1.5, 2], unit='Y')
TimedeltaIndex([  '0 days 00:00:00',   '0 days 00:00:00', '365 days 05:49:12',
                '365 days 05:49:12', '730 days 11:38:24'],
               dtype='timedelta64[ns]', freq=None)

The way to handle this is to multiply by average number of days in months or year. For example

partial_interval = pd.Series([0.25, 0.5, 1, 1.5, 2])

# we want timedelta for 0.25, 0.5, 1, 1.5 etc months, we need to convert to days
interval = pd.to_timedelta(partial_interval * 30.44, unit='d')
print(interval)

TimedeltaIndex([ '7 days 14:38:24', '15 days 05:16:48', '30 days 10:33:36',
                '45 days 15:50:24', '60 days 21:07:12'],
               dtype='timedelta64[ns]', freq=None)

# we want timedelta for 0.25, 0.5, 1, 1.5 etc years, we need to convert to days
interval = pd.to_timedelta(partial_interval * 365.25, unit='d')
print(interval)

TimedeltaIndex([ '91 days 07:30:00', '182 days 15:00:00', '365 days 06:00:00',
                '547 days 21:00:00', '730 days 12:00:00'],
               dtype='timedelta64[ns]', freq=None)

Adding/substracting time intervals from a date

current_date = self.sim.date

# sample a list of numbers from an exponential distribution
# (remember to use self.rng in TLO code)
random_draw = np.random.exponential(scale=5, size=10)

# convert these numbers into years
# valid units are: [h]ours; [d]ays; [M]onths; [y]ears
# REMEMBER: Pandas cannot handle fractions of months or years
random_years = pd.to_timedelta(random_draw, unit='y')

# add to current date
future_dates = current_date + random_years

Events

A regular event for individual in population

An event scheduled to run every day on a given person. Note the order of the mixin & superclass:

class MyRegularEventOnIndividual(IndividualScopeEventMixin, RegularEvent):
    def __init__(self, module, person):
        super().__init__(module=module, person=person, frequency=DateOffset(days=1))

    def apply(self, person):
        print('do something on person', person.index, 'on', self.sim.date)

Add to simulation e.g. in initialise_simulation():

sim.schedule_event(MyRegularEventOnIndividual(module=self, person=an_individual),
                   sim.date + DateOffset(days=1)

Ending a regular event

class ExampleEvent(RegularEvent, PopulationScopeEventMixin):
    def __init__(self, module):
        super().__init__(module, frequency=DateOffset(days=1))

    def apply(self, population):
        # this event doesn't run after 2030
        if self.sim.date.year == 2030:
            # the end date is today's date
            self.end_date = self.sim.date
            # exit the procedure
            return

        # code that does something for this event
        print('do some work for this event')

Understanding assignment by index or row offset

When you assign a series/column of values from one dataframe/series to another dataframe/series, Pandas will by default honour the index on the collection. However, you can ignore the index by accessing the values directly. If you notice odd assignments in your properties, check whether you're assigning using index or values. Example (run in a Python console):

import pandas as pd

# create a dataframe with one column
df1 = pd.DataFrame({'column_1': range(0, 5)})
df1.index.name = 'df1_index'
print(df1)

# df1:
#            column_1
# df1_index
# 0                 0
# 1                 1
# 2                 2
# 3                 3
# 4                 4

df2 = pd.DataFrame({'column_2': range(10, 15)})
df2.index.name = 'df2_index'
df2 = df2.sort_values(by='column_2', ascending=False) # reverse the order of rows in df2
print(df2)

# notice the df2_index:
#
#            column_2
# df2_index
# 4                14
# 3                13
# 2                12
# 1                11
# 0                10

# if we assign one column to another, Pandas will use the index to merge the columns
df1['df2_col2_use_index'] = df2['column_2']

# if we assign the column's values to another, Pandas will ignore the index
df1['df2_col2_use_row_offset'] = df2['column_2'].values

# note difference when assigning using index vs '.values'
print(df1)

#            column_1  df2_col2_use_index  df2_col2_use_row_offset
# df1_index
# 0                 0                  10                       14
# 1                 1                  11                       13
# 2                 2                  12                       12
# 3                 3                  13                       11
# 4                 4                  14                       10

Assign values to population with specified probability

Assigning True or False at given probability

Assign True to all individuals at probability p_true (otherwise False)

df = population.prop
random_draw = self.rng.random_sample(size=len(df))  # random sample for each person between 0 and 1
df['my_property'] = (p_true < random_draw)

or randomly sample a set of rows at the given probability:

df = population.prop
df['my_property'] = False
sampled_indices = np.random.choice(df.index.values, int(len(df) * p_true))
df.loc[sampled_indices, 'my_property'] = True

You can sample a proportion of the index and set those:

df = population.prop
df['my_property'] = False
df.loc[df.index.to_series().sample(frac=p_true).index, 'my_property'] = True

Assigning True or False at different rates based on criteria

Imagine we have different rate of my_property being true based on sex.

df = population.props

# create a dataframe to hold the probabilities (or read from an Excel workbook)
prob_by_sex = pd.DataFrame(data=[('M', 0.46), ('F', 0.62)], columns=['sex', 'p_true'])

# merge with the population dataframe
df_with_prob = df[['sex']].merge(prob_by_sex, left_on=['sex'], right_on=['sex'], how='left')

# randomly sample numbers between 0 and 1
random_draw = self.rng.random_sample(size=len(df))

# assign true or false based on draw and individual's p_true
df['my_property'] = (df_with_prob.p_true.values < random_draw)

Assigning value from a set with given probability

df = population.props

# get the categories and probabilities (read from Excel file/in the code etc)
categories = [1, 2, 3, 4]  # or categories = ['A', 'B', 'C', 'D']
probabilities = [0.1, 0.2, 0.3, 0.4]  # NB These must add up to 1.

random_choice = self.rng.choice(categories, size=len(df), p=probabilities)

# if 'categories' should be treated as a plain old number or string
df['my_category'] = random_choice

# else if 'categories' should be treated as a real Pandas Categorical
# i.e. property was set up using Types.CATEGORICAL
df['my_category'].values[:] = random_choice

Transitioning between multiple states based on probability for each transition

A utility function (transition_states) can carry out all transitions based on probability matrix for each transition from one state to another.

# import the util module to be able to use the transition_states function
from tlo import util

# create a probability matrix, each original state's probabilities should sum to 1
disease_states = ['a', 'b', 'c', 'd'] # or disease_states = [1, 2, 3, 4]
prob_matrix = pd.DataFrame(columns=disease_states, index=disease_states)
# when writing the prob_matrix['a'] is the original state  
# values in the list are probability for new states in the same order
#                   a    b    c    d
prob_matrix['a'] = [0.9, 0.1, 0.0, 0.0]
prob_matrix['b'] = [0.2, 0.2, 0.6, 0.0]
prob_matrix['c'] = [0.0, 0.2, 0.6, 0.2]
prob_matrix['d'] = [0.0, 0.0, 0.3, 0.7]

# when viewed, columns are the original state, rows/indexes are the new_states
prob_matrix
' a b c d
a 0.9 0.2 0.0 0.0
b 0.1 0.2 0.2 0.0
c 0.0 0.6 0.6 0.3
d 0.0 0.0 0.2 0.7
df = population.props

# States can only change if the individual is alive and is over 16
changeable_states = df.loc[df.is_alive & (df.age_years > 16), 'disease_state']
# transition the changeable states based on the probability matrix, passing in the rng
new_states = util.transition_states(changeable_states, prob_matrix, self.rng)
# update the DataFrame with the new states
df.disease_state.update(new_states)

Creating and extending dataframes

In most cases, dataframes are created from the resource files. This section is for times when you need to build dataframes from scratch, add on several columns or merge dataframes together.

Building dataframes iteratively

The most efficient way to build a pandas dataframe is to pass the entire dataset in one step. If you have to calculate a column at a time (in a for loop for example), then avoid using pd.concat([original_df, new column], axis=1).

Instead use the method below, calculate each of the columns sequentially and tranform into a dataframe after the entire dataset is built.

# create empty dictionary of {'column_name': column_data}, then fill it with all data
df_data = {}
for col in ['A', 'B', 'C']:
    column_name = f'column_{col}'
    column_data = function_that_returns_list_of_data(col)
    df_data[column_name] = column_data

# convert dictionary into pandas dataframe
pd.DataFrame(data=df_data, index=index)  # the index here can be left out if not relevant

Merging two dataframes together

If you want to join two dataframes together and you know that there are no repeated values in the join-column then it is more efficient to join on indexes rather than named columns.

# original_df already has the index that we want so we don't need to reindex
original_df  

# we want to join on 'joining_column' so we make this the index of merging_df
merging_df.set_index('joining_column', inplace=True)

# now we can join the indexes of both dataframes
altered_df = original_df.merge(merging_df, left_index=True, right_index=True, how='left')

Plotting maps

An example analysis script which plots maps is in src/scripts/consumable_resource_analyses/map_plotting_example.py

Reading shape files

Maps are plotted using shape files, files that represent geographic areas with a .shp file extension. There are three different shape files that correspond to the boundaries of different administrative levels (these files come from the Humanitarian Data Exchange project https://data.humdata.org/dataset/malawi-administrative-level-0-3-boundaries).

  • mwi_admbnda_adm0_nso_20181016.shp -> country boundary
  • mwi_admbnda_adm1_nso_20181016.shp -> region boundaries
  • mwi_admbnda_adm2_nso_20181016.shp -> district boundaries

To read in the shape files you'll need to use the pyshp package (https://pypi.org/project/pyshp/), install this package using an install command in the PyCharm Terminal. Click on the Terminal button in PyCharm and enter the following command:

pip install pyshp

Import the python package for reading in the shape files into your code:

import shapefile as shp

Read in the shape file corresponding to the administrative boundaries you want to plot:

sf = shp.Reader('resources/ResourceFile_mwi_admbnda_adm0_nso_20181016.shp')

Plotting shape files

You'll need to have the matplotlib package installed to plot the shape file. To install matplotlib click on the Terminal button in PyCharm and enter the following command:

pip install matplotlib

Import matplotlib, by convention matplotlib.pyplot is given the alias plt:

import matplotlib.pyplot as plt

Open a new figure and plot the shape file with the following code:

plt.figure()  # open a new figure 

# loops through each part of the file (regions, districts etc.) and plots the boundary
for shape in sf.shapeRecords():
    for i in range(len(shape.shape.parts)):
        i_start = shape.shape.parts[i]
        if i == len(shape.shape.parts) - 1:
            i_end = len(shape.shape.points)
        else:
            i_end = shape.shape.parts[i + 1]
        x = [i[0] for i in shape.shape.points[i_start:i_end]]
        y = [i[1] for i in shape.shape.points[i_start:i_end]]
        plt.plot(x, y, color='k', linewidth=0.1)  # color='k' sets the boundary line colour to black

plt.axis('off')  # remove the axes
plt.gca().set_aspect('equal', adjustable='box')  # set the aspect ratio to even to avoid stretching the map

Plotting point data

To plot points on the map with your data you will need data with grid coordinates. For example if the example data is stored in a data frame named example_df with columns named Eastings and Northings, corresponding to the x an y coordinates of the data, then we would plot the grid coordinates using the following code:

eastings = example_df['Eastings']  # x grid coordinate
northings = example_df['Northings']  # y grid coordinate

plt.scatter(eastings, northings, c='b', s=4)  # c sets the marker colour and s sets the marker size

For more options on marker colour, marker size and marker style see the following matplotlib documentation: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.scatter.html

Plotting point data with values

If your data has both grid coordinates and a numerical value, the numerical value can be represented on plot using a colour map. For example, if the example data is a data frame of paracetamol stock out days named paracetamol_df with columns named Eastings, Northings, and Stock Out Days we would plot the numerical values of the stock out days with a colour map using the following code:

stock_out_days = paracetamol_df['Stock Out Days']
eastings = paracetamol_df['Eastings']
northings = paracetamol_df['Northings']

cm = plt.cm.get_cmap('Purples')  # 'Purples' is one of matplotlib's sequetial colour maps
sc = plt.scatter(eastings, northings, c=stock_out_days, cmap=cm, s=4)  # cmap sets to colour map
# fraction and pad set the size and position of the colour bar
plt.colorbar(sc, fraction=0.01, pad=0.01, label="Stock Out Days")  

For more colour map options see the following matplotlib documentation: https://matplotlib.org/examples/color/colormaps_reference.html

Outputting a map

You can save your figure and output it to PyCharm's Plots window in the SciView tab using the following code:

# bbox_inches="tight" removes white space and dpi sets the resolution of the figure
plt.savefig('my_map_plot.png', bbox_inches="tight", dpi=600)

plt.show()  # this outputs the figure to PyCharm's Plots window

Full example of map plotting code

import pandas as pd
import matplotlib.pyplot as plt
import shapefile as shp

# read in the data frame and shape file
paracetamol_df = pd.read_csv('paracetamol_df.csv')
sf = shp.Reader("mwi_admbnda_adm2_nso_20181016.shp")

# select the data and grid coordinates from the data frame
stock_out_days = paracetamol_df['Stock Out Days']
eastings = paracetamol_df['Eastings']
northings = paracetamol_df['Northings']

# create a figure
plt.figure()

# loop through the parts in the shape file
for shape in sf.shapeRecords():
    for i in range(len(shape.shape.parts)):
        i_start = shape.shape.parts[i]
        if i == len(shape.shape.parts) - 1:
            i_end = len(shape.shape.points)
        else:
            i_end = shape.shape.parts[i + 1]
        x = [i[0] for i in shape.shape.points[i_start:i_end]]
        y = [i[1] for i in shape.shape.points[i_start:i_end]]
        plt.plot(x, y, color='k', linewidth=0.1)

# remove figure axes and set the aspect ratio to equal so that the map isn't stretched
plt.axis('off')
plt.gca().set_aspect('equal')

# plot the data using a purple colour map
cm = plt.cm.get_cmap('Purples')
sc = plt.scatter(eastings, northings, c=stock_out_days, cmap=cm, s=4)
plt.colorbar(sc, fraction=0.01, pad=0.01, label="Stock Out Days")

# give the figure a title
plt.title("Paracetamol 500mg, tablets")

# save the figure
plt.savefig('plot_map_paracetamol_stock_out_days.png', bbox_inches="tight", dpi=600)

# display the figure in PyCharm's Plots window
plt.show()

Example Plots

Banded Plot

Example of a banded plot written using matplotlib, useful for plotting confidence intervals

import matplotlib.pyplot as plt
import numpy as np

# generating dummy data
x = np.linspace(0, 10, 20)
y = x**2 + 15
ymin = (y - 10) - 5*np.random.rand(20)
ymax = (y + 10) + 5*np.random.rand(20)

# plot the model output in the foreground (zorder=2)
plt.plot(x, y, label='model output', zorder=2)

# plot the confidence interval to be semi-transparent (alpha=0.5) and behind the model output (zorder=1)
plt.fill_between(x, ymin, ymax, alpha=0.5, label='confidence interval', zorder=1)

# label the plot with titles and a grid
plt.legend()
plt.title('Banded Plot Title')
plt.xlabel('X axis title')
plt.ylabel('Y axis title')
plt.grid(linestyle='--', zorder=0)

# save the plot
plt.savefig('banded_plot.png', dpi=600)
plt.show()
plt.close()

Banded Plot Example

Stacked Bar Chart

Example of a stacked bar chart written using matplotlib, note in pandas a stacked bar chart can also be generated directly from a data frame using df.plot.bar(stacked=True).

import matplotlib.pyplot as plt
import numpy as np

# generate dummy data
groups = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
x = 6 + 2*np.random.rand(7)
y = 6 - 3*np.random.rand(7)
z = 6 - 5*np.random.rand(7)

# use the bottom= arugment to offset the y and z data by x and x+y respectively in order to "stack" the data
plt.bar(groups, x, label='x variable')
plt.bar(groups, y, label='y variable', bottom=x)
plt.bar(groups, z, label='z variable', bottom=x+y)

# label the plot with titles
plt.legend()
plt.ylim(0, 20)
plt.title('Stacked Bar Chart Title')
plt.ylabel('Y axis label')
plt.xlabel('X axis label')

# save the plot
plt.savefig('stacked_bar_chart.png', dpi=600)
plt.show()
plt.close()

Stacked Bar Chart Example

Population Pyramid

Example of a population pyramid written using matplotlib:

import matplotlib.pyplot as plt
import numpy as np

# generate dummy data
groups = ['0-9', '10-19', '20-29', '30-39', '40-49', '50-59', '60-69', '70-79', '80+']
men = np.linspace(100, 20, 9) + 20*np.random.rand(9)
women = np.linspace(100, 25, 9) + 20*np.random.rand(9)
model_men =  np.linspace(100, 20, 9) + 20*np.random.rand(9)
model_women = np.linspace(100, 25, 9) + 20*np.random.rand(9)

# use horizontal bar chart functions (barh) to plot the pyramid and the plot function to plot model output
plt.barh(groups, men, alpha=0.5, label='men')
plt.barh(groups, -women, alpha=0.5, label='women')
plt.plot(model_men, groups)
plt.plot(-model_women, groups)

# label the plot with titles and correct the x axis tick labels (replace negative values with positive)
plt.legend()
plt.title('Population Pyramid')
plt.ylabel('Age Groups')
plt.xlabel('Population Size')
locs, labels = plt.xticks()
plt.xticks(locs, np.sqrt(locs**2))

# save the plot
plt.savefig('population_pyramid.png', dpi=600)
plt.show()
plt.close()

Population Pyramid Example

Further Examples

Slides from the matplotlib plotting tutorial and a jupyter notebook containing more example matplotlib plots are available on the TLO Dropbox: https://www.dropbox.com/sh/6nv3918vy88dlu6/AAANykRCmX-tgFpaEor-kNSua/06%20-%20Presentations/plotting_tutorial_2020_04_08?dl=0&subfolder_nav_tracking=1

Sankey diagrams

Example of a Sankey diagram

from matplotlib.sankey import Sankey
from matplotlib import pyplot as plt

# first flow into the diagram, the first value is the total quantity introduced into the
# flow, the remaining, the subsequent terms remove a certain quantity from the first flow.

# In this example we will plot the percent health care budget consumed by various conditions, 50% is spent on HIV, 30%
# on road traffic injuries and 20% on epilepsy, we store this information in two arrays, flows1 which houses the
# data and labels1 which gives each percentage a label.

# I want the total budget to go in a straight line from left to right, the hiv budget to go up from the total budget,
# the road traffic budget to carry on straight and the epilepsy budget to go down, I will store these directions in an
# array orientations1, the first entry is 0 as we don't want to change the orientations from the default direction,
# the second entry is 1 as we want the HIV budget to go up, the third entry is 0 as we want the entry to go on straight,
# the fourth entry is -1 as we want the epilepsy budget to go down.
flow1 = [100, -30, -50, -20]
labels1 = ['Total expenditure on health', '% spent on HIV', "% spent on road "
                                                            "\n"
                                                            "traffic injuries",
           '% spent on epilepsy']
orientations1 = [0, 1, 0, -1]
# The second flow breaks down the what the road traffic injuries budget was spent on, 10% of
# the total budget was spent on bandages, 15% on plaster of paris and 5% on surgery, we store data in flows 2 and the
# labelling info in labels2, leaving the first entry blank as this is where the '% spent on road traffic injuries'
# flow links to the breakdown flow.
# In the orientations, the first entry is zero as we want this flow to carry on in the same direction, the second entry
# for bandage expenditure is 1 as we want this to head up from the flow, the second entry is zero as we want the
# plaster of paris expenditure to go on straight and finally the fourth entry is -1 to make the surgery expenditure go
# down
flow2 = [50, -10, -15, -25]
labels2 = ['', 'bandages', 'plaster of paris', 'surgery']
orientations2 = [0, 1, 0, -1]

# Now we have created the flows and set the labels and directions the arrows go in, we can create the sankey diagram.

# The sankey object needs to be scaled (controls how much space the diagram takes up), but if it's not scaled properly
# it can look pretty terrible, I found that a fairly reasonable scale to use is to use 1/a where a is the first entry of
# the first flow (in this example 100).
# The offset zooms into and out from the diagram

fig = plt.figure(figsize=(20, 10))  # create figure
ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[])
ax.axis('off')  # Turn off the box for the plot
plt.title('Budget example')  # create title
sankey = Sankey(ax=ax,
                scale=1 / flow1[0],
                offset=0.2,
                unit='')  # create sankey object
sankey.add(flows=flow1,  # add the first flow
           labels=labels1,
           orientations=orientations1,
           pathlengths=[0.1, 0.1, 0.1, 0.1],
           trunklength=0.605,  # how long this flow is
           edgecolor='#027368',  # choose colour
           facecolor='#027368')
sankey.add(flows=flow2,
           labels=labels2,
           trunklength=0.5,
           pathlengths=[0.25, 0.25, 0.25, 0.25],
           orientations=[0, 1, 0, -1],
           prior=0,  # which sankey are you connecting to (0-indexed)
           connect=(2, 0),  # flow number to connect: this is the index of road traffic injury portion of the budget in
           # the first flow (third entry, python index 2) which connects to the first entry in the second flow (python
           # index 0).
           edgecolor='#58A4B0',  # choose colour
           facecolor='#58A4B0')
diagrams = sankey.finish()
plt.savefig('sankey_example.png', dpi=600)
plt.show()
plt.close()

Sankey Diagram Example

Logging

Default logging setup

Analysis or test script logging setup

This structure should be used for the analysis or test script. The sim.configure_logging method does all configuration of the logger and takes in a logfile_prefix. By default this outputs a logfile in TLOmodel/outputs in the form "<logfile_prefix>__<YYYY-MM-DD_HH:MM:SS>.log".

This logfile path is returned for reading in the logfile later on.

# Establish the simulation object
start_date = Date(year=2010, month=1, day=1)
end_date = Date(year=2010, month=12, day=31)
popsize = 2000
sim = Simulation(start_date=start_date)

# Register the appropriate modules``
sim.register(demography.Demography(resourcefilepath=resourcefilepath))
sim.register(contraception.Contraception(resourcefilepath=resourcefilepath))

# configure logging after registering modules
logfile = sim.configure_logging(filename="LogFile")

# Run the simulation
sim.seed_rngs(0)
sim.make_initial_population(n=popsize)
sim.simulate(end_date=end_date)
# No filehandler flushing is required
# ...

Disease module logging setup

When using logging within a disease module, you just need to make sure you're importing logging from tlo

from tlo import logging
# usage in the rest of the script is the same

Customising logging setup

Custom output directory for logfile

The default logging directory is .outputs/, this can be set using the directory argument:

custom_dir = "./path/to/custom/dir/"

# configure logging after registering modules
logfile = sim.configure_logging(filename="LogFile", directory=custom_dir)

In a test you would do this using the tmpdir fixture like so:

def test_example(tmpdir):
    # Establish the simulation object
    sim = Simulation(start_date=start_date)
    # ...   
    # Register modules
    # ...
    logfile = sim.configure_logging(filename="log", directory=tmpdir)
    # Run the simulation
    # ...
    # 
    # read the results
    logfile = parse_log_file(f)

Only show logging for specific modules

You can pass in a dictionary of logging levels to the configure_logging() method, with '*' being a wildcard for all disease modules. Because this is evaluated in the order that the dictionary is in, you can disable all disease modules, then enable just the ones that you are interested in.

from tlo import logging
# ...

# Establish the simulation object
# ...   
# Register modules
sim.register(demography.Demography(resourcefilepath=resourcefilepath))
sim.register(contraception.Contraception(resourcefilepath=resourcefilepath))
sim.register(enhanced_lifestyle.Lifestyle(resourcefilepath=resourcefilepath))
sim.register(healthsystem.HealthSystem(resourcefilepath=resourcefilepath,
                                       service_availability=service_availability,
                                       capabilities_coefficient=1.0,
                                       mode_appt_constraints=2))
sim.register(symptommanager.SymptomManager(resourcefilepath=resourcefilepath))
sim.register(healthseekingbehaviour.HealthSeekingBehaviour())
sim.register(dx_algorithm_child.DxAlgorithmChild())
sim.register(mockitis.Mockitis())
sim.register(chronicsyndrome.ChronicSyndrome())logfile = sim.configure_logging("test_log", output_dir=tmpdir)

# Here, I only an interested in logging the output from mockitis and symptom manager:
custom_levels = {
    '*': logging.CRITICAL,  # disable logging for all modules
    'tlo.methods.mockitis': logging.INFO,  # enable logging at INFO level
    'tlo.methods.symptommanager': logging.INFO,  # enable logging at INFO level

}

# use the custom levels in the configuration of the logging
logfile = sim.configure_logging(filename="LogFile", custom_levels=custom_levels)


# Run the simulation
# ...
# 
# read the results
logfile = parse_log_file(f)

Compare outputs of model with data

When a simulation is run, it is often useful to compare the number of deaths from a particular cause with the GBD calibration data. Do this using a utility function as;

# Get comparison
comparison = compare_number_of_deaths(logfile=sim.log_filepath, resourcefilepath=resourcefilepath)

# Make a simple bar chart
comparison.loc[('2015-2019', slice(None), '0-4', 'Childhood Diarrhoea')].sum().plot.bar()
plt.title('Deaths per year due to Childhood Diarrhoea, 2015-2019')
plt.tight_layout()
plt.show()

Logging based on cumulative (TODO)

TODO: Insert code to show how logging can be the cumulative since the last logging event

Set logging level to be different for the screen and the file

TODO: e.g debug to screen, info to file; nothing to screen, info to file

common_usage

These are some common usage patterns in disease modules:

  • Requesting consumables and interpreting the resposne
  • Rigorous testing of basic logic of the module
  • Building and referring to LinearModels
  • Using 'tmp_' properties whilst you wait for something to be done by someone else
  • Checking that conditions are satisfied at the beginning of an event

analysis_scripts

  • General (current) approach to demonstration of the intervention have an effect and exploratory analysis
  • Using outputs from demography (e.g. deaths, population sizes) to compute statistics.
  • Calculating incidence

common_do_and_dont

  • Do: Make things safe for small populations
    • Check for zero before dividing
    • Checking if indices of 'people' to do something on are empty
  • Do put 'analysis' code in analysis script and helper functions
  • Do put file formatting in formatting file and not in the module itself.
  • Do Not: hard code any parameter
  • Do not: have too many properties that overlap
Clone this wiki locally