Skip to content

Schito SchistoInfectionWormBurdenEvent pandas style

Asif Tamuri edited this page Mar 10, 2020 · 7 revisions

Here I try to rewrite the code for SchistoInfectionWormBurdenEvent using more conventional Pandas operations, working on the entire (or larger) sets of data to avoid working on small subsets at a time and looping over data etc.

I'm assuming the following parameters are setup somewhere:

beta_by_age_group = pd.Series([0.3, 1.0, 0.05], index=['PSAC', 'SAC', 'Adults'])
beta_by_age_group.index.name = 'age_group'

R0 = pd.Series({'Balaka': 1.124254835,
 'Blantyre': 1.129557175,
 'Blantyre City': 1.125299162,
 'Chikwawa': 1.124802165,
 'Chiradzulu': 1.141412855,
 'Chitipa': 1.125768303,
 'Dedza': 1.124234464,
 'Dowa': 0.0,
 'Karonga': 1.124209139,
 'Kasungu': 1.12470922,
 'Likoma': 1.131439355,
 'Lilongwe': 1.127042051,
 'Lilongwe City': 1.1243717,
 'Machinga': 1.124287956,
 'Mangochi': 1.12420876,
 'Mchinji': 1.124478478,
 'Mulanje': 1.144959393,
 'Mwanza': 1.125061605,
 'Mzimba': 1.12535666,
 'Mzuzu City': 1.130178098,
 'Neno': 1.126314283,
 'Nkhata Bay': 1.13232802,
 'Nkhotakota': 1.133205764,
 'Nsanje': 1.13558784,
 'Ntcheu': 1.124211331,
 'Ntchisi': 1.124331529,
 'Phalombe': 1.184100577,
 'Rumphi': 0.0,
 'Salima': 1.12872694,
 'Thyolo': 1.124214465,
 'Zomba': 1.126197481,
 'Zomba City': 1.126197481})

Then the following performs the apply method:

df = population.props

where = df.is_alive

age_group = pd.cut(df.loc[where, 'age_years'], [0, 4, 14, 120], labels=['PSAC', 'SAC', 'Adults'], include_lowest=True)
age_group.name = 'age_group'

mean_count_burden_district_age_group = df.loc[where].groupby(['district_of_residence', age_group])['sh_aggregate_worm_burden'].agg([np.mean, np.size])

district_count = df.loc[where].groupby(df.district_of_residence)['district_of_residence'].count()

beta_contribution_to_reservoir = mean_count_burden_district_age_group['mean'] * beta_by_age_group

to_get_weighted_mean = mean_count_burden_district_age_group['size'] / district_count

age_worm_burden = beta_contribution_to_reservoir * to_get_weighted_mean

reservoir = age_worm_burden.groupby(['district_of_residence']).sum()

contact_rates = age_group.map(beta_by_age_group)

harbouring_rates = df.loc[where, 'sh_harbouring_rate']

rates = harbouring_rates * contact_rates

worms_total = reservoir * R0

draw_worms = pd.Series(np.random.poisson(df.loc[where, 'district_of_residence'].map(worms_total) * rates), index=df.index[where])

param_worm_fecundity = 0.005  # params['worms_fecundity']

established = np.random.random(size=sum(where)) < np.exp(df.loc[where, 'sh_aggregate_worm_burden'] * -param_worm_fecundity)

to_establish = pd.DataFrame({'new_worms': draw_worms[(draw_worms > 0) & established]})

sim_date = pd.Timestamp.now()   # <-- a dummy bit of code for testing

to_establish['date_maturation'] = sim_date + pd.to_timedelta(np.random.randint(30, 55, size=len(to_establish)), unit='D')

for index, row in to_establish.iterrows():
    self.sim.schedule(SchistoMatureWorms(self.module, person_id=index, new_worms=row.new_worms), row.date_maturation)