Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize population calibration to improve calculation speeds #11

Closed
AndreasSahlberg opened this issue Nov 24, 2019 · 1 comment
Closed

Comments

@AndreasSahlberg
Copy link
Contributor

AndreasSahlberg commented Nov 24, 2019

onsset/onsset/onsset.py

Lines 1060 to 1153 in 5aab584

def calibrate_pop_and_urban(self, pop_actual, pop_future_high, pop_future_low, urban_current, urban_future,
start_year, end_year, intermediate_year):
"""
Calibrate the actual current population, the urban split and forecast the future population
"""
logging.info('Calibrate current population')
project_life = end_year - start_year
# Calculate the ratio between the actual population and the total population from the GIS layer
pop_ratio = pop_actual / self.df[SET_POP].sum()
# And use this ratio to calibrate the population in a new column
self.df[SET_POP_CALIB] = self.df.apply(lambda row: row[SET_POP] * pop_ratio, axis=1)
self.df[SET_ELEC_POP_CALIB] = self.df[SET_ELEC_POP] * pop_ratio
if max(self.df[SET_URBAN]) == 3: # THIS OPTION IS CURRENTLY DISABLED
calibrate = True if 'n' in input(
'Use urban definition from GIS layer <y/n> (n=model calibration):') else False
else:
calibrate = True
# RUN_PARAM: This is where manual calibration of urban/rural population takes place.
# The model uses 0, 1, 2 as GHS population layer does.
# As of this version, urban are only rows with value equal to 2
if calibrate:
urban_modelled = 2
factor = 1
while abs(urban_modelled - urban_current) > 0.01:
self.df[SET_URBAN] = 0
self.df.loc[(self.df[SET_POP_CALIB] > 5000 * factor) & (
self.df[SET_POP_CALIB] / self.df[SET_GRID_CELL_AREA] > 350 * factor), SET_URBAN] = 1
self.df.loc[(self.df[SET_POP_CALIB] > 50000 * factor) & (
self.df[SET_POP_CALIB] / self.df[SET_GRID_CELL_AREA] > 1500 * factor), SET_URBAN] = 2
pop_urb = self.df.loc[self.df[SET_URBAN] > 1, SET_POP_CALIB].sum()
urban_modelled = pop_urb / pop_actual
if urban_modelled > urban_current:
factor *= 1.1
else:
factor *= 0.9
# Get the calculated urban ratio, and limit it to within reasonable boundaries
pop_urb = self.df.loc[self.df[SET_URBAN] > 1, SET_POP_CALIB].sum()
urban_modelled = pop_urb / pop_actual
if abs(urban_modelled - urban_current) > 0.01:
print('The modelled urban ratio is {:.2f}. '
'In case this is not acceptable please revise this part of the code'.format(urban_modelled))
# Project future population, with separate growth rates for urban and rural
logging.info('Project future population')
if calibrate:
urban_growth_high = (urban_future * pop_future_high) / (urban_modelled * pop_actual)
rural_growth_high = ((1 - urban_future) * pop_future_high) / ((1 - urban_modelled) * pop_actual)
yearly_urban_growth_rate_high = urban_growth_high ** (1 / project_life)
yearly_rural_growth_rate_high = rural_growth_high ** (1 / project_life)
urban_growth_low = (urban_future * pop_future_low) / (urban_modelled * pop_actual)
rural_growth_low = ((1 - urban_future) * pop_future_low) / ((1 - urban_modelled) * pop_actual)
yearly_urban_growth_rate_low = urban_growth_low ** (1 / project_life)
yearly_rural_growth_rate_low = rural_growth_low ** (1 / project_life)
else:
urban_growth_high = pop_future_high / pop_actual
rural_growth_high = pop_future_high / pop_actual
yearly_urban_growth_rate_high = urban_growth_high ** (1 / project_life)
yearly_rural_growth_rate_high = rural_growth_high ** (1 / project_life)
urban_growth_low = pop_future_low / pop_actual
rural_growth_low = pop_future_low / pop_actual
yearly_urban_growth_rate_low = urban_growth_low ** (1 / project_life)
yearly_rural_growth_rate_low = rural_growth_low ** (1 / project_life)
# RUN_PARAM: Define here the years for which results should be provided in the output file.
yearsofanalysis = [intermediate_year, end_year]
for year in yearsofanalysis:
self.df[SET_POP + "{}".format(year) + 'High'] = self.df.apply(lambda row: row[SET_POP_CALIB] *
(yearly_urban_growth_rate_high **
(year - start_year))
if row[SET_URBAN] > 1
else row[SET_POP_CALIB] *
(yearly_rural_growth_rate_high ** (year - start_year)), axis=1)
self.df[SET_POP + "{}".format(year) + 'Low'] = self.df.apply(lambda row: row[SET_POP_CALIB] *
(yearly_urban_growth_rate_low **
(year - start_year))
if row[SET_URBAN] > 1
else row[SET_POP_CALIB] *
(yearly_rural_growth_rate_low ** (year - start_year)), axis=1)
self.df[SET_POP + "{}".format(start_year)] = self.df.apply(lambda row: row[SET_POP_CALIB], axis=1)
return urban_modelled

This issue is also related to issue 10

@AndreasSahlberg AndreasSahlberg changed the title Vectorize to improve calculation speeds Vectorize population calibration to improve calculation speeds Nov 26, 2019
@AndreasSahlberg
Copy link
Contributor Author

There are several case where using the "apply" method, which can be improved by working directly with the entires Series/DataFrame

@oluchee oluchee added this to To do in OnSSET Hackaton 2019 Nov 26, 2019
@oluchee oluchee assigned oluchee and unassigned oluchee Jan 7, 2020
OnSSET Hackaton 2019 automation moved this from To do to Done Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants