Vectorize population calibration to improve calculation speeds #11

AndreasSahlberg · 2019-11-24T19:51:17Z

Lines 1060 to 1153 in 5aab584

    
               def calibrate_pop_and_urban(self, pop_actual, pop_future_high, pop_future_low, urban_current, urban_future, 
        
                                           start_year, end_year, intermediate_year): 
        
                   """ 
        
                   Calibrate the actual current population, the urban split and forecast the future population 
        
                   """ 
        
                   logging.info('Calibrate current population') 
        
                   project_life = end_year - start_year 
        
                   # Calculate the ratio between the actual population and the total population from the GIS layer 
        
                   pop_ratio = pop_actual / self.df[SET_POP].sum() 
        
                   # And use this ratio to calibrate the population in a new column 
        
                   self.df[SET_POP_CALIB] = self.df.apply(lambda row: row[SET_POP] * pop_ratio, axis=1) 
        
                   self.df[SET_ELEC_POP_CALIB] = self.df[SET_ELEC_POP] * pop_ratio 
        
                   if max(self.df[SET_URBAN]) == 3:  # THIS OPTION IS CURRENTLY DISABLED 
        
                       calibrate = True if 'n' in input( 
        
                           'Use urban definition from GIS layer <y/n> (n=model calibration):') else False 
        
                   else: 
        
                       calibrate = True 
        
                   # RUN_PARAM: This is where manual calibration of urban/rural population takes place. 
        
                   # The model uses 0, 1, 2 as GHS population layer does. 
        
                   # As of this version, urban are only rows with value equal to 2 
        
                   if calibrate: 
        
                       urban_modelled = 2 
        
                       factor = 1 
        
                       while abs(urban_modelled - urban_current) > 0.01: 
        
                           self.df[SET_URBAN] = 0 
        
                           self.df.loc[(self.df[SET_POP_CALIB] > 5000 * factor) & ( 
        
                                   self.df[SET_POP_CALIB] / self.df[SET_GRID_CELL_AREA] > 350 * factor), SET_URBAN] = 1 
        
                           self.df.loc[(self.df[SET_POP_CALIB] > 50000 * factor) & ( 
        
                                   self.df[SET_POP_CALIB] / self.df[SET_GRID_CELL_AREA] > 1500 * factor), SET_URBAN] = 2 
        
                           pop_urb = self.df.loc[self.df[SET_URBAN] > 1, SET_POP_CALIB].sum() 
        
                           urban_modelled = pop_urb / pop_actual 
        
                           if urban_modelled > urban_current: 
        
                               factor *= 1.1 
        
                           else: 
        
                               factor *= 0.9 
        
                   # Get the calculated urban ratio, and limit it to within reasonable boundaries 
        
                   pop_urb = self.df.loc[self.df[SET_URBAN] > 1, SET_POP_CALIB].sum() 
        
                   urban_modelled = pop_urb / pop_actual 
        
                   if abs(urban_modelled - urban_current) > 0.01: 
        
                       print('The modelled urban ratio is {:.2f}. ' 
        
                             'In case this is not acceptable please revise this part of the code'.format(urban_modelled)) 
        
                   # Project future population, with separate growth rates for urban and rural 
        
                   logging.info('Project future population') 
        
                   if calibrate: 
        
                       urban_growth_high = (urban_future * pop_future_high) / (urban_modelled * pop_actual) 
        
                       rural_growth_high = ((1 - urban_future) * pop_future_high) / ((1 - urban_modelled) * pop_actual) 
        
                       yearly_urban_growth_rate_high = urban_growth_high ** (1 / project_life) 
        
                       yearly_rural_growth_rate_high = rural_growth_high ** (1 / project_life) 
        
                       urban_growth_low = (urban_future * pop_future_low) / (urban_modelled * pop_actual) 
        
                       rural_growth_low = ((1 - urban_future) * pop_future_low) / ((1 - urban_modelled) * pop_actual) 
        
                       yearly_urban_growth_rate_low = urban_growth_low ** (1 / project_life) 
        
                       yearly_rural_growth_rate_low = rural_growth_low ** (1 / project_life) 
        
                   else: 
        
                       urban_growth_high = pop_future_high / pop_actual 
        
                       rural_growth_high = pop_future_high / pop_actual 
        
                       yearly_urban_growth_rate_high = urban_growth_high ** (1 / project_life) 
        
                       yearly_rural_growth_rate_high = rural_growth_high ** (1 / project_life) 
        
                       urban_growth_low = pop_future_low / pop_actual 
        
                       rural_growth_low = pop_future_low / pop_actual 
        
                       yearly_urban_growth_rate_low = urban_growth_low ** (1 / project_life) 
        
                       yearly_rural_growth_rate_low = rural_growth_low ** (1 / project_life) 
        
                   # RUN_PARAM: Define here the years for which results should be provided in the output file. 
        
                   yearsofanalysis = [intermediate_year, end_year] 
        
                   for year in yearsofanalysis: 
        
                       self.df[SET_POP + "{}".format(year) + 'High'] = self.df.apply(lambda row: row[SET_POP_CALIB] * 
        
                                                                                                 (yearly_urban_growth_rate_high ** 
        
                                                                                                  (year - start_year)) 
        
                       if row[SET_URBAN] > 1 
        
                       else row[SET_POP_CALIB] * 
        
                            (yearly_rural_growth_rate_high ** (year - start_year)), axis=1) 
        
                       self.df[SET_POP + "{}".format(year) + 'Low'] = self.df.apply(lambda row: row[SET_POP_CALIB] * 
        
                                                                                                (yearly_urban_growth_rate_low ** 
        
                                                                                                 (year - start_year)) 
        
                       if row[SET_URBAN] > 1 
        
                       else row[SET_POP_CALIB] * 
        
                            (yearly_rural_growth_rate_low ** (year - start_year)), axis=1) 
        
                   self.df[SET_POP + "{}".format(start_year)] = self.df.apply(lambda row: row[SET_POP_CALIB], axis=1) 
        
                   return urban_modelled

This issue is also related to issue 10

AndreasSahlberg · 2019-11-26T09:56:51Z

There are several case where using the "apply" method, which can be improved by working directly with the entires Series/DataFrame

AndreasSahlberg added the performance label Nov 26, 2019

AndreasSahlberg changed the title ~~Vectorize to improve calculation speeds~~ Vectorize population calibration to improve calculation speeds Nov 26, 2019

oluchee added this to To do in OnSSET Hackaton 2019 Nov 26, 2019

oluchee assigned oluchee and unassigned oluchee Jan 7, 2020

AndreasSahlberg mentioned this issue Jan 4, 2022

Pop calibration #141

Merged

AndreasSahlberg closed this as completed Jan 5, 2022

OnSSET Hackaton 2019 automation moved this from To do to Done Jan 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize population calibration to improve calculation speeds #11

Vectorize population calibration to improve calculation speeds #11

AndreasSahlberg commented Nov 24, 2019 •

edited

Loading

AndreasSahlberg commented Nov 26, 2019

Vectorize population calibration to improve calculation speeds #11

Vectorize population calibration to improve calculation speeds #11

Comments

AndreasSahlberg commented Nov 24, 2019 • edited Loading

AndreasSahlberg commented Nov 26, 2019

AndreasSahlberg commented Nov 24, 2019 •

edited

Loading