Add zone data aggregation functionality #460

zptro · 2022-09-05T11:21:44Z

This PR enables the following:

Input files can contain data for zones that are not in the network.
Zone IDs can be different in input files and network.
Zone data can be aggregated (summed) to match network zones. For shares (of population and workplaces), the aggregation is a weighted average with total population or workplaces as weight.

All of this is enabled by adding a zone_mapping.txt file to scenario-input folder, which contains the mapping between input file zones and network zones.

Caveat: Aggregation of shares of detached houses (as share of total building area) is coded as an else clause which calculates a simple average, but that is not a very nice solution. A nicer solution would be to move detach to .pop file and calculate it as a weighted average with population as weight. But this solution will break backwards compatibility.

johpiip · 2022-09-13T14:19:24Z

Hi, thank you for your PR! What is the use case for this? Why would we benefit for having helmet-model-system do the aggregation versus doing it before the model run?

Does helmet_validate_inputfiles.py need to be updated to something like "if aggregation.txt does not exist then x" or "if aggregation.txt exists then y" or is this handled already?

zptro · 2022-09-19T12:59:31Z

Hi, thank you for your PR! What is the use case for this? Why would we benefit for having helmet-model-system do the aggregation versus doing it before the model run?

Here are two use cases:

You have land-use data from the national model and would like to use in a transport forecast for the Helsinki region (Helmet). This means you will have to cut out the Helsinki region data from the national land-use data and do a zone number transformation on it. The models will have different zone numbers (unless Helmet adopts the zone numbers of the national model, that is), because the Helmet zone numbering is too sparse and clashes with other regional models.
You have land-use data from the Helsinki MAL process and would like to use in a national transport forecast. This means you will have to aggregate (sum) the data to match the larger zones in the national model.

Both of these use cases could of course be done with a separate script that produces transformed land-use files. The downside of a separate script is that you would have these transformed land-use files to keep track of. What if the source files were updated, but you forgot to run the transformation script? Would you notice that the transformed land-use files are now outdated? With the built-in transformation, you point directly to the non-transformed files in the model run, so no need to remember which files are transformed versions of which files.

Besides, data aggregation is convenient in pandas, so it makes sense to do it while the data is in a clearly defined pandas format, hence when it is already read into the model system.

zptro · 2022-09-19T13:12:04Z

Does helmet_validate_inputfiles.py need to be updated to something like "if aggregation.txt does not exist then x" or "if aggregation.txt exists then y" or is this handled already?

helmet_validate_inputfiles.py utilizes read_csv_file. The transformation is run only if the file exists:

helmet-model-system/Scripts/utils/read_csv_file.py

Line 65 in 05f6b42

if os.path.exists(aggr_path):

If the wrong aggregation.txt file is used, the resulting tables will not match the network and validation will fail. And if there are errors in the non-transformed data, these error will be passed on to the aggregated data and cause validation to fail.

johpiip · 2022-09-19T14:53:05Z

Thank you for the argumentation! :) Let me know when you need help with the PR!

johpiip · 2023-01-03T13:34:29Z

My apologies for returning to this so late. I have the model run running, and I will approve this by the end of the week if is shows no glitches.

johpiip · 2023-01-05T10:40:30Z

Hmm, results for basic 2018 model run are different than from this PR and the other one #477. However, the results from both PRs are the same. I'm suspecting some weird error in my basic 2018 run, so I will make new runs and return to this once they are finished. Sorry for the delay.

johpiip

Results look good, nice work!

Add zone data aggregation functionality

bdb055f

zptro requested a review from johpiip September 5, 2022 11:21

Move if clause and suppress warning

05f6b42

zptro added 3 commits September 20, 2022 22:42

Make code more easy to read

90e0fbb

Change file name

5e70b3b

Hsndle detached house share as mean

22c90b7

zptro marked this pull request as ready for review September 23, 2022 06:34

zptro requested review from johpiip and removed request for johpiip September 23, 2022 06:35

Merge branch 'olusanya' into feat/zonedata_aggregation

e296ffc

johpiip approved these changes Jan 13, 2023

View reviewed changes

johpiip merged commit a829d56 into olusanya Jan 13, 2023

johpiip deleted the feat/zonedata_aggregation branch January 13, 2023 13:33

johpiip added this to the v4.1.2 milestone Jan 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add zone data aggregation functionality #460

Add zone data aggregation functionality #460

zptro commented Sep 5, 2022 •

edited

Loading

johpiip commented Sep 13, 2022

zptro commented Sep 19, 2022

zptro commented Sep 19, 2022

johpiip commented Sep 19, 2022

johpiip commented Jan 3, 2023

johpiip commented Jan 5, 2023

johpiip left a comment

Add zone data aggregation functionality #460

Add zone data aggregation functionality #460

Conversation

zptro commented Sep 5, 2022 • edited Loading

johpiip commented Sep 13, 2022

zptro commented Sep 19, 2022

zptro commented Sep 19, 2022

johpiip commented Sep 19, 2022

johpiip commented Jan 3, 2023

johpiip commented Jan 5, 2023

johpiip left a comment

Choose a reason for hiding this comment

zptro commented Sep 5, 2022 •

edited

Loading