Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zone data aggregation functionality #460

Merged
merged 6 commits into from
Jan 13, 2023
Merged

Conversation

zptro
Copy link
Collaborator

@zptro zptro commented Sep 5, 2022

This PR enables the following:

  • Input files can contain data for zones that are not in the network.
  • Zone IDs can be different in input files and network.
  • Zone data can be aggregated (summed) to match network zones. For shares (of population and workplaces), the aggregation is a weighted average with total population or workplaces as weight.

All of this is enabled by adding a zone_mapping.txt file to scenario-input folder, which contains the mapping between input file zones and network zones.

Caveat: Aggregation of shares of detached houses (as share of total building area) is coded as an else clause which calculates a simple average, but that is not a very nice solution. A nicer solution would be to move detach to .pop file and calculate it as a weighted average with population as weight. But this solution will break backwards compatibility.

@zptro zptro requested a review from johpiip September 5, 2022 11:21
@johpiip
Copy link
Contributor

johpiip commented Sep 13, 2022

Hi, thank you for your PR! What is the use case for this? Why would we benefit for having helmet-model-system do the aggregation versus doing it before the model run?

Does helmet_validate_inputfiles.py need to be updated to something like "if aggregation.txt does not exist then x" or "if aggregation.txt exists then y" or is this handled already?

@zptro
Copy link
Collaborator Author

zptro commented Sep 19, 2022

Hi, thank you for your PR! What is the use case for this? Why would we benefit for having helmet-model-system do the aggregation versus doing it before the model run?

Here are two use cases:

  • You have land-use data from the national model and would like to use in a transport forecast for the Helsinki region (Helmet). This means you will have to cut out the Helsinki region data from the national land-use data and do a zone number transformation on it. The models will have different zone numbers (unless Helmet adopts the zone numbers of the national model, that is), because the Helmet zone numbering is too sparse and clashes with other regional models.
  • You have land-use data from the Helsinki MAL process and would like to use in a national transport forecast. This means you will have to aggregate (sum) the data to match the larger zones in the national model.

Both of these use cases could of course be done with a separate script that produces transformed land-use files. The downside of a separate script is that you would have these transformed land-use files to keep track of. What if the source files were updated, but you forgot to run the transformation script? Would you notice that the transformed land-use files are now outdated? With the built-in transformation, you point directly to the non-transformed files in the model run, so no need to remember which files are transformed versions of which files.

Besides, data aggregation is convenient in pandas, so it makes sense to do it while the data is in a clearly defined pandas format, hence when it is already read into the model system.

@zptro
Copy link
Collaborator Author

zptro commented Sep 19, 2022

Does helmet_validate_inputfiles.py need to be updated to something like "if aggregation.txt does not exist then x" or "if aggregation.txt exists then y" or is this handled already?

helmet_validate_inputfiles.py utilizes read_csv_file. The transformation is run only if the file exists:

if os.path.exists(aggr_path):

If the wrong aggregation.txt file is used, the resulting tables will not match the network and validation will fail. And if there are errors in the non-transformed data, these error will be passed on to the aggregated data and cause validation to fail.

@johpiip
Copy link
Contributor

johpiip commented Sep 19, 2022

Thank you for the argumentation! :) Let me know when you need help with the PR!

@zptro zptro marked this pull request as ready for review September 23, 2022 06:34
@zptro zptro requested review from johpiip and removed request for johpiip September 23, 2022 06:35
@johpiip
Copy link
Contributor

johpiip commented Jan 3, 2023

My apologies for returning to this so late. I have the model run running, and I will approve this by the end of the week if is shows no glitches.

@johpiip
Copy link
Contributor

johpiip commented Jan 5, 2023

Hmm, results for basic 2018 model run are different than from this PR and the other one #477. However, the results from both PRs are the same. I'm suspecting some weird error in my basic 2018 run, so I will make new runs and return to this once they are finished. Sorry for the delay.

Copy link
Contributor

@johpiip johpiip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results look good, nice work!

@johpiip johpiip merged commit a829d56 into olusanya Jan 13, 2023
@johpiip johpiip deleted the feat/zonedata_aggregation branch January 13, 2023 13:33
@johpiip johpiip added this to the v4.1.2 milestone Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants