Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for allowed regions #26

Closed
phackstock opened this issue Oct 14, 2021 · 2 comments · Fixed by #35
Closed

Check for allowed regions #26

phackstock opened this issue Oct 14, 2021 · 2 comments · Fixed by #35
Assignees
Labels
enhancement New feature or request

Comments

@phackstock
Copy link
Contributor

Building on #22, the next step should be to integrate RegionAggregationMapping with DataStructureDefinition in order to verify that all specified native regions as well as common regions are allowed.

@phackstock
Copy link
Contributor Author

Some thoughts and/or open questions on the implementation:

  • In principle we don't need to implement anything as we already have DataStructureDefinition.validate(df) which we could just run after all renaming and/or region aggregation to check whether the all the native and common regions are allowed. However, that would mean running a potentially illegal region aggregation mapping and only finding out after the fact. Additionally, we of course don't need to run any renaming or aggregation to find out if there is a problem. This can be determined by looking solely at the mapping for the current model and the DatastructureDefinition. That leads to the my first question:
    • At what point and how do we want to check the validity of the mappings? In my view there are several options:
      • Package it in a GitHub action that runs upon any changes happening in the mapping directory or to any of the region definitions
      • Run a check of all mapping files at the beginning of each workflow. Advantage: Easy to implement; disadvantage: A faulty mapping could also affect an entirely different model which is uploaded
      • Run through the models in the given data frame one by one and check if the mapping for each model is correct.
    • My preference would probably be option 3
  • If we go with option 3 what should the interface for this validation function look like? We could go with an analogue of what's DataStructureDefinition.validate and create DataStructureDefinition.validate_mapping(df, mapping) where mapping might simply be the directory where all mappings live, a dictionary with all the mappings loaded as RegionAggregationMapping or some custom holing object like mentioned in Check for model mapping collisions #27. From an interface point of view I would prefer the first option. We would only provide the mapping directory and everything else is handled inside DataStructureDefinition.validate_mapping.

Would love to hear your thoughts on all that @danielhuppmann.

@danielhuppmann
Copy link
Member

Happy to share my thoughts...

  1. There is a misunderstanding about the aim of the DataStructureDefinition.validate method - it validates an IamDataFrame instance (e.g., an upload by a modelling team) against an instance of the DataStructureDefinition initialized from a project-specific directory (i.e., correct variables and regions). It is not a validation of the nomenclature-yaml files. This (implicitly) already occurs when initializing a DataStructureDefinition instance from a directory.

  2. The guiding question should be: how to execute the region-aggregation? One option that I see is the following:

    dsd = nomenclature.DataStructureDefinition("<path/to/definitions>")
    reg = nomenclature.RegionProcessor("<path/to/definitions>", dsd)
    
    df = pyam.IamDataFrame(df)d
    dsd.validate(df)
    reg.apply(df)

    The validation that all native[-renamed] and common regions are defined in the DataStructureDefinition instance should happen as part of the initialization of the RegionProcessor instance.

    Alternatively, the collection of region-aggregation-mappings could be implemented as a module of the DataStructureDefinition.

  3. When do we check the validity of configuration files and mappings? This should happen in every (project-specific) repository that has such files - executed via GitHub Actions upon a PR or push, so that invalid or inconsistent files are unlikely to end up on the main branch of any repo.

    The nomenclature package already has a testing module (see here), which is currently used as part of the automated testing of the irp-internal-workflow repository (see here). This should be further developed, I'll start an issue.

    Furthermore, the way the code is structured now, all region-mapping files have to be parsed anyway to find the "correct" mapping file when importing the package.

    In short, we want to do 1 and we have to do 2 (unless we restructure the code and have a central "directory" with the mapping of model-to-mapping-file).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants