Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop non-requested variables from datasets #448

Open
aulemahal opened this issue Feb 21, 2022 · 1 comment
Open

Drop non-requested variables from datasets #448

aulemahal opened this issue Feb 21, 2022 · 1 comment

Comments

@aulemahal
Copy link
Contributor

Is your feature request related to a problem? Please describe.
When using the catalog with a derived variable registry and calling to_dataset_dict, all variables that were needed for the derived variable computation are in the final datasets, even if they were not requested directly.

Describe the solution you'd like
In the following imaginary example, say we have datasets for tasmin, tasmax and pr and a function for tas = func(tasmin, tasmax) in the DVR.

subcat = cat.search(variable=['tas', 'pr']) 
subcat.to_dataset_dict()

I would expect the ouput dataset to only include tas and pr, but with the current intake-esm tasmin and tasmax are also returned.

Describe alternatives you've considered

  1. Wrapping it myself. Remembering what I wanted in a list and dropping the rest.
  2. Instead of passing _requested_variables to the ESMDataSource objects, perform that subset in to_dataset_dict directly, after the derived variables have been created.
  3. Add an aditionnal _needed_variables that would store a list of variables including dependencies, while _requested_variables would not list dependencies. And a subset would be performed by to_dataset_dict after the derived variable creation.
@RondeauG
Copy link
Contributor

RondeauG commented Feb 22, 2022

There is already a dependents variable created during the search() function, but it gets merged into _requested_variables and the information is lost afterwards.

Would it be a realistic solution to decouple _dependent_variables and _requested_variables, using both when necessary to open the datasets and extract data, and then only keep the _requested_variables for the final output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants