-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean way to store multiple objects in display object #11
Comments
When you use xr.align it will create a tuple of objects that have each object aligned with the time dimension. Since xarray already has this idea of a tuple of xarray objects I suggest looking into this method of containerizing multiple xarray objects. I have written some code that we will use to extract data from the object (and do some other QC stuff when requested). It can auto detect if the container is an object or a tuple. It will use a second keyword parameter "datastream" to go through the global attributes to get the correct requested variable (since the same variable name can be in multiple objects). Since this would work on tuple or object I don't think we can use object modifiers, and will need to use it as a function. If we use the same concept as xr.align then I think we could have each read datastream object put into a tuple to containerize it so we can pass that single container into a plotting routine. I would like to find a way where we don't require merging or aligning the datasets before making a comparison plot as that can be a lot of extra work to align and add time steps when we don't really need to for just a plot. |
The way I'm thinking of doing this is using a dictionary with string keys that map to each datastream. For example, if we have 2 xarray objects ds1 and ds2, the input can be: input_dict = {'ds1_name': ds, 'ds2_name': ds} In the case of one dataset, I can make the class constructor automatically generate the dictionary based on the datastream name if the user does not provide a dictionary for one file. This then makes it to where I could then have the user specify the dataset name and variable in the plot routine. Since we are just plotting data, no merging or aligning should be needed since matplotlib should automatically account for the different timesteps. |
I think the storing of keys in the dictionary is nicer for finding the correct object, but it's different than the current method xarray already implements. Do we want to deviate from the base xarray functionality? As long as it's documented well enough we can transform a tuple of objects to a dictionary of objects quite simply. I think you plan is worth trying. |
The only issue I see with using xr.align is if we display two datasets that have the same variable name, but are from different instruments from the same site. In this case, I don't know how the names would be handled by the align.
I'm my draft version of the new display object (which I'll do a PR on today and request a review from you), I have it taking in either a tuple or a dict when taking in multiple datasets. In the case of the tuple, it is still represented as a dict in the object whose keys are the datastream property. If you specify a dict, this makes it to where you don't need a datastream property so that non-ARM netCDF files can be displayed.
…________________________________
From: Ken Kehoe <notifications@github.com>
Sent: Thursday, March 28, 2019 9:03:31 PM
To: ANL-DIGR/ACT
Cc: Jackson, Robert; Author
Subject: Re: [ANL-DIGR/ACT] Clean way to store multiple objects in display object (#11)
I think the storing of keys in the dictionary is nicer for finding the correct object, but it's different than the current method xarray already implements. Do we want to deviate from the base xarray functionality?
As long as it's documented well enough we can transform a tuple of objects to a dictionary of objects quite simply. I think you plan is worth trying.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#11 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AVEAdHhLpj7TF12l724wTvgdtVVVd9akks5vbXRygaJpZM4cFbpI>.
|
If you merge another dataset (or a dictionary including data array objects), by default the resulting dataset will be aligned on the union of all index coordinates: In [12]: other = xr.Dataset({'bar': ('x', [1, 2, 3, 4]), 'x': list('abcd')}) In [13]: xr.merge([ds, other])
This ensures that merge is non-destructive. xarray.MergeError is raised if you attempt to merge two variables with the same name but different values: xarray raises an error if two variables of the same name occur in separate datasets that are merged: In [14]: xr.merge([ds, ds + 1]) The same non-destructive merging between DataArray index coordinates is used. I know that when looking at aircraft data, a common plot to do is to plot LWC from different sensors on the same timeseries. While we would hope that the LWC would have different names in different datasets, I can see an edge case there where it wouldn't. Therefore, I think using the dictionary would help avoid this from happening so that the user doesn't have to worry about changing variable names. |
I think plotting the two or more datasets where they use the same variable name will actually be common. For example plotting all the SGP MET temp_mean values on the same plot. That is where xr.align would work but xr.merge would not. xr.merge puts all data in the same object, while xr.align will keep the objects separate. Also, there will be many cases where the data to be plotted will have different variable names, but there happens to be a variable name common between the datasets. So I think this issue of multiple objects with a common variable name will be a common issue. We could set the default to only show the name of the datastream with the variable if there is more than one object. I find it helpful to show the datastream name even when the variables are not the same name where there are multiple instruments. I think we should make that a plotting keyword option. |
We now use a dict to store the datasets in the display object...closing. |
Another issue that I need to get working on is to have a cleaner way to plot data from multiple objects at a time in one display. Right now you have to merge objects, but it would be nicer to have the display object natively support the display of data from more than one object at a time so that the user does not have to make a new object and hog up memory and resources.
The text was updated successfully, but these errors were encountered: