Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean way to store multiple objects in display object #11

Closed
rcjackson opened this issue Mar 24, 2019 · 7 comments
Closed

Clean way to store multiple objects in display object #11

rcjackson opened this issue Mar 24, 2019 · 7 comments

Comments

@rcjackson
Copy link
Collaborator

Another issue that I need to get working on is to have a cleaner way to plot data from multiple objects at a time in one display. Right now you have to merge objects, but it would be nicer to have the display object natively support the display of data from more than one object at a time so that the user does not have to make a new object and hog up memory and resources.

@kenkehoe
Copy link
Contributor

When you use xr.align it will create a tuple of objects that have each object aligned with the time dimension. Since xarray already has this idea of a tuple of xarray objects I suggest looking into this method of containerizing multiple xarray objects. I have written some code that we will use to extract data from the object (and do some other QC stuff when requested). It can auto detect if the container is an object or a tuple. It will use a second keyword parameter "datastream" to go through the global attributes to get the correct requested variable (since the same variable name can be in multiple objects). Since this would work on tuple or object I don't think we can use object modifiers, and will need to use it as a function. If we use the same concept as xr.align then I think we could have each read datastream object put into a tuple to containerize it so we can pass that single container into a plotting routine. I would like to find a way where we don't require merging or aligning the datasets before making a comparison plot as that can be a lot of extra work to align and add time steps when we don't really need to for just a plot.

@rcjackson
Copy link
Collaborator Author

The way I'm thinking of doing this is using a dictionary with string keys that map to each datastream. For example, if we have 2 xarray objects ds1 and ds2, the input can be:

input_dict = {'ds1_name': ds, 'ds2_name': ds}

In the case of one dataset, I can make the class constructor automatically generate the dictionary based on the datastream name if the user does not provide a dictionary for one file. This then makes it to where I could then have the user specify the dataset name and variable in the plot routine. Since we are just plotting data, no merging or aligning should be needed since matplotlib should automatically account for the different timesteps.

@kenkehoe
Copy link
Contributor

I think the storing of keys in the dictionary is nicer for finding the correct object, but it's different than the current method xarray already implements. Do we want to deviate from the base xarray functionality?

As long as it's documented well enough we can transform a tuple of objects to a dictionary of objects quite simply. I think you plan is worth trying.

@rcjackson
Copy link
Collaborator Author

rcjackson commented Mar 29, 2019 via email

@rcjackson
Copy link
Collaborator Author

If you merge another dataset (or a dictionary including data array objects), by default the resulting dataset will be aligned on the union of all index coordinates:

In [12]: other = xr.Dataset({'bar': ('x', [1, 2, 3, 4]), 'x': list('abcd')})

In [13]: xr.merge([ds, other])
Out[13]:
<xarray.Dataset>
Dimensions: (x: 4, y: 3)
Coordinates:

  • x (x) object 'a' 'b' 'c' 'd'
  • y (y) int64 10 20 30
    Data variables:
    foo (x, y) float64 0.4691 -0.2829 -1.509 -1.136 ... nan nan nan nan
    bar (x) int64 1 2 3 4

This ensures that merge is non-destructive. xarray.MergeError is raised if you attempt to merge two variables with the same name but different values:

xarray raises an error if two variables of the same name occur in separate datasets that are merged:

In [14]: xr.merge([ds, ds + 1])
MergeError: conflicting values for variable 'foo' on objects to be combined:
first value: <xarray.Variable (x: 2, y: 3)>
array([[ 0.4691123 , -0.28286334, -1.5090585 ],
[-1.13563237, 1.21211203, -0.17321465]])
second value: <xarray.Variable (x: 2, y: 3)>
array([[ 1.4691123 , 0.71713666, -0.5090585 ],
[-0.13563237, 2.21211203, 0.82678535]])

The same non-destructive merging between DataArray index coordinates is used.

I know that when looking at aircraft data, a common plot to do is to plot LWC from different sensors on the same timeseries. While we would hope that the LWC would have different names in different datasets, I can see an edge case there where it wouldn't. Therefore, I think using the dictionary would help avoid this from happening so that the user doesn't have to worry about changing variable names.

@kenkehoe
Copy link
Contributor

I think plotting the two or more datasets where they use the same variable name will actually be common. For example plotting all the SGP MET temp_mean values on the same plot. That is where xr.align would work but xr.merge would not. xr.merge puts all data in the same object, while xr.align will keep the objects separate. Also, there will be many cases where the data to be plotted will have different variable names, but there happens to be a variable name common between the datasets. So I think this issue of multiple objects with a common variable name will be a common issue.

We could set the default to only show the name of the datastream with the variable if there is more than one object. I find it helpful to show the datastream name even when the variables are not the same name where there are multiple instruments. I think we should make that a plotting keyword option.

@rcjackson
Copy link
Collaborator Author

We now use a dict to store the datasets in the display object...closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants