Clean way to store multiple objects in display object #11

rcjackson · 2019-03-24T10:47:57Z

Another issue that I need to get working on is to have a cleaner way to plot data from multiple objects at a time in one display. Right now you have to merge objects, but it would be nicer to have the display object natively support the display of data from more than one object at a time so that the user does not have to make a new object and hog up memory and resources.

kenkehoe · 2019-03-25T16:43:44Z

When you use xr.align it will create a tuple of objects that have each object aligned with the time dimension. Since xarray already has this idea of a tuple of xarray objects I suggest looking into this method of containerizing multiple xarray objects. I have written some code that we will use to extract data from the object (and do some other QC stuff when requested). It can auto detect if the container is an object or a tuple. It will use a second keyword parameter "datastream" to go through the global attributes to get the correct requested variable (since the same variable name can be in multiple objects). Since this would work on tuple or object I don't think we can use object modifiers, and will need to use it as a function. If we use the same concept as xr.align then I think we could have each read datastream object put into a tuple to containerize it so we can pass that single container into a plotting routine. I would like to find a way where we don't require merging or aligning the datasets before making a comparison plot as that can be a lot of extra work to align and add time steps when we don't really need to for just a plot.

rcjackson · 2019-03-27T12:37:45Z

The way I'm thinking of doing this is using a dictionary with string keys that map to each datastream. For example, if we have 2 xarray objects ds1 and ds2, the input can be:

input_dict = {'ds1_name': ds, 'ds2_name': ds}

In the case of one dataset, I can make the class constructor automatically generate the dictionary based on the datastream name if the user does not provide a dictionary for one file. This then makes it to where I could then have the user specify the dataset name and variable in the plot routine. Since we are just plotting data, no merging or aligning should be needed since matplotlib should automatically account for the different timesteps.

kenkehoe · 2019-03-29T02:03:30Z

I think the storing of keys in the dictionary is nicer for finding the correct object, but it's different than the current method xarray already implements. Do we want to deviate from the base xarray functionality?

As long as it's documented well enough we can transform a tuple of objects to a dictionary of objects quite simply. I think you plan is worth trying.

rcjackson · 2019-03-29T14:26:16Z

The only issue I see with using xr.align is if we display two datasets that have the same variable name, but are from different instruments from the same site. In this case, I don't know how the names would be handled by the align. I'm my draft version of the new display object (which I'll do a PR on today and request a review from you), I have it taking in either a tuple or a dict when taking in multiple datasets. In the case of the tuple, it is still represented as a dict in the object whose keys are the datastream property. If you specify a dict, this makes it to where you don't need a datastream property so that non-ARM netCDF files can be displayed.

…

________________________________ From: Ken Kehoe <notifications@github.com> Sent: Thursday, March 28, 2019 9:03:31 PM To: ANL-DIGR/ACT Cc: Jackson, Robert; Author Subject: Re: [ANL-DIGR/ACT] Clean way to store multiple objects in display object (#11) I think the storing of keys in the dictionary is nicer for finding the correct object, but it's different than the current method xarray already implements. Do we want to deviate from the base xarray functionality? As long as it's documented well enough we can transform a tuple of objects to a dictionary of objects quite simply. I think you plan is worth trying. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#11 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AVEAdHhLpj7TF12l724wTvgdtVVVd9akks5vbXRygaJpZM4cFbpI>.

rcjackson · 2019-03-29T14:34:28Z

If you merge another dataset (or a dictionary including data array objects), by default the resulting dataset will be aligned on the union of all index coordinates:

In [12]: other = xr.Dataset({'bar': ('x', [1, 2, 3, 4]), 'x': list('abcd')})

In [13]: xr.merge([ds, other])
Out[13]:
<xarray.Dataset>
Dimensions: (x: 4, y: 3)
Coordinates:

x (x) object 'a' 'b' 'c' 'd'
y (y) int64 10 20 30
Data variables:
foo (x, y) float64 0.4691 -0.2829 -1.509 -1.136 ... nan nan nan nan
bar (x) int64 1 2 3 4

This ensures that merge is non-destructive. xarray.MergeError is raised if you attempt to merge two variables with the same name but different values:

xarray raises an error if two variables of the same name occur in separate datasets that are merged:

In [14]: xr.merge([ds, ds + 1])
MergeError: conflicting values for variable 'foo' on objects to be combined:
first value: <xarray.Variable (x: 2, y: 3)>
array([[ 0.4691123 , -0.28286334, -1.5090585 ],
[-1.13563237, 1.21211203, -0.17321465]])
second value: <xarray.Variable (x: 2, y: 3)>
array([[ 1.4691123 , 0.71713666, -0.5090585 ],
[-0.13563237, 2.21211203, 0.82678535]])

The same non-destructive merging between DataArray index coordinates is used.

I know that when looking at aircraft data, a common plot to do is to plot LWC from different sensors on the same timeseries. While we would hope that the LWC would have different names in different datasets, I can see an edge case there where it wouldn't. Therefore, I think using the dictionary would help avoid this from happening so that the user doesn't have to worry about changing variable names.

kenkehoe · 2019-03-29T17:05:00Z

I think plotting the two or more datasets where they use the same variable name will actually be common. For example plotting all the SGP MET temp_mean values on the same plot. That is where xr.align would work but xr.merge would not. xr.merge puts all data in the same object, while xr.align will keep the objects separate. Also, there will be many cases where the data to be plotted will have different variable names, but there happens to be a variable name common between the datasets. So I think this issue of multiple objects with a common variable name will be a common issue.

We could set the default to only show the name of the datastream with the variable if there is more than one object. I find it helpful to show the datastream name even when the variables are not the same name where there are multiple instruments. I think we should make that a plotting keyword option.

rcjackson · 2019-04-11T20:04:56Z

We now use a dict to store the datasets in the display object...closing.

rcjackson mentioned this issue Mar 29, 2019

Display object reorganization. #16

Merged

rcjackson closed this as completed Apr 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean way to store multiple objects in display object #11

Clean way to store multiple objects in display object #11

rcjackson commented Mar 24, 2019

kenkehoe commented Mar 25, 2019

rcjackson commented Mar 27, 2019

kenkehoe commented Mar 29, 2019

rcjackson commented Mar 29, 2019 via email

rcjackson commented Mar 29, 2019

kenkehoe commented Mar 29, 2019

rcjackson commented Apr 11, 2019

Clean way to store multiple objects in display object #11

Clean way to store multiple objects in display object #11

Comments

rcjackson commented Mar 24, 2019

kenkehoe commented Mar 25, 2019

rcjackson commented Mar 27, 2019

kenkehoe commented Mar 29, 2019

rcjackson commented Mar 29, 2019 via email

rcjackson commented Mar 29, 2019

kenkehoe commented Mar 29, 2019

rcjackson commented Apr 11, 2019