Addressed feedback. #1

mmccarty · 2018-03-27T21:03:29Z

No description provided.

mmccarty · 2018-03-27T23:04:40Z

intake_netcdf/__init__.py


    def read_chunked(self):
-        raise Exception('read_chunked not supported for xarray containers.')
+        self._load_metadata()
+        return self._ds

    def read_partition(self, i):
        raise Exception('read_partition not supported for xarray containers.')


@martindurant read_partition is not implemented yet. I'm wondering if we need to support it.

Right. To my mind, it should be implemented at some point, but the input is a tuple. I would keep an exception for now (but should be NotImplemented, because we may do it later).

mmccarty · 2018-03-27T23:05:09Z

intake_netcdf/__init__.py


-    def __init__(self, urlpath, xarray_kwargs=None, metadata=None):
+    def __init__(self, urlpath, chunks, xarray_kwargs=None, metadata=None):


Added chunks to signature and doc string.

mmccarty · 2018-03-27T23:05:27Z

intake_netcdf/__init__.py

@@ -70,17 +77,17 @@ def _get_schema(self):

    def read(self):
        self._load_metadata()
-        return self._ds
+        return self._ds.load()


Loads all the data.

mmccarty · 2018-03-27T23:07:26Z

intake_netcdf/__init__.py


    def read_partition(self, i):
        raise Exception('read_partition not supported for xarray containers.')

    def to_dask(self):
-        self._load_metadata()
-        return self._ds.to_dask_dataframe()
+        return self.read_chunked()


read_chunked is will return the dataset as a lazy container, which I believe is also what we want for the to_dask case.

Agree, this should just return the xarray directly (which points to dask arrays); this would be self._ds if you agree that read_chunked should be an exception.

martindurant

I like what I see here.

I feel some other members of the team should think about the issues around xarray in general.

You should enable Travis on this repo.

martindurant · 2018-03-29T08:16:22Z

intake_netcdf/__init__.py

        self._ds = None
        super(NetCDFSource, self).__init__(
-            container=self.container,
+            container=None,


Not xarray?

The container is not used since we are overloading the read* methods (Intake plugins the "hard" way). I'd rather give no information rather than false information.

martindurant · 2018-03-29T08:17:28Z

intake_netcdf/__init__.py


    def read_chunked(self):
-        raise Exception('read_chunked not supported for xarray containers.')
+        self._load_metadata()
+        return self._ds


Yeah, so this is approximately correct, but it doesn't give something you can iterate over. I'm not sure whether this should be an exception just like get_partition.

martindurant · 2018-03-29T08:18:27Z

intake_netcdf/__init__.py


    def read_partition(self, i):
        raise Exception('read_partition not supported for xarray containers.')

    def to_dask(self):
-        self._load_metadata()
-        return self._ds.to_dask_dataframe()
+        return self.read_chunked()


Agree, this should just return the xarray directly (which points to dask arrays); this would be self._ds if you agree that read_chunked should be an exception.

martindurant · 2018-03-31T13:09:15Z

The travis script seems to be missing install dependencies.

mmccarty · 2018-03-31T13:11:53Z

Yeah. I’m still working on it but I’m on vacation with limited connectivity. I’ll get back to it next week.

On Sat, Mar 31, 2018 at 9:09 AM Martin Durant ***@***.***> wrote: The travis script seems to be missing install dependencies. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAmNGmwkloj73sW66QIielJQlWn5pa36ks5tj3_7gaJpZM4S9ooN> .

-- Mike McCarty Senior Software Engineer 571-317-1628 <(571)%20317-1628> | anaconda.com <https://anaconda.com/> <https://twitter.com/anacondainc> <https://www.linkedin.com/company-beta/2636430/> <https://www.facebook.com/anacondainc/> <https://www.slideshare.net/continuumio>

martindurant · 2018-03-31T13:13:21Z

That's fine, enjoy your relaxation

martindurant · 2018-04-23T02:21:10Z

Notes on our conversation about the future of this PR/repo (cc @jbcrail , but note that @mmccarty will presumably return to work on this when he has time). I believe the following is a reasonable course of action.

a new container type should be acceptable to Intake, "xarray". The builtin functionality in the source class will be overridden.
We must consider carefully what this means for an xarray opened on an Intake server - presumably communication will be the same as an ndarray, which doesn't yet exist (right?). Note that being able to load netCDF/HDF from a remote location would be a huge boon, and there are servers around doing only that job, because it is so useful - can we make it happen? We would need to create each variable as a dask-array, where any chunk calls the server with its multi-dimensional index, and create a local xarray that stores these dask-arrays in the same arrangement and with the same metadata as remotely.
The natural representation of an open xarray is the open xarray object itself, and that is what discover() should return. Also, the arrays should be chunked from the start, so to_dask() is a no-op on that, and read() should call whatever xarray function it is to materialise the data into memory.
This repo should be renamed intake-xarray, and include three separate plugins: netCDF which opens one or more files (these are separate functions in xarray); rasterIO and zarr. The latter is the only one that can actually directly open files remotely, and we should take care to parse s3:, hdfs:, and gcs: and create the mappers that zarr needs (I'll help with that). It would be nice is an unstructured zarr array returns an xarray data-array as opposed to a dataset, although maybe that should be a separate plugin. Note again, that we have no array readers at all, not even numpy (never mind scientific formats)

mmccarty · 2018-04-23T14:40:45Z

Thanks @martindurant I'll try to find spare cycles to make progress. I went ahead an renamed this GH repo.

We should probably go ahead and merge this PR and create issues to follow up. For tracking, I moved this bullet list to an issue and created an issue for fixing CI.

martindurant · 2018-04-23T14:47:44Z

Merging to allow further progress in subsequent PRs.

Addressed feedback.

c972137

mmccarty commented Mar 27, 2018

View reviewed changes

mmccarty requested a review from martindurant March 27, 2018 23:08

mmccarty self-assigned this Mar 27, 2018

martindurant reviewed Mar 29, 2018

View reviewed changes

Mike McCarty added 2 commits March 29, 2018 09:53

read_partition raises NotImplementedError

99023aa

added .travis.yml

2304bfb

mmccarty mentioned this pull request Apr 23, 2018

Next iteration on xarray support for intake #2

Closed

martindurant merged commit d43c042 into master Apr 23, 2018

martindurant deleted the initial-feedback branch April 23, 2018 14:47

martindurant mentioned this pull request Apr 25, 2018

xarray container type intake/intake#54

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addressed feedback. #1

Addressed feedback. #1

mmccarty commented Mar 27, 2018

mmccarty Mar 27, 2018

martindurant Mar 29, 2018

mmccarty Mar 27, 2018

mmccarty Mar 27, 2018

mmccarty Mar 27, 2018

martindurant Mar 29, 2018

martindurant left a comment

martindurant Mar 29, 2018

mmccarty Mar 29, 2018

martindurant Mar 29, 2018

martindurant Mar 29, 2018

martindurant commented Mar 31, 2018

mmccarty commented Mar 31, 2018 via email

martindurant commented Mar 31, 2018

martindurant commented Apr 23, 2018

mmccarty commented Apr 23, 2018

martindurant commented Apr 23, 2018


		def __init__(self, urlpath, xarray_kwargs=None, metadata=None):
		def __init__(self, urlpath, chunks, xarray_kwargs=None, metadata=None):

Addressed feedback. #1

Addressed feedback. #1

Conversation

mmccarty commented Mar 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martindurant left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martindurant commented Mar 31, 2018

mmccarty commented Mar 31, 2018 via email

martindurant commented Mar 31, 2018

martindurant commented Apr 23, 2018

mmccarty commented Apr 23, 2018

martindurant commented Apr 23, 2018