New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global slice and sort (#397) #406

Merged
merged 15 commits into from Oct 6, 2017

Conversation

Projects
None yet
2 participants
@nickhand
Member

nickhand commented Sep 22, 2017

Adds global sort and global slice functions to CatalogSource (and CatalogMesh) objects

@nickhand nickhand requested a review from rainwoodman Sep 22, 2017

@nickhand

This comment has been minimized.

Member

nickhand commented Sep 22, 2017

Is this what you had in mind @rainwoodman ?

Execute a global slice of a CatalogSource.
.. note::
After the global slice is performed, the data is scattered

This comment has been minimized.

@rainwoodman

rainwoodman Sep 22, 2017

Member

I wonder if it is useful to avoid the scattering with an option?

This comment has been minimized.

@nickhand

nickhand Sep 24, 2017

Member

I am not sure I can think of a use case now, but it doesn't hurt to include the option, I think

This comment has been minimized.

@rainwoodman

rainwoodman Sep 25, 2017

Member

Avoiding the scattering probably means the compute can be deferred further?

This comment has been minimized.

@nickhand

nickhand Sep 25, 2017

Member

yes I think so. I wonder if it is possible to implement this with no compute() calls? We can't use Scatter/GatherArray for dask arrays but we can bcast the appropriate slices of them?

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Sep 23, 2017

I have to find a larger chunk of time to read through this. But briefly reading this, it seems not supporting slicing a ColumnAccessor. Could it be useful to globally slice a columnaccessor?

@nickhand

This comment has been minimized.

Member

nickhand commented Sep 24, 2017

No problem, let's push this to v0.2.8 anyways.

I think I agree. It would be nice to call both sort() and gslice() on a ColumnAccessor, I think

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Sep 25, 2017

The intended main use case of gslice is for abundance matching. Gslicing on a column accessor does not fit into this context.

Currently a gslice will trigger a full scan of the catalog, does it?

@nickhand

This comment has been minimized.

Member

nickhand commented Sep 25, 2017

As it's written now, it computes the whole catalog and re-scatters it evenly after the slice. In theory, we could pass around the dask arrays instead?

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Sep 25, 2017

Perhaps leave a few comments in the code to describe how this 'could be done' without actually doing it. Because in reality if we are abundance matching we probably can easily read in the entire data set in most cases (especially we already done a sorting) The IO is cut off from that point when we have the abundance matched catalog, and it's not a bad thing.

The situation may change if we are goaling to make catalogs for LSST or BOA.

@nickhand

This comment has been minimized.

Member

nickhand commented Oct 6, 2017

This looks ready too so I'll merge this in for 0.2.8

@nickhand nickhand merged commit d554caf into master Oct 6, 2017

3 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.09%) to 95.265%
Details

@nickhand nickhand deleted the global-slice-sort branch Oct 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment