Skip to content

Separate array and dataframe mindeps builds#8079

Merged
jrbourbeau merged 7 commits intodask:mainfrom
jrbourbeau:separate-array-dataframe-mindeps
Aug 25, 2021
Merged

Separate array and dataframe mindeps builds#8079
jrbourbeau merged 7 commits intodask:mainfrom
jrbourbeau:separate-array-dataframe-mindeps

Conversation

@jrbourbeau
Copy link
Copy Markdown
Member

This PR separates our existing array + dataframe mindeps CI build into separate builds for array and dataframe. This is to ensure that dask.array and dask.dataframe only depend on their specified minimum dependencies (i.e. numpy and numpy + pandas, respectively).

xref #8078

@jrbourbeau
Copy link
Copy Markdown
Member Author

There's the ModuleNotFoundError: No module named 'pandas' from #8078

@github-actions github-actions bot added array dataframe dispatch Related to `Dispatch` extension objects labels Aug 23, 2021
@github-actions github-actions bot removed dataframe dispatch Related to `Dispatch` extension objects labels Aug 23, 2021
@jrbourbeau
Copy link
Copy Markdown
Member Author

cc @galipremsagar @quasiben

--------
numpy.percentile : Numpy's equivalent Percentile function
"""
from dask.dataframe.dispatch import _percentile
Copy link
Copy Markdown
Contributor

@galipremsagar galipremsagar Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be not going through the dispatch _percentile and instead going through local _percentile method. Will it not result in dispatch of the registered backends never being used ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. I was initially concerned with making whatever were needed to remove the pandas dependency in dask.array. I agree we may need further updates to get back the functionality added in #8029.

Since dask.dataframe depends on NumPy and pandas, we can import dask.array code inside dask.dataframe, but not the other way around. Is there a way we could move that dispatching logic from dask.dataframe into dask.array instead?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we could move that dispatching logic from dask.dataframe into dask.array instead?

I think we can move these to dask.array dispatch and backend: https://github.com/dask/dask/blob/main/dask/dataframe/dispatch.py#L25-L29
&
https://github.com/dask/dask/blob/main/dask/dataframe/backends.py#L538-L540

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind pushing up a PR which handles the dask.dataframe -> dask.array migration?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened a PR: #8083, could you review it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @quasiben

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @galipremsagar! Just left comments over in #8083

Copy link
Copy Markdown
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jrbourbeau !

@jrbourbeau
Copy link
Copy Markdown
Member Author

Thanks for reviewing @galipremsagar!

Just a note to others that gpuCI will start passing once rapidsai/cudf#9118 is in

@jrbourbeau jrbourbeau merged commit 85f0b14 into dask:main Aug 25, 2021
@jrbourbeau jrbourbeau deleted the separate-array-dataframe-mindeps branch August 25, 2021 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants