Skip to content

Add Series.dot method to dataframe module#7236

Merged
jsignell merged 6 commits intodask:mainfrom
Madhu94:add-series-dot
Mar 19, 2021
Merged

Add Series.dot method to dataframe module#7236
jsignell merged 6 commits intodask:mainfrom
Madhu94:add-series-dot

Conversation

@Madhu94
Copy link
Copy Markdown
Contributor

@Madhu94 Madhu94 commented Feb 16, 2021

Ref: #1259

I had a few questions about how to attempt this so I am marking this WIP.

  1. Should I handle the case for dot product of a dask series and a dask array? I didn't do so because a lot of the underlying infrastructure (like the _extract_meta utility) assumed the operands were all either dask series or dask dataframes.
  2. Should I implement this for dask dataframe as part of this PR too or can that be done separately?
  3. Should the method accept a meta kwarg too, for the case when the other operand could be a dataframe?

Any other feedback on the code would really help, thanks in advance!

@Madhu94 Madhu94 marked this pull request as draft February 16, 2021 21:41
@jsignell
Copy link
Copy Markdown
Member

Thanks for opening this! I'll give you my opinionated answers :)

  1. Should I handle the case for dot product of a dask series and a dask array? I didn't do so because a lot of the underlying infrastructure (like the _extract_meta utility) assumed the operands were all either dask series or dask dataframes.

I think for this first PR it is ok to not accepst dask.arrays. It is a good idea though, to try to alert the user in a friendly error message if they do pass an array.

  1. Should I implement this for dask dataframe as part of this PR too or can that be done separately?

It can be done separately. It'll make this easier to review if it just does series. And then the dataframe PR will be straightforward.

  1. Should the method accept a meta kwarg too, for the case when the other operand could be a dataframe?

I think all methods should accept a meta kwarg.

@Madhu94
Copy link
Copy Markdown
Contributor Author

Madhu94 commented Feb 25, 2021

I think I've addressed the above issues, thank you!

I had a query about the enforce_metadata kwarg accepted by map_partitions - it doesn't seem to enforce dtype, only the structure (i.e. name attribute in case of series or columns in case of dataframes). Is there a reason dtypes aren't enforced?

@Madhu94 Madhu94 marked this pull request as ready for review February 25, 2021 13:14
@Madhu94 Madhu94 changed the title [WIP] Add Series.dot method to dataframe module Add Series.dot method to dataframe module Feb 25, 2021
@jsignell
Copy link
Copy Markdown
Member

I had a query about the enforce_metadata kwarg accepted by map_partitions - it doesn't seem to enforce dtype

It seems like enforce_metadata is intended to enforce dtype. Can you provide an example of that not working as expected?

@Madhu94
Copy link
Copy Markdown
Contributor Author

Madhu94 commented Feb 27, 2021

I had a query about the enforce_metadata kwarg accepted by map_partitions - it doesn't seem to enforce dtype

It seems like enforce_metadata is intended to enforce dtype. Can you provide an example of that not working as expected?

import dask.dataframe as dd
import pandas as pd

s = pd.Series([1, 2, 3])
ds = dd.from_pandas(s, npartitions=1)
ds.map_partitions(lambda x: x.sum(), meta=float, enforce_metadata=True).compute()

I would expect this to coerce the result to float or raise an error. The reverse doesn't seem to work either -

s = pd.Series([1.1, 2.1, 3.1])
ds = dd.from_pandas(s, npartitions=1)
ds.map_partitions(lambda x: x.sum(), meta=int, enforce_metadata=True).compute()

Nothing in the code (_apply_and_enforce utility) seems to be checking dtype - let me know if I'm missing something?

@Madhu94
Copy link
Copy Markdown
Contributor Author

Madhu94 commented Mar 2, 2021

The comment inside _dot_series is misleading actually, fixing that.

Base automatically changed from master to main March 8, 2021 20:19
Copy link
Copy Markdown
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I let this sit for a bit. I think it looks really nice now. I just have a few suggestions of how to improve the tests.

@Madhu94 Madhu94 force-pushed the add-series-dot branch 2 times, most recently from 165e55b to 893f952 Compare March 19, 2021 19:10
@Madhu94
Copy link
Copy Markdown
Contributor Author

Madhu94 commented Mar 19, 2021

Checking CI

@jsignell jsignell merged commit e00f2fa into dask:main Mar 19, 2021
@jsignell
Copy link
Copy Markdown
Member

Thanks for sticking with this @Madhu94!

@Madhu94
Copy link
Copy Markdown
Contributor Author

Madhu94 commented Mar 19, 2021

Thanks for your help @jsignell 🎉 🎉

@Madhu94
Copy link
Copy Markdown
Contributor Author

Madhu94 commented Mar 21, 2021

I believe #1259 should not have been closed as a result of this since there were other methods to be implemented there. I'm sorry my first update wasn't clear. I'll be more careful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add missing methods to Series

2 participants