Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

item.append doesn't pass npartitions to item.write #36

Open
JugglingNumbers opened this issue Feb 20, 2020 · 3 comments
Open

item.append doesn't pass npartitions to item.write #36

JugglingNumbers opened this issue Feb 20, 2020 · 3 comments

Comments

@JugglingNumbers
Copy link

When using item.append(item, new_data, npartitions=35) the write function is passed npartitions = None. Should be npartitions=npartitions

write(item, combined, npartitions=None, chunksize=None,
metadata=current.metadata, overwrite=True,
epochdate=epochdate, reload_items=reload_items, **kwargs)

@ancher1912
Copy link
Contributor

Hmm...changing this just result in a similar exception at line 182. I've changed it into:

new_npartitions = npartitions
if new_npartitions is None:
    memusage = data.memory_usage(deep=True).sum()
    new_npartitions = int(1 + memusage // DEFAULT_PARTITION_SIZE)

# combine old dataframe with new
current = self.item(item)
new = dd.from_pandas(data, npartitions=new_npartitions)

That seems to work at a first glance.

@JugglingNumbers
Copy link
Author

@ancher1912 yup you've encountered the other append error: #31

The other option is just to change the last line of your blurb to
new= dd.from_pandas(data, npartitions=1

since the combined dask dataframe is partitioned by the variable npartitions it doesn't matter if we only use one partition when converting the new dataframe to dask.

@ancher1912
Copy link
Contributor

Yeah, you're right. I've send a pull request to @ranaroussi with you're proposed changed. At least I can continue doing what I was doing before I did an update of Dask and FastParquet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants