Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modification of tutorial: append section: duplicates are dropped #34

Open
yohplala opened this issue Jan 6, 2020 · 0 comments
Open

Comments

@yohplala
Copy link

yohplala commented Jan 6, 2020

Hello,

I haven't tested append() yet, and I was wondering if duplicates are removed when an append is managed.
I had a look in collection.py script and following pandas function are used:
combined = dd.concat([current.data, new]).drop_duplicates(keep="last")

After a look into pandas documentation, I understand that duplicate lines are removed, only the last occurence is kept.

Please, I think it would be relevant to simply say so in the tutorial.
You write:
Let's append the last day (row) to our item:

Wouldn't it be worth to add:
Let's append the last day (row) to our item. With current data, there is obviously no duplicate rows. If you append a dataframe that contain duplicate rows with that of the existing item, these duplicates will be removed by use of 'drop_duplicates()' method from panda dataframe

Thanks again for bringing pystore!
Bests,
Pierrot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant