Modification of tutorial: append section: duplicates are dropped #34

yohplala · 2020-01-06T11:57:44Z

Hello,

I haven't tested append() yet, and I was wondering if duplicates are removed when an append is managed.
I had a look in collection.py script and following pandas function are used:
combined = dd.concat([current.data, new]).drop_duplicates(keep="last")

After a look into pandas documentation, I understand that duplicate lines are removed, only the last occurence is kept.

Please, I think it would be relevant to simply say so in the tutorial.
You write:
Let's append the last day (row) to our item:

Wouldn't it be worth to add:
Let's append the last day (row) to our item. With current data, there is obviously no duplicate rows. If you append a dataframe that contain duplicate rows with that of the existing item, these duplicates will be removed by use of 'drop_duplicates()' method from panda dataframe

Thanks again for bringing pystore!
Bests,
Pierrot

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modification of tutorial: append section: duplicates are dropped #34

Modification of tutorial: append section: duplicates are dropped #34

yohplala commented Jan 6, 2020 •

edited

Modification of tutorial: append section: duplicates are dropped #34

Modification of tutorial: append section: duplicates are dropped #34

Comments

yohplala commented Jan 6, 2020 • edited

yohplala commented Jan 6, 2020 •

edited