Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mission350: ios data contains duplicate #77

Open
Futabay opened this issue May 27, 2019 · 1 comment
Open

Mission350: ios data contains duplicate #77

Futabay opened this issue May 27, 2019 · 1 comment

Comments

@Futabay
Copy link

Futabay commented May 27, 2019

Hi,
Based on Kaggle discussion, it seems that there are two duplicate data ('Mannequin Challenge', 'VR Roller Coaster') in the data set.
Here is what I did to remove the duplicates:

Screen Shot 2019-05-27 at 8 33 06 AM

Screen Shot 2019-05-27 at 8 33 25 AM

Screen Shot 2019-05-27 at 8 33 42 AM

@Futabay Futabay changed the title ios contains duplicate data ios data contains duplicate data May 27, 2019
@Futabay Futabay changed the title ios data contains duplicate data Mission350: ios data contains duplicate May 27, 2019
@mgaimann
Copy link

mgaimann commented Dec 14, 2021

Can confirm that the issue still exists in this project.

In the guided project instructions on page 6 it says:

In the previous step, we managed to remove the duplicate app entries in the Google Play dataset. We don't need to do the same for the App Store data because there are no duplicates — you can check that for yourself using the id column (not the track_name column)

This is not true, as the id column specifies an id number, while the track_name column specifies the actual app name which should be used to detect duplicates.

Probably the author used the id column, found no duplicates based on the id (which is correct) and then moved on, but instead he should have used the track_name column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants