Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Url-Categorization #5

Closed
rohit-raje-786 opened this issue Jan 24, 2021 · 4 comments
Closed

Url-Categorization #5

rohit-raje-786 opened this issue Jan 24, 2021 · 4 comments

Comments

@rohit-raje-786
Copy link

After running the 01_construct_features.py getting an error "['main_category_confidence'] not in index"

WhatsApp Image 2021-01-24 at 10 04 24 PM

@domantasm96
Copy link
Owner

Hey, thank you for your message.

I think the problem is that URL categorization dataset file is corrupted due to GitHub LFS restrictions. I would suggest you to download an original dataset from https://data.world/crowdflower/url-categorization website.

Let me know if the issue still remains.

@rohit-raje-786
Copy link
Author

rohit-raje-786 commented Jan 26, 2021 via email

@domantasm96
Copy link
Owner

Updated code with the fixed solution. The problem was that data.world provides two datasets with the same set of data but with different columns names:

  1. original/URL-categorization-DFE.csv (column: 'main_category:confidence')
  2. data/url_categorization_dfe.csv (column: 'main_category_confidence')

In my previous code 2 option dataset was used so if you used 1 option, then the error may occur on different column naming.

Thank you for reporting the problem and let me know if the issue is still remains unsolved or you have any other problems in term of executing this code!

Cheers!

@rohit-raje-786
Copy link
Author

rohit-raje-786 commented Jan 28, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants