Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a script that identifies bad pages in strobe uwaterloo #124

Open
BruceJohnJennerLawso opened this issue Mar 18, 2017 · 0 comments
Open

Comments

@BruceJohnJennerLawso
Copy link
Owner

A lot of the pages in strobe uwaterloo are in a garbage state of quality, case in point,

https://strobe.uwaterloo.ca/athletics/intramurals/teams.php?team=1745

That being said, a decent fraction of the database is in wonderful shape (I would think at least 90-95% of the pages are exactly as expected), and the total number of "problem pages" is probably in the range of 50-200 total. Given that, it would be nice to have a script that rolls through the listed pages and flags any that have something unusual about them (less than 7 total games, missing data cells (bad row flagged in the getTableInRows() function, etc.)

This can probably just be a modification of the generalized scraper once its done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

1 participant