Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile statistics about data field completeness #4

Open
JonathanReeve opened this issue Feb 7, 2018 · 2 comments
Open

Compile statistics about data field completeness #4

JonathanReeve opened this issue Feb 7, 2018 · 2 comments
Labels
1.0 Issues for version 1.0 (current version) 2.0 Issues for version 2.0 (complete rewrite) haskell

Comments

@JonathanReeve
Copy link
Owner

Wikipedia data only exists for about 1-2K of the ~45K books in PG, if I recall correctly. To figure out where there is room for improvement, it would first help to know the completeness of all the data fields. Then we can identify patterns in the books that have very little metadata.

@JonathanReeve
Copy link
Owner Author

It'd be nice to have this on the website, too. So best to do this in Haskell.

@JonathanReeve JonathanReeve added 1.0 Issues for version 1.0 (current version) 2.0 Issues for version 2.0 (complete rewrite) labels Mar 15, 2018
@JonathanReeve
Copy link
Owner Author

Put this in a new page on the website, called "stats." Show:

  • How many books are there in the database?
  • Where do they all come from? (PG only, for now)
  • How much Wikipedia data is available for the books? (wp_info has raw dumps)
  • How many books have original publication dates?

More later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0 Issues for version 1.0 (current version) 2.0 Issues for version 2.0 (complete rewrite) haskell
Projects
None yet
Development

No branches or pull requests

1 participant