Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out what data dump from 2015-16 Nutrimatic is currently using #14

Closed
TheOriginalSoni opened this issue Feb 17, 2024 · 5 comments · Fixed by #16
Closed

Figure out what data dump from 2015-16 Nutrimatic is currently using #14

TheOriginalSoni opened this issue Feb 17, 2024 · 5 comments · Fixed by #16

Comments

@TheOriginalSoni
Copy link
Contributor

https://nutrimatic.org/ currently hosts an index from sometime in 2015-16. https://nutrimatic.org/staging currently is built off an index from Dec 2023.

Trying to figure out what the exact month for former is.

@TheOriginalSoni
Copy link
Contributor Author

Using a quick Quarry query to the WMF database to quickly find articles without spaces for any given month. Then checking them across both Nutrimatic and staging version to see when it existed.

All article titles are guaranteed to be in Nutrimatic from that database dump. We may get false positives, but false negatives should be unlikely, so we should be able to narrow down exact month quickly

@TheOriginalSoni
Copy link
Contributor Author

TheOriginalSoni commented Feb 17, 2024

Umthugulu, article created in Dec 2021 -

Old version - N
Staging - Y

So old index is before Dec 2021 (obviously)


Tirumurugatruppadai, article created in Dec 2018 -

Old version - N
Staging - Y

So old index is before Dec 2018


Unfortunately, the logging table for WMF does not go further back than 2017. Need to see what happened then before resuming

@TheOriginalSoni
Copy link
Contributor Author

TheOriginalSoni commented Feb 17, 2024

Trying a new query on Quarry based on pageid instead of logging table.


Pageid : 38719376 - Abatsky - Created in April 2013

Old version - Y
Staging - Y

So old index is created after April 2013


Pageid : 49297846 - AISINDO - Created in Feb 2016

Old version - Y
Staging - Y

So old index is most likely created after Feb 2016


Pageid : 52315025 - OneSpan - Created in Nov 2016

Old version - N
Staging - Y

So old index is created before Nov 2016


Pageid : 50315032 - Pehenuikai - Created in Apr 2016

Old version - Y
Staging - Y

So old index is likely created on or after Apr 2016


Pageid : 51320175 - Rudapithecus - Created in Aug 2016

Old version - Y
Staging - Y

So old index is likely created on or after Aug 2016


Pageid : 51952743 - Kalyppo - Created in Oct 2016

Old version - Y
Staging - Y

So old index is likely created on or after Oct 2016


Based on this, the old index was definitely created before Nov 2016, and probably created on or after Oct 2016. So my best guess is that we're working with Oct 2016 data dump.

Since the best way to confirm "starting date" will also have false positives (Some article had Kalyppo on it, but not the page title itself), we will never be 100% sure on that by this method.

@TheOriginalSoni
Copy link
Contributor Author

Specifically, since this has articles from October 2016 in them, it's probably a data dump generated on 1 Nov 2016 or similar.

@egnor egnor closed this as completed in #16 Feb 24, 2024
egnor added a commit that referenced this issue Feb 24, 2024
Add instructions on historical Nutrimatic (Closes #14 )
@TheOriginalSoni
Copy link
Contributor Author

For future reference - Link to all historical data dumps from Wikipedias

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant