New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When a page is moved, the pageview tool doesn't report views of the page at the old location #26

Closed
mpeel opened this Issue Feb 19, 2016 · 6 comments

Comments

Projects
None yet
3 participants
@mpeel

mpeel commented Feb 19, 2016

Take as an example the English Wikipedia article on "First observation of gravitational waves". At present, the page view statistics link on the 'view history' page points to:
https://tools.wmflabs.org/pageviews/#start=2016-01-30&end=2016-02-18&project=en.wikipedia.org&platform=all-access&agent=user&pages=First_observation_of_gravitational_waves
The first view reported on this page was on the 14th February. However, the page has existed since 11 February at different locations, which are now redirects to the page listed at:
http://dispenser.homenet.org/~dispenser/cgi-bin/rdcheck.py?page=First_observation_of_gravitational_waves
e.g., the original name for the page was "Gravitational wave detection, February 2016", and page views from that page can still be found at:
https://tools.wmflabs.org/pageviews/#start=2016-01-30&end=2016-02-18&project=en.wikipedia.org&platform=all-access&agent=user&pages=Gravitational_wave_detection,_February_2016

Ideally, the tool would report the page view statistics for the page at both its current location and any previous locations. Would this be possible, please?

@MusikAnimal

This comment has been minimized.

Owner

MusikAnimal commented Feb 19, 2016

Yes, this particular scenario can be resolved with #9, which I hope to implement soon. In most cases it won't be a big deal, but there are performance issues I'll need to sort out for articles with lots of redirects, like Google. For this reason including redirects as a single statistic will be optional, defaulted-off.

This still isn't fool-proof for page moves, as one could move the page, then a new article created at the old location. There is #16 where we'd allow entering a page ID, which stays intact with page moves, however the Pageviews API goes by page name – so we end up with the same issue.

The only real solution is to recursively check a given page's move log, collect page view data on each such that we end up with accurate data for the full date range. This is not easy to do... and could be very expensive in terms of performance.

@kaldari

This comment has been minimized.

Collaborator

kaldari commented Apr 6, 2016

I think #9 is the only realistic solution to this issue. I would suggest marking this as a duplicate.

@MusikAnimal

This comment has been minimized.

Owner

MusikAnimal commented Apr 7, 2016

@kaldari You don't think checking the move log would be worthwhile? That's one API call, followed by N more for each entry in the move log that falls within our date range. It shouldn't be too expensive, and I think will offer more accurate representation of what the user probably wants than including data on the redirects. For instance, an old location may have become a different article entirely, and never became a redirect to the new location.

Of course, it is still a lot of work! I say let's wait for T121912 to be resolved and go from there.

@MusikAnimal

This comment has been minimized.

Owner

MusikAnimal commented May 9, 2016

Upon further consideration, I'm not sure it's worth the effort to recursively query the move log of each page, given the pageviews API will hopefully resolve redirects on its own, see T121912. Closing as wontfix

@MusikAnimal MusikAnimal closed this May 9, 2016

@MusikAnimal

This comment has been minimized.

Owner

MusikAnimal commented Jul 15, 2016

Reopening. We've had some more complaints of this issue, and I think with some effort we can come up with a solid solution.

@MusikAnimal

This comment has been minimized.

Owner

MusikAnimal commented Aug 8, 2016

Issue migrated to Phabricator, please follow it for updates https://phabricator.wikimedia.org/T141332

@MusikAnimal MusikAnimal closed this Aug 8, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment