Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and surface large pageview spikes #5034

Open
ragesoss opened this issue Jul 7, 2022 · 18 comments
Open

Detect and surface large pageview spikes #5034

ragesoss opened this issue Jul 7, 2022 · 18 comments

Comments

@ragesoss
Copy link
Member

ragesoss commented Jul 7, 2022

Dashboard users — instructors, editors, Wiki Education staff — like to know when content they worked on becomes suddenly very relevant to the public and results in large spikes in the number of pageviews. These typically represent success stories, where users created or improved content that then became highly relevant to the public (eg, because the topic become relevant to an important current event).

The Dashboard should be able to detect and log this kind of pageview spike.

The place to do this would probably be during the import of pageviews to update an Article's average pageviews, since daily pageviews over a moderately long period of time are fetched (and then averaged) during this process. It might make sense to do this after addressing #4370 since a good solution to that will probably involve extending the period over which pageviews are fetched.

@vaidehi44
Copy link
Contributor

@ragesoss, can I take this up?

@ragesoss
Copy link
Member Author

@vaidehi44 yes, go for it.

@vaidehi44
Copy link
Contributor

@ragesoss, since you suggested to leave the issue referenced here (#4370), should I also drop this one?

@ragesoss
Copy link
Member Author

ragesoss commented Mar 1, 2023

Hmm... I'm not sure. If you can think of an efficient way to do this one without relying on #4370, I'm open to it.

@vaidehi44
Copy link
Contributor

Okk ... I guess here too best way would be to introduce earliest_edit and avgerage_page_views in ArticlesCourses. Because will have to keep a track of average views to compare it when the count rises suddenly. What do you think?

@vaidehi44
Copy link
Contributor

And, also what should be the threshold or the criteria to decide whether there has been significant rise in views? Like, if the views increase 5 fold or 10 fold ... anything like that?

@ragesoss
Copy link
Member Author

ragesoss commented Mar 2, 2023

Yes, I think that approach would work well, putting both of those values into ArticlesCourses. That will make it easy to if any single day is many times above the average, during the process for updating that average.

I think 5-fold is a good starting point... maybe 5-fold increase and minimum spike of 100 views in a day. Those can be tweaked, depending on how often we find it happening.

@vaidehi44
Copy link
Contributor

@ragesoss, I am little confused in where the code for updating the earliest_edit and average_page_views fields will fall. I think, for earliest_edit there should be an instance method which would be called from update_cache method of ArticlesCourses.

And, what I get is that the cache of ArticlesCourses is being updated from UpdateCourseStats which falls under the constant_update category through ScheduleCourseUpdatesWorker. So, I guess it is being updated every 5 minutes.

So, how should the average_page_views be updated, because I think it should fall in daily_update schedule? Will it require a new worker, which will get called in data_cycle/daily_update.rb?

@ragesoss
Copy link
Member Author

ragesoss commented Mar 6, 2023

The approach I was thinking of would be to go through UpdateCourseStats, and just skip updating the views for any record that had its views updated within the last ~week. I guess that would require also keeping track of views_updated_at.

@vaidehi44
Copy link
Contributor

Okk ... Don't you think 1 week would be a big time frame?

@ragesoss
Copy link
Member Author

ragesoss commented Mar 6, 2023

I don't think it's too long.

@vaidehi44
Copy link
Contributor

Yes, I think that approach would work well, putting both of those values into ArticlesCourses. That will make it easy to if any single day is many times above the average, during the process for updating that average.

I think 5-fold is a good starting point... maybe 5-fold increase and minimum spike of 100 views in a day. Those can be tweaked, depending on how often we find it happening.

So, here you meant that while updating the average, we'll also compare the views of last 7 days individually, to see if there was minimum spike of 100 views?

@ragesoss
Copy link
Member Author

ragesoss commented Mar 6, 2023

yes

@vaidehi44
Copy link
Contributor

And, what about the status of spike? Where should we store whether an article course currently has spike in pageviews? Will it add one more field to articles_courses, or there's some other way for it?

@ragesoss
Copy link
Member Author

ragesoss commented Mar 6, 2023

I think when a spike is detected, it should create a (new type of) Alert record and send an email to a Wiki Education staff member.

@vaidehi44
Copy link
Contributor

vaidehi44 commented Mar 6, 2023

Oh...so shouldn't the info about the spike be communicated to the end users, i.e. on the frontend? Not even to instructors/editors?

@ragesoss
Copy link
Member Author

ragesoss commented Mar 6, 2023

At this point, no. I think we want to start by having an Alert that sends email to Wiki Education staff, just to understand how often these kinds of spikes happen and make tweaks to the threshold. At that stage, we'd probably personally contact the people who were working on the articles. Later on, we may want to design a nice-looking email that automatically goes out to the relevant instructors and students, via the Alert.

@vaidehi44
Copy link
Contributor

Ohkk .. got your point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants