New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
record project view events to redshift #17704
Conversation
@@ -215,6 +216,14 @@ def show | |||
if params[:key] == 'artist' | |||
@project_image = CDO.studio_url "/v3/files/#{@view_options['channel']}/_share_image.png", 'https:' | |||
end | |||
FirehoseClient.instance.put_record( | |||
'projects-events', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joannepolsky can you comment on whether this new event type will successfully result in the record appearing in redshift? or is our firehose + redshift integration currently only configured for events of type 'analysis-events' as we are already recording at
code-dot-org/dashboard/app/models/studio_person.rb
Lines 43 to 56 in de8188c
FirehoseClient.instance.put_record( | |
'analysis-events', | |
{ | |
study: 'studio_person_audit', | |
event: 'studio_person_merge', | |
data_json: { | |
studio_person_a_id: studio_person_a.id, | |
studio_person_b_id: studio_person_b.id, | |
studio_person_merged_id: studio_person_a.id, | |
user_a_id: users_a, | |
user_b_id: users_b | |
}.to_json | |
} | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually was a problem, we got a spew of Honeybadger alerts coming in at https://app.honeybadger.io/projects/3240/faults/34958824#notice-summary
As a workaround for now, I created a projects-events
Firehose stream (https://console.aws.amazon.com/firehose/home?region=us-east-1#/details/projects-events?edit=false), and pointed it at an S3 bucket, which is a bit simpler to do than get the data all the way into Redshift. We'll either want to change that over at some point so that the data gets to the database (Ben might need to create another table for us), or change the code to refer to the existing table.
@joannepolsky and @wjordan , Asher and I definitely intended to do another pass over this implementation to tighten things up and make it a bit easier to test. It probably makes sense to either hard-code the specific stream we're targeting into the FirehoseClient or make sure we have pretty clear documentation on the existing streams, since it's a bit of work to set up a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @asherkach.
I not a fan of the firehose implementation now, because of this very issue (all data goes to prod only). However, since this is the existing solution, it looks fine to me. You'll have to verify prod after deploy. |
@ashercodeorg , since I hear you may still reading code reviews -- this would save ~15M rows to redshift via firehose per year. Please chime in if this sounds like too much to you. |
Sorry about this - please let me know if there are any additional steps I need to take. |
I think the best solution for now is to change this to point at 'analysis-events', as that's the stream that we have pointing into Redshift. I think we probably also eventually want to clean up the FirehoseClient interface so that's not even a parameter that we pass in, since it's not really a meaningful option to change it without doing a bunch of AWS and Redshift configuration at the same time. |
The feature request is to add exact per-project page view counts for project "view" and "share" pages. Examples:
view and share page paths can also contain other app types besides applab. According to Google Analytics there have been 300K views in the last week combined for paths starting with
/projects/applab
,/projects/gamelab
,/projects/artist
, and/projects/playlab
. Therefore 300K / week (or 0.5 / sec, or 15M / year) represents a reasonable estimate for the rate at which these page view events will be generated.These page view events will be used for analytics purposes only and will not be used directly in the product.