Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record project view events to redshift #17704

Merged
merged 2 commits into from Oct 6, 2017
Merged

Conversation

davidsbailey
Copy link
Member

@davidsbailey davidsbailey commented Sep 13, 2017

The feature request is to add exact per-project page view counts for project "view" and "share" pages. Examples:

view and share page paths can also contain other app types besides applab. According to Google Analytics there have been 300K views in the last week combined for paths starting with /projects/applab, /projects/gamelab, /projects/artist, and /projects/playlab. Therefore 300K / week (or 0.5 / sec, or 15M / year) represents a reasonable estimate for the rate at which these page view events will be generated.

These page view events will be used for analytics purposes only and will not be used directly in the product.

@@ -215,6 +216,14 @@ def show
if params[:key] == 'artist'
@project_image = CDO.studio_url "/v3/files/#{@view_options['channel']}/_share_image.png", 'https:'
end
FirehoseClient.instance.put_record(
'projects-events',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joannepolsky can you comment on whether this new event type will successfully result in the record appearing in redshift? or is our firehose + redshift integration currently only configured for events of type 'analysis-events' as we are already recording at

FirehoseClient.instance.put_record(
'analysis-events',
{
study: 'studio_person_audit',
event: 'studio_person_merge',
data_json: {
studio_person_a_id: studio_person_a.id,
studio_person_b_id: studio_person_b.id,
studio_person_merged_id: studio_person_a.id,
user_a_id: users_a,
user_b_id: users_b
}.to_json
}
)
?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually was a problem, we got a spew of Honeybadger alerts coming in at https://app.honeybadger.io/projects/3240/faults/34958824#notice-summary

As a workaround for now, I created a projects-events Firehose stream (https://console.aws.amazon.com/firehose/home?region=us-east-1#/details/projects-events?edit=false), and pointed it at an S3 bucket, which is a bit simpler to do than get the data all the way into Redshift. We'll either want to change that over at some point so that the data gets to the database (Ben might need to create another table for us), or change the code to refer to the existing table.

@joannepolsky and @wjordan , Asher and I definitely intended to do another pass over this implementation to tighten things up and make it a bit easier to test. It probably makes sense to either hard-code the specific stream we're targeting into the FirehoseClient or make sure we have pretty clear documentation on the existing streams, since it's a bit of work to set up a new one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jopolsky
Copy link

I not a fan of the firehose implementation now, because of this very issue (all data goes to prod only). However, since this is the existing solution, it looks fine to me. You'll have to verify prod after deploy.

@davidsbailey
Copy link
Member Author

@ashercodeorg , since I hear you may still reading code reviews -- this would save ~15M rows to redshift via firehose per year. Please chime in if this sounds like too much to you.

@davidsbailey davidsbailey merged commit 36810cf into staging Oct 6, 2017
@davidsbailey davidsbailey deleted the data-project-view branch October 6, 2017 19:20
@davidsbailey
Copy link
Member Author

Sorry about this - please let me know if there are any additional steps I need to take.

@ewjordan
Copy link
Contributor

I think the best solution for now is to change this to point at 'analysis-events', as that's the stream that we have pointing into Redshift. I think we probably also eventually want to clean up the FirehoseClient interface so that's not even a parameter that we pass in, since it's not really a meaningful option to change it without doing a bunch of AWS and Redshift configuration at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants