Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PL: Write 2020 workshops to a gsheet #33089

Merged
merged 2 commits into from Feb 12, 2020

Conversation

breville
Copy link
Member

Write serialized information about workshops to a gsheet. The workshops are CSD/CSP 5-day summer, CSF intro, and CSF deepdive. The dates are May 15th 2020 through August 31st 2020. The write is done 30 minutes after every second hour. The destination tab is summer_workshops (auto), and we also write some information to a tab called summer_workshops_meta (auto).

This writes to the same gsheet as #32609 and works very similarly.

Write serialized information about workshops to a gsheet.  The workshops are CSD/CSP 5-day summer, CSF intro, and CSF deepdive.  The dates are May 15th 2020 through August 31st 2020.  The write is done 30 minutes after every second hour.
@breville breville requested review from islemaster and a team February 11, 2020 22:42
# Subsequent rows are for each workshop.
Pd::Workshop.where(subject: subjects).find_each.each do |w|
if w.workshop_starting_date && w.workshop_starting_date >= Date.new(2020, 5, 15) && w.workshop_starting_date <= Date.new(2020, 8, 31)
workshops << Api::V1::Pd::WorkshopDownloadSerializer.new(w).attributes.values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick check on the cost of doing this filtering in the script instead of as part of our query:

  1. We're loading all workshops ever for these three subjects, rather than only those in a very specific range.
  2. Making the range part of the query would require a join against the first session and might be more likely to introduce a bug.

Given the number of rows we're working with, I suspect this is fine through August of this year. Instincts on when we'd want to rethink this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to avoid the join and also adding actual SQL to do the range query, and did have a similar concern about complexity. It currently takes about 30 seconds to do the query, which returns 4341 workshops before we filter by date. Those workshops go back to 2016. I think that we could cope with this query even taking several minutes, which I think will give us a few more years of creating workshops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - thanks for walking me through the napkin-math on this!

Copy link
Contributor

@islemaster islemaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@bencodeorg
Copy link
Contributor

I don't know the whole context here, but this looks very similar to something I did last year here -- if this is in place of that, might be worth tearing that code out (and remove cron schedule entry) while you're at it.

@breville
Copy link
Member Author

@bencodeorg Good catch. I've prepped a removal at #33094.

workshops << Api::V1::Pd::WorkshopDownloadSerializer.new(Pd::Workshop.first).attributes.keys.map(&:to_s)

# Subsequent rows are for each workshop.
Pd::Workshop.where(subject: subjects).find_each.each do |w|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the example here https://api.rubyonrails.org/classes/ActiveRecord/Batches.html, seems like it is not necessary to do both find_each and each?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good catch. That was left over from temporary testing.

@breville breville merged commit b94bead into staging Feb 12, 2020
@breville breville deleted the pl-write-2020-workshops-to-gsheet branch February 12, 2020 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants