New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PL: Write 2020 workshops to a gsheet #33089
Conversation
Write serialized information about workshops to a gsheet. The workshops are CSD/CSP 5-day summer, CSF intro, and CSF deepdive. The dates are May 15th 2020 through August 31st 2020. The write is done 30 minutes after every second hour.
# Subsequent rows are for each workshop. | ||
Pd::Workshop.where(subject: subjects).find_each.each do |w| | ||
if w.workshop_starting_date && w.workshop_starting_date >= Date.new(2020, 5, 15) && w.workshop_starting_date <= Date.new(2020, 8, 31) | ||
workshops << Api::V1::Pd::WorkshopDownloadSerializer.new(w).attributes.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick check on the cost of doing this filtering in the script instead of as part of our query:
- We're loading all workshops ever for these three subjects, rather than only those in a very specific range.
- Making the range part of the query would require a join against the first session and might be more likely to introduce a bug.
Given the number of rows we're working with, I suspect this is fine through August of this year. Instincts on when we'd want to rethink this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wanted to avoid the join and also adding actual SQL to do the range query, and did have a similar concern about complexity. It currently takes about 30 seconds to do the query, which returns 4341 workshops before we filter by date. Those workshops go back to 2016. I think that we could cope with this query even taking several minutes, which I think will give us a few more years of creating workshops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good - thanks for walking me through the napkin-math on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I don't know the whole context here, but this looks very similar to something I did last year here -- if this is in place of that, might be worth tearing that code out (and remove cron schedule entry) while you're at it. |
@bencodeorg Good catch. I've prepped a removal at #33094. |
bin/cron/summer_workshops_to_gdrive
Outdated
workshops << Api::V1::Pd::WorkshopDownloadSerializer.new(Pd::Workshop.first).attributes.keys.map(&:to_s) | ||
|
||
# Subsequent rows are for each workshop. | ||
Pd::Workshop.where(subject: subjects).find_each.each do |w| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the example here https://api.rubyonrails.org/classes/ActiveRecord/Batches.html, seems like it is not necessary to do both find_each
and each
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good catch. That was left over from temporary testing.
Write serialized information about workshops to a gsheet. The workshops are CSD/CSP 5-day summer, CSF intro, and CSF deepdive. The dates are May 15th 2020 through August 31st 2020. The write is done 30 minutes after every second hour. The destination tab is
summer_workshops (auto)
, and we also write some information to a tab calledsummer_workshops_meta (auto)
.This writes to the same gsheet as #32609 and works very similarly.