New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New census summarization logic #20746
Conversation
@@ -146,13 +147,138 @@ def self.decide_teaches(posterior) | |||
end | |||
|
|||
def self.summarize_school_data(school, school_years, years_with_ap_data, years_with_ib_data, state_years_with_data) | |||
summarize_school_data_simple(school, school_years, years_with_ap_data, years_with_ib_data, state_years_with_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth passing a hash with this many parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
|
||
school_years.each do |school_year| | ||
audit = { | ||
version: 0.3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 0.3 a meaningful constant that should be defined somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added constants to define the different versions.
count_no = 0 | ||
|
||
# If the school doesn't have stats then treat it as not high school. | ||
# The lack of stats will show up in the audit data as a null value for high_school. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for K8 school, given the lines below that do the same thing as high school?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment to mention k8_schools too.
@@ -93,4 +93,8 @@ def self.merge_from_csv(filename, options = {col_sep: "\t", headers: true, quote | |||
def has_high_school_grades? | |||
grade_09_offered || grade_10_offered || grade_11_offered || grade_12_offered || grade_13_offered | |||
end | |||
|
|||
def has_k8_grades? | |||
grade_kg_offered || grade_01_offered || grade_02_offered || grade_03_offered || grade_04_offered || grade_05_offered || grade_06_offered || grade_07_offered || grade_08_offered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like Ruby must have some magic to generate this list programatically :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do something like this:
irb(main):014:0> (['kg'] + (9..13).to_a).map{|n| "grade_#{n.to_s.rjust(2,'0')}_offered"}
=> ["grade_kg_offered", "grade_09_offered", "grade_10_offered", "grade_11_offered", "grade_12_offered", "grade_13_offered"]
Then you'll need to run those strings through read_attribute
and reduce
or something similar to ||
them together. I'm not sure that will be more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I could determine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from a fairly superficial read.. e.g. I didn't correlate things back to the spec.
Seems like good testing will be crucial to making sure this works properly. Do we have a measure of test coverage from the tests in place?
end | ||
|
||
test "Non-high school with 20 hours ALL does teach" do | ||
submission = build :census_submission, how_many_10_hours: "NONE", how_many_20_hours: "ALL" | ||
assert Census::CensusSummary.submission_teaches_cs?(submission, is_high_school: false) | ||
assert Census::CensusSummary.submission_teaches_cs?(submission, is_high_school: false, is_k8_school: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test for is_k8_school: nil
? According to the comments above that is a valid scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll add this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
This changes the census summarization logic to the simple logic in the current spec. I've kept around the Bayesian summarization logic since we may still want to use it or a variation.
There are a few test cases that are ambiguous depending on which logic we use. For now I've changed them to expect
MAYBE
or the previously expected value (YES
orNO
) so that they don't need to be tweaked every time we update the logic. Either result is appropriate in these case, but not the opposite value or nil so the tests are still useful.bin/oneoff/census/summarize.rb
has ben updated to output the summaries rather than save to the DB. It also now takes an optional parameter to control which algorithm to use. This will allow us to try different logic more easily against prod data.I also added summarization to the cron job for updating the summaries fusion table so that summaries will be updated daily.