Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New census summarization logic #20746

Merged
merged 9 commits into from Feb 20, 2018
Merged

Conversation

drewsamnick
Copy link
Contributor

This changes the census summarization logic to the simple logic in the current spec. I've kept around the Bayesian summarization logic since we may still want to use it or a variation.

There are a few test cases that are ambiguous depending on which logic we use. For now I've changed them to expect MAYBE or the previously expected value (YES or NO) so that they don't need to be tweaked every time we update the logic. Either result is appropriate in these case, but not the opposite value or nil so the tests are still useful.

bin/oneoff/census/summarize.rb has ben updated to output the summaries rather than save to the DB. It also now takes an optional parameter to control which algorithm to use. This will allow us to try different logic more easily against prod data.

I also added summarization to the cron job for updating the summaries fusion table so that summaries will be updated daily.

@@ -146,13 +147,138 @@ def self.decide_teaches(posterior)
end

def self.summarize_school_data(school, school_years, years_with_ap_data, years_with_ib_data, state_years_with_data)
summarize_school_data_simple(school, school_years, years_with_ap_data, years_with_ib_data, state_years_with_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth passing a hash with this many parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed


school_years.each do |school_year|
audit = {
version: 0.3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 0.3 a meaningful constant that should be defined somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added constants to define the different versions.

count_no = 0

# If the school doesn't have stats then treat it as not high school.
# The lack of stats will show up in the audit data as a null value for high_school.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for K8 school, given the lines below that do the same thing as high school?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment to mention k8_schools too.

@@ -93,4 +93,8 @@ def self.merge_from_csv(filename, options = {col_sep: "\t", headers: true, quote
def has_high_school_grades?
grade_09_offered || grade_10_offered || grade_11_offered || grade_12_offered || grade_13_offered
end

def has_k8_grades?
grade_kg_offered || grade_01_offered || grade_02_offered || grade_03_offered || grade_04_offered || grade_05_offered || grade_06_offered || grade_07_offered || grade_08_offered
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like Ruby must have some magic to generate this list programatically :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do something like this:

irb(main):014:0> (['kg'] + (9..13).to_a).map{|n| "grade_#{n.to_s.rjust(2,'0')}_offered"}
=> ["grade_kg_offered", "grade_09_offered", "grade_10_offered", "grade_11_offered", "grade_12_offered", "grade_13_offered"]

Then you'll need to run those strings through read_attribute and reduce or something similar to || them together. I'm not sure that will be more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I could determine.

Copy link
Member

@breville breville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from a fairly superficial read.. e.g. I didn't correlate things back to the spec.

Seems like good testing will be crucial to making sure this works properly. Do we have a measure of test coverage from the tests in place?

end

test "Non-high school with 20 hours ALL does teach" do
submission = build :census_submission, how_many_10_hours: "NONE", how_many_20_hours: "ALL"
assert Census::CensusSummary.submission_teaches_cs?(submission, is_high_school: false)
assert Census::CensusSummary.submission_teaches_cs?(submission, is_high_school: false, is_k8_school: true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test for is_k8_school: nil? According to the comments above that is a valid scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@drewsamnick drewsamnick merged commit d2b2cec into staging Feb 20, 2018
@drewsamnick drewsamnick deleted the census-new-summarization-logic branch February 20, 2018 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants