New census summarization logic #20746

drewsamnick · 2018-02-20T00:29:38Z

This changes the census summarization logic to the simple logic in the current spec. I've kept around the Bayesian summarization logic since we may still want to use it or a variation.

There are a few test cases that are ambiguous depending on which logic we use. For now I've changed them to expect MAYBE or the previously expected value (YES or NO) so that they don't need to be tweaked every time we update the logic. Either result is appropriate in these case, but not the opposite value or nil so the tests are still useful.

bin/oneoff/census/summarize.rb has ben updated to output the summaries rather than save to the DB. It also now takes an optional parameter to control which algorithm to use. This will allow us to try different logic more easily against prod data.

I also added summarization to the cron job for updating the summaries fusion table so that summaries will be updated daily.

breville · 2018-02-20T12:30:04Z

dashboard/app/models/census/census_summary.rb

@@ -146,13 +147,138 @@ def self.decide_teaches(posterior)
  end

  def self.summarize_school_data(school, school_years, years_with_ap_data, years_with_ib_data, state_years_with_data)
+    summarize_school_data_simple(school, school_years, years_with_ap_data, years_with_ib_data, state_years_with_data)


Maybe worth passing a hash with this many parameters?

breville · 2018-02-20T12:31:01Z

dashboard/app/models/census/census_summary.rb

+
+    school_years.each do |school_year|
+      audit = {
+        version: 0.3,


Is 0.3 a meaningful constant that should be defined somewhere?

Added constants to define the different versions.

breville · 2018-02-20T12:32:36Z

dashboard/app/models/census/census_summary.rb

+      count_no = 0
+
+      # If the school doesn't have stats then treat it as not high school.
+      # The lack of stats will show up in the audit data as a null value for high_school.


Same for K8 school, given the lines below that do the same thing as high school?

Updated the comment to mention k8_schools too.

breville · 2018-02-20T12:35:15Z

dashboard/app/models/school_stats_by_year.rb

@@ -93,4 +93,8 @@ def self.merge_from_csv(filename, options = {col_sep: "\t", headers: true, quote
  def has_high_school_grades?
    grade_09_offered || grade_10_offered || grade_11_offered || grade_12_offered || grade_13_offered
  end
+
+  def has_k8_grades?
+    grade_kg_offered || grade_01_offered || grade_02_offered || grade_03_offered || grade_04_offered || grade_05_offered || grade_06_offered || grade_07_offered || grade_08_offered


I feel like Ruby must have some magic to generate this list programatically :)

You could do something like this:

irb(main):014:0> (['kg'] + (9..13).to_a).map{|n| "grade_#{n.to_s.rjust(2,'0')}_offered"} => ["grade_kg_offered", "grade_09_offered", "grade_10_offered", "grade_11_offered", "grade_12_offered", "grade_13_offered"]

Then you'll need to run those strings through read_attribute and reduce or something similar to || them together. I'm not sure that will be more readable.

Not that I could determine.

breville

Looks good from a fairly superficial read.. e.g. I didn't correlate things back to the spec.

Seems like good testing will be crucial to making sure this works properly. Do we have a measure of test coverage from the tests in place?

aoby · 2018-02-20T20:01:11Z

dashboard/test/models/census/census_summary_test.rb

  end

  test "Non-high school with 20 hours ALL does teach" do
    submission = build :census_submission, how_many_10_hours: "NONE", how_many_20_hours: "ALL"
-    assert Census::CensusSummary.submission_teaches_cs?(submission, is_high_school: false)
+    assert Census::CensusSummary.submission_teaches_cs?(submission, is_high_school: false, is_k8_school: true)


Is there a test for is_k8_school: nil? According to the comments above that is a valid scenario.

Good point. I'll add this.

Drew Samnick added 5 commits February 19, 2018 12:32

Don't look at checkboxes for mixed k8/high schools

16b92f4

Use simple summarization logic

2366738

Add k8_school to audit_data

e79ffac

Allow summarization without saving to DB

83997cc

Summarize census data before updating fusion table

354672e

drewsamnick requested review from breville, aoby and bencodeorg February 20, 2018 00:29

breville reviewed Feb 20, 2018

View reviewed changes

breville approved these changes Feb 20, 2018

View reviewed changes

Drew Samnick added 3 commits February 20, 2018 11:58

Use named constants for audit_data versions

a58d9a8

Use a hash instead of long list of parameters

0ce6b78

Add comment about k8_schools

e0074bc

aoby reviewed Feb 20, 2018

View reviewed changes

aoby approved these changes Feb 20, 2018

View reviewed changes

Add tests for schools without stats

04dc250

drewsamnick merged commit d2b2cec into staging Feb 20, 2018

drewsamnick deleted the census-new-summarization-logic branch February 20, 2018 21:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New census summarization logic #20746

New census summarization logic #20746

drewsamnick commented Feb 20, 2018

breville Feb 20, 2018

drewsamnick Feb 20, 2018

breville Feb 20, 2018

drewsamnick Feb 20, 2018

breville Feb 20, 2018

drewsamnick Feb 20, 2018

breville Feb 20, 2018

aoby Feb 20, 2018

drewsamnick Feb 20, 2018

breville left a comment

aoby Feb 20, 2018

drewsamnick Feb 20, 2018

drewsamnick Feb 20, 2018

New census summarization logic #20746

New census summarization logic #20746

Conversation

drewsamnick commented Feb 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

breville left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment