Census: support CSV V2 format #29567

breville · 2019-07-09T21:02:58Z

Going forward, all CSVs should use this one "V2" format, with no more variants per state (which we support in the existing "V1" format). For the 2018-19 school year, we already have some CSVs using the V1 format, and so we will only attempt to use the V2 format for some states. The assumption is that all states will use the V2 format from 2019-20 onwards.

The V2 format has two columns of note for identifying a school: state_school_id and nces_id. If nces_id is provided, then we use it to look up a state school ID. If nces_is is not provided or is "unspecified", then we fall back to the state_school_id column.

The course is specified by course_id and must be a valid course; we no longer will compare against a list of valid course codes for a state.

Once this code is live, any subsequent seed step will be able to pick up a newly-added V2 CSV file on S3, and should successfully ingest it. We will start this process by just doing one state - ID - and making sure that ingests successfully.

Going forward, all CSVs should use this one "V2" format, with no more variants per state which we do in the existing "V1" format. For the 2018-19 school year, we already have some CSVs using the V1 format, and so we only attempt to use the V2 format for some states. The assumption is that all states will use the V2 format from 2019-20 onwards.

dashboard/app/models/census/state_cs_offering.rb

agealy

Only addition on the description is that we may after all be uploading v2 CSVs for states with existing v1 2018-2019 forms. According to discussion so far, our expectation is that this will work so long as the v2 forms are supersets of their respective v1 forms.

sureshc · 2019-07-12T00:12:27Z

dashboard/app/models/census/state_cs_offering.rb

@@ -85,7 +99,23 @@ def self.infer_no(state_code)
    INFERRED_NO_EXCLUSION_LIST.exclude? state_code.upcase
  end

-  def self.construct_state_school_id(state_code, row_hash)
+  def self.construct_state_school_id(state_code, row_hash, school_year)


Would it be worth renaming this method to get_state_school_id since that’s what we are typically doing here? In some rare/legacy cases, we are constructing the state_school_id from district_id and school_id.

I like this also because it avoids duplicating the function name in School that we call a lot in here.

sureshc · 2019-07-12T00:22:24Z

dashboard/app/models/census/state_cs_offering.rb

+
+      # The V2 format requires either nces_id or state_school_id.
+      if nces_id != UNSPECIFIED_VALUE
+        return School.find_by(id: nces_id)&.state_school_id


If find_by the NCES ID does not return a record, this will return nil. Could we try to find the record and then fallback to returning the state_school_id provided in the CSV?

bencodeorg

Much simpler, looks great!

islemaster

Awesome!

Now that we have support for the new V2 CSV format in #29567, we have more data to ingest. We will place these CSV files on S3 for actual ingestion, but since I wanted to test ingestion locally first as we iteratively cleaned up the data, I've just included them in this change. They can be ingested locally so long as `CDO.stub_school_data` is set to `true`.

sureshc reviewed Jul 9, 2019

View reviewed changes

dashboard/app/models/census/state_cs_offering.rb Outdated Show resolved Hide resolved

breville added 4 commits July 9, 2019 14:42

Census: small fixes

3b7ced3

Census: update CSV columns that are used

662e892

Census: update sample CSV

6d5ce36

Census: small tweaks

00b07d6

breville requested review from agealy and bencodeorg July 11, 2019 22:26

agealy approved these changes Jul 11, 2019

View reviewed changes

sureshc reviewed Jul 12, 2019

View reviewed changes

Census: code review feedback

d5f14f5

bencodeorg approved these changes Jul 12, 2019

View reviewed changes

sureshc approved these changes Jul 12, 2019

View reviewed changes

breville requested a review from islemaster July 12, 2019 16:58

islemaster approved these changes Jul 12, 2019

View reviewed changes

breville merged commit 15cd21a into staging Jul 12, 2019

breville deleted the census-support-csv-v2-format branch July 12, 2019 18:54

breville mentioned this pull request Jul 15, 2019

Census: add more 2018-2019 state CS offerings #29710

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Census: support CSV V2 format #29567

Census: support CSV V2 format #29567

breville commented Jul 9, 2019 •

edited

agealy left a comment

sureshc Jul 12, 2019

breville Jul 12, 2019

sureshc Jul 12, 2019

breville Jul 12, 2019

bencodeorg left a comment

islemaster left a comment

Census: support CSV V2 format #29567

Census: support CSV V2 format #29567

Conversation

breville commented Jul 9, 2019 • edited

agealy left a comment

Choose a reason for hiding this comment

sureshc Jul 12, 2019

Choose a reason for hiding this comment

breville Jul 12, 2019

Choose a reason for hiding this comment

sureshc Jul 12, 2019

Choose a reason for hiding this comment

breville Jul 12, 2019

Choose a reason for hiding this comment

bencodeorg left a comment

Choose a reason for hiding this comment

islemaster left a comment

Choose a reason for hiding this comment

breville commented Jul 9, 2019 •

edited