2018-2019 school import #38432

bencodeorg · 2021-01-06T22:35:11Z

Updates to import 2018-2019 schools. This PR includes:

Reference to new file (in S3) to import, and a block to process that file.
New dry run option, to try seeding a file before actually conducting the import.
More logging to tell about the types of changes that are to be made (in a dry run) or were made (in a non-dry run).
Handle updates to state_school_id, which (without these changes) is blocked by the foreign key relationship to state_cs_offerings. The approach here is to destroy (delete) the existing state CS offerings temporarily, update the school's state_school_id, then recreate the state_cs_offerings.

To do (specific to this import):

Run School.seed_from_s3 manually in production, so that I can a) run it manually in test first, then monitor its progress in production. It's also pretty slow, so it won't slow down a production release.
Uncomment the commented out code that prevents a full seed from happening as part of our normal seeding process.

(other) to do in NCES import space:

Import latitude and longitude updates to the schools table.
Add demographic data to school_stats_by_year.

Testing story

I've included tests to cover the dry run import of a CSV, and the new reconstitution of deleted state CS offerings.

I also tested the import manually, by seeding my schools table from scratch (without the new block for 2018-2019 data), then after adding the block running `School.seed_from_s3(stub_school_data: false)

Result:

School seeding: done processing /var/folders/xp/9jb41z0n0y1fc7c_g45tgsqc0000gn/T/ccd_sch_029_1819_w_1a_091019.csv.20210105-49225-hv4ges.
2659 new schools added.
24871 schools updated.
74553 schools unchanged (school considered changed if only update was adding new columns included in this import).
93 duplicate schools skipped.
Among updated schools, these attributes were updated:
school_category: 99423
last_known_school_year_open: 97453
address_line1: 12356
name: 7896
state_school_id: 3744
address_line2: 3560
zip: 2190
city: 1215
school_type: 224
address_line3: 16
state: 11

Reviewer Checklist:

Tests provide adequate coverage
Privacy and Security impacts have been assessed
Code is well-commented
New features are translatable or updates will not break translations
Relevant documentation has been added or updated
User impact is well-understood and desirable
Pull Request is labeled appropriately
Follow-up work items (including potential tech debt) are tracked and linked

…ool IDs

clareconstantine

Nice work! The comments were very helpful. This looks good to me overall!

clareconstantine · 2021-01-11T17:51:32Z

dashboard/app/models/school.rb

-    else
-      School.seed_from_s3
+      #else
+      #School.seed_from_s3


just confirming - this is the change to prevent the seeding from happening with the new file before we do a dry run?

Yeah pretty much -- with school districts I actually did a non-dry run on staging (which appeared to have the entire school district dataset, which test didn't surprisingly), then a non-dry run on production. Basically just wanted to be able to run it manually to keep an eye on it if anything went wrong.

clareconstantine · 2021-01-11T18:01:43Z

dashboard/app/models/school.rb

+            id:                           row['NCESSCH'].to_i.to_s,
+            name:                         row['SCH_NAME'].upcase,
+            # Four schools with addresses longer than 50 characters (DB column limit)
+            # Also four schools with second address line longer than 30 characters.


What happens with those schools?

I'm truncating them (was getting DB errors on import without the truncation, figured cutting these four schools down in length wasn't a huge deal and a lot easier than expanding the allowed character limit for the columns via migration).

clareconstantine · 2021-01-11T18:43:40Z

dashboard/app/models/school.rb

+        new_attributes: new_attributes,
+        &parse_row
+      )
+    ensure


bencodeorg · 2021-01-16T00:32:25Z

Production import:

Record with NCES ID 20018010080 and state school ID AK-05-050400 not unique, not added
Record with NCES ID 20018010162 and state school ID AK-05-051050 not unique, not added
Record with NCES ID 110003000511 and state school ID DC-001-420 not unique, not added
Record with NCES ID 120039010618 and state school ID FL-13-8131 not unique, not added
Record with NCES ID 120048008536 and state school ID FL-16-0581 not unique, not added
Record with NCES ID 120144008605 and state school ID FL-48-1852 not unique, not added
Record with NCES ID 120144008698 and state school ID FL-48-1851 not unique, not added
Record with NCES ID 120150008724 and state school ID FL-50-4030 not unique, not added
Record with NCES ID 120150008747 and state school ID FL-50-4031 not unique, not added
Record with NCES ID 170140306527 and state school ID IL-40-056-0340-26-4005603401001 not unique, not added
Record with NCES ID 190002710279 and state school ID IA-785510 000-785510 418 not unique, not added
Record with NCES ID 220033010320 and state school ID LA-010-010053 not unique, not added
Record with NCES ID 220102010339 and state school ID LA-032-032038 not unique, not added
Record with NCES ID 220105010412 and state school ID LA-033-033010 not unique, not added
Record with NCES ID 220129010374 and state school ID LA-040-040061 not unique, not added
Record with NCES ID 220165010400 and state school ID LA-052-052050 not unique, not added
Record with NCES ID 220165010403 and state school ID LA-052-052051 not unique, not added
Record with NCES ID 220168010414 and state school ID LA-053-053045 not unique, not added
Record with NCES ID 240012010500 and state school ID MD-03-031559 not unique, not added
Record with NCES ID 240027010501 and state school ID MD-08-080617 not unique, not added
Record with NCES ID 240048010502 and state school ID MD-15-150110 not unique, not added
Record with NCES ID 240048010503 and state school ID MD-15-150360 not unique, not added
Record with NCES ID 270032704983 and state school ID MN-526083-526083010 not unique, not added
Record with NCES ID 271833012683 and state school ID MN-012184-012184006 not unique, not added
Record with NCES ID 317802002371 and state school ID NE-160006000-160006005 not unique, not added
Record with NCES ID 330327610027 and state school ID NH-345-58934534523040 not unique, not added
Record with NCES ID 340076903036 and state school ID NJ-191376-050 not unique, not added
Record with NCES ID 340076903038 and state school ID NJ-191376-010 not unique, not added
Record with NCES ID 340076903048 and state school ID NJ-191376-030 not unique, not added
Record with NCES ID 341293003417 and state school ID NJ-234090-300 not unique, not added
Record with NCES ID 341629003449 and state school ID NJ-215210-302 not unique, not added
Record with NCES ID 350030001148 and state school ID NM-020-058 not unique, not added
Record with NCES ID 390441110729 and state school ID OH-044115-121954 not unique, not added
Record with NCES ID 390452010822 and state school ID OH-045203-064600 not unique, not added
Record with NCES ID 390503010724 and state school ID OH-050302-122887 not unique, not added
Record with NCES ID 400807010335 and state school ID OK-20-I099-20-I099-705 not unique, not added
Record with NCES ID 401059010484 and state school ID OK-55-I012-55-I012-155 not unique, not added
Record with NCES ID 401059010485 and state school ID OK-55-I012-55-I012-160 not unique, not added
Record with NCES ID 410192011348 and state school ID OR-00000000002243-00000000000000001270 not unique, not added
Record with NCES ID 462142010071 and state school ID SD-30003-02 not unique, not added
Record with NCES ID 468043010799 and state school ID SD-63003-02 not unique, not added
Record with NCES ID 480771017014 and state school ID TX-101902-101902042 not unique, not added
Record with NCES ID 480783015742 and state school ID TX-101903-101903006 not unique, not added
Record with NCES ID 480831016825 and state school ID TX-020902-020902003 not unique, not added
Record with NCES ID 480894022775 and state school ID TX-227901-227901179 not unique, not added
Record with NCES ID 480967017011 and state school ID TX-123910-123910048 not unique, not added
Record with NCES ID 481119016928 and state school ID TX-020905-020905115 not unique, not added
Record with NCES ID 481416016942 and state school ID TX-065901-065901041 not unique, not added
Record with NCES ID 481428017006 and state school ID TX-084910-084910121 not unique, not added
Record with NCES ID 481443016990 and state school ID TX-071901-071901104 not unique, not added
Record with NCES ID 481500017045 and state school ID TX-170902-170902010 not unique, not added
Record with NCES ID 481527017052 and state school ID TX-178904-178904056 not unique, not added
Record with NCES ID 481611016970 and state school ID TX-101907-101907133 not unique, not added
Record with NCES ID 481653016972 and state school ID TX-101908-101908044 not unique, not added
Record with NCES ID 481739017037 and state school ID TX-108902-108902045 not unique, not added
Record with NCES ID 481761017047 and state school ID TX-171901-171901002 not unique, not added
Record with NCES ID 481800016986 and state school ID TX-068901-068901005 not unique, not added
Record with NCES ID 481965016954 and state school ID TX-079907-079907131 not unique, not added
Record with NCES ID 481965016956 and state school ID TX-079907-079907133 not unique, not added
Record with NCES ID 481983017067 and state school ID TX-152907-152907002 not unique, not added
Record with NCES ID 482034016969 and state school ID TX-057909-057909147 not unique, not added
Record with NCES ID 482364016978 and state school ID TX-101912-101912080 not unique, not added
Record with NCES ID 482364016982 and state school ID TX-101912-101912097 not unique, not added
Record with NCES ID 482364017019 and state school ID TX-101912-101912071 not unique, not added
Record with NCES ID 482391017028 and state school ID TX-101913-101913005 not unique, not added
Record with NCES ID 482391017030 and state school ID TX-101913-101913118 not unique, not added
Record with NCES ID 482484012776 and state school ID TX-016901-016901101 not unique, not added
Record with NCES ID 482559017057 and state school ID TX-133903-133903106 not unique, not added
Record with NCES ID 482566015929 and state school ID TX-014906-014906126 not unique, not added
Record with NCES ID 482613017001 and state school ID TX-108912-108912044 not unique, not added
Record with NCES ID 482619017032 and state school ID TX-101916-101916009 not unique, not added
Record with NCES ID 482730016940 and state school ID TX-061902-061902047 not unique, not added
Record with NCES ID 482730016941 and state school ID TX-061902-061902123 not unique, not added
Record with NCES ID 482778017056 and state school ID TX-187907-187907101 not unique, not added
Record with NCES ID 482922017091 and state school ID TX-234905-234905101 not unique, not added
Record with NCES ID 482985016952 and state school ID TX-043907-043907043 not unique, not added
Record with NCES ID 483021017043 and state school ID TX-164901-164901041 not unique, not added
Record with NCES ID 483039016973 and state school ID TX-057914-057914656 not unique, not added
Record with NCES ID 483075017097 and state school ID TX-200902-200902002 not unique, not added
Record with NCES ID 483176017078 and state school ID TX-225902-225902105 not unique, not added
Record with NCES ID 483294015799 and state school ID TX-015910-015910054 not unique, not added
Record with NCES ID 483432017033 and state school ID TX-101917-101917007 not unique, not added
Record with NCES ID 483510016955 and state school ID TX-043910-043910130 not unique, not added
Record with NCES ID 483738016035 and state school ID TX-041902-041902002 not unique, not added
Record with NCES ID 483765017096 and state school ID TX-199901-199901106 not unique, not added
Record with NCES ID 483897017035 and state school ID TX-105902-105902043 not unique, not added
Record with NCES ID 483945016951 and state school ID TX-074911-074911101 not unique, not added
Record with NCES ID 483948016965 and state school ID TX-094902-094902002 not unique, not added
Record with NCES ID 484071016996 and state school ID TX-071909-071909003 not unique, not added
Record with NCES ID 484071016997 and state school ID TX-071909-071909114 not unique, not added
Record with NCES ID 484401017015 and state school ID TX-126908-126908041 not unique, not added
Record with NCES ID 484441017083 and state school ID TX-226906-226906002 not unique, not added
Record with NCES ID 540051001594 and state school ID WV-3300000-33235 not unique, not added

School seeding: done processing /tmp/ccd_sch_029_1819_w_1a_091019.csv.20210114-19911-1i1f6ib.
2659 new schools added.
24871 schools updated.
74553 schools unchanged (school considered unchanged if only update was adding new columns included in this import).
93 duplicate schools skipped.
State CS offerings deleted: 1588, state CS offerings reloaded: 1588
Among updated schools, these attributes were updated:
school_category: 99424
last_known_school_year_open: 97454
address_line1: 12356
name: 7897
state_school_id: 3744
address_line2: 3560
zip: 2190
city: 1215
school_type: 225
address_line3: 16
state: 11

bencodeorg added 12 commits December 16, 2020 17:37

Make full school seed possible when seeding all manually

1ff745e

Add dry run option and handle state CS offerings on ID update

f51f489

Import 2018-2019 schools data

494050a

Add comment describing to do in deleting state CS offerings

6e2e7fa

Add tests for dry run merge_from_csv

bd6ffee

Add tests for deleting and reloading state CS offerings [WIP]

bc32c4c

Delete and recreate state CS offerings for schools with new state sch…

02321a2

…ool IDs

Handle blank school table in school factory

089d69c

Add test for reloading state CS offerings

5e0351f

Remove latitude/longitude seeding for another PR

35f20be

Temporarily stop full seeding of schools table

eabb468

Remove extraneous comment

1494c05

bencodeorg requested review from a team January 6, 2021 22:43

bencodeorg added 2 commits January 6, 2021 14:52

Add future tense to dry run logging

3623dad

Typo in logging

bb382c3

bencodeorg mentioned this pull request Jan 8, 2021

Import latitude and longitude for 2018-2019 schools #38459

Merged

8 tasks

bencodeorg added 2 commits January 7, 2021 17:35

Merge branch 'staging' into 2018-school-import

7651c83

Merge staging to get fix for failing UI test

6ccc90a

bencodeorg requested a review from clareconstantine January 9, 2021 00:58

bencodeorg changed the title ~~2018 school import~~ 2018-2019 school import Jan 9, 2021

clareconstantine approved these changes Jan 11, 2021

View reviewed changes

bencodeorg merged commit 48fd996 into staging Jan 13, 2021

bencodeorg deleted the 2018-school-import branch January 13, 2021 00:09

bencodeorg mentioned this pull request Jan 16, 2021

Turn back on full school seeding #38612

Merged

tim-dot-org mentioned this pull request Apr 27, 2021

2019-2020 NCES import #40287

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2018-2019 school import #38432

2018-2019 school import #38432

bencodeorg commented Jan 6, 2021 •

edited

clareconstantine left a comment

clareconstantine Jan 11, 2021

bencodeorg Jan 11, 2021

clareconstantine Jan 11, 2021

bencodeorg Jan 11, 2021

clareconstantine Jan 11, 2021

bencodeorg commented Jan 16, 2021

2018-2019 school import #38432

2018-2019 school import #38432

Conversation

bencodeorg commented Jan 6, 2021 • edited

Testing story

Reviewer Checklist:

clareconstantine left a comment

Choose a reason for hiding this comment

clareconstantine Jan 11, 2021

Choose a reason for hiding this comment

bencodeorg Jan 11, 2021

Choose a reason for hiding this comment

clareconstantine Jan 11, 2021

Choose a reason for hiding this comment

bencodeorg Jan 11, 2021

Choose a reason for hiding this comment

clareconstantine Jan 11, 2021

Choose a reason for hiding this comment

bencodeorg commented Jan 16, 2021

bencodeorg commented Jan 6, 2021 •

edited