Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2018-2019 school import #38432

Merged
merged 16 commits into from
Jan 13, 2021
Merged

2018-2019 school import #38432

merged 16 commits into from
Jan 13, 2021

Conversation

bencodeorg
Copy link
Contributor

@bencodeorg bencodeorg commented Jan 6, 2021

Updates to import 2018-2019 schools. This PR includes:

  1. Reference to new file (in S3) to import, and a block to process that file.
  2. New dry run option, to try seeding a file before actually conducting the import.
  3. More logging to tell about the types of changes that are to be made (in a dry run) or were made (in a non-dry run).
  4. Handle updates to state_school_id, which (without these changes) is blocked by the foreign key relationship to state_cs_offerings. The approach here is to destroy (delete) the existing state CS offerings temporarily, update the school's state_school_id, then recreate the state_cs_offerings.

To do (specific to this import):

  1. Run School.seed_from_s3 manually in production, so that I can a) run it manually in test first, then monitor its progress in production. It's also pretty slow, so it won't slow down a production release.
  2. Uncomment the commented out code that prevents a full seed from happening as part of our normal seeding process.

(other) to do in NCES import space:

  1. Import latitude and longitude updates to the schools table.
  2. Add demographic data to school_stats_by_year.

Testing story

I've included tests to cover the dry run import of a CSV, and the new reconstitution of deleted state CS offerings.

I also tested the import manually, by seeding my schools table from scratch (without the new block for 2018-2019 data), then after adding the block running `School.seed_from_s3(stub_school_data: false)

Result:

School seeding: done processing /var/folders/xp/9jb41z0n0y1fc7c_g45tgsqc0000gn/T/ccd_sch_029_1819_w_1a_091019.csv.20210105-49225-hv4ges.
2659 new schools added.
24871 schools updated.
74553 schools unchanged (school considered changed if only update was adding new columns included in this import).
93 duplicate schools skipped.
Among updated schools, these attributes were updated:
school_category: 99423
last_known_school_year_open: 97453
address_line1: 12356
name: 7896
state_school_id: 3744
address_line2: 3560
zip: 2190
city: 1215
school_type: 224
address_line3: 16
state: 11

Reviewer Checklist:

  • Tests provide adequate coverage
  • Privacy and Security impacts have been assessed
  • Code is well-commented
  • New features are translatable or updates will not break translations
  • Relevant documentation has been added or updated
  • User impact is well-understood and desirable
  • Pull Request is labeled appropriately
  • Follow-up work items (including potential tech debt) are tracked and linked

@bencodeorg bencodeorg requested review from a team January 6, 2021 22:43
@bencodeorg bencodeorg changed the title 2018 school import 2018-2019 school import Jan 9, 2021
Copy link

@clareconstantine clareconstantine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! The comments were very helpful. This looks good to me overall!

else
School.seed_from_s3
#else
#School.seed_from_s3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just confirming - this is the change to prevent the seeding from happening with the new file before we do a dry run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah pretty much -- with school districts I actually did a non-dry run on staging (which appeared to have the entire school district dataset, which test didn't surprisingly), then a non-dry run on production. Basically just wanted to be able to run it manually to keep an eye on it if anything went wrong.

id: row['NCESSCH'].to_i.to_s,
name: row['SCH_NAME'].upcase,
# Four schools with addresses longer than 50 characters (DB column limit)
# Also four schools with second address line longer than 30 characters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with those schools?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm truncating them (was getting DB errors on import without the truncation, figured cutting these four schools down in length wasn't a huge deal and a lot easier than expanding the allowed character limit for the columns via migration).

new_attributes: new_attributes,
&parse_row
)
ensure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! :)

@bencodeorg bencodeorg merged commit 48fd996 into staging Jan 13, 2021
@bencodeorg bencodeorg deleted the 2018-school-import branch January 13, 2021 00:09
@bencodeorg
Copy link
Contributor Author

Production import:

Record with NCES ID 20018010080 and state school ID AK-05-050400 not unique, not added
Record with NCES ID 20018010162 and state school ID AK-05-051050 not unique, not added
Record with NCES ID 110003000511 and state school ID DC-001-420 not unique, not added
Record with NCES ID 120039010618 and state school ID FL-13-8131 not unique, not added
Record with NCES ID 120048008536 and state school ID FL-16-0581 not unique, not added
Record with NCES ID 120144008605 and state school ID FL-48-1852 not unique, not added
Record with NCES ID 120144008698 and state school ID FL-48-1851 not unique, not added
Record with NCES ID 120150008724 and state school ID FL-50-4030 not unique, not added
Record with NCES ID 120150008747 and state school ID FL-50-4031 not unique, not added
Record with NCES ID 170140306527 and state school ID IL-40-056-0340-26-4005603401001 not unique, not added
Record with NCES ID 190002710279 and state school ID IA-785510 000-785510 418 not unique, not added
Record with NCES ID 220033010320 and state school ID LA-010-010053 not unique, not added
Record with NCES ID 220102010339 and state school ID LA-032-032038 not unique, not added
Record with NCES ID 220105010412 and state school ID LA-033-033010 not unique, not added
Record with NCES ID 220129010374 and state school ID LA-040-040061 not unique, not added
Record with NCES ID 220165010400 and state school ID LA-052-052050 not unique, not added
Record with NCES ID 220165010403 and state school ID LA-052-052051 not unique, not added
Record with NCES ID 220168010414 and state school ID LA-053-053045 not unique, not added
Record with NCES ID 240012010500 and state school ID MD-03-031559 not unique, not added
Record with NCES ID 240027010501 and state school ID MD-08-080617 not unique, not added
Record with NCES ID 240048010502 and state school ID MD-15-150110 not unique, not added
Record with NCES ID 240048010503 and state school ID MD-15-150360 not unique, not added
Record with NCES ID 270032704983 and state school ID MN-526083-526083010 not unique, not added
Record with NCES ID 271833012683 and state school ID MN-012184-012184006 not unique, not added
Record with NCES ID 317802002371 and state school ID NE-160006000-160006005 not unique, not added
Record with NCES ID 330327610027 and state school ID NH-345-58934534523040 not unique, not added
Record with NCES ID 340076903036 and state school ID NJ-191376-050 not unique, not added
Record with NCES ID 340076903038 and state school ID NJ-191376-010 not unique, not added
Record with NCES ID 340076903048 and state school ID NJ-191376-030 not unique, not added
Record with NCES ID 341293003417 and state school ID NJ-234090-300 not unique, not added
Record with NCES ID 341629003449 and state school ID NJ-215210-302 not unique, not added
Record with NCES ID 350030001148 and state school ID NM-020-058 not unique, not added
Record with NCES ID 390441110729 and state school ID OH-044115-121954 not unique, not added
Record with NCES ID 390452010822 and state school ID OH-045203-064600 not unique, not added
Record with NCES ID 390503010724 and state school ID OH-050302-122887 not unique, not added
Record with NCES ID 400807010335 and state school ID OK-20-I099-20-I099-705 not unique, not added
Record with NCES ID 401059010484 and state school ID OK-55-I012-55-I012-155 not unique, not added
Record with NCES ID 401059010485 and state school ID OK-55-I012-55-I012-160 not unique, not added
Record with NCES ID 410192011348 and state school ID OR-00000000002243-00000000000000001270 not unique, not added
Record with NCES ID 462142010071 and state school ID SD-30003-02 not unique, not added
Record with NCES ID 468043010799 and state school ID SD-63003-02 not unique, not added
Record with NCES ID 480771017014 and state school ID TX-101902-101902042 not unique, not added
Record with NCES ID 480783015742 and state school ID TX-101903-101903006 not unique, not added
Record with NCES ID 480831016825 and state school ID TX-020902-020902003 not unique, not added
Record with NCES ID 480894022775 and state school ID TX-227901-227901179 not unique, not added
Record with NCES ID 480967017011 and state school ID TX-123910-123910048 not unique, not added
Record with NCES ID 481119016928 and state school ID TX-020905-020905115 not unique, not added
Record with NCES ID 481416016942 and state school ID TX-065901-065901041 not unique, not added
Record with NCES ID 481428017006 and state school ID TX-084910-084910121 not unique, not added
Record with NCES ID 481443016990 and state school ID TX-071901-071901104 not unique, not added
Record with NCES ID 481500017045 and state school ID TX-170902-170902010 not unique, not added
Record with NCES ID 481527017052 and state school ID TX-178904-178904056 not unique, not added
Record with NCES ID 481611016970 and state school ID TX-101907-101907133 not unique, not added
Record with NCES ID 481653016972 and state school ID TX-101908-101908044 not unique, not added
Record with NCES ID 481739017037 and state school ID TX-108902-108902045 not unique, not added
Record with NCES ID 481761017047 and state school ID TX-171901-171901002 not unique, not added
Record with NCES ID 481800016986 and state school ID TX-068901-068901005 not unique, not added
Record with NCES ID 481965016954 and state school ID TX-079907-079907131 not unique, not added
Record with NCES ID 481965016956 and state school ID TX-079907-079907133 not unique, not added
Record with NCES ID 481983017067 and state school ID TX-152907-152907002 not unique, not added
Record with NCES ID 482034016969 and state school ID TX-057909-057909147 not unique, not added
Record with NCES ID 482364016978 and state school ID TX-101912-101912080 not unique, not added
Record with NCES ID 482364016982 and state school ID TX-101912-101912097 not unique, not added
Record with NCES ID 482364017019 and state school ID TX-101912-101912071 not unique, not added
Record with NCES ID 482391017028 and state school ID TX-101913-101913005 not unique, not added
Record with NCES ID 482391017030 and state school ID TX-101913-101913118 not unique, not added
Record with NCES ID 482484012776 and state school ID TX-016901-016901101 not unique, not added
Record with NCES ID 482559017057 and state school ID TX-133903-133903106 not unique, not added
Record with NCES ID 482566015929 and state school ID TX-014906-014906126 not unique, not added
Record with NCES ID 482613017001 and state school ID TX-108912-108912044 not unique, not added
Record with NCES ID 482619017032 and state school ID TX-101916-101916009 not unique, not added
Record with NCES ID 482730016940 and state school ID TX-061902-061902047 not unique, not added
Record with NCES ID 482730016941 and state school ID TX-061902-061902123 not unique, not added
Record with NCES ID 482778017056 and state school ID TX-187907-187907101 not unique, not added
Record with NCES ID 482922017091 and state school ID TX-234905-234905101 not unique, not added
Record with NCES ID 482985016952 and state school ID TX-043907-043907043 not unique, not added
Record with NCES ID 483021017043 and state school ID TX-164901-164901041 not unique, not added
Record with NCES ID 483039016973 and state school ID TX-057914-057914656 not unique, not added
Record with NCES ID 483075017097 and state school ID TX-200902-200902002 not unique, not added
Record with NCES ID 483176017078 and state school ID TX-225902-225902105 not unique, not added
Record with NCES ID 483294015799 and state school ID TX-015910-015910054 not unique, not added
Record with NCES ID 483432017033 and state school ID TX-101917-101917007 not unique, not added
Record with NCES ID 483510016955 and state school ID TX-043910-043910130 not unique, not added
Record with NCES ID 483738016035 and state school ID TX-041902-041902002 not unique, not added
Record with NCES ID 483765017096 and state school ID TX-199901-199901106 not unique, not added
Record with NCES ID 483897017035 and state school ID TX-105902-105902043 not unique, not added
Record with NCES ID 483945016951 and state school ID TX-074911-074911101 not unique, not added
Record with NCES ID 483948016965 and state school ID TX-094902-094902002 not unique, not added
Record with NCES ID 484071016996 and state school ID TX-071909-071909003 not unique, not added
Record with NCES ID 484071016997 and state school ID TX-071909-071909114 not unique, not added
Record with NCES ID 484401017015 and state school ID TX-126908-126908041 not unique, not added
Record with NCES ID 484441017083 and state school ID TX-226906-226906002 not unique, not added
Record with NCES ID 540051001594 and state school ID WV-3300000-33235 not unique, not added

School seeding: done processing /tmp/ccd_sch_029_1819_w_1a_091019.csv.20210114-19911-1i1f6ib.
2659 new schools added.
24871 schools updated.
74553 schools unchanged (school considered unchanged if only update was adding new columns included in this import).
93 duplicate schools skipped.
State CS offerings deleted: 1588, state CS offerings reloaded: 1588
Among updated schools, these attributes were updated:
school_category: 99424
last_known_school_year_open: 97454
address_line1: 12356
name: 7897
state_school_id: 3744
address_line2: 3560
zip: 2190
city: 1215
school_type: 225
address_line3: 16
state: 11

@tim-dot-org tim-dot-org mentioned this pull request Apr 27, 2021
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants