Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a FERC Form 1 Super DB Schema #288

Closed
zaneselvans opened this issue May 28, 2019 · 1 comment
Closed

Create a FERC Form 1 Super DB Schema #288

zaneselvans opened this issue May 28, 2019 · 1 comment
Assignees
Labels
ferc1 Anything having to do with FERC Form 1
Projects
Milestone

Comments

@zaneselvans
Copy link
Member

Currently the multi-annual FERC Form 1 DB that we build is based on a schema that's an awkward hybrid of the mappings between unreadable DBF filenames and actual table names that we found online, which pertains to the 2015 data, and a our parsing of the strings in the DBC file for whatever reference year we're trying to build a database for -- typically the most recent data release year.

This means that there are some kinds of data that we could be missing -- if tables and/or columns existed in the past but don't exist now, we'll probably miss them when reading in earlier years of data. This doesn't appear to be common, but it could trip someone up.

The "right way" to fix this seems like it's probably to create a super-schema that contains all of the tables that have ever existed and all of the columns that have ever existed, by parsing the DBC files for all of the data years... and crossing our fingers that the data types didn't change over the years, or there's one data type that can be imposed on each column which all the data can be forced into. We would also need to figure out the historical mappings between database table names and F1_whatevs.DBF filenames.

@zaneselvans zaneselvans added the ferc1 Anything having to do with FERC Form 1 label May 28, 2019
@zaneselvans zaneselvans added this to the future release milestone May 28, 2019
@zaneselvans zaneselvans added this to To do in FERC_Form_1 via automation May 28, 2019
@zaneselvans zaneselvans self-assigned this May 29, 2019
@zaneselvans
Copy link
Member Author

I have verified that the 2015 & 2017 FERC Form 1 database schemas are identical, and that they include all the tables and fields which exist in all previous years of data going back as far as 1994, so the super-schema would super-fluous. An ETL test has been integrated (test/etl_test.py::test_ferc1_lost_data()) which verifies that this situation contiues to be true (and checks whether any new tables have been added in future years, relative to our 2015 mapping of DBF files to database table names, so we can update that if need be.

FERC_Form_1 automation moved this from To do to Done May 29, 2019
@zaneselvans zaneselvans modified the milestones: future release, 0.1.0 Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ferc1 Anything having to do with FERC Form 1
Projects
No open projects
FERC_Form_1
  
Done
Development

No branches or pull requests

1 participant