Create a FERC Form 1 Super DB Schema #288

zaneselvans · 2019-05-28T19:42:02Z

Currently the multi-annual FERC Form 1 DB that we build is based on a schema that's an awkward hybrid of the mappings between unreadable DBF filenames and actual table names that we found online, which pertains to the 2015 data, and a our parsing of the strings in the DBC file for whatever reference year we're trying to build a database for -- typically the most recent data release year.

This means that there are some kinds of data that we could be missing -- if tables and/or columns existed in the past but don't exist now, we'll probably miss them when reading in earlier years of data. This doesn't appear to be common, but it could trip someone up.

The "right way" to fix this seems like it's probably to create a super-schema that contains all of the tables that have ever existed and all of the columns that have ever existed, by parsing the DBC files for all of the data years... and crossing our fingers that the data types didn't change over the years, or there's one data type that can be imposed on each column which all the data can be forced into. We would also need to figure out the historical mappings between database table names and F1_whatevs.DBF filenames.

zaneselvans · 2019-05-29T21:01:15Z

I have verified that the 2015 & 2017 FERC Form 1 database schemas are identical, and that they include all the tables and fields which exist in all previous years of data going back as far as 1994, so the super-schema would super-fluous. An ETL test has been integrated (test/etl_test.py::test_ferc1_lost_data()) which verifies that this situation contiues to be true (and checks whether any new tables have been added in future years, relative to our 2015 mapping of DBF files to database table names, so we can update that if need be.

zaneselvans added the ferc1 Anything having to do with FERC Form 1 label May 28, 2019

zaneselvans added this to the future release milestone May 28, 2019

zaneselvans added this to To do in FERC_Form_1 via automation May 28, 2019

zaneselvans self-assigned this May 29, 2019

zaneselvans closed this as completed May 29, 2019

FERC_Form_1 automation moved this from To do to Done May 29, 2019

zaneselvans modified the milestones: future release, 0.1.0 Jun 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a FERC Form 1 Super DB Schema #288

Create a FERC Form 1 Super DB Schema #288

zaneselvans commented May 28, 2019

zaneselvans commented May 29, 2019

Create a FERC Form 1 Super DB Schema #288

Create a FERC Form 1 Super DB Schema #288

Comments

zaneselvans commented May 28, 2019

zaneselvans commented May 29, 2019