New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address issue where 861 ETL fails w/o all years of data #1671
Address issue where 861 ETL fails w/o all years of data #1671
Conversation
update to `pudl.transform.eia861.balancing_authority` to address catalyst-cooperative#828 and allow the 861 ETL to run w/o all years of data
It looks like the same failing test across the three ci-tests. It's I looked a little deeper and noticed that the beginning of 'Run PyTest with Tox' looks different on this PR than #1656. Here it starts:
Whereas on on #1656 it looks like this:
The missing API_KEY_EIA would explain the failure, though I'm not sure how to fix this issue or if I even can. On the ci-notify action, this PR is missing the |
Hey @arengel, yes, the tests are failing on here because you don't have access to the repository github secrets as an outside contributor. The existing tests only check whether all years can be processed together, so while this change seems like it's fine, and doesn't break the tests, we don't really know how the system will behave when only partial sets of years are processed. In general within PUDL we've been ensuring that both all the data can be processed, and that just the most recent year of data can be processed (since we typically just process the most recent year in the integration tests). Though in the case of the EIA-861 we've only tested that all of the data can be processed. I pulled down this branch and experimented with a few different combinations of years, and unsurprisingly most combinations failed. Just doing 2020, or 2019-2020, or 2005-2020 all failed for reasons other than the problem fixed by this PR, sometimes in other tables. One could also restrict the list of tables that are being processed, but the BA table in particular draws BA IDs and state associations from across a large number of tables to construct the final table, so you won't get the same outputs necessarily if you don't process all of the tables. Is there a reason why you're trying to avoid processing all of the years of data? On my machine doing all the EIA-861 tables for all the years takes ~3 minutes. |
Thanks for taking a look at this @zaneselvans! And also for taking the time to test and explain the implications of what I was proposing, I definitely have a better understanding of how PUDL works now. To answer your question, I was looking for a way to centrally control the years that we get when pulling 861 data in a way similar to how setting Given that the goal here is convenience and not performance, we don’t want to affect the processing at all. We certainly don’t want the output to change based on the set of years we pull out or the set of tables processed. It’s now clear that even if using Eia861Settings to set the years worked (as in didn’t raise errors) it’s not what we’re looking for. Instead we’ll select the years we want from the data we get from I’ll leave this PR open in case this change adds useful flexibility for PUDL but we definitely don’t need it for the Hub. |
Yeah I think the right thing to do is just select the years you want from the table after the whole thing is returned. However, this PR still seems like an improvement -- there's no reason that the BA function should totally crash and burn just because some of the fixes are outside the range of years being processed, and I've tested that it produces exactly the same output as the current code when all years are processed, so I'm inclined to go ahead and merge it, even if it doesn't satisfy your original desire. |
Proposing a minor change to how balancing authority fixes are applied so that not all years of data are required to make the fixes work. This should address #828.
In the existing process, when I try do something like this:
I get an error in these lines of code because there are values in
BA_ID_NAME_FIXES.index
that are not indf.index
.pudl/src/pudl/transform/eia861.py
Lines 859 to 861 in 6e71bfa
I played around a little bit and I think replacing that with the merge I suggest here addresses the problem and all the tests appear to pass when I run them locally.
Given the small size of the change, I didn't think any additional documentation beyond the brief comment was needed but happy to add if that would be helpful.