Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demand mapping #717

Merged
merged 25 commits into from
Aug 27, 2020
Merged

Demand mapping #717

merged 25 commits into from
Aug 27, 2020

Conversation

yashkumar1803
Copy link
Contributor

Added new visualization functions, which can be analyzed in the electricity-demand-mapping repo

@codecov
Copy link

codecov bot commented Aug 7, 2020

Codecov Report

Merging #717 into sprint22 will decrease coverage by 3.17%.
The diff coverage is 6.73%.

Impacted file tree graph

@@             Coverage Diff              @@
##           sprint22     #717      +/-   ##
============================================
- Coverage     74.39%   71.22%   -3.17%     
============================================
  Files            39       39              
  Lines          4639     4819     +180     
============================================
- Hits           3451     3432      -19     
- Misses         1188     1387     +199     
Impacted Files Coverage Δ
src/pudl/analysis/service_territory.py 21.60% <0.00%> (-0.35%) ⬇️
src/pudl/helpers.py 87.02% <ø> (ø)
src/pudl/output/ferc714.py 17.74% <ø> (ø)
src/pudl/transform/eia861.py 96.57% <ø> (ø)
src/pudl/analysis/demand_mapping.py 9.52% <6.80%> (-2.21%) ⬇️
src/pudl/workspace/datastore.py 43.02% <0.00%> (-17.88%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 200dab7...e5bab5f. Read the comment docs.

@zaneselvans
Copy link
Member

Hey @yashkumar1803 it's not technically part of this PR, but I do want to check in about a few things in the most recent version of your notebook, over in the demand mapping repo.

  • Is there a reason that you're pulling the tract level layer of the census geometries, and then dissolving them to the county level, rather than directly using the provided county layer of the census data?
  • There's a function integrated into PUDL for obtaining the Census DP1 and storing it locally:
    • pudl.analysis.service_territory.get_census2010_gdf() -- and you can set the desired layer to state, county, or tract.
  • The FERC 714 doesn't need to be obtained independently, as it will be downloaded automatically when you ask for the FERC 714 ETL to be run, which happens automatically inside of the PUDL output object (including inside of the FERC 714 Respondents class). All access to the FERC 714 data should be made through the PUDL output object, the interface ought to be pretty stable at this point. Also, it runs the ETL, so you don't need to do the extract / transform steps in the notebook anywhere. So for example...
pudl_settings = pudl.workspace.setup.get_defaults()
pudl_engine = sa.create_engine(pudl_settings['pudl_db'])
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine)

# Get the county level Census geometries / data:
county_gdf = pudl.analysis.service_territory.get_census2010_gdf(pudl_settings, layer="counties")

ferc714_out = pudl.output.ferc714.Respondents(pudl_out)
ba_county_map = ferc714_out.georef_counties()

# FERC 714 hourly demand data:
dhpa_ferc714 = pudl_out.demand_hourly_pa_ferc714()
  • pd.read_csv() is happy to take a pathlib.Path object as an argument -- you don't need to str() the path first.
  • To select records from a given year, when you have a date field, you can use the datetime accessor for the datetime column so like ba_county_map_2010 = ba_county_map[ba_county_map.report_date.dt.year == 2010]

@zaneselvans zaneselvans changed the base branch from sprint20 to sprint21 August 10, 2020 23:29
Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get to everything, but there's a lot here already. We should probably merge it, but keep working on the module as a whole in smaller chunks, one function at a time.

src/pudl/analysis/demand_mapping.py Outdated Show resolved Hide resolved
src/pudl/analysis/demand_mapping.py Outdated Show resolved Hide resolved
src/pudl/analysis/demand_mapping.py Outdated Show resolved Hide resolved
src/pudl/analysis/demand_mapping.py Outdated Show resolved Hide resolved
src/pudl/analysis/demand_mapping.py Outdated Show resolved Hide resolved
['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])


def uncovered_area_mismatch(disagg_geom, total_geom, title="Area Coverage (By Planning Area)"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the notebook output for this map, the PJM territory seems vastly overrepresented, since there are several FERC 714 respondent IDs that have been assigned to the PJM EIA ID. However, only one of them has any demand associated with it. Would it make sense to exclude any area that doesn't have any reported demand at all in the year in question from consideration? Probable sometime early on -- the annualized() version of the FERC714 respondents output includes the annual sum of that respondent's reported demand.

src/pudl/analysis/demand_mapping.py Show resolved Hide resolved
src/pudl/analysis/demand_mapping.py Show resolved Hide resolved
src/pudl/analysis/demand_mapping.py Outdated Show resolved Hide resolved

def error_heatmap(alloc_df, actual_df, demand_columns, region_col="pca", error_metric="r2", leap_exception=False):
"""
Create heatmap of 365X24 dimension to visualize the annual hourly error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does a very particular thing, and I think it might be better separated into a few smaller functions that are more reusable -- like one which, given a year of hourly data (a Datetime index + a data column) makes a heatmap, and another separate one which takes 2 hourly demand allocations and calculates the delta between them, the output for which can be fed into the plotting function. That we we can make other 365 x 24 heatmaps to show other kinds of variables that make sense that way too.

Visually, can we scale it down so it fits on one screen? I think the Y-axis can also be simplified to have day-of-year (integer) as the labels. Could label every 7 days to highlight the weekly pattern? Could just have the first letters of the months JFMAMJJASOND. Allowing the UTC time to be localized for display so that the zero hour is local zero would also be good, since it would make the familiar diurnal pattern clearer, and allow uniform visual comparison between different plots of this type.

Added `pudl.analysis.demand_mapping.sales_ratio_by_class_fips()` function
which uses the EIA 861 Sales and Service Territory tables to estimate
the breakdown of electricity sales to residential, commercial, industrial,
and transportation customers in each year and county.

Closes #720
@zaneselvans zaneselvans linked an issue Aug 20, 2020 that may be closed by this pull request
yashkumar1803 and others added 4 commits August 21, 2020 18:22
* Along with the county geometries, bring in county area and population in the
  `pudl.analysis.service_territory.add_geometry()` function. This means
  calculating the true areas of the counties in a projected (equal area)
  coordinate system.
* When service territory geometries are being dissolved, sum the areas and
  populations similarly to keep them self-consistent.
* In the FERC 714 territory demand summary method, also make sure that the
  population and area are available, and calculate some informative ratios
  (population density, demand per unit area, and demand per capita) for use in
  identifying bad service territory geometries.
* Add the mccabe Flake8 plugin to the pudl-dev to calculate function complexity

Progress toward #716
@zaneselvans zaneselvans changed the base branch from sprint21 to sprint22 August 27, 2020 14:42
@zaneselvans zaneselvans merged commit bbf3907 into sprint22 Aug 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calculate metric of customer class electricity sales splits
2 participants