Notebook land: intro notebooks for CEMS and output tables #823

aesharpe · 2020-11-12T18:49:27Z

This branch contains the work-in-progress notebooks related to RMI work and dataset tours.

CEMS_by_utility.ipynb is what I used to aggregate CEMS data by utility for RMI. For now, you can ignore this (it was just a tool for me).

explore-CEMS.ipynb is the CEMS intro notebook, woo! This one is close to done (with a few commented out map portions RE: hawaii and alaska that I wasn't sure how to adapt to the new map)

explore-output-tables.ipynb is the beginnings of an intro to output tables book. This one still needs lots of work / direction / correction. Should I give examples of all the tables or should it just serve as a hub of information about the tables (either permanently or until we get meta-data for the output tables).

…tion

…te a string

…o generators where applicable.

codecov · 2020-11-12T19:21:44Z

Codecov Report

Merging #823 (ed5266e) into sprint27 (ce1ea8f) will increase coverage by 0.34%.
The diff coverage is 20.00%.

@@             Coverage Diff              @@
##           sprint27     #823      +/-   ##
============================================
+ Coverage     70.31%   70.66%   +0.34%     
============================================
  Files            39       40       +1     
  Lines          4871     4887      +16     
============================================
+ Hits           3425     3453      +28     
+ Misses         1446     1434      -12

Impacted Files	Coverage Δ
src/pudl/analysis/service_territory.py	`21.77% <0.00%> (ø)`
src/pudl/output/eia860.py	`100.00% <ø> (ø)`
src/pudl/output/pudltabl.py	`58.25% <0.00%> (ø)`
src/pudl/output/epacems.py	`25.00% <25.00%> (ø)`
src/pudl/workspace/datastore.py	`61.05% <0.00%> (+12.63%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce1ea8f...ed5266e. Read the comment docs.

zaneselvans · 2020-11-25T17:39:30Z

notebooks/examples/CEMS.ipynb

@@ -0,0 +1,1130 @@
+{


Let's see if we can use the automatically generated Jupyter table of contents instead of trying to maintain one by hand, which may get frustrating with edits to the notebook over time. It's currently a plugin, but will be integrated into the juplyterlab core in 3.0 (which is coming out momentarily)

Reply via ReviewNB

oooh yes! that would make a big difference

If you don't have it installed already the plugin is called jupyterlab-toc (you can install it directly from the plugin manager on the left-hand-side of JupyterLab -- it looks like a puzzle piece)

zaneselvans · 2020-11-25T17:39:31Z

notebooks/examples/CEMS.ipynb

@@ -0,0 +1,1130 @@
+{


geoplot isn't part of the pudl-dev environment, and doesn't seem to be getting used below. Do we need it? It caused this cell to fail for me.

Reply via ReviewNB

Oh yeah, that was leftover from some other maps I that didn't make the cut

zaneselvans · 2020-11-25T17:39:31Z

notebooks/examples/CEMS.ipynb

@@ -0,0 +1,1130 @@
+{


I think we will always want users to access only the root of the parquet dataset -- Dask and other tools that known how to work with Parquet understand how to work with the hierarchy / partitioning efficiently. See my edits to the notebook.

Reply via ReviewNB

zaneselvans · 2020-11-25T17:39:31Z

notebooks/examples/CEMS.ipynb

@@ -0,0 +1,1130 @@
+{


This is a little bit dangerous -- it'll trigger the full computation of any tasks that exist in the dataframe. In this case it works okay because you've only just read it in, and the number of records is stored in the parquet file metadata, but in general this could be very computationally intensive.

Reply via ReviewNB

hmmm good to know.

zaneselvans · 2020-11-25T17:39:31Z

notebooks/examples/CEMS.ipynb

@@ -0,0 +1,1130 @@
+{


You can also do dd.dtypes to get a list of both the column names, and their data types, which is a bit more informative.

Reply via ReviewNB

better, agreed.

zaneselvans · 2020-11-25T17:39:31Z

notebooks/examples/CEMS.ipynb

@@ -0,0 +1,1130 @@
+{


This kind of iteration and aggregation is what Dask is designed to do automatically. If you point it at the top level of the dataset, it will do all this work for you seamlessly under the hood.

Reply via ReviewNB

zaneselvans · 2020-11-25T17:39:31Z

notebooks/examples/CEMS.ipynb

@@ -0,0 +1,1130 @@
+{


Man, is Texas really that bad? This seems like a crazy outlier. Might there be some bad data somewhere? It might be more illustrative to plot the GHG emissions per capita on a state by state basis, since you've got the census data handy already.

Reply via ReviewNB

zaneselvans · 2020-11-25T17:39:32Z

notebooks/examples/CEMS.ipynb

@@ -0,0 +1,1130 @@
+{


Hmm yeah, these shapefiles are a little janky aren't they, since they include coastal waters not just the land boundaries. Let's use contextily to get a nice basemap to display this info.

Reply via ReviewNB

Docs here: https://geopandas.org/gallery/plotting_basemap_background.html

zaneselvans · 2020-11-25T17:39:32Z

notebooks/examples/CEMS.ipynb

@@ -0,0 +1,1130 @@
+{


For time-series based aggregation, take a look at the pandas.Grouper documentation -- there's a bunch of time-specific functionality that knows how to work with datetime columns directly at whatever frequency you're interested in.

Reply via ReviewNB

zaneselvans · 2020-11-25T17:39:32Z

notebooks/examples/CEMS.ipynb

@@ -0,0 +1,1130 @@
+{


I think we could make a really cool comparative seasonal load chart here with seaborn -- with a grid of different states and a statistical envelope showing the monthly load for each state across all of the years of data.

Reply via ReviewNB

ooo that would be cool!

…into notebook-land

…put docs

aesharpe added 27 commits September 7, 2020 11:14

minor notebook changes

6774f9f

Merge branch 'tweak_tables_eia861' into data_validation_one_table_eia861

ab92be0

changes to data-validation notebook

058ab94

added explore-data-validation notebook

07194b0

added one 861 table to outputs, updated data_validation notebook

3be953c

further work in data validation notebook to use niels' suggested func…

d2641d9

…tion

working on CEMS notebook

ffc2b3f

fixed problem with Dask groupby and categorical columns by making sta…

bbfde90

…te a string

added proportional coordinates map to notebook

7485668

Merge branch 'sprint24' into data_validation_one_table_eia861

c93ba00

progress on CEMS notebook, RMI mega table

55cee3b

notebook for grouping CEMS data by utility

360d165

small tweak to CEMS by utility aggregation notebook

4a907cd

Merge branch 'ferc1-eia-2019' into cems_playground

03207dc

final updates to CEMS-utility aggregation for RMI

805dd27

tweaks to cems notebooks

a10deed

adapt CEMS-utility notebook to work for any utility

5b1cf5f

get rid of outliers in fuel_cost_per_mwh calculation CEMS notebook

3197091

add more to fuel_cost CEMS analysis -- need to pull in MUL data

b2a1d08

update CEMS_to_utility notebook to incorporate EPA-EIA maping units t…

7da704e

…o generators where applicable.

additions to CEMS tour notebook

a8dc95b

Merge branch 'sprint27' into cems_playground

2ba1bc8

finalize CEMS notebook for review

70bdb42

Merge branch 'sprint27' into notebook-land

d77949d

added explore notebook for output tables; began populating outline

655ecc5

make output table descriptions into a table

4bbeea3

tiny tweaks to output table notebook

e8d6e11

aesharpe changed the title ~~Notebook land~~ Notebook land: intro notebooks for CEMS and output tables Nov 12, 2020

Added basic JupyterHub notebook for 2i2c pilot

dac3e90

aesharpe and others added 2 commits November 20, 2020 18:51

correct tiny spelling error

bc014bc

Add 2i2c JupyterHub test notebook

f2382f4

zaneselvans reviewed Nov 25, 2020

View reviewed changes

zaneselvans added 5 commits November 25, 2020 13:29

Updates to EPA CEMS notebook alongside PR 822

4352c0c

Merge branch 'notebook-land' of github.com:catalyst-cooperative/pudl …

130de48

…into notebook-land

Merge branch 'jupyterhub-test' into notebook-land

1421356

Merge branch 'sprint27' into notebook-land

02783e4

Fix docstring formatting in epacems output functions. Add epacems out…

ed5266e

…put docs

zaneselvans merged commit 3c72fc2 into sprint27 Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebook land: intro notebooks for CEMS and output tables #823

Notebook land: intro notebooks for CEMS and output tables #823

aesharpe commented Nov 12, 2020

codecov bot commented Nov 12, 2020 •

edited

zaneselvans Nov 25, 2020

aesharpe Nov 25, 2020

zaneselvans Nov 25, 2020

zaneselvans Nov 25, 2020

aesharpe Nov 26, 2020

zaneselvans Nov 25, 2020

zaneselvans Nov 25, 2020

aesharpe Nov 26, 2020

zaneselvans Nov 25, 2020

aesharpe Nov 26, 2020

zaneselvans Nov 25, 2020

zaneselvans Nov 25, 2020

zaneselvans Nov 25, 2020

zaneselvans Nov 25, 2020

zaneselvans Nov 25, 2020

zaneselvans Nov 25, 2020

aesharpe Nov 26, 2020

Notebook land: intro notebooks for CEMS and output tables #823

Notebook land: intro notebooks for CEMS and output tables #823

Conversation

aesharpe commented Nov 12, 2020

codecov bot commented Nov 12, 2020 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 12, 2020 •

edited