-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook land: intro notebooks for CEMS and output tables #823
Conversation
…o generators where applicable.
Codecov Report
@@ Coverage Diff @@
## sprint27 #823 +/- ##
============================================
+ Coverage 70.31% 70.66% +0.34%
============================================
Files 39 40 +1
Lines 4871 4887 +16
============================================
+ Hits 3425 3453 +28
+ Misses 1446 1434 -12
Continue to review full report at Codecov.
|
notebooks/examples/CEMS.ipynb
Outdated
@@ -0,0 +1,1130 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see if we can use the automatically generated Jupyter table of contents instead of trying to maintain one by hand, which may get frustrating with edits to the notebook over time. It's currently a plugin, but will be integrated into the juplyterlab core in 3.0 (which is coming out momentarily)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooh yes! that would make a big difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't have it installed already the plugin is called jupyterlab-toc (you can install it directly from the plugin manager on the left-hand-side of JupyterLab -- it looks like a puzzle piece)
notebooks/examples/CEMS.ipynb
Outdated
@@ -0,0 +1,1130 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
geoplot isn't part of the pudl-dev environment, and doesn't seem to be getting used below. Do we need it? It caused this cell to fail for me.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that was leftover from some other maps I that didn't make the cut
notebooks/examples/CEMS.ipynb
Outdated
@@ -0,0 +1,1130 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will always want users to access only the root of the parquet dataset -- Dask and other tools that known how to work with Parquet understand how to work with the hierarchy / partitioning efficiently. See my edits to the notebook.
Reply via ReviewNB
notebooks/examples/CEMS.ipynb
Outdated
@@ -0,0 +1,1130 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little bit dangerous -- it'll trigger the full computation of any tasks that exist in the dataframe. In this case it works okay because you've only just read it in, and the number of records is stored in the parquet file metadata, but in general this could be very computationally intensive.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm good to know.
notebooks/examples/CEMS.ipynb
Outdated
@@ -0,0 +1,1130 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also do dd.dtypes
to get a list of both the column names, and their data types, which is a bit more informative.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better, agreed.
notebooks/examples/CEMS.ipynb
Outdated
@@ -0,0 +1,1130 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of iteration and aggregation is what Dask is designed to do automatically. If you point it at the top level of the dataset, it will do all this work for you seamlessly under the hood.
Reply via ReviewNB
notebooks/examples/CEMS.ipynb
Outdated
@@ -0,0 +1,1130 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man, is Texas really that bad? This seems like a crazy outlier. Might there be some bad data somewhere? It might be more illustrative to plot the GHG emissions per capita on a state by state basis, since you've got the census data handy already.
Reply via ReviewNB
notebooks/examples/CEMS.ipynb
Outdated
@@ -0,0 +1,1130 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah, these shapefiles are a little janky aren't they, since they include coastal waters not just the land boundaries. Let's use contextily
to get a nice basemap to display this info.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notebooks/examples/CEMS.ipynb
Outdated
@@ -0,0 +1,1130 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For time-series based aggregation, take a look at the pandas.Grouper documentation -- there's a bunch of time-specific functionality that knows how to work with datetime columns directly at whatever frequency you're interested in.
Reply via ReviewNB
notebooks/examples/CEMS.ipynb
Outdated
@@ -0,0 +1,1130 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could make a really cool comparative seasonal load chart here with seaborn
-- with a grid of different states and a statistical envelope showing the monthly load for each state across all of the years of data.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooo that would be cool!
…into notebook-land
This branch contains the work-in-progress notebooks related to RMI work and dataset tours.
CEMS_by_utility.ipynb
is what I used to aggregate CEMS data by utility for RMI. For now, you can ignore this (it was just a tool for me).explore-CEMS.ipynb
is the CEMS intro notebook, woo! This one is close to done (with a few commented out map portions RE: hawaii and alaska that I wasn't sure how to adapt to the new map)explore-output-tables.ipynb
is the beginnings of an intro to output tables book. This one still needs lots of work / direction / correction. Should I give examples of all the tables or should it just serve as a hub of information about the tables (either permanently or until we get meta-data for the output tables).