Skip to content

catalyst-cooperative/pudl-examples

Repository files navigation

PUDL Example Notebooks

This repository contains a collection of Jupyter notebooks with examples of how to use the data and software distributed by Catalyst Cooperative's Public Utility Data Liberation (PUDL) project.

Run PUDL Notebooks on Kaggle

The easiest way to get up and running with these examples and a fresh copy of all the PUDL data is on Kaggle.

Kaggle offers substantial free computing resources and convenient data storage, so you can start playing with the PUDL data without needing to set up any software or download any data.

You'll find the PUDL data dictionary helpful for interpreting the data.

Running Jupyter locally

If you're already familiar with git, Python environments, filesystem paths, and running upyter notebooks locally, you can also work with these notebooks and the PUDL data locally:

  • Create a Python environment that includes common data science packages. We like to use the mamba package manager and the conda-forge channel.
  • Clone this repository.
  • Download the PUDL dataset from Kaggle (it's ~8GB!) and unzip it somewhere conveniently accessible from the notebooks in the cloned repo.
  • Start your JupyterLab or Jupyter Notebook server and navigate to the notebooks in the cloned repo.
  • You'll need to adjust the file paths in the notebooks to point at the directory where you put the PUDL data, and might need to adjust the packages installed in your Python environment to work with the notebooks.

Other Data Access Methods

See the PUDL documentation for other data access methods.

If you're familiar with cloud services, you can check out:

  • The AWS Open Data Registry: s3://pudl.catalyst.coop (free access)
  • Google Cloud Storage: gs://pudl.catalyst.coop (requester pays)

Stalk us on the Internet