Skip to content
@catalyst-cooperative

Catalyst Cooperative

Catalyst is a small data analysis cooperative working on electricity regulation and climate change.

Catalyst Cooperative is a data engineering and analysis consultancy, specializing in energy system and utility financial data. Our current focus is on the US electricity and natural gas sectors. We primarily serve non-profit organizations, academic researchers, journalists, climate policy advocates, public policymakers, and occasionally smaller business users.

We believe public data should be freely available and easy to use by those working in the public interest. Whenever possible, we release our software under the MIT License, and our data products under the Creative Commons Attribution 4.0 License

If you're interested in hiring us Email hello@catalyst.coop. Our current rate is $150/hr. We can often make acommodations for smaller/grassroots organizations and frequently collaborate with open source contributors.

Contact Us 💌

Services We Provide

  • Programmatic acquisition, cleaning, and integration of public data sources.
  • Data oriented software development.
  • Compilation of new machine-readable data sources from regulatory filings, legislation, and other public information.
  • Data warehousing and dashboard development.
  • Both ad-hoc and replicable production data analysis.
  • Translation of existing ad-hoc data wrangling workflows into replicable data pipelines written in Python.

Tools We Use 🔨 🔧

  • Python is our primary language for everything.
  • Pandas the swiss army knife of tabular data manipulation in Python.
  • Dask to scale up data wrangling tasks we do with Pandas beyond what can be done in memory.
  • Dagster for orchestrating and parallelizing our data pipelines.
  • SQLite for local storage and distribution of tabular, relational data.
  • Apache Parquet to persist larger data tables to disk.
  • JupyterLab for interactive data wrangling, exploration, and visualizations.
  • Pydantic for managing and validating settings and our collection of metadata.
  • Scikit Learn for entity matching between datasets and imputation of missing data.
  • Intake Data Catalogs to distribute versioned data like software via conda forge
  • Google BigQuery to warehouse finished data products for live access.
  • Zenodo provides long-term, programmatically accessible, versioned archives of all our raw inputs.
  • Sphinx for building our documentation, incorporating much of our structured metadata directly using Jinja templates.
  • Tox and pytest manage our testing environment and define our unit, integration, and data validation tests.
  • The Frictionless Framework as a standard interchange model for tabular data.
  • Tableau for producing dashboards and interactive data visualizations for client projects.
  • VS Code is our primary main code editor, ever more deeply integrated with GitHub.
  • pre-commit to enforce code formatting and style standards.
  • We use GitHub Actions to run our continuous integration and coordinate our nightly builds and data scraping jobs.

Tools We're Studying 🚧

  • duckdb as a performant, columnar, analysis oriented alternative to SQLite.
  • dbt to manage pure SQL data transformations where appropriate within our larger Python based workflows.
  • Pandera to specifiy dataframe schemas and data validations in conjunction with Dagster.
  • Hypothesis for more robust data-oriented unit testing.
  • Apache Superset as an open source alternative to Tableau.
  • SQLModel to more easily unify our metadata and database schema definitions with SQLAlchemy.

Research Collaborators 🧠

Organizational Friends & Allies 💞

Past Funders & Clients 💰 💵

Business & Employment 🌲 🌲

Catalyst is a democratic workplace and a member of the US Federation of Worker Cooperatives. We exist to help our members earn a decent living while working for a more just, livable, and sustainable world. Our income comes from a mix of grant funding and client work. We only work with mission-aligned clients.

We are an entirely remote organization, and have been since well before the coronavirus pandemic. Our members are scattered all across North America from Alaska to Mexico. We enjoy a great deal of autonomy and flexibility in determining our own work-life balance and schedules. Membership entails working a minimum of 1000 hours each year for the co-op.

As a small 100% employee-owned cooperative, we are able to compensate members through an unusual mix of wages and profit sharing, including:

  • An hourly wage (currently $36.75/hr)
  • Tax-deferred employer retirement plan contributions (proportional to wages, up to 25% of wages)
  • Tax-advantaged patronage dividends (proportional to hours worked, unlimited but subject to profitability)

We also reimburse ourselves for expenses related to maintaining a home office, and provide a monthly health insurance stipend.

Candidates must do at least 500 hours of contract work for the cooperative within over six months, at which point they will be considered for membership.

Check our website to see if we're recruiting new members.

Pinned

  1. pudl Public

    The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.

    Python 329 80

  2. An Intake catalog for distributing open energy system data liberated by Catalyst Cooperative.

    Python 4

  3. Tutorials and example Jupyter notebooks that help explain how to work with PUDL data.

    Jupyter Notebook 7 4

  4. A tool for converting FERC filings published in XBRL into SQLite databases

    Python 2

  5. pudl-scrapers Public archive

    Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.

    HTML 4 3

  6. pudl-zenodo-storage Public archive

    Tools for creating versioned archives of raw data on Zenodo using Frictionless data packages.

    Python 3 2

Repositories

  • pudl Public

    The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.

    Python 329 MIT 80 353 17 Updated Nov 30, 2022
  • ferc-xbrl-extractor Public

    A tool for converting FERC filings published in XBRL into SQLite databases

    Python 2 MIT 0 4 1 Updated Nov 29, 2022
  • pudl-zenodo-storage Public archive

    Tools for creating versioned archives of raw data on Zenodo using Frictionless data packages.

    Python 3 MIT 2 0 0 Updated Nov 29, 2022
  • pudl-scrapers Public archive

    Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.

    HTML 4 MIT 3 0 0 Updated Nov 29, 2022
  • staged-recipes Public

    A place to submit conda recipes before they become fully fledged conda-forge feedstocks

    Python 0 BSD-3-Clause 3,647 0 0 Updated Nov 29, 2022
  • pudl-catalog Public

    An Intake catalog for distributing open energy system data liberated by Catalyst Cooperative.

    Python 4 MIT 0 5 0 Updated Nov 29, 2022
  • rmi-ferc1-eia Public

    A collaboration with RMI to integrate FERC Form 1 and EIA CapEx and OpEx reporting

    Python 2 MIT 2 38 11 Updated Nov 29, 2022
  • rmi-energy-communities Public

    Partnership between Catalyst and RMI to identify energy communities as defined by the Inflation Reduction Act

    Python 1 MIT 0 0 5 Updated Nov 29, 2022
  • historical_weather Public

    Analysis of historical weather data from NOAAs Global Summary of the Day (GSOD)

    Jupyter Notebook 1 MIT 0 5 1 Updated Nov 29, 2022
  • pudl-usage-metrics Public

    A dagster ETL for collecting and cleaning PUDL usage metrics.

    Python 1 MIT 0 8 9 Updated Nov 29, 2022