Throughout the summer of 2019, I was a Data Science Intern at the World Resources Institute (WRI)’s Climate Program supporting Power Explorer, a project seeking to provide open access to global data on power production and its impacts. This repository contains some sample code from the work I produced. The full codebase will be available open-source in the near future.
Python scripts using web scraping techniques, including packages such as bs4
and re
, to automate data extraction, transformation, and integration into WRIs Global Power Plant Database. Web scraped over 40,000 data points on CO2 emissions at the power plant level from multiple countries.
Sample Jupyter Notebooks included:
- Australia (
australia_dataset_parsing.ipynb
) - European Union (
JRC_Power_Plants.ipynb
)
Countries scraped but not included in sample:
- United States
- Canada
- India
co2_prediction_models.ipynb
: Machine learning algorithms to predict CO2 emissions for all thermal power plants world- wide using the data previously extracted. Using a min-max scaled dataset, achieved a coefficient of determination of 0.975 and mean squared error of 0.000171 on the test set.