# Data matching
---

Experimenting with matching data from:
- Global Energy Monitor (GEM)'s [Global Coal Plant Tracker](https://www.globalenergymonitor.org/coal.html)
- USA's [CAMPD emissions data](https://campd.epa.gov/data)
- OSM's [cooling_tower](https://wiki.openstreetmap.org/wiki/Tag:man_made%3Dcooling_tower) tag

## Setup

### Imports

In [None]:
import overpy
import pandas as pd
import geopandas as gpd

In [None]:
from coal_emissions_monitoring.data_cleaning import load_clean_gcpt_gdf, load_clean_campd_facilities_gdf

### APIs

In [None]:
osm_api = overpy.Overpass()

### Parameters

In [None]:
# show all columns in pandas
pd.set_option("display.max_columns", None)

## Load data

### GEM Global Coal Plant Tracker

In [None]:
gcpt_df = load_clean_gcpt_gdf("/Users/adminuser/Downloads/Global-Coal-Plant-Tracker-January-2023.xlsx")
gcpt_df

### CAMPD facilities metadata

In [None]:
campd_facilities_df = load_clean_campd_facilities_gdf("/Users/adminuser/GitHub/ccai-ss23-ai-monitoring-tutorial/data/facility-attributes-2d71649a-2e7f-4fdf-abaa-e0529ce2fc62.csv")
campd_facilities_df

In [None]:
campd_facilities_df.capacity_mw.describe()

In [None]:
campd_facilities_df[campd_facilities_df["year"] == 2023].explore()

### CAMPD emissions data

In [None]:
# TODO
# campd_emissions_df = 

### OSM cooling_tower tag

In [None]:
osm_results = osm_api.query(
    query = """
area[name="United States"]->.searchArea;
(
  node["man_made"="cooling_tower"](area.searchArea);
  way["man_made"="cooling_tower"](area.searchArea);
  relation["man_made"="cooling_tower"](area.searchArea);
);
out body;
>;
out skel qt;
"""
)
osm_results

In [None]:
len(osm_results.nodes), len(osm_results.ways), len(osm_results.relations)

In [None]:
osm_results_df = pd.DataFrame(
    [
        {
            "osm_id": element.id,
            "latitude": element.lat,
            "longitude": element.lon,
        }
        for element in osm_results.nodes
    ]
)
# convert OSM results to geodataframe
osm_df = gpd.GeoDataFrame(
    osm_results_df,
    geometry=gpd.points_from_xy(osm_results_df.longitude, osm_results_df.latitude),
    crs="EPSG:4326",
)
osm_df

## Match data