# Greenhouse Gas Concentration Analysis  
**Data Source**:  
NOAA Global Monitoring Laboratory ([gml.noaa.gov](https://gml.noaa.gov/data/data.php?site=MLO&category=Greenhouse%2BGases))  

## Datasets  
| Gas | Measurement Type | URL | Units |  
|------|------------------|-----|-------|  
| CO2 (Carbon Dioxide) | Surface Flask | [Link](https://gml.noaa.gov/aftp/data/trace_gases/co2/flask/surface/txt/co2_mlo_surface-flask_1_ccgg_event.txt) | ppm (μmol/mol) | 
| CH4 (Methane) | Surface Flask | [Link](https://gml.noaa.gov/aftp/data/trace_gases/ch4/flask/surface/txt/ch4_mlo_surface-flask_1_ccgg_event.txt) | ppb (nmol/mol) |  
| N2O (Nitrous Oxide) | Surface Flask | [Link](https://gml.noaa.gov/aftp/data/trace_gases/n2o/flask/surface/txt/n2o_mlo_surface-flask_1_ccgg_event.txt) | ppb (nmol/mol) |  
| CO (Carbon Monoxide) | Surface Flask | [Link](https://gml.noaa.gov/aftp/data/trace_gases/co/flask/surface/txt/co_mlo_surface-flask_1_ccgg_event.txt) | ppb (nmol/mol) |  
| H2 (Hydrogen) | Surface Flask | [Link](https://gml.noaa.gov/aftp/data/trace_gases/h2/flask/surface/txt/h2_mlo_surface-flask_1_ccgg_event.txt) | ppb (nmol/mol) | 
| SF6 (Sulfur Hexafluoride) | Surface Flask | [Link](https://gml.noaa.gov/aftp/data/trace_gases/sf6/flask/surface/txt/sf6_mlo_surface-flask_1_ccgg_event.txt) | ppt (pmol/mol) |

## Key Notes  
- **`date` column**: datetime information in datetime64[ns] with dropped timezone info.
- **`value` column**: Gas concentration in dry air:  
  - CO2: ppm (μmol/mol)  
  - CH4/N2O/CO/H2: ppb (nmol/mol)  
  - SF6: ppt (pmol/mol)  
- **Missing data**: Coded as `-999.999`.  
- **Focus gas**: The combined dataset contains features for 6 greenhouse gases, but the full analysis focuses on **CH4 (methane)**.
- **Non-Greenhouse Gases**: CO (carbon monoxide) and H2 (hydrogen) are not greenhouse gases, as they do not absorb infrared radiation. They are included here because of their indirect influence on greenhouse gas dynamics (e.g., CO and H2 affect atmospheric chemistry and methane lifetime), but they are not modeled or forecast in this analysis.
- **Dataset**: The combined dataset containing "datetime" and "value" features for 6 greenhouse gases will be saved as "all_ghg_aligned.csv".

# Libraries

In [1]:
import pandas as pd
from pathlib import Path
import os

# Make Directories

In [2]:
output_dir = Path('data/processed')
output_dir.mkdir(parents=True, exist_ok=True)

# Load Datasets and Save CSV

In [3]:
# define dataset URLs
datasets = {
    'CH4': 'https://gml.noaa.gov/aftp/data/trace_gases/ch4/flask/surface/txt/ch4_mlo_surface-flask_1_ccgg_event.txt',
    'N2O': 'https://gml.noaa.gov/aftp/data/trace_gases/n2o/flask/surface/txt/n2o_mlo_surface-flask_1_ccgg_event.txt',
    'SF6': 'https://gml.noaa.gov/aftp/data/trace_gases/sf6/flask/surface/txt/sf6_mlo_surface-flask_1_ccgg_event.txt',
    'CO2': 'https://gml.noaa.gov/aftp/data/trace_gases/co2/flask/surface/txt/co2_mlo_surface-flask_1_ccgg_event.txt',
    'CO': 'https://gml.noaa.gov/aftp/data/trace_gases/co/flask/surface/txt/co_mlo_surface-flask_1_ccgg_event.txt',
    'H2': 'https://gml.noaa.gov/aftp/data/trace_gases/h2/flask/surface/txt/h2_mlo_surface-flask_1_ccgg_event.txt'
}
# Note: all datasets have the same features, excpet that CO lacks the "value_unc" feature.  This difference won't affect the 
# analysis since the only features that will be used are datetime and value (gas concentration).  

# function to load the raw datasets, keeping only datetime and value columns  
def load_gas_data(url, gas_type):
    df = pd.read_csv(url, sep=r'\s+', comment='#', header=0)

    # select features/columns
    cols_to_keep = ['datetime', 'value']
    df = df[cols_to_keep]

    # convert datetime column to date, without time info 
    df['date'] = pd.to_datetime(df['datetime']).dt.tz_localize(None).dt.normalize()
    df.drop(columns=['datetime'], inplace=True)
    
    # add a feature/column for gas type
    df['gas'] = gas_type

    return df

# load all datasets
df_list = [load_gas_data(url, gas) for gas, url in datasets.items()]

# concatenate all sets into a single dataframe
df_all = pd.concat(df_list)

# align data based on date (from datetime)
df_combined = df_all.pivot_table(index='date', columns='gas', values='value')

# reset the index
df_combined.reset_index(inplace=True)

# save a copy to csv
df_combined.to_csv(output_dir / 'all_ghg_aligned.csv', index=False)

df_combined.head()

gas,date,CH4,CO,CO2,H2,N2O,SF6
0,1969-08-20,,,-5.27,,,
1,1969-08-27,,,-2.1625,,,
2,1969-09-02,,,-9.115,,,
3,1969-09-12,,,320.945,,,
4,1969-09-24,,,320.89,,,
