<img width="50" src="https://carbonplan-assets.s3.amazonaws.com/monogram/dark-small.png" style="margin-left:0px;margin-top:20px"/>

# IIASA to Parquet

_by Joe Hamman (CarbonPlan), July 1, 2020_

This notebook converts IIASA CSV and DAT files to Parquet format and stages them
in a Google Cloud Storage bucket.

**Inputs:**

- various data files downloaded from IIASA website (manual process).

**Outputs:**

- One Parquet dataset per local data file:
  `gs://carbonplan-data-restricted/raw/iiasa/<name>.parquet`

**Notes:**

- No reprojection or processing of the data is done in this notebook.


In [None]:
import io
import os.path
import pathlib

import gcsfs
import pandas as pd

# run `gcloud auth login` on the command line, or try switching token to `browser`
fs = gcsfs.GCSFileSystem(
    project="carbonplan",
    token="/Users/jhamman/.config/gcloud/legacy_credentials/joe@carbonplan.org/adc.json",
)

In [None]:
source_dir = pathlib.Path("../../carbonplan_data/iiasa/SSP_CMIP6_201811.csv/")
blob_prefix = "carbonplan-data-restricted/raw/iiasa/SSP_CMIP6_201811"
csvs = source_dir.glob("*csv")

for csv in csvs:
    blob = f"{blob_prefix}/{csv.stem.lower()}.parquet"
    print(blob)

    df = pd.read_csv(csv)
    df.to_parquet(
        blob, compression="gzip", open_with=fs.open, engine="fastparquet"
    )

In [None]:
source_dir = pathlib.Path("../../carbonplan_data/iiasa/SSP_IAM_V2_201811.csv/")
blob_prefix = "carbonplan-data-restricted/raw/iiasa/SSP_IAM_V2_201811"
csvs = source_dir.glob("*csv")

for csv in csvs:
    blob = f"{blob_prefix}/{csv.stem.lower()}.parquet"
    print(blob)

    df = pd.read_csv(csv)
    df.to_parquet(
        blob, compression="gzip", open_with=fs.open, engine="fastparquet"
    )

In [None]:
source = "../../carbonplan_data/iiasa/SspDb_compare_regions_2013-06-12.csv"
blob = "carbonplan-data-restricted/raw/iiasa/SspDb_compare_regions_2013-06-12.parquet"
df = pd.read_csv(source)
df.to_parquet(blob, compression="gzip", open_with=fs.open, engine="fastparquet")

In [None]:
source = "../../carbonplan_data/iiasa/SspDb_country_data_2013-06-12.csv"
blob = (
    "carbonplan-data-restricted/raw/iiasa/SspDb_country_data_2013-06-12.parquet"
)
df = pd.read_csv(source)
df.to_parquet(blob, compression="gzip", open_with=fs.open, engine="fastparquet")

In [None]:
# TODO:

- write parser for RCP DAT files.