# IREM Data Conversion with the `radem` library


## 1. Download IREM data (optional)

In this section, we will guide you through the process of downloading IREM data (if you haven't downloaded it already).

- This cell supports Linux-based operating systems, for other OSes you can download the data manually.
- We will use the `wget` command to fetch the data from the official [IREM data repository](http://srem.psi.ch/datarepo/V0/irem/).
- The data will be organized into directories for original raw data, and extracted CDF data. You can modify the `DATA_DIR` variable to change the location of the data.
- Additionally, we will ensure that existing files are not downloaded again to save time and bandwidth.
- Running this cell the first time can take a while, grab a coffee.

In [None]:
%%sh
# Data directory to store the data
DATA_DIR="../data/irem"

# Creating data directories
mkdir -p ${DATA_DIR}
mkdir -p ${DATA_DIR}/extracted

# Create a symlink to the raw directory
DATA_RAW_DIR=${DATA_DIR}/raw
if [ ! -L "$DATA_RAW_DIR" ]; then
    ABS_DATA_RAW_DIR=$(readlink -f ${DATA_RAW_DIR})
    ABS_DATA_DIR=$(readlink -f ${DATA_DIR})
    ln -s ${ABS_DATA_DIR}/srem.psi.ch/datarepo/V0/irem ${ABS_DATA_RAW_DIR}
fi

# Get data recursively, don't download existing files
wget \
    --recursive \
    --no-parent \
    --continue \
    --no-clobber \
    --no-verbose \
    -A gz \
    http://srem.psi.ch/datarepo/V0/irem/ \
    -P ${DATA_DIR} \
    2> ${DATA_DIR}/wget.log # Redirect wget output to a log file to avoid cluttering the notebook

# Remove summary plots dir which we don't care about
rm -rf ${DATA_DIR}/irem/raw/summaryplots

## 2. Notebook setup

In [32]:
import os
from pathlib import Path
from typing import List
import gzip

DATA_DIR = Path("../data/irem")
DATA_RAW_DIR = DATA_DIR / "raw"
DATA_EXTRACTED_DIR = DATA_DIR / "extracted"

## 2. Extract CDF data

In [33]:
def get_data_raw_filenames(data_raw_dir: Path) -> List[Path]:
    filenames = [data_raw_dir / dirname / filename
                    for dirname in os.listdir(data_raw_dir)
                    for filename in os.listdir(data_raw_dir / dirname)
                    if filename.endswith(".cdf.gz")]
    filenames_sorted = sorted(filenames)
    return filenames_sorted

def extract_data_raw_file(input_filename: Path, output_filename: Path) -> None:
    with open(input_filename, 'rb') as f_in:
        with gzip.open(f_in) as f_decompressed, open(output_filename, 'wb') as f_out:
            f_out.write(f_decompressed.read())

def extract_data_raw_files(data_raw_filenames: List[Path], data_extracted_dir: Path) -> None:
    for filename in data_raw_filenames:
        output_filename = data_extracted_dir / filename.stem
        if not output_filename.exists():
            print(f"Extracting {filename} to {output_filename}")
            extract_data_raw_file(filename, output_filename)
        else:
            print(
                f"Skipping extracting {output_filename} - already exists.")

extract_data_raw_files(
    get_data_raw_filenames(DATA_RAW_DIR),
    DATA_EXTRACTED_DIR)

Skipping extracting ../data/irem/extracted/IREM_PACC_20021017.cdf - already exists.
Skipping extracting ../data/irem/extracted/IREM_PACC_20021018.cdf - already exists.
Skipping extracting ../data/irem/extracted/IREM_PACC_20021019.cdf - already exists.
Skipping extracting ../data/irem/extracted/IREM_PACC_20021020.cdf - already exists.
Skipping extracting ../data/irem/extracted/IREM_PACC_20021021.cdf - already exists.
Skipping extracting ../data/irem/extracted/IREM_PACC_20021022.cdf - already exists.
Skipping extracting ../data/irem/extracted/IREM_PACC_20021023.cdf - already exists.
Skipping extracting ../data/irem/extracted/IREM_PACC_20021024.cdf - already exists.
Skipping extracting ../data/irem/extracted/IREM_PACC_20021025.cdf - already exists.
Skipping extracting ../data/irem/extracted/IREM_PACC_20021026.cdf - already exists.
Skipping extracting ../data/irem/extracted/IREM_PACC_20021027.cdf - already exists.
Skipping extracting ../data/irem/extracted/IREM_PACC_20021028.cdf - already 

In [None]:
## 3. 