# RADEM data transformations

This notebook demonstrates how to transform IREM data into a formats suitable for further analysis (DataFrame, HDF5, CSV).


## Prerequisites

- RADEM raw data

### Initializing the notebook

Fill the variables below with the paths to your data.

In [1]:
import radem
from datetime import date
from pathlib import Path

DATA_DIR = Path("../data/radem")

DATA_RAW_DIR = DATA_DIR / "raw"
DATA_EXTRACTED_DIR = DATA_DIR / "extracted"
DATA_HDF5_DIR = DATA_DIR / "hdf5"
DATA_CSV_DIR = DATA_DIR / "csv"


## Data transformations

### Reading CDFs (option 1)

In [2]:
sc_cdfs = radem.handlers.read_radem_science_cdfs(DATA_EXTRACTED_DIR)
hk_cdfs = radem.handlers.read_radem_housekeeping_cdfs(DATA_EXTRACTED_DIR)

print(len(sc_cdfs))
print(len(hk_cdfs))

604
580


### Reading CDFs (option 2)


In [3]:
sc_paths = radem.handlers.get_radem_science_cdf_paths(
    DATA_EXTRACTED_DIR,
    from_date=date(2023, 12, 1),
    to_date=date(2024, 1, 31))
hk_paths = radem.handlers.get_radem_housekeeping_cdf_paths(
    DATA_EXTRACTED_DIR,
    from_date=date(2023, 12, 1),
    to_date=date(2024, 1, 31))

sc_cdfs = radem.handlers.read_radem_cdfs(sc_paths)
hk_cdfs = radem.handlers.read_radem_cdfs(hk_paths)

print(len(sc_cdfs))
print(len(hk_cdfs))


78
78


### Reading CDFs (option 3)

In [4]:
sc_cdf = radem.handlers.read_radem_science_cdfs(
    DATA_EXTRACTED_DIR,
    from_date=date(2023, 12, 1),
    to_date=date(2024, 1, 31))
hk_cdf = radem.handlers.read_radem_housekeeping_cdfs(
    DATA_EXTRACTED_DIR,
    from_date=date(2023, 12, 1),
    to_date=date(2024, 1, 31))

print(len(sc_cdf))
print(len(hk_cdf))


78
78


### Fix and convert for further analysis

> 💡 This step merges, removes duplicates, sorts, and converts the data to a pandas DataFrame which simplifies further analysis and eliminates low-level issues with CDF files.


In [5]:
sc_df = radem.handlers.convert_radem_science_cdfs_to_df(sc_cdfs)

print(sc_df)

                     protons_bin_1  protons_bin_2  protons_bin_3  \
time                                                               
2023-12-01 00:00:29             14            190             19   
2023-12-01 00:01:29             12            214             23   
2023-12-01 00:02:29             15            198             18   
2023-12-01 00:03:29             14            205             22   
2023-12-01 00:04:29             12            159             18   
...                            ...            ...            ...   
2024-01-30 23:55:32             58            252             25   
2024-01-30 23:56:32             65            232             20   
2024-01-30 23:57:32             54            206             26   
2024-01-30 23:58:32             49            237             27   
2024-01-30 23:59:32             48            241             32   

                     protons_bin_4  protons_bin_5  protons_bin_6  \
time                                           

### Writing to HDF5

In [6]:
radem.handlers.write_hdf(sc_df, DATA_HDF5_DIR / "example.h5")

### Reading from HDF5

In [7]:
sc_df_hdf = radem.handlers.read_hdf(DATA_HDF5_DIR / "example.h5")

print(all(sc_df_hdf == sc_df))

True


### Writing to CSV (not recommended)

> ⚠️ CSV files are not efficient for storing large datasets, use HDF5 format.

> ⚠️ Tiny floating point errors may occur when writing / reading to CSV e.g. `1.4143094841930115` vs `1.4143094841930117`. If it matters to you, use HDF5 format.

In [8]:
radem.handlers.write_csv(sc_df, DATA_CSV_DIR / "example.csv")

### Reading from CSV

In [9]:
sc_df_csv = radem.handlers.read_csv(DATA_CSV_DIR / "example.csv")