# ACE Exploration

ACE (Advanced Composition Explorer) is equipped with nine scientific instruments to make comprehensive and coordinated in situ measurements. These instruments are categorized into two groups: High Resolution Spectrometers and Monitoring Instruments.

## High Resolution Spectrometers
- **CRIS** - Cosmic Ray Isotope Spectrometer
- **SIS** - Solar Isotope Spectrometer
- **ULEIS** - Ultra Low Energy Isotope Spectrometer
- **SEPICA** - Solar Energetic Particle Ionic Charge Analyzer
- **SWICS** - Solar Wind Ion Composition Spectrometer
- **SWIMS** - Solar Wind Ion Mass Spectrometer

## Monitoring Instruments
- **EPAM** - Electron, Proton and Alpha Monitor
- **SWEPAM** - Solar Wind Electron, Proton and Alpha Monitor
- **MAG** - Magnetic Field Monitor

All open-source ACE data are formatted using hierarchical data format (HDF). The data are organized by instrument and by time-averaging periods. Each instrument's data are stored in separate HDF data files, and separate HDF files also contain the data from the different averaging periods. For most of the instruments, the data are averaged hourly, daily, and per 27 days (1 Bartels rotation).

## About Hierarchical Data Formats
Hierarchical Data Formats (HDF) are open source file formats that support large, complex, heterogeneous data. HDF files use a “file directory” like structure that allows you to organize data within the file in many different structured ways, as you might do with files on your computer. HDF files also allow for embedding of metadata making them self-describing.

---

## Analytical Questions
How can we apply novel dimension reduction methods, such as PCA, TSNE, etc., to obtain informative solar wind in-situ data representation in low-dimensional space? How can this low-dimensional representation provide better 2D/3D visualization support than traditional dimension reduction techniques?

## Libraries and global variables

In [2]:
import sys

sys.path.append("../src/scripts")
from utilities import (
    parse_hdf_data,
    flag_occurrences,
    visualize_flag,
    add_datetime_column,
)

ModuleNotFoundError: No module named 'matplotlib'

In [None]:
!pip install matplotlib

In [None]:
# global variables
MISSING_FLAG = -999.900

## Data Import

In [None]:
# read data
data_dir = "../data/ace/raw"
mag_df = parse_hdf_data(f"{data_dir}/MAG_data_1hr.txt")
swepam_df = parse_hdf_data(f"{data_dir}/SWEPAM_data_1hr.txt")
swics_df = parse_hdf_data(f"{data_dir}/SWICS_data_1day.txt")

In [None]:
# dtype conversion
for df in [mag_df, swepam_df, swics_df]:
    df[["year", "day", "hr", "min", "sec"]] = df[
        ["year", "day", "hr", "min", "sec"]
    ].astype(int)

In [None]:
# datetime conversion and drop redundant features
for df in [mag_df, swepam_df, swics_df]:
    add_datetime_column(df).drop(
        columns=["year", "day", "hr", "min", "sec", "fp_year", "fp_doy"],
        inplace=True,
        axis=1,
    )

## Data Cleaning

### Handling Missing Values

In [None]:
missing_rows = mag_df[mag_df.eq("-999.900").any(axis=1)]
flag_occurrences(mag_df, MISSING_FLAG).sort_values(
    ascending=False, by="Flag_Count"
)


## Exploratory Data Analysis

### Descriptives

In [None]:
# MAG data info
display(mag_df.info())
display(mag_df.describe())

In [None]:
# SWICS data info
display(swics_df.info())
display(swics_df.describe())

In [None]:
# SWEPAM data info
display(swepam_df.info())
display(swepam_df.describe())

### Univariate Analysis

### Multivariate Analysis

## Data Transformation

### Normalization and Standardization

### Handling Outliers

## Dimensionality Reduction

## Joins
- Is there anyway to informatively join these features? 

## Data Quality Checks

### Addressing missing values

In [None]:
missing_rows = mag_df[mag_df.eq("-999.900").any(axis=1)]
mag_df = count_flag_occurrences(mag_df, MISSING_FLAG).sort_values(ascending=False,by="Flag_Count")


## Self-Organizing Maps