# Filename parsing

## Introduction

piece of code that delves into the intricacies of space weather monitoring using data from the China Seismo-Electromagnetic Satellite (CSES). 

## Project Setup

The project directory, named CSES_files, serves as the repository for our data files. These files, stored in `HDF5` format, contain valuable measurements from various instruments aboard the CSES satellite. The initial step involves importing the necessary libraries:

In [None]:
import os
import geopandas as gpd
import pandas as pd
import numpy as np
from datetime import datetime, timezone
import h5py
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import xarray as xr
import xarray
from shapely import geometry
from glob import glob

Each library plays a crucial role in data manipulation, visualization, and geographic data handling. For instance, `xarray` is used for handling **multi-dimensional arrays** efficiently, while `geopandas` provides tools for **geographic data manipulation**.

## File Paths and Dataset Handling

### define project directory and the names for different datasets

The code begins by defining the project directory and listing the `HDF5` files to be processed:

In [None]:
project_dir = "./CSES_files"

print(f"Percorso cartella di progetto: {project_dir}")

EFD1 = 'CSES_01_EFD_1_L02_A1_213330_20211206_164953_20211206_172707_000.h5'
HEP1 = 'CSES_01_HEP_1_L02_A4_176401_20210407_182209_20210407_190029_000.h5'
HEP4 = 'CSES_01_HEP_4_L02_A4_202091_20210923_184621_20210923_192441_000.h5'
LAP1 = 'CSES_01_LAP_1_L02_A3_174201_20210324_070216_20210324_073942_000.h5'
SCM1 = 'CSES_01_SCM_1_L02_A2_183380_20210523_154551_20210523_162126_000.h5'
HEPD = 'CSES_HEP_DDD_0219741_20220117_214156_20220117_230638_L3_0000267631.h5'


The `dataset` function is then defined to open an `xarray` dataset from a given file path:

In [None]:
# Function to open an xarray dataset from a given path
def dataset(path):
    return xarray.open_dataset(path, engine = 'h5netcdf', phony_dims = 'sort')

# Function to list all variable names in a dataset
def variables(data):
    return list(data.keys())

### List of file paths to be processed

In [None]:
file_list = [
    os.path.join(project_dir, EFD1),
    os.path.join(project_dir, HEP1),
    os.path.join(project_dir, HEP4),
    os.path.join(project_dir, LAP1),
    os.path.join(project_dir, SCM1),
    os.path.join(project_dir, HEPD)
]

# Redefine the dataset function to open xarray datasets
def dataset(path):
    return xarray.open_dataset(path, engine = 'h5netcdf', phony_dims = 'sort')

# Redefine the variables function to list all variable names in a dataset
def variables(data):
    return list(data.keys())


## Extracting Metadata

To understand the data better, the code extracts `metadata` such as `start` and `end dates`, and `orbit numbers` from the filenames. This is achieved using the `extract_dates` and `extract_orbit` functions:

### Extract start and end dates from a file name