# Notebook: SECS Data Extraction and Preprocessing

## Description

This notebook extracts SECS (Spherical Elementary Current Systems) data from CSV files (`Be`, `Bn`, `Bu`) located in a specified data folder. The data is extracted around specified midpoint coordinates (`x_mid`, `y_mid`) with a defined region (`W`, `H`). The processed data arrays (`Be`, `Bn`, `Bu`) are saved into a structured CSV file (`target.csv`) suitable for further analysis or machine learning tasks.

### Table of Contents

1. [Setup and Path Definitions](#setup-and-path-definitions)
2. [Dimension and Region Specification](#dimension-and-region-specification)
3. [Utility Functions](#utility-functions)
4. [Data Loading and Extraction](#data-loading-and-extraction)
5. [Saving Extracted Data](#saving-extracted-data)

In [1]:
## 1. Setup and Path Definitions

import numpy as np
import pandas as pd
from pathlib import Path
from datetime import datetime

# Define working and output directories
data_path = Path('/Users/akv020/Tensorflow/fennomag-net/data/secs/test_mode_2024')
output_path = Path('/Users/akv020/Tensorflow/fennomag-net/source/preprocess')

In [2]:
## 2. Dimension and Region Specification

# Midpoint coordinates for extraction
x_mid, y_mid = 10, 10

# Region dimensions (radius around midpoint)
W, H = 3, 3

# Python indexing starts at 0; define slice regions appropriately
dx = slice(x_mid - W, x_mid + W + 1)
dy = slice(y_mid - H, y_mid + H + 1)

In [3]:
## 3. Utility Functions

# Utility function to read CSV and extract a region
def read_and_extract(file_path, dx, dy):
    df = pd.read_csv(file_path, header=None)
    data_region = df.values[dx, dy]
    return data_region

# Function to extract datetime from filename
def extract_datetime(filename):
    parts = filename.stem.split('_')
    return datetime.strptime(parts[1] + parts[2], '%Y%m%d%H%M%S')

In [4]:
## 4. Data Loading and Extraction

# Get sorted list of files
Be_files = sorted((data_path / 'Be').glob('*.csv'))
Bn_files = sorted((data_path / 'Bn').glob('*.csv'))
Bu_files = sorted((data_path / 'Bu').glob('*.csv'))

# Initialize a list to store rows of data
data_rows = []
total_files = len(Be_files)
progress_step = max(total_files // 100, 1)

# Process files (assuming identical timestamps for Be, Bn, Bu)
for idx, (be_file, bn_file, bu_file) in enumerate(zip(Be_files, Bn_files, Bu_files), 1):
    if idx % progress_step == 0:
        print(f"Processing file {idx}/{total_files} ({(idx/total_files)*100:.1f}%)")

    timestamp = extract_datetime(be_file)

    be_region = read_and_extract(be_file, dx, dy).flatten()
    bn_region = read_and_extract(bn_file, dx, dy).flatten()
    bu_region = read_and_extract(bu_file, dx, dy).flatten()

    row = [timestamp] + be_region.tolist() + bn_region.tolist() + bu_region.tolist()
    data_rows.append(row)

Processing file 5270/527040 (1.0%)
Processing file 10540/527040 (2.0%)
Processing file 15810/527040 (3.0%)
Processing file 21080/527040 (4.0%)
Processing file 26350/527040 (5.0%)
Processing file 31620/527040 (6.0%)
Processing file 36890/527040 (7.0%)
Processing file 42160/527040 (8.0%)
Processing file 47430/527040 (9.0%)
Processing file 52700/527040 (10.0%)
Processing file 57970/527040 (11.0%)
Processing file 63240/527040 (12.0%)
Processing file 68510/527040 (13.0%)
Processing file 73780/527040 (14.0%)
Processing file 79050/527040 (15.0%)
Processing file 84320/527040 (16.0%)
Processing file 89590/527040 (17.0%)
Processing file 94860/527040 (18.0%)
Processing file 100130/527040 (19.0%)
Processing file 105400/527040 (20.0%)
Processing file 110670/527040 (21.0%)
Processing file 115940/527040 (22.0%)
Processing file 121210/527040 (23.0%)
Processing file 126480/527040 (24.0%)
Processing file 131750/527040 (25.0%)
Processing file 137020/527040 (26.0%)
Processing file 142290/527040 (27.0%)
Pr

In [5]:
# Define column names based on region dimensions
region_indices = [(i, j) for i in range(-W, W+1) for j in range(-H, H+1)]
columns = ['DateTime']

# Add component names with offsets to columns
for comp in ['Be', 'Bn', 'Bu']:
    for idx in region_indices:
        columns.append(f"{comp}_{idx[0]}_{idx[1]}")

# Create DataFrame and save as CSV with single precision
result_df = pd.DataFrame(data_rows, columns=columns)
result_df.to_csv(data_path / 'target_2D.csv', index=False, float_format='%.6g')

print("Data extraction and saving to 'target.csv' completed.")

Data extraction and saving to 'target.csv' completed.


In [6]:
dx

slice(7, 14, None)

In [7]:
dy

slice(7, 14, None)

In [8]:
pwd

'/Users/akv020/Tensorflow/fennomag-net/source/preprocess'

In [None]:
# Conceptual structure
secs_data = np.array(shape=(n_timestamps, 21, 21, 3), dtype=np.float32)
# Where:
# - n_timestamps: number of 1-minute intervals in the dataset
# - 21x21: spatial grid
# - 3: magnetic field components (Be, Bn, Bu)