# Merge_all — Build a Unified Dataset for **Maneuvers** (single-boat)

This notebook aggregates **validated maneuver intervals** (from `summary.json`) across all runs into a single, analysis-ready CSV.  
It **does not compare two boats**: each maneuver slice comes from one boat’s log and is annotated with run/rider metadata.

---

## Inputs

- **`summary.json`** — produced by `mainCOG`; for each run it lists validated maneuvers with:
  - `maneuver_index`, `maneuver_type`, `start_time`, `end_time`, `duration`, …
- **Data root** (e.g., `../Data_Sailnjord/Maneuvers`) with folders: /date/person/run/single_file.csv
Each run folder must contain **exactly one CSV**.

- **Helper**: `report_fct.filter_interval(df, start, end)` to clip time windows.

---

## What the code does

1. **Load `summary.json`** and iterate all runs & their maneuver intervals.  
2. **Open the run’s CSV** and **slice** rows to `[start_time, end_time]`.  
3. **Annotate** each sliced row with:
 - `run`, `rider_name`, `boat_name`, `maneuver_index`, `maneuver_type`,  
   `interval_duration`, `start_time`, `end_time`.
4. **Recompute rigging-line features** (when `Line_L/Line_R/Line_C` exist):
 - Sort per-row magnitudes → `Line_R2`(min), `Line_L2`(mid), `Line_C2`(max)  
 - `side_line2 = Line_L2 + Line_R2`  
 - `total_line2 = side_line2 + Line_C2`
5. **Concatenate all slices**, sort by `SecondsSince1970`, and **export**.

---

## Output

- **`all_data.csv`** — long, tidy table of maneuver rows across all runs containing telemetry
(`SecondsSince1970`, `Lat`, `Lon`, `SOG`, `VMG`, `COG`, `TWA`, `TWD`, `TWS`, `Heel_Lwd`, …),
maneuver metadata, and optional derived line metrics (`Line_*2`, `side_line2`, `total_line2`).


In [1]:
import os
import json
import pandas as pd
import numpy as np
from report_fct import filter_interval
import pandas as pd

def build_csv_from_summary(summary_path, data_root, output_csv="all_data.csv"):
    # Load the summary JSON
    with open(summary_path, "r") as f:
        summary = json.load(f)

    all_rows = []

    # Iterate over each run entry in the summary
    for run_entry in summary:
        run_name = run_entry["run"]
        date = run_entry["date"]  # e.g., "08_06"
        person = run_entry["person"]  # e.g., "Gian"
        intervals = run_entry["intervals"]

        # Ensure intervals is a list
        if not isinstance(intervals, list):
            print(f"⚠️ Invalid interval data for {run_name}: expected a list, found {type(intervals)}")
            continue

        # Build the run folder path using date, person, and run name
        run_path = os.path.join(data_root, date, person, run_name)
        # Check if the run folder exists
        if not os.path.isdir(run_path):
            print(f"⚠️ Run folder not found for {run_name} at {run_path}")
            continue

        # Search for the CSV file in the run folder
        csv_files = [f for f in os.listdir(run_path) if f.endswith(".csv")]
        if len(csv_files) != 1:
            print(f"⚠️ Skipping {run_name}: expected 1 CSV, found {len(csv_files)}")
            continue

        # Build the full path to the CSV file
        csv_path = os.path.join(run_path, csv_files[0])

        # Read the CSV file into a DataFrame
        try:
            df = pd.read_csv(csv_path)
        except Exception as e:
            print(f"⚠️ Error reading CSV file {csv_path}: {e}")
            continue

        # Process each interval in the run
        try:
            print(f"✔ Processing run: {run_name}, total intervals: {len(intervals)}")

            for i, interval in enumerate(intervals):
                start, end = interval["start_time"], interval["end_time"]
                df_filtered = filter_interval(df, start, end)  # Pass the DataFrame, not the path
                # Add necessary metadata to each row
                df_filtered["run"] = run_name
                df_filtered["rider_name"] = person
                df_filtered["boat_name"] = csv_files[0].replace(".csv", "")
                df_filtered["maneuver_index"] = interval["maneuver_index"]
                df_filtered["maneuver_type"] = interval["maneuver_type"]
                df_filtered["interval_duration"] = interval["duration"]
                df_filtered["start_time"] = interval["start_time"]
                df_filtered["end_time"] = interval["end_time"]
                
                # Réaffectation arbitraire des lignes en triant les valeurs
                lines = df_filtered[["Line_C", "Line_L", "Line_R"]].values
                sorted_lines = np.sort(lines, axis=1)  # Sorting row-wise

                # Creating new columns with sorted line values
                df_filtered["Line_R2"] = sorted_lines[:, 0]  # Smallest
                df_filtered["Line_L2"] = sorted_lines[:, 1]  # Middle
                df_filtered["Line_C2"] = sorted_lines[:, 2]  # Largest
                df_filtered["side_line2"] = df_filtered["Line_L2"] + df_filtered["Line_R2"]
                df_filtered["total_line2"] = df_filtered["side_line2"] + df_filtered["Line_C2"]
                
                all_rows.append(df_filtered)

        except Exception as e:
            print(f"❌ Error processing run {run_name}, interval {i + 1}: {type(e).__name__} - {e}")
            continue

    # Final save
    if not all_rows:
        print("❌ No valid data found.")
        return

    # Combine all rows into a single DataFrame and save to CSV
    df_global = pd.concat(all_rows, ignore_index=True)
    df_global = df_global.sort_values(by='SecondsSince1970', ascending=True)
    df_global.to_csv(output_csv, index=False)
    print(f"✅ Global CSV saved to: {output_csv}")

In [2]:
build_csv_from_summary(
    summary_path="summary.json",
    data_root="../Data_Sailnjord/Maneuvers",
    output_csv="all_data.csv"
)

✔ Processing run: 08_06_Run1, total intervals: 12
✔ Processing run: 08_06_Run2, total intervals: 11
✔ Processing run: 08_06_Run3, total intervals: 11
✔ Processing run: 08_06_Run4, total intervals: 10
✔ Processing run: 08_06_Run5, total intervals: 11


✔ Processing run: 08_06_Run1, total intervals: 13
✔ Processing run: 08_06_Run2, total intervals: 11
✔ Processing run: 08_06_Run3, total intervals: 12
✔ Processing run: 08_06_Run4, total intervals: 11
✔ Processing run: 08_06_Run5, total intervals: 11


✔ Processing run: 08_06_Run6, total intervals: 11
✔ Processing run: 11_06_Run1, total intervals: 12
✔ Processing run: 11_06_Run2, total intervals: 11
✔ Processing run: 11_06_Run3, total intervals: 12
✔ Processing run: 11_06_Run4, total intervals: 11


✔ Processing run: 11_06_Run5, total intervals: 11
✔ Processing run: 11_06_Run1, total intervals: 11
✔ Processing run: 11_06_Run2, total intervals: 12
✔ Processing run: 11_06_Run3, total intervals: 12
✔ Processing run: 11_06_Run4, total intervals: 10


✔ Processing run: 11_06_Run5, total intervals: 12
✔ Processing run: 11_06_Run6, total intervals: 12


✅ Global CSV saved to: all_data.csv
