# AddInfoToSummary – Enrichment of Straight Line Summary with Interview Data

This notebook takes the initial straight line summary (`summary.json`, produced by `MainCOG.ipynb`) and **enriches it with rider and equipment information** extracted from interview files.

### Workflow
1. **Load interview data**  
   - Scans each session’s `Interview and equipment` folder.  
   - Reads `.xlsx` files containing rider and equipment information.  
   - Standardizes rider names (e.g., `Gian` → `Gian Stragiotti`, `Karl` → `Karl Maeder`).  

2. **Match interview data with runs**  
   - For each run and each leg (upwind or downwind) listed in `summary.json`, the notebook matches corresponding interview entries based on:  
     - Rider name  
     - Run number  
     - Leg index  
     - (Optionally) closest timestamp to the interval start time.  

3. **Extract and map equipment information**  
   - **Total weight** of the rider and gear.  
   - **Master leeward** configuration.  
   - **Mast brand** (mapped from codes: `0 → Levi`, `1 → Chub`).  

4. **Update summary**  
   - Each interval in the summary is enriched with additional fields for both boats:  
     - `boatX_total_weight`  
     - `boatX_master_leeward`  
     - `boatX_mast_brand`  

5. **Output**  
   - Produces a new file: `summary_enriched.json`.  
   - This enriched summary is used as input for subsequent data merging (`merge_all.ipynb`) and analyses.

This notebook is the **second step of the pipeline**, connecting raw interval detection with contextual rider/equipment data from interviews.


In [1]:
import os
import json
import pandas as pd

In [2]:
# Configuration des chemins
base_dir = "../Data_Sailnjord/Straight_lines"
summary_file = "summary.json"
output_file = "summary_enriched.json"


In [3]:
def load_interview_data(interview_dir):
    name_map = {
        "Gian": "Gian Stragiotti",
        "Karl": "Karl Maeder",
        "SenseBoard": "SenseBoard"
    }

    dfs = []
    for file in os.listdir(interview_dir):
        if file.endswith(".xlsx"):
            key = file.replace("Interview ", "").replace(".xlsx", "").split()[0]
            name = name_map.get(key, key)
            df = pd.read_excel(os.path.join(interview_dir, file))
            df["Name"] = name
            dfs.append(df)
    return pd.concat(dfs, ignore_index=True) if dfs else pd.DataFrame()

def get_boat_info(df, boat_name, run_idx, leg_idx, interval_start_time=None):
    candidates = df[
        (df["Name"].str.contains(boat_name, case=False)) &
        (df["Run"] == run_idx + 1) &
        (df["Leg U=1, D=2"] == leg_idx + 1)
    ].copy()

    if candidates.empty:
        return {
            "total_weight": None,
            "master_leeward": None,
            "mast_brand": None
        }

    if interval_start_time and "Timestamp" in candidates.columns:
        candidates["abs_diff"] = (candidates["Timestamp"] - interval_start_time).abs()
        candidates = candidates.sort_values("abs_diff")

    row = candidates.iloc[0]

    # Mapping 0/1 vers "Levi"/"Chub"
    brand_map = {0: "Levi", 1: "Chub"}
    raw_brand = row.get("Mast brand (0=Levi,1=Chub)", None)
    mast_brand = brand_map.get(int(raw_brand)) if pd.notnull(raw_brand) else None

    return {
        "total_weight": row.get("Total weight", None),
        "master_leeward": bool(row.get("Master leeward (1)", False)),
        "mast_brand": mast_brand
    }


In [4]:
with open(summary_file, "r") as f:
    summary_data = json.load(f)

for date_folder in sorted(os.listdir(base_dir)):
    date_path = os.path.join(base_dir, date_folder)
    if not os.path.isdir(date_path) or "Interview" in date_folder:
        continue

    interview_dir = os.path.join(date_path, "Interview and equipment")
    if not os.path.exists(interview_dir):
        print(f"📂 Dossier d’interview manquant pour {date_folder}")
        continue

    interview_df = load_interview_data(interview_dir)

    for run in summary_data:
        if run["run"].startswith(date_folder):
            try:
                run_number = int(run["run"].split("_Run")[1]) - 1
            except:
                continue
            for leg_idx, interval in enumerate(run["intervals"]):
                for b in [1, 2]:
                    boat = interval.get(f"boat{b}_name", "")
                    info = get_boat_info(interview_df, boat, run_number, leg_idx)
                    interval[f"boat{b}_total_weight"] = info["total_weight"]
                    interval[f"boat{b}_master_leeward"] = info["master_leeward"]
                    interval[f"boat{b}_mast_brand"] = info["mast_brand"]


with open(output_file, "w") as f:
    json.dump(summary_data, f, indent=2)

print(f"✅ Résumé enrichi sauvé dans {output_file}")

✅ Résumé enrichi sauvé dans summary_enriched.json
