# Score Calculation
## Body Angle Score
### Questions Need to Answer
a. Between-subject differences: whether participants differ in average body angle score

b. Within-subject fatigue effects: whether average body angle score declines in longer sessions. 

c. Speed effects: whether higher stroke per minute (SPM) reduces average body angle score.

### Mathematical definition
#### Body angle

Let:

- $b_c$: Body angle in **catch** position, where knee angle is minimum.  
  - **Gold standard:** $45^\circ \leq b_c \leq 65^\circ$

- $b_f$: Body angle in **finish** position, where elbow angle is minimum.  
  - **Gold standard:** $100^\circ \leq b_f \leq 120^\circ$

---

#### Score definitions

##### **Catch position score $s_c$**
Score of body angle in catch position, ranging from 0 to 10.

\begin{equation}
s_c =
\begin{cases}
10, & \text{if } 0 \le |b_c - 55| \le 10 \\
14 - 0.4|b_c - 55|, & \text{if } |b_c - 55| > 10
\end{cases}
\end{equation}

- Every 5¬∞ deviation from the gold standard range ‚Üí minus 2 points.

---

##### **Finish position score $s_f$**
Score of body angle in finish position, ranging from 0 to 10.

\begin{equation}
s_f =
\begin{cases}
10, & \text{if } 0 \le |b_f - 110| \le 10 \\
14 - 0.4|b_f - 110|, & \text{if } |b_f - 110| > 10
\end{cases}
\end{equation}

- Every 5¬∞ deviation from the gold standard range ‚Üí minus 2 points.

---

#### Total body angle score $s_B$

Total score of body angle for each stroke, ranging from 0 to 10.

\begin{equation}
s_B = 0.5 s_c + 0.5 s_f
\end{equation}

### Demo (Body Score)

In [113]:
import pandas as pd
import numpy as np
from pathlib import Path

# ========= USER CONFIG =========
CSV_PATH = Path(r"/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/angle_relx_cleaned_id/1_low/01_1_low_pose2d_angles_relx_cleaned.csv")
OUT_CSV  = Path(r"/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/body_angle_scores_per_stroke.csv")

# ========= 1) Load and sort data =========
df = pd.read_csv(CSV_PATH)
sort_cols = [c for c in ["video", "frame", "time_ms"] if c in df.columns]
df = df.sort_values(sort_cols if sort_cols else df.columns.tolist()).reset_index(drop=True)

# ========= 2) Check required columns =========
required = ["stroke_id", "angle_right_knee", "angle_right_elbow", "angle_right_body"]
missing = [c for c in required if c not in df.columns]
if missing:
    raise ValueError(f"‚ùå Missing required columns: {missing}")

# ========= 3) Scoring functions =========
def score_catch(b_c):
    """
    Catch phase score:
    Ideal body angle = 55¬∞ ¬± 10¬∞ ‚Üí 10 points.
    Outside this range: 14 - 0.4 * |b_c - 55|, clamped to 0‚Äì10.
    """
    if pd.isna(b_c):
        return np.nan
    return 10.0 if abs(b_c - 55) <= 10 else max(0.0, min(10.0, 14.0 - 0.4 * abs(b_c - 55)))

def score_finish(b_f):
    """
    Finish phase score:
    Ideal body angle = 110¬∞ ¬± 10¬∞ ‚Üí 10 points.
    Outside this range: 14 - 0.4 * |b_f - 110|, clamped to 0‚Äì10.
    """
    if pd.isna(b_f):
        return np.nan
    return 10.0 if abs(b_f - 110) <= 10 else max(0.0, min(10.0, 14.0 - 0.4 * abs(b_f - 110)))

# ========= 4) Compute scores per stroke =========
rows = []
for sid, seg in df.groupby("stroke_id"):
    if seg.empty:
        continue

    # Catch = frame where knee angle is minimal within the stroke
    i_catch = seg["angle_right_knee"].idxmin()
    b_c = df.loc[i_catch, "angle_right_body"]
    t_c = df.loc[i_catch, "time_ms"] if "time_ms" in df.columns else np.nan

    # Finish = frame where elbow angle is minimal within the stroke
    i_finish = seg["angle_right_elbow"].idxmin()
    b_f = df.loc[i_finish, "angle_right_body"]
    t_f = df.loc[i_finish, "time_ms"] if "time_ms" in df.columns else np.nan

    # Compute scores
    s_c = score_catch(b_c)
    s_f = score_finish(b_f)
    s_B = np.nan if (pd.isna(s_c) or pd.isna(s_f)) else 0.5 * (s_c + s_f)

    rows.append({
        "stroke_id": sid,
        "start_frame": int(seg.index.min()),
        "end_frame": int(seg.index.max()),
        "catch_frame": int(i_catch),
        "finish_frame": int(i_finish),
        "catch_time_ms": t_c,
        "finish_time_ms": t_f,
        "b_catch_deg": float(b_c),
        "b_finish_deg": float(b_f),
        "score_catch": float(s_c),
        "score_finish": float(s_f),
        "score_body_total": float(s_B),
    })

scores_df = pd.DataFrame(rows)

# ========= 5) Export results =========
scores_df.to_csv(OUT_CSV, index=False)
print(f"‚úÖ Strokes detected: {len(scores_df)}")
print(f"üíæ Saved: {OUT_CSV.resolve()}")
display(scores_df)

‚úÖ Strokes detected: 22
üíæ Saved: /Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/body_angle_scores_per_stroke.csv


Unnamed: 0,stroke_id,start_frame,end_frame,catch_frame,finish_frame,catch_time_ms,finish_time_ms,b_catch_deg,b_finish_deg,score_catch,score_finish,score_body_total
0,1,0,319,173,307,2827,3945,68.2,103.01,8.72,10.0,9.36
1,2,320,637,513,637,5663,6698,71.31,103.19,7.48,10.0,8.74
2,3,638,989,842,638,8408,6706,69.23,103.14,8.31,10.0,9.15
3,4,990,1336,1188,1328,11294,12462,70.36,103.61,7.86,10.0,8.93
4,5,1337,1688,1543,1672,14255,15331,70.53,102.91,7.79,10.0,8.89
5,6,1689,2029,1874,2011,17017,18159,71.04,101.9,7.58,10.0,8.79
6,7,2030,2365,2214,2353,19853,21012,70.56,101.76,7.78,10.0,8.89
7,8,2366,2696,2555,2691,22697,23832,69.84,102.05,8.06,10.0,9.03
8,9,2697,3036,2902,3030,25592,26659,70.45,102.14,7.82,10.0,8.91
9,10,3037,3374,3241,3368,28420,29479,70.18,101.71,7.93,10.0,8.96


### Batch Run ‚Äì‚Äì Body Angle Score

In [114]:
# ==============================================================
# Batch processing for per-stroke body-angle scoring
# Computes mean, SD, min, max statistics per participant
# ==============================================================

import pandas as pd
import numpy as np
from pathlib import Path

# ---------- USER CONFIG ----------
INPUT_DIR = Path(r"/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/angle_relx_cleaned_id/1_fast/")  # folder containing *_cleaned.csv
OUTPUT_DIR = Path(r"/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/body_scores/1_fast/")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

SUMMARY_FILE = OUTPUT_DIR / "batch_body_score_summary.csv"

# ---------- Scoring functions ----------
def score_catch(b_c):
    """Catch phase: ideal body angle = 55¬∞ ¬±10¬∞."""
    if pd.isna(b_c):
        return np.nan
    return 10.0 if abs(b_c - 55) <= 10 else max(0.0, min(10.0, 14.0 - 0.4 * abs(b_c - 55)))

def score_finish(b_f):
    """Finish phase: ideal body angle = 110¬∞ ¬±10¬∞."""
    if pd.isna(b_f):
        return np.nan
    return 10.0 if abs(b_f - 110) <= 10 else max(0.0, min(10.0, 14.0 - 0.4 * abs(b_f - 110)))

# ---------- Function for one file ----------
def process_body_scores(file_path: Path):
    print(f"\nüìÇ Processing {file_path.name}")
    df = pd.read_csv(file_path)

    # Basic check
    required = ["stroke_id", "angle_right_knee", "angle_right_elbow", "angle_right_body"]
    missing = [c for c in required if c not in df.columns]
    if missing:
        print(f"‚ö†Ô∏è Skipped ‚Äî missing columns: {missing}")
        return None

    # Sort by time/frame
    sort_cols = [c for c in ["video", "frame", "time_ms"] if c in df.columns]
    df = df.sort_values(sort_cols if sort_cols else df.columns.tolist()).reset_index(drop=True)

    rows = []
    stroke_ids = sorted(df["stroke_id"].unique())

    # Exclude the last (incomplete) stroke
    if len(stroke_ids) > 1:
        print(f"‚ö†Ô∏è Excluding last stroke (ID {stroke_ids[-1]}) ‚Äî assumed incomplete.")
        stroke_ids = stroke_ids[:-1]

    for sid in stroke_ids:
        seg = df[df["stroke_id"] == sid]
        if seg.empty:
            continue

        # Catch = min knee angle
        i_catch = seg["angle_right_knee"].idxmin()
        b_c = df.loc[i_catch, "angle_right_body"]

        # Finish = min elbow angle
        i_finish = seg["angle_right_elbow"].idxmin()
        b_f = df.loc[i_finish, "angle_right_body"]

        s_c = score_catch(b_c)
        s_f = score_finish(b_f)
        s_B = np.nan if (pd.isna(s_c) or pd.isna(s_f)) else 0.5 * (s_c + s_f)

        rows.append({
            "stroke_id": sid,
            "b_catch_deg": float(b_c),
            "b_finish_deg": float(b_f),
            "score_catch": float(s_c),
            "score_finish": float(s_f),
            "score_body_total": float(s_B),
        })

    scores_df = pd.DataFrame(rows)
    if scores_df.empty:
        print("‚ö†Ô∏è No valid strokes found.")
        return None

    # ---------- Save per-file results ----------
    # Create cleaner output name: e.g. "01_1_low_body_score.csv"
    base_name = file_path.stem
    base_name = base_name.replace("_pose2d_angles_relx_cleaned", "").replace("_cleaned", "")
    out_file = OUTPUT_DIR / f"{base_name}_body_score.csv"

    scores_df.to_csv(out_file, index=False)
    print(f"‚úÖ Saved body-angle scores ‚Üí {out_file.name} ({len(scores_df)} strokes)")

    # ---------- Compute summary statistics ----------
    stats = {
        "file": file_path.name,
        "n_strokes": len(scores_df),
        "catch_mean": round(scores_df["score_catch"].mean(), 2),
        "catch_sd": round(scores_df["score_catch"].std(ddof=1), 2),
        "finish_mean": round(scores_df["score_finish"].mean(), 2),
        "finish_sd": round(scores_df["score_finish"].std(ddof=1), 2),
        "body_total_mean": round(scores_df["score_body_total"].mean(), 2),
        "body_total_sd": round(scores_df["score_body_total"].std(ddof=1), 2),
        "body_total_min": round(scores_df["score_body_total"].min(), 2),
        "body_total_max": round(scores_df["score_body_total"].max(), 2),
    }

    return stats

# ---------- Batch run ----------
summary = []
for f in sorted(INPUT_DIR.glob("*_cleaned.csv")):
    try:
        info = process_body_scores(f)
        if info:
            summary.append(info)
    except Exception as e:
        print(f"‚ùå Error processing {f.name}: {e}")

# ---------- Save summary ----------
if summary:
    summary_df = pd.DataFrame(summary)
    summary_df.to_csv(SUMMARY_FILE, index=False)
    print(f"\n‚úÖ Batch finished. Summary saved to {SUMMARY_FILE}")
    display(summary_df)
else:
    print("‚ö†Ô∏è No valid files processed.")


üìÇ Processing 01_1_fast_pose2d_angles_relx_cleaned.csv
‚ö†Ô∏è Excluding last stroke (ID 31) ‚Äî assumed incomplete.
‚úÖ Saved body-angle scores ‚Üí 01_1_fast_body_score.csv (30 strokes)

üìÇ Processing 02_1_fast_pose2d_angles_relx_cleaned.csv
‚ö†Ô∏è Excluding last stroke (ID 23) ‚Äî assumed incomplete.
‚úÖ Saved body-angle scores ‚Üí 02_1_fast_body_score.csv (22 strokes)

üìÇ Processing 03_1_fast_pose2d_angles_relx_cleaned.csv
‚ö†Ô∏è Excluding last stroke (ID 35) ‚Äî assumed incomplete.
‚úÖ Saved body-angle scores ‚Üí 03_1_fast_body_score.csv (34 strokes)

üìÇ Processing 04_1_fast_pose2d_angles_relx_cleaned.csv
‚ö†Ô∏è Excluding last stroke (ID 32) ‚Äî assumed incomplete.
‚úÖ Saved body-angle scores ‚Üí 04_1_fast_body_score.csv (31 strokes)

üìÇ Processing 05_1_fast_pose2d_angles_relx_cleaned.csv
‚ö†Ô∏è Excluding last stroke (ID 31) ‚Äî assumed incomplete.
‚úÖ Saved body-angle scores ‚Üí 05_1_fast_body_score.csv (30 strokes)

üìÇ Processing 06_1_fast_pose2d_angles_relx_cleaned.c

Unnamed: 0,file,n_strokes,catch_mean,catch_sd,finish_mean,finish_sd,body_total_mean,body_total_sd,body_total_min,body_total_max
0,01_1_fast_pose2d_angles_relx_cleaned.csv,30,7.84,0.43,10.0,0.0,8.92,0.21,8.49,9.45
1,02_1_fast_pose2d_angles_relx_cleaned.csv,22,4.76,0.49,9.04,0.53,6.9,0.34,6.28,7.46
2,03_1_fast_pose2d_angles_relx_cleaned.csv,34,4.35,0.59,9.89,0.27,7.12,0.29,6.59,7.69
3,04_1_fast_pose2d_angles_relx_cleaned.csv,31,5.77,0.55,9.91,0.31,7.84,0.29,7.31,8.32
4,05_1_fast_pose2d_angles_relx_cleaned.csv,30,7.53,0.53,10.0,0.0,8.77,0.26,8.26,9.17
5,06_1_fast_pose2d_angles_relx_cleaned.csv,30,2.33,1.22,10.0,0.0,6.17,0.61,5.0,7.42


## Sequence Score
**Sequence compliance (binary metrics)**: For each stroke, we compared the **relative X-position** between the hands and knees and the knee angle time series: A stroke was labeled **‚Äúcorrect‚Äù** if the **relative position became positive (handle clearly past knees) before the knee angle began to decrease (knee flexion)**. Otherwise, it was labeled ‚Äúincorrect.‚Äù The percentage of correct strokes per session served as the handle‚Äìknee compliance rate.
### Questions Need to Answer
a. Between-subject variation: whether some rowers consistently maintain correct sequencing. (measured by average sequence score)

b. Within-subject fatigue effects: whether sequencing correctness declines in longer sessions. (measured by average sequence score)

c. Speed effects: whether higher SPM increases the chance of early knee flexion errors. (measured by average sequence score)

### Mathematical Definition
#### Sequence

Let  

- $t = 1, 2, \dots, T$: time series  
- $r(t)$: the relative x-position between the hand and the knee  
- $\theta(t)$: the knee angle time series  

Define  

- **Hands-passing event** as the first time (frame) when the hands move in front of the knees:  
  $$
  t_h = \min \{ t \mid r(t) > 0 \}
  $$

- **Knee-flexion onset** as the first time (frame) when the knee angle starts to decrease:  
  $$
  t_k = \min \{ t \mid \Delta \theta(t) = \theta(t) - \theta(t-1) < 0 \}
  $$

> May need more tolerance when implementing in code.

Then the **sequence score** for each stroke is defined as:  
$$
s_{\text{seq}} =
\begin{cases}
1, & \text{if } t_h < t_k \\
0, & \text{otherwise}
\end{cases}
$$


### Demo Code (Sequence Score)

In [111]:
import pandas as pd
import numpy as np
from pathlib import Path

# ========= USER CONFIG =========
CSV_PATH = Path(r"/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/angle_relx_cleaned_id/1_low/01_1_low_pose2d_angles_relx_cleaned.csv")

# -------- Output settings --------
# Save all outputs to a clean "outputs" folder
OUTPUT_DIR = CSV_PATH.parent.parent / "outputs"
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Create a short output file name, e.g. "02_1_fast_seq_score.csv"
file_prefix = CSV_PATH.stem.replace("_pose2d_angles_relx_cleaned", "")
OUT_CSV = OUTPUT_DIR / f"{file_prefix}_seq_score.csv"

# ---- Slow Parameters (Strict, raw signal, 120 fps) ----
VEL_THRESH = -1.2          # deg/frame, knee angle decreasing speed threshold
AMP_DROP_THRESH = 5.0      # deg below local peak to define flexion onset
WINDOW_SIZE = 6            # must keep decreasing for >=N frames
RELX_THRESH = -0.06        # threshold for hand passes knee (slightly >0)

# # ---- Fast Parameters (Strict, raw signal, 120 fps) ----
# VEL_THRESH = -1.5          # deg/frame, knee angle decreasing speed threshold
# AMP_DROP_THRESH = 3.0      # deg below local peak to define flexion onset
# WINDOW_SIZE = 4            # must keep decreasing for >= N frames
# RELX_THRESH = -0.09        # threshold for hand passes knee (slightly >0)

# ========= 1) Load and sort =========
df = pd.read_csv(CSV_PATH)
sort_cols = [c for c in ["video", "frame", "time_ms"] if c in df.columns]
df = df.sort_values(sort_cols if sort_cols else df.columns.tolist()).reset_index(drop=True)

# ========= 2) Check required columns =========
required = ["stroke_id", "relx_hand_knee", "angle_right_knee"]
missing = [c for c in required if c not in df.columns]
if missing:
    raise ValueError(f"‚ùå Missing required columns: {missing}")

# ========= Helper: find local maxima =========
def find_local_maxima(y, min_separation=3):
    """Find indices of local maxima with basic filtering."""
    dy = np.diff(y)
    sign = np.sign(dy)
    sign[sign == 0] = 1
    maxima = np.where((sign[:-1] > 0) & (sign[1:] < 0))[0] + 1
    if len(maxima) == 0:
        return np.array([np.argmax(y)])  # fallback
    filtered = []
    for i in maxima:
        if not filtered or i - filtered[-1] >= min_separation:
            filtered.append(i)
        else:
            if y[i] > y[filtered[-1]]:
                filtered[-1] = i
    return np.array(filtered)

# ========= 3) Compute sequence score per stroke =========
rows = []
unique_strokes = sorted(df["stroke_id"].unique())

# ---- Exclude the last stroke (usually incomplete) ----
if len(unique_strokes) > 1:
    print(f"‚ö†Ô∏è Excluding last stroke (ID {unique_strokes[-1]}) ‚Äî assumed incomplete.")
    unique_strokes = unique_strokes[:-1]

for sid in unique_strokes:
    seg = df[df["stroke_id"] == sid]
    if seg.empty or seg["angle_right_knee"].isna().all():
        continue

    r = seg["relx_hand_knee"].to_numpy()
    theta = seg["angle_right_knee"].to_numpy()
    frames = seg.index.to_numpy()

    # Compute knee angle velocity (deg/frame)
    v = np.gradient(theta)

    # --- 1) hand passes knee ---
    idx_hand_pass = np.where(r > RELX_THRESH)[0]
    t_h = frames[idx_hand_pass[0]] if len(idx_hand_pass) > 0 else np.inf

    # --- 2) knee flexion onset detection (raw signal, no smoothing) ---
    local_maxima = find_local_maxima(theta)
    t_k = np.inf
    found = False

    for peak in local_maxima:
        peak_angle = theta[peak]
        # search after the peak for a continuous descent
        for i in range(peak + 1, len(theta) - WINDOW_SIZE):
            window_v = v[i : i + WINDOW_SIZE]
            window_theta = theta[i : i + WINDOW_SIZE + 1]
            sustained_down = np.all(window_v < 0)  # velocity consistently negative
            enough_slope = np.mean(window_v) < VEL_THRESH
            enough_drop = theta[i] < peak_angle - AMP_DROP_THRESH
            if sustained_down and enough_slope and enough_drop:
                t_k = frames[i]
                found = True
                break
        if found:
            break

    # ---- Fallback: use first frame where velocity turns negative ----
    if not np.isfinite(t_k):
        neg_idx = np.where(v < 0)[0]
        if len(neg_idx) > 0:
            t_k = frames[neg_idx[0]]

    # --- 3) Sequence score (strict, no smoothing) ---
    if not np.isfinite(t_k):
        s_seq = 1  # if still undefined, assume correct (very smooth motion)
    else:
        s_seq = 1 if t_h < t_k else 0

    rows.append({
        "stroke_id": sid,
        "hand_pass_frame": int(t_h) if np.isfinite(t_h) else None,
        "knee_flex_frame": int(t_k) if np.isfinite(t_k) else None,
        "sequence_score": int(s_seq)
    })

# ========= 4) Save results =========
scores_df = pd.DataFrame(rows)
scores_df.to_csv(OUT_CSV, index=False)

print(f"‚úÖ Sequence scores computed for {len(scores_df)} strokes (Method A, raw signal, no smoothing, last stroke excluded)")
print(f"üíæ Saved to: {OUT_CSV.resolve()}")
display(scores_df)

‚ö†Ô∏è Excluding last stroke (ID 22) ‚Äî assumed incomplete.
‚úÖ Sequence scores computed for 21 strokes (Method A, raw signal, no smoothing, last stroke excluded)
üíæ Saved to: /Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/angle_relx_cleaned_id/outputs/01_1_low_seq_score.csv


Unnamed: 0,stroke_id,hand_pass_frame,knee_flex_frame,sequence_score
0,1,24,24,0
1,2,337,349,1
2,3,672,668,0
3,4,1017,1024,1
4,5,1363,1372,1
5,6,1707,1714,1
6,7,2049,2141,1
7,8,2393,2383,0
8,9,2726,2732,1
9,10,3067,3084,1


### Batch Run ‚Äì‚Äì Sequence Score

In [112]:
import pandas as pd
import numpy as np
from pathlib import Path

# ========= USER CONFIG =========
INPUT_DIR = Path(r"/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/angle_relx_cleaned_id/1_fast/")
OUTPUT_DIR = Path(r"/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/seq_scores/1_fast")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# ========= Parameter sets =========
PARAMS_FAST = {
    "VEL_THRESH": -1.5,     # deg/frame
    "AMP_DROP_THRESH": 3.0, # deg
    "WINDOW_SIZE": 4,
    "RELX_THRESH": -0.09
}

PARAMS_SLOW = {
    "VEL_THRESH": -1.2,
    "AMP_DROP_THRESH": 5.0,
    "WINDOW_SIZE": 6,
    "RELX_THRESH": -0.06
}

# ========= Helper: find local maxima =========
def find_local_maxima(y, min_separation=3):
    """Find indices of local maxima with basic filtering."""
    dy = np.diff(y)
    sign = np.sign(dy)
    sign[sign == 0] = 1
    maxima = np.where((sign[:-1] > 0) & (sign[1:] < 0))[0] + 1
    if len(maxima) == 0:
        return np.array([np.argmax(y)])  # fallback
    filtered = []
    for i in maxima:
        if not filtered or i - filtered[-1] >= min_separation:
            filtered.append(i)
        else:
            if y[i] > y[filtered[-1]]:
                filtered[-1] = i
    return np.array(filtered)

# ========= Function to process one file =========
def process_file(file_path: Path):
    print(f"\nüìÇ Processing {file_path.name}")

    # ---- Select parameter set based on file name ----
    if "fast" in file_path.name.lower():
        params = PARAMS_FAST
        speed_type = "fast"
    elif "low" in file_path.name.lower() or "slow" in file_path.name.lower():
        params = PARAMS_SLOW
        speed_type = "slow"
    else:
        print("‚ö†Ô∏è Could not determine speed type from file name. Skipping.")
        return None

    VEL_THRESH = params["VEL_THRESH"]
    AMP_DROP_THRESH = params["AMP_DROP_THRESH"]
    WINDOW_SIZE = params["WINDOW_SIZE"]
    RELX_THRESH = params["RELX_THRESH"]

    # ---- Load data ----
    df = pd.read_csv(file_path)
    sort_cols = [c for c in ["video", "frame", "time_ms"] if c in df.columns]
    df = df.sort_values(sort_cols if sort_cols else df.columns.tolist()).reset_index(drop=True)

    required = ["stroke_id", "relx_hand_knee", "angle_right_knee"]
    missing = [c for c in required if c not in df.columns]
    if missing:
        print(f"‚ö†Ô∏è Missing columns in {file_path.name}: {missing}. Skipping.")
        return None

    rows = []
    unique_strokes = sorted(df["stroke_id"].unique())

    # Exclude the last stroke
    if len(unique_strokes) > 1:
        print(f"‚ö†Ô∏è Excluding last stroke (ID {unique_strokes[-1]}) ‚Äî assumed incomplete.")
        unique_strokes = unique_strokes[:-1]

    # ---- Compute per-stroke sequence score ----
    for sid in unique_strokes:
        seg = df[df["stroke_id"] == sid]
        if seg.empty or seg["angle_right_knee"].isna().all():
            continue

        r = seg["relx_hand_knee"].to_numpy()
        theta = seg["angle_right_knee"].to_numpy()
        frames = seg.index.to_numpy()

        # Compute knee angle velocity (deg/frame)
        v = np.gradient(theta)

        # --- 1) hand passes knee ---
        idx_hand_pass = np.where(r > RELX_THRESH)[0]
        t_h = frames[idx_hand_pass[0]] if len(idx_hand_pass) > 0 else np.inf

        # --- 2) knee flexion onset detection (raw signal, no smoothing) ---
        local_maxima = find_local_maxima(theta)
        t_k = np.inf
        found = False

        for peak in local_maxima:
            peak_angle = theta[peak]
            for i in range(peak + 1, len(theta) - WINDOW_SIZE):
                window_v = v[i : i + WINDOW_SIZE]
                window_theta = theta[i : i + WINDOW_SIZE + 1]
                sustained_down = np.all(window_v < 0)
                enough_slope = np.mean(window_v) < VEL_THRESH
                enough_drop = theta[i] < peak_angle - AMP_DROP_THRESH
                if sustained_down and enough_slope and enough_drop:
                    t_k = frames[i]
                    found = True
                    break
            if found:
                break

        # ---- Fallback: use first frame where velocity turns negative ----
        if not np.isfinite(t_k):
            neg_idx = np.where(v < 0)[0]
            if len(neg_idx) > 0:
                t_k = frames[neg_idx[0]]

        # --- 3) Sequence score ---
        if not np.isfinite(t_k):
            s_seq = 1
        else:
            s_seq = 1 if t_h < t_k else 0

        rows.append({
            "stroke_id": sid,
            "hand_pass_frame": int(t_h) if np.isfinite(t_h) else None,
            "knee_flex_frame": int(t_k) if np.isfinite(t_k) else None,
            "sequence_score": int(s_seq)
        })

    if not rows:
        print(f"‚ö†Ô∏è No valid strokes detected in {file_path.name}")
        return None

    scores_df = pd.DataFrame(rows)

    # ---- Save result ----
    file_prefix = file_path.stem.replace("_pose2d_angles_relx_cleaned", "")
    out_path = OUTPUT_DIR / f"{file_prefix}_seq_score.csv"
    scores_df.to_csv(out_path, index=False)
    print(f"‚úÖ Saved: {out_path.name} ({len(scores_df)} strokes)")

    return {
        "file": file_path.name,
        "speed_type": speed_type,
        "n_strokes": len(scores_df),
        "avg_seq_score": round(scores_df["sequence_score"].mean(), 3)
    }

# ========= 4) Batch run =========
summary = []
for csv_file in sorted(INPUT_DIR.rglob("*_pose2d_angles_relx_cleaned.csv")):
    try:
        info = process_file(csv_file)
        if info:
            summary.append(info)
    except Exception as e:
        print(f"‚ùå Error processing {csv_file.name}: {e}")

# ========= 5) Save summary =========
if summary:
    summary_df = pd.DataFrame(summary)
    summary_path = OUTPUT_DIR / "batch_sequence_score_summary.csv"
    summary_df.to_csv(summary_path, index=False)
    print(f"\n‚úÖ Batch finished. Summary saved to {summary_path}")
    display(summary_df)
else:
    print("‚ö†Ô∏è No valid files processed.")


üìÇ Processing 01_1_fast_pose2d_angles_relx_cleaned.csv
‚ö†Ô∏è Excluding last stroke (ID 31) ‚Äî assumed incomplete.
‚úÖ Saved: 01_1_fast_seq_score.csv (30 strokes)

üìÇ Processing 02_1_fast_pose2d_angles_relx_cleaned.csv
‚ö†Ô∏è Excluding last stroke (ID 23) ‚Äî assumed incomplete.
‚úÖ Saved: 02_1_fast_seq_score.csv (22 strokes)

üìÇ Processing 03_1_fast_pose2d_angles_relx_cleaned.csv
‚ö†Ô∏è Excluding last stroke (ID 35) ‚Äî assumed incomplete.
‚úÖ Saved: 03_1_fast_seq_score.csv (34 strokes)

üìÇ Processing 04_1_fast_pose2d_angles_relx_cleaned.csv
‚ö†Ô∏è Excluding last stroke (ID 32) ‚Äî assumed incomplete.
‚úÖ Saved: 04_1_fast_seq_score.csv (31 strokes)

üìÇ Processing 05_1_fast_pose2d_angles_relx_cleaned.csv
‚ö†Ô∏è Excluding last stroke (ID 31) ‚Äî assumed incomplete.
‚úÖ Saved: 05_1_fast_seq_score.csv (30 strokes)

üìÇ Processing 06_1_fast_pose2d_angles_relx_cleaned.csv
‚ö†Ô∏è Excluding last stroke (ID 31) ‚Äî assumed incomplete.
‚úÖ Saved: 06_1_fast_seq_score.csv (30 strokes

Unnamed: 0,file,speed_type,n_strokes,avg_seq_score
0,01_1_fast_pose2d_angles_relx_cleaned.csv,fast,30,0.37
1,02_1_fast_pose2d_angles_relx_cleaned.csv,fast,22,0.0
2,03_1_fast_pose2d_angles_relx_cleaned.csv,fast,34,0.0
3,04_1_fast_pose2d_angles_relx_cleaned.csv,fast,31,0.87
4,05_1_fast_pose2d_angles_relx_cleaned.csv,fast,30,0.97
5,06_1_fast_pose2d_angles_relx_cleaned.csv,fast,30,0.0


## Technique Quality Score
### Mathematical Definition
We define the technique quality for each stroke as

\begin{equation}
s_Q \;=100 \times (\; 0.08\, s_B \;+\; 0.2\, s_{\text{seq}})
\end{equation}

#### Single File (Technique Quality Score)

In [104]:
import pandas as pd
from pathlib import Path

# ========= USER CONFIG =========
# Fill in your file paths here üëá
body_score_path = "/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/body_scores/1_fast/01_1_fast_body_score.csv"   # e.g. "/path/to/01_1_low_body_score.csv"
seq_score_path  = "/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/seq_scores/1_fast/01_1_fast_seq_score.csv"   # e.g. "/path/to/01_1_low_seq_score.csv"
summary_path    = "/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/angle_relx_cleaned_id/1_fast/batch_summary.csv"   # e.g. "/path/to/batch_summary.csv"
output_path     = "/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/tech_qual_score.csv"   # e.g. "/path/to/01_1_low_tech_quality.csv"

# ========= 1) Load data =========
df_body = pd.read_csv(body_score_path)
df_seq = pd.read_csv(seq_score_path)
df_summary = pd.read_csv(summary_path)

# ========= 2) Parse file name for participant_id, session_time, and speed_type =========
file_stem = Path(body_score_path).stem  # e.g., "01_1_low_body_score"
parts = file_stem.split("_")

participant_id = parts[0] if len(parts) > 0 else None
session_time   = parts[1] if len(parts) > 1 else None
speed_type     = parts[2] if len(parts) > 2 else None

# ========= 3) Look up stroke_rate from batch summary =========
# Try to find a row in the summary file that contains this file name pattern
match = df_summary[df_summary["file"].str.contains(f"{participant_id}_{session_time}_{speed_type}", case=False, na=False)]
stroke_rate = match.iloc[0]["SPM_est"] if not match.empty and "SPM_est" in match.columns else None

# ========= 4) Merge and compute scores =========
merged = (
    df_body[["stroke_id", "score_body_total"]]
    .rename(columns={"score_body_total": "body_angle_score"})
    .merge(
        df_seq[["stroke_id", "sequence_score"]],
        on="stroke_id",
        how="inner"
    )
)

merged["tech_qual_score"] = 100* (0.08 * merged["body_angle_score"] + 0.2 * merged["sequence_score"])

# ========= 5) Add metadata columns (same for all rows) =========
merged.insert(0, "participant_id", participant_id)
merged.insert(1, "session_time", session_time)
merged.insert(2, "speed_type", speed_type)
merged.insert(3, "stroke_rate", stroke_rate)

# ========= 6) Display and save =========
pd.set_option("display.float_format", "{:.2f}".format)
print(f"‚úÖ participant_id: {participant_id}, session_time: {session_time}, speed_type: {speed_type}, stroke_rate: {stroke_rate}\n")
display(merged)

merged.to_csv(output_path, index=False)
print(f"\n‚úÖ Merged file saved to:\n{output_path}")

‚úÖ participant_id: 01, session_time: 1, speed_type: fast, stroke_rate: 30.09



Unnamed: 0,participant_id,session_time,speed_type,stroke_rate,stroke_id,body_angle_score,sequence_score,tech_qual_score
0,1,1,fast,30.09,1,8.97,1,91.8
1,1,1,fast,30.09,2,8.81,1,90.52
2,1,1,fast,30.09,3,8.65,1,89.19
3,1,1,fast,30.09,4,8.67,1,89.33
4,1,1,fast,30.09,5,9.45,1,95.57
5,1,1,fast,30.09,6,9.04,0,72.35
6,1,1,fast,30.09,7,9.11,0,72.87
7,1,1,fast,30.09,8,9.18,1,93.47
8,1,1,fast,30.09,9,9.13,0,73.07
9,1,1,fast,30.09,10,9.08,0,72.62



‚úÖ Merged file saved to:
/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs/tech_qual_score.csv


#### Batch Run ‚Äì‚Äì Technique Quality Score

In [1]:
import pandas as pd
from pathlib import Path

# ========= USER CONFIG =========
BASE_DIR = Path("/Users/ameliaxu/Documents/ls100_project/MediaPipeEnv/outputs")
BODY_DIR = BASE_DIR / "body_scores"
SEQ_DIR  = BASE_DIR / "seq_scores"
SUMMARY_DIR = BASE_DIR / "angle_relx_cleaned_id"
OUTPUT_DIR   = BASE_DIR / "tech_quality_scores"
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# All session folders to process
SESSION_FOLDERS = ["1_fast", "1_low", "2_low", "4_low"]

# ========= Define expertise mapping =========
EXPERTISE_MAP = {
    "a01": "intermediate",
    "a03": "intermediate",
    "a02": "novice",
    "a05": "novice",
    "a04": "beginner",
    "a06": "beginner"
}

# ========= Define files to skip =========
SKIP_FILES = {"03_2_low_final_body_score.csv"}  # can add more later if needed

def process_file(body_file: Path, summary_path: Path):
    """Merge one body score file with its sequence file, compute tech_qual_score, and save results."""
    try:
        # ---- Skip unwanted files ----
        if body_file.name in SKIP_FILES:
            print(f"‚è≠Ô∏è Skipping file: {body_file.name}")
            return None

        # ---- Parse file name (e.g., 03_2_low_final_body_score.csv)
        stem = body_file.stem
        parts = stem.split("_")
        participant_id = parts[0] if len(parts) > 0 else None
        session_time   = parts[1] if len(parts) > 1 else None
        speed_type     = parts[2] if len(parts) > 2 else None

        # Add 'a' prefix to participant_id
        participant_id_prefixed = f"a{participant_id}"

        # ---- Add expertise level
        expertise_level = EXPERTISE_MAP.get(participant_id_prefixed, "unknown")

        # ---- Load summary (for stroke rate)
        if not summary_path.exists():
            print(f"‚ö†Ô∏è Summary not found: {summary_path}")
            stroke_rate = None
        else:
            df_summary = pd.read_csv(summary_path)
            pat = f"{participant_id}_{session_time}_{speed_type}"
            match = df_summary[df_summary["file"].str.contains(pat, case=False, na=False)]
            stroke_rate = match.iloc[0]["SPM_est"] if not match.empty and "SPM_est" in match.columns else None

        # ---- Locate matching sequence score file
        session_folder = f"{session_time}_{speed_type}"
        seq_file = SEQ_DIR / session_folder / f"{participant_id}_{session_time}_{speed_type}_seq_score.csv"
        if not seq_file.exists():
            print(f"‚ö†Ô∏è Sequence score not found for {stem} ‚Üí {seq_file}")
            return None

        # ---- Read body and sequence CSVs
        df_body = pd.read_csv(body_file)
        df_seq  = pd.read_csv(seq_file)

        # ---- Merge on stroke_id
        merged = (
            df_body[["stroke_id", "score_body_total"]]
            .rename(columns={"score_body_total": "torso_angle_score"})
            .merge(df_seq[["stroke_id", "sequence_score"]], on="stroke_id", how="inner")
        )

        # ---- Compute technical quality
        merged["tech_qual_score"] = 100 * (0.08 * merged["torso_angle_score"] + 0.2 * merged["sequence_score"])

        # ---- Add metadata
        merged.insert(0, "participant_id", participant_id_prefixed)
        merged.insert(1, "expertise_level", expertise_level)
        merged.insert(2, "session_time", session_time)
        merged.insert(3, "speed_type", speed_type)
        merged.insert(4, "stroke_rate", stroke_rate)

        # ---- Save output file
        out_name = body_file.name.replace("_body_score", "_tech_quality_score")
        out_name = "a" + out_name
        out_path = OUTPUT_DIR / out_name
        merged.to_csv(out_path, index=False)
        print(f"‚úÖ Saved: {out_name} ({len(merged)} strokes)")

        # ---- Return summary row
        return {
            "file": body_file.name,
            "participant_id": participant_id_prefixed,
            "expertise_level": expertise_level,
            "session_time": session_time,
            "speed_type": speed_type,
            "stroke_rate": stroke_rate,
            "n_strokes": len(merged),
            "avg_torso_score": merged["torso_angle_score"].mean(),
            "avg_seq_score": merged["sequence_score"].mean(),
            "avg_tech_qual": merged["tech_qual_score"].mean()
        }

    except Exception as e:
        print(f"‚ùå Error processing {body_file.name}: {e}")
        return None

# ========= Batch run =========
summary_rows = []

for session_folder in SESSION_FOLDERS:
    body_folder = BODY_DIR / session_folder
    summary_path = SUMMARY_DIR / session_folder / "batch_summary.csv"

    if not body_folder.exists():
        print(f"‚ö†Ô∏è Body folder not found: {body_folder}")
        continue

    for body_file in sorted(body_folder.glob("*_body_score.csv")):
        info = process_file(body_file, summary_path)
        if info:
            summary_rows.append(info)

# ========= Save overall summary =========
if summary_rows:
    summary_df = pd.DataFrame(summary_rows)
    summary_file = OUTPUT_DIR / "batch_tech_quality_summary.csv"
    summary_df.to_csv(summary_file, index=False)
    print(f"\n‚úÖ Batch completed. Summary saved to:\n{summary_file}")
    display(summary_df)
else:
    print("‚ö†Ô∏è No valid files processed.")

‚úÖ Saved: a01_1_fast_tech_quality_score.csv (30 strokes)
‚úÖ Saved: a02_1_fast_tech_quality_score.csv (22 strokes)
‚úÖ Saved: a03_1_fast_tech_quality_score.csv (34 strokes)
‚úÖ Saved: a04_1_fast_tech_quality_score.csv (31 strokes)
‚úÖ Saved: a05_1_fast_tech_quality_score.csv (30 strokes)
‚úÖ Saved: a06_1_fast_tech_quality_score.csv (30 strokes)
‚úÖ Saved: a01_1_low_tech_quality_score.csv (21 strokes)
‚úÖ Saved: a02_1_low_tech_quality_score.csv (20 strokes)
‚úÖ Saved: a03_1_low_tech_quality_score.csv (20 strokes)
‚úÖ Saved: a04_1_low_tech_quality_score.csv (20 strokes)
‚úÖ Saved: a05_1_low_tech_quality_score.csv (19 strokes)
‚úÖ Saved: a06_1_low_tech_quality_score.csv (20 strokes)
‚úÖ Saved: a01_2_low_tech_quality_score.csv (40 strokes)
‚úÖ Saved: a02_2_low_tech_quality_score.csv (38 strokes)
‚úÖ Saved: a03_2_low_tech_quality_score.csv (37 strokes)
‚è≠Ô∏è Skipping file: 03_2_low_final_body_score.csv
‚úÖ Saved: a04_2_low_tech_quality_score.csv (38 strokes)
‚úÖ Saved: a05_2_low_tech_qual

Unnamed: 0,file,participant_id,expertise_level,session_time,speed_type,stroke_rate,n_strokes,avg_torso_score,avg_seq_score,avg_tech_qual
0,01_1_fast_body_score.csv,a01,intermediate,1,fast,30.09,30,8.918319,0.366667,78.679887
1,02_1_fast_body_score.csv,a02,novice,1,fast,29.82,22,6.898722,0.0,55.189774
2,03_1_fast_body_score.csv,a03,intermediate,1,fast,34.32,34,7.11748,0.0,56.939843
3,04_1_fast_body_score.csv,a04,beginner,1,fast,31.64,31,7.843939,0.870968,80.170866
4,05_1_fast_body_score.csv,a05,novice,1,fast,30.05,30,8.766343,0.966667,89.464078
5,06_1_fast_body_score.csv,a06,beginner,1,fast,30.1,30,6.165344,0.0,49.322749
6,01_1_low_body_score.csv,a01,intermediate,1,low,20.34,21,9.00483,0.857143,89.181497
7,02_1_low_body_score.csv,a02,novice,1,low,20.18,20,7.358615,0.4,66.868919
8,03_1_low_body_score.csv,a03,intermediate,1,low,20.15,20,6.923074,0.25,60.384596
9,04_1_low_body_score.csv,a04,beginner,1,low,20.47,20,8.242619,0.55,76.940953
