# mast_ttest – Summary

This notebook applies **independent-sample t-tests** to evaluate whether **mast brand (Levitaz vs Chubanga)** influences boat performance, specifically **SOG (Speed Over Ground)**, under different run conditions.  
The analysis uses telemetry from `all_data.csv` filtered for **June 10, 2025 runs**.

---

## Inputs
- **Data**: `all_data.csv` containing time-series telemetry across runs.   
- **Helper functions**:  
  - `t_test(df1, df2, target="SOG")`: performs two-sample t-test, prints t-statistic, p-value, and interprets significance (`p < 0.05`).  
  - `print_run_stats(label, group1, group2, target)`: prints mean values of `target`, plus average and std of `SOG` for each group.

---

## Workflow

### Step 1: Load & filter data
- Restrict dataset to rows with timestamp starting `"2025-06-10"`.

### Step 2: Define groups of runs
- **Runs 1–5**: Karl on **Levitaz**, Gian on **Chubanga**.  
- **Runs 6–10**: Karl on **Chubanga**, Gian on **Levitaz**.  
- Subsets created: `data_10juin_first_runs`, `data_10juin_last_runs`.

### Step 3: Initial t-test (conditions check)
- Compare **TWS** between runs 1–5 and runs 6–10.  
- Report descriptive stats for both groups.

### Step 4: Karl Levitaz vs Karl Chubanga
- Subset Karl’s boat (or SenseBoard paired against Gian).  
- Run t-tests on **SOG**:
  - General (all legs).  
  - Upwind only (`TWA > 0`).  
  - Downwind only (`TWA ≤ 0`).  
- Use `print_run_stats` to show mast brand, mean SOG, std SOG per group.

### Step 5: Gian Chubanga vs Gian Levitaz
- Subset Gian’s boat (or SenseBoard paired against Karl).  
- Run t-tests on **SOG**:
  - General.  
  - Upwind only.  
  - Downwind only.  
- Use `print_run_stats` to show mast brand, mean SOG, std SOG per group.

---

## Output
- Printed results for each t-test: **t-statistic**, **p-value**, interpretation of significance.  
- Group summaries: mast brand used, mean SOG, std SOG.  
- Structured comparisons:  
  - **Karl**: Levitaz (runs 1–5) vs Chubanga (runs 6–10).  
  - **Gian**: Chubanga (runs 1–5) vs Levitaz (runs 6–10).  
  - Breakdown by **upwind** and **downwind**.

---

## Notes
- Significance threshold: `p < 0.05`.  
- Non-significant results suggest **data can be combined**; significant results indicate a **mast effect**.  

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import scipy.stats as stats

def t_test(df1, df2, target="SOG"):
    t_stat, p_value = stats.ttest_ind(df1[target].dropna(), df2[target].dropna())
    print(f"T-statistic: {t_stat:.3f}, p-value: {p_value:.15f}")
    
    # If p-value is less than 0.05, the difference is statistically significant
    if p_value < 0.05:
        print("The difference is statistically significant, keeping data split.")
    else:
        print("The difference is not statistically significant, keeping data combined.")

def print_run_stats(first_sentence, first_runs_df, last_runs_df, target):
    print("\n", first_sentence)

    if first_runs_df[target].dtype == "O":
        first_target = ", ".join(first_runs_df[target].dropna().unique())
        last_target = ", ".join(last_runs_df[target].dropna().unique())
    else:
        first_target = f"{first_runs_df[target].mean():.2f}"
        last_target = f"{last_runs_df[target].mean():.2f}"

    print(f"Mean {target } on the first group : {first_target}, "
          f"average SOG: {first_runs_df['SOG'].mean():.2f}, std SOG: {first_runs_df['SOG'].std():.2f}")
    
    print(f"Mean  {target } on the second group : {last_target}, "
          f"average SOG: {last_runs_df['SOG'].mean():.2f}, std SOG: {last_runs_df['SOG'].std():.2f}")



In [2]:
df = pd.read_csv("all_data.csv")

In [3]:
data_10juin = df[df["ISODateTimeUTC"].str.startswith("2025-06-10")]

## T test on the TWS between runs 1 to 5 where Karl is on the Levitaz and Gian is on the Chubanga and runs 6 to 10 is the other way around

In [4]:
first_runs = ["10_06_Run1","10_06_Run2","10_06_Run3","10_06_Run4","10_06_Run5"]
data_10juin_first_runs = data_10juin[data_10juin["run"].isin(first_runs) ]

In [5]:
last_runs = ["10_06_Run6","10_06_Run7","10_06_Run8","10_06_Run9","10_06_Run10"]
data_10juin_last_runs = data_10juin[data_10juin["run"].isin(last_runs) ]

In [6]:
t_test(data_10juin_first_runs,data_10juin_last_runs, target="TWS")
print_run_stats("Runs 1 to 5 VS Runs 6 to 10:", data_10juin_first_runs, data_10juin_last_runs, target="TWS")

T-statistic: -208.083, p-value: 0.000000000000000
The difference is statistically significant, keeping data split.

 Runs 1 to 5 VS Runs 6 to 10:
Mean TWS on the first group : 6.19, average SOG: 24.06, std SOG: 2.19
Mean  TWS on the second group : 8.06, average SOG: 24.57, std SOG: 2.16


## t test karl levi vs karl chub

In [7]:
only_karl_first_runs_levi = data_10juin_first_runs[
    (data_10juin_first_runs["boat_name"] == "Karl Maeder") |
    ((data_10juin_first_runs["boat_name"] == "SenseBoard") & 
     (data_10juin_first_runs["opponent_name"] == "Gian Stragiotti"))
]
only_karl_first_runs_levi.sample(5)

Unnamed: 0,ISODateTimeUTC,SecondsSince1970,Heel_Abs,Heel_Lwd,Lat,LatBow,LatCenter,LatStern,Leg,Line_C,...,interval_duration,mast_brand,gain_forward,gain_lateral,gain_vmg,Line_R2,Line_L2,Line_C2,side_line2,total_line2
62735,2025-06-10T12:39:32.938Z,1749559000.0,45.6,45.6,43.529888,43.52989,43.529884,43.529878,,83.3,...,54.391,Levi,1.921796,1.118719,-0.240491,4.2,4.7,83.3,8.9,92.2
67413,2025-06-10T12:56:20.553Z,1749560000.0,46.6,46.6,43.532329,43.532331,43.532325,43.532318,,85.9,...,53.319,Levi,18.109941,-2.603025,-14.41707,4.5,5.7,85.9,10.2,96.1
59591,2025-06-10T12:29:21.461Z,1749559000.0,48.9,48.9,43.531236,43.531234,43.53124,43.531246,,89.4,...,73.306,Levi,-4.81349,-7.419819,-8.825811,3.9,5.0,89.4,8.9,98.3
66603,2025-06-10T12:53:38.153Z,1749560000.0,55.0,55.0,43.53132,43.531319,43.531324,43.53133,,120.459,...,51.991,Levi,17.418261,-10.113511,1.450612,3.6,6.7,120.459,10.3,130.759
68558,2025-06-10T13:02:41.857Z,1749561000.0,56.7,56.7,43.533708,43.533706,43.533712,43.533718,,121.997,...,61.199,Levi,0.315863,1.971176,1.723912,5.1,8.7,121.997,13.8,135.797


In [8]:
only_karl_last_runs_chub = data_10juin_last_runs[
    (data_10juin_last_runs["boat_name"] == "Karl Maeder") |
    ((data_10juin_last_runs["boat_name"] == "SenseBoard") & 
     (data_10juin_last_runs["opponent_name"] == "Gian Stragiotti"))
]
only_karl_last_runs_chub.sample(5)

Unnamed: 0,ISODateTimeUTC,SecondsSince1970,Heel_Abs,Heel_Lwd,Lat,LatBow,LatCenter,LatStern,Leg,Line_C,...,interval_duration,mast_brand,gain_forward,gain_lateral,gain_vmg,Line_R2,Line_L2,Line_C2,side_line2,total_line2
79442,2025-06-10T13:50:10.056Z,1749563000.0,57.6,57.6,43.529238,43.529236,43.529242,43.529248,,92.4,...,67.905,Chub,2.286412,6.301976,6.337387,4.9,7.1,92.4,12.0,104.4
70437,2025-06-10T13:14:03.544Z,1749561000.0,62.4,62.4,43.5305,43.530498,43.530504,43.53051,,107.4,...,59.901,Chub,12.760132,-19.958349,-7.732247,5.8,6.5,107.4,12.3,119.7
79089,2025-06-10T13:49:34.744Z,1749563000.0,57.0,57.0,43.532813,43.532811,43.532817,43.532823,,113.8,...,67.905,Chub,3.192359,-4.19092,-1.259104,6.101,6.4,113.8,12.501,126.301
79224,2025-06-10T13:49:48.263Z,1749563000.0,58.6,58.6,43.531438,43.531436,43.531442,43.531448,,103.2,...,67.905,Chub,1.300818,-0.458163,0.466044,3.339,7.1,103.2,10.439,113.639
70121,2025-06-10T13:13:31.943Z,1749561000.0,64.7,64.7,43.533687,43.533685,43.533691,43.533697,,131.8,...,59.901,Chub,4.796879,-9.105626,-3.91708,9.4,9.8,131.8,19.2,151.0


In [9]:
t_test(only_karl_first_runs_levi,only_karl_last_runs_chub) #general
print("\nUpwind and downwind for Karl:")
print_run_stats("Karl on Levi VS Karl on Chub:", only_karl_first_runs_levi, only_karl_last_runs_chub, target="mast_brand")

T-statistic: -15.956, p-value: 0.000000000000000
The difference is statistically significant, keeping data split.

Upwind and downwind for Karl:

 Karl on Levi VS Karl on Chub:
Mean mast_brand on the first group : Levi, average SOG: 23.80, std SOG: 2.03
Mean  mast_brand on the second group : Chub, average SOG: 24.44, std SOG: 2.17


In [10]:
only_karl_first_runs_levi_upwind = only_karl_first_runs_levi[only_karl_first_runs_levi["TWA"]>0]
only_karl_last_runs_chub_upwind = only_karl_last_runs_chub[only_karl_last_runs_chub["TWA"]>0]
# upwind
print("\nUpwind for Karl:")
t_test(only_karl_first_runs_levi_upwind,only_karl_last_runs_chub_upwind)
print_run_stats("Karl on Levi upwind VS Karl on Chub upwind:", only_karl_first_runs_levi_upwind, only_karl_last_runs_chub_upwind, target="mast_brand")


Upwind for Karl:
T-statistic: -44.815, p-value: 0.000000000000000
The difference is statistically significant, keeping data split.

 Karl on Levi upwind VS Karl on Chub upwind:
Mean mast_brand on the first group : Levi, average SOG: 22.12, std SOG: 0.73
Mean  mast_brand on the second group : Chub, average SOG: 22.84, std SOG: 0.54


In [11]:
only_karl_first_runs_levi_downwind = only_karl_first_runs_levi[only_karl_first_runs_levi["TWA"] <= 0]
only_karl_last_runs_chub_downwind = only_karl_last_runs_chub[only_karl_last_runs_chub["TWA"] <= 0]
#downwind
print("\nDownwind for Karl:")
t_test(only_karl_first_runs_levi_downwind,only_karl_last_runs_chub_downwind)
print_run_stats("Karl on Levi downwind VS Karl on Chub downwind:", only_karl_first_runs_levi_downwind, only_karl_last_runs_chub_downwind, target="mast_brand")


Downwind for Karl:
T-statistic: -56.835, p-value: 0.000000000000000
The difference is statistically significant, keeping data split.

 Karl on Levi downwind VS Karl on Chub downwind:
Mean mast_brand on the first group : Levi, average SOG: 25.86, std SOG: 0.91
Mean  mast_brand on the second group : Chub, average SOG: 27.20, std SOG: 0.56


## t test Gian chub vs Gian levi

In [12]:
only_gian_first_runs_chub = data_10juin_first_runs[
    (data_10juin_first_runs["boat_name"] == "Gian Stragiotti") |
    ((data_10juin_first_runs["boat_name"] == "SenseBoard") & 
     (data_10juin_first_runs["opponent_name"] == "Karl Maeder"))
]
only_gian_first_runs_chub.sample(5)

Unnamed: 0,ISODateTimeUTC,SecondsSince1970,Heel_Abs,Heel_Lwd,Lat,LatBow,LatCenter,LatStern,Leg,Line_C,...,interval_duration,mast_brand,gain_forward,gain_lateral,gain_vmg,Line_R2,Line_L2,Line_C2,side_line2,total_line2
66871,2025-06-10T12:56:19.654Z,1749560000.0,60.2,60.2,43.532396,43.532398,43.532392,43.532386,,122.4,...,53.319,Chub,20.196352,-2.822637,-16.003636,8.3,11.4,122.4,19.7,142.1
64819,2025-06-10T12:47:28.651Z,1749560000.0,54.3,54.3,43.531968,43.53197,43.531964,43.531957,,108.233,...,48.401,Chub,-10.307319,4.463205,10.166821,6.4,10.9,108.233,17.3,125.533
59933,2025-06-10T12:31:31.545Z,1749559000.0,55.1,55.1,43.530947,43.530949,43.530943,43.530937,1.0,114.3,...,54.787,Chub,-9.088905,-6.269953,1.004706,9.1,10.0,114.3,19.1,133.4
59864,2025-06-10T12:31:24.651Z,1749559000.0,53.8,53.8,43.530129,43.530131,43.530125,43.530118,1.0,119.8,...,54.787,Chub,-5.588834,-5.28433,-0.244727,8.854,9.5,119.8,18.354,138.154
58951,2025-06-10T12:29:30.755Z,1749559000.0,61.3,61.3,43.530313,43.530311,43.530317,43.530323,1.0,114.8,...,73.306,Chub,-8.690814,-8.11726,-11.712729,6.3,9.0,114.8,15.3,130.1


In [13]:
only_gian_last_runs_levi = data_10juin_last_runs[
    (data_10juin_last_runs["boat_name"] == "Gian Stragiotti") |
    ((data_10juin_last_runs["boat_name"] == "SenseBoard") & 
     (data_10juin_last_runs["opponent_name"] == "Karl Maeder"))
]
only_gian_last_runs_levi.sample(5)

Unnamed: 0,ISODateTimeUTC,SecondsSince1970,Heel_Abs,Heel_Lwd,Lat,LatBow,LatCenter,LatStern,Leg,Line_C,...,interval_duration,mast_brand,gain_forward,gain_lateral,gain_vmg,Line_R2,Line_L2,Line_C2,side_line2,total_line2
72631,2025-06-10T13:22:28.356Z,1749562000.0,55.7,55.7,43.535907,43.535905,43.535911,43.535917,,130.4,...,67.8,Levi,-0.342633,1.303837,0.660641,7.6,7.6,130.4,15.2,145.6
74949,2025-06-10T13:32:47.858Z,1749562000.0,59.3,59.3,43.535864,43.535862,43.535868,43.535874,,123.4,...,62.598,Levi,2.440765,-0.192227,1.559037,5.5,7.0,123.4,12.5,135.9
75353,2025-06-10T13:33:28.255Z,1749562000.0,60.9,60.9,43.531778,43.531776,43.531782,43.531788,,134.1,...,62.598,Levi,9.634555,10.747389,14.422916,4.737,6.1,134.1,10.837,144.937
73172,2025-06-10T13:23:22.453Z,1749562000.0,48.6,48.6,43.530659,43.530657,43.530663,43.530669,,126.551,...,67.8,Levi,-7.676137,-10.176634,-12.746664,7.7,8.821,126.551,16.521,143.072
75256,2025-06-10T13:33:18.556Z,1749562000.0,60.5,60.5,43.532756,43.532755,43.532761,43.532766,,132.1,...,62.598,Levi,4.947007,8.134794,9.280865,11.6,12.5,132.1,24.1,156.2


In [14]:
t_test(only_gian_first_runs_chub,only_gian_last_runs_levi) #GENERAL
print("\nUpwind and downwind for Gian:")
print_run_stats("Gian on chub VS Gian on levi:", only_gian_first_runs_chub, only_gian_last_runs_levi, target="mast_brand")

T-statistic: -8.918, p-value: 0.000000000000000
The difference is statistically significant, keeping data split.

Upwind and downwind for Gian:

 Gian on chub VS Gian on levi:
Mean mast_brand on the first group : Chub, average SOG: 24.32, std SOG: 2.30
Mean  mast_brand on the second group : Levi, average SOG: 24.70, std SOG: 2.14


In [15]:
only_gian_first_runs_chub_upwind = only_gian_first_runs_chub[only_gian_first_runs_chub["TWA"]>0]
only_gian_last_runs_levi_upwind = only_gian_last_runs_levi[only_gian_last_runs_levi["TWA"]>0]
print("\nUpwind for Gian:")
t_test(only_gian_first_runs_chub_upwind,only_gian_last_runs_levi_upwind) #upwind
print_run_stats("Gian on chub upwind VS Gian on levi upwind:", only_gian_first_runs_chub_upwind, only_gian_last_runs_levi_upwind, target="mast_brand")


Upwind for Gian:
T-statistic: -35.891, p-value: 0.000000000000000
The difference is statistically significant, keeping data split.

 Gian on chub upwind VS Gian on levi upwind:
Mean mast_brand on the first group : Chub, average SOG: 22.42, std SOG: 0.98
Mean  mast_brand on the second group : Levi, average SOG: 23.13, std SOG: 0.55


In [16]:
only_gian_first_runs_chub_downwind = only_gian_first_runs_chub[only_gian_first_runs_chub["TWA"] <= 0]
only_gian_last_runs_levi_downwind = only_gian_last_runs_levi[only_gian_last_runs_levi["TWA"] <= 0]
print("\nDownwind for Gian:")
t_test(only_gian_first_runs_chub_downwind,only_gian_last_runs_levi_downwind) #upwind
print_run_stats("Gian on chub downwind VS Gian on levi downwind:", only_gian_first_runs_chub_downwind, only_gian_last_runs_levi_downwind, target="mast_brand")


Downwind for Gian:
T-statistic: -32.900, p-value: 0.000000000000000
The difference is statistically significant, keeping data split.

 Gian on chub downwind VS Gian on levi downwind:
Mean mast_brand on the first group : Chub, average SOG: 26.65, std SOG: 0.86
Mean  mast_brand on the second group : Levi, average SOG: 27.41, std SOG: 0.58
