# weight_ttest – Summary

This notebook applies **independent-sample t-tests** to assess whether boat weight differences between runs have a statistically significant impact on performance metrics, primarily **SOG** (Speed Over Ground).  
The analysis uses telemetry from `all_data.csv` filtered for **June 9, 2025 runs**.

---

## Inputs
- **Data**: `all_data.csv` containing time-series telemetry across runs.  
- **Helper function**:  
  - `t_test(df1, df2, target="SOG")`: runs a two-sample t-test on the specified column, prints t-statistic, p-value, and interprets significance (`p < 0.05`).

---

## Workflow

### Step 1: Load & filter data
- Restrict dataset to rows with timestamp starting `"2025-06-09"`.  

### Step 2: Define groups of runs
- **First runs (5,6,7)**: Karl holds weights.  
- **Last runs (8–11)**: Gian holds 6 kg.  
- Subsets created: `data_9juin_first_runs`, `data_9juin_last_runs`.

### Step 3: Initial t-test (wind conditions)
- Compare **TWS** (True Wind Speed) between early (5–7) and later (8–11) runs to verify comparable conditions.

### Step 4: Karl heavy vs Karl light
- Subset only Karl’s boat (or SenseBoard when opponent is Gian).  
- Run t-tests on **SOG**:
  - General (all legs).  
  - Upwind only (`TWA > 0`).  
  - Downwind only (`TWA ≤ 0`).  
- For each case: report boat weight, mean SOG, std SOG.

### Step 5: Gian light vs Gian heavy
- Subset only Gian’s boat (or SenseBoard when opponent is Karl).  
- Run t-tests on **SOG**:
  - General.  
  - Upwind only.  
  - Downwind only.  
- For each case: report boat weight, mean SOG, std SOG.

---

## Output
- Printed results for each t-test: **t-statistic**, **p-value**, interpretation of significance.  
- Descriptive stats: mean and std of `SOG`, average `boat_weight` for each group.  
- Group comparisons structured as:  
  - **Karl**: heavy (runs 5–7) vs light (runs 8–11).  
  - **Gian**: light (runs 5–7) vs heavy (runs 8–11).  
  - Separate breakdowns for **upwind** and **downwind**.

---

## Notes
- Significance threshold: `p < 0.05`.  
- If not significant, conclusion is to **combine data**; if significant, keep data split by weight condition.  


In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import scipy.stats as stats

def t_test(df1, df2, target="SOG"):
    t_stat, p_value = stats.ttest_ind(df1[target].dropna(), df2[target].dropna())
    print(f"T-statistic: {t_stat:.3f}, p-value: {p_value:.15f}")
    
    # If p-value is less than 0.05, the difference is statistically significant
    if p_value < 0.05:
        print("The difference is statistically significant, keeping data split.")
    else:
        print("The difference is not statistically significant, keeping data combined.")


In [2]:
df = pd.read_csv("all_data.csv")

In [3]:
data_9juin = df[df["ISODateTimeUTC"].str.startswith("2025-06-09")]

## T test on the TWS between runs 5,6,7 where Karl holds the weights and runs 8,9,10,11 where Gian holds 6kgs

In [4]:
first_runs = ["09_06_Run5","09_06_Run6","09_06_Run7"]
data_9juin_first_runs = data_9juin[data_9juin["run"].isin(first_runs) ]

In [5]:
last_runs = ["09_06_Run8","09_06_Run9","09_06_Run10","09_06_Run11"]
data_9juin_last_runs = data_9juin[data_9juin["run"].isin(last_runs) ]

In [6]:
t_test(data_9juin_first_runs,data_9juin_last_runs, target="TWS")
print(data_9juin_first_runs["TWS"].mean(),data_9juin_last_runs["TWS"].mean())
print(f"Average TWS in Group 1: {data_9juin_first_runs['TWS'].mean()},STD TWS in Group 1: {data_9juin_first_runs['TWS'].std()}")
print(f"Average TWS in Group 2: {data_9juin_last_runs['TWS'].mean()},STD TWS in Group 2: {data_9juin_last_runs['TWS'].std()}")

T-statistic: -20.004, p-value: 0.000000000000000
The difference is statistically significant, keeping data split.
7.2590247128437175 7.541569498486814
Average TWS in Group 1: 7.2590247128437175,STD TWS in Group 1: 0.909090314209303
Average TWS in Group 2: 7.541569498486814,STD TWS in Group 2: 0.7956380127932735


## t test karl heavy vs karl not heavy

In [7]:
only_karl_first_runs_heavy = data_9juin_first_runs[
    (data_9juin_first_runs["boat_name"] == "Karl Maeder") |
    ((data_9juin_first_runs["boat_name"] == "SenseBoard") & 
     (data_9juin_first_runs["opponent_name"] == "Gian Stragiotti"))
]
only_karl_first_runs_heavy.sample(5)

Unnamed: 0,ISODateTimeUTC,SecondsSince1970,Heel_Abs,Heel_Lwd,Lat,LatBow,LatCenter,LatStern,Leg,Line_C,...,interval_duration,mast_brand,gain_forward,gain_lateral,gain_vmg,Line_R2,Line_L2,Line_C2,side_line2,total_line2
44961,2025-06-09T12:46:17.757Z,1749473000.0,63.9,63.9,43.505965,43.505963,43.505969,43.505975,1.0,118.0,...,68.0,Levi,2.583673,2.980761,3.938635,5.8,7.4,118.0,13.2,131.2
47166,2025-06-09T12:54:19.652Z,1749474000.0,58.0,58.0,43.507066,43.507065,43.507071,43.507077,,130.3,...,68.88,Levi,1.177229,-1.056133,-0.029322,6.8,7.1,130.3,13.9,144.2
47290,2025-06-09T12:54:32.064Z,1749474000.0,52.4,52.4,43.505802,43.5058,43.505806,43.505813,,115.8,...,68.88,Levi,1.259265,0.300012,1.043038,6.6,6.813,115.8,13.413,129.213
45270,2025-06-09T12:46:48.652Z,1749473000.0,56.0,56.0,43.502763,43.502761,43.502767,43.502773,1.0,125.5,...,68.0,Levi,13.133963,3.722988,11.680665,5.2,6.0,125.5,11.2,136.7
46193,2025-06-09T12:48:47.360Z,1749473000.0,48.8,48.8,43.503882,43.503884,43.503878,43.503872,1.0,93.6,...,47.292,Levi,1.153059,0.004111,-0.849948,5.5,8.5,93.6,14.0,107.6


In [8]:
only_karl_last_runs_light = data_9juin_last_runs[
    (data_9juin_last_runs["boat_name"] == "Karl Maeder") |
    ((data_9juin_last_runs["boat_name"] == "SenseBoard") & 
     (data_9juin_last_runs["opponent_name"] == "Gian Stragiotti"))
]
only_karl_last_runs_light.sample(5)

Unnamed: 0,ISODateTimeUTC,SecondsSince1970,Heel_Abs,Heel_Lwd,Lat,LatBow,LatCenter,LatStern,Leg,Line_C,...,interval_duration,mast_brand,gain_forward,gain_lateral,gain_vmg,Line_R2,Line_L2,Line_C2,side_line2,total_line2
54174,2025-06-09T13:25:23.960Z,1749476000.0,52.3,52.3,43.503637,43.503635,43.503641,43.503648,1.0,105.7,...,61.205,Levi,9.78447,3.809893,8.89984,6.491,6.7,105.7,13.191,118.891
57486,2025-06-09T13:37:37.052Z,1749476000.0,46.0,46.0,43.504658,43.50466,43.504654,43.504648,,81.3,...,52.596,Levi,-1.716315,-1.310598,0.565554,5.1,7.3,81.3,12.4,93.7
49332,2025-06-09T13:05:52.164Z,1749474000.0,67.3,67.3,43.505793,43.505791,43.505797,43.505803,,122.3,...,61.789,Levi,9.801804,0.950535,6.459807,10.0,10.379,122.3,20.379,142.679
55419,2025-06-09T13:28:22.561Z,1749476000.0,48.6,48.6,43.508165,43.508167,43.508161,43.508155,,116.8,...,49.802,Levi,6.98148,3.033934,-3.551149,6.9,8.7,116.8,15.6,132.4
49309,2025-06-09T13:05:49.852Z,1749474000.0,57.1,57.1,43.506045,43.506043,43.506049,43.506055,,108.4,...,61.789,Levi,7.994504,1.229709,5.616045,9.8,9.8,108.4,19.6,128.0


In [9]:
t_test(only_karl_first_runs_heavy,only_karl_last_runs_light) #general

print("\nUpwind and downwind for Karl:")
print(f"\nWeight of Karl on the first runs: {only_karl_first_runs_heavy['boat_weight'].mean()}, average SOG: {only_karl_first_runs_heavy['SOG'].mean()}, std SOG: {only_karl_first_runs_heavy['SOG'].std()}")
print(f"Weight of Karl on the last runs: {only_karl_last_runs_light['boat_weight'].mean()}, average SOG: {only_karl_last_runs_light['SOG'].mean()}, std SOG: {only_karl_last_runs_light['SOG'].std()}")

T-statistic: -7.879, p-value: 0.000000000000004
The difference is statistically significant, keeping data split.

Upwind and downwind for Karl:

Weight of Karl on the first runs: 106.97500000000001, average SOG: 23.94727146332986, std SOG: 2.042511601997199
Weight of Karl on the last runs: 100.975, average SOG: 24.320151187904965, std SOG: 1.962379480136006


In [10]:
only_karl_first_runs_heavy_upwind = only_karl_first_runs_heavy[only_karl_first_runs_heavy["TWA"]>0]
only_karl_last_runs_light_upwind = only_karl_last_runs_light[only_karl_last_runs_light["TWA"]>0]

t_test(only_karl_first_runs_heavy_upwind,only_karl_last_runs_light_upwind) #upwind

print("\nUpwind for Karl:")
print(f"Weight of Karl on the first runs: {only_karl_first_runs_heavy_upwind['boat_weight'].mean()}, average SOG: {only_karl_first_runs_heavy_upwind['SOG'].mean()}, std SOG: {only_karl_first_runs_heavy_upwind['SOG'].std()}")
print(f"Weight of Karl on the last runs: {only_karl_last_runs_light_upwind['boat_weight'].mean()}, average SOG: {only_karl_last_runs_light_upwind['SOG'].mean()}, std SOG: {only_karl_last_runs_light_upwind['SOG'].std()}")

T-statistic: -1.138, p-value: 0.255372507601836
The difference is not statistically significant, keeping data combined.

Upwind for Karl:
Weight of Karl on the first runs: 106.97499999999997, average SOG: 22.72190383681399, std SOG: 0.5513928546466119
Weight of Karl on the last runs: 100.97500000000001, average SOG: 22.74386574074074, std SOG: 0.725225703905136


In [11]:
only_karl_first_runs_heavy_downwind = only_karl_first_runs_heavy[only_karl_first_runs_heavy["TWA"] <= 0]
only_karl_last_runs_light_downwind = only_karl_last_runs_light[only_karl_last_runs_light["TWA"] <= 0]
t_test(only_karl_first_runs_heavy_downwind, only_karl_last_runs_light_downwind)  # downwind

print("\nDownwind for Karl:")
print(f"Weight of Karl on the first runs: {only_karl_first_runs_heavy_downwind['boat_weight'].mean()}, average SOG: {only_karl_first_runs_heavy_downwind['SOG'].mean()}, std SOG: {only_karl_first_runs_heavy_downwind['SOG'].std()}")
print(f"Weight of Karl on the last runs: {only_karl_last_runs_light_downwind['boat_weight'].mean()}, average SOG: {only_karl_last_runs_light_downwind['SOG'].mean()}, std SOG: {only_karl_last_runs_light_downwind['SOG'].std()}")


T-statistic: 18.894, p-value: 0.000000000000000
The difference is statistically significant, keeping data split.

Downwind for Karl:
Weight of Karl on the first runs: 106.97499999999998, average SOG: 27.031662591687045, std SOG: 0.7814322747955662
Weight of Karl on the last runs: 100.97499999999998, average SOG: 26.32492639842983, std SOG: 0.9483400838098708


## t test Gian heavy vs karl not heavy

In [12]:
only_gian_first_runs_light = data_9juin_first_runs[
    (data_9juin_first_runs["boat_name"] == "Gian Stragiotti") |
    ((data_9juin_first_runs["boat_name"] == "SenseBoard") & 
     (data_9juin_first_runs["opponent_name"] == "Karl Maeder"))
]
only_gian_first_runs_light.sample(5)

Unnamed: 0,ISODateTimeUTC,SecondsSince1970,Heel_Abs,Heel_Lwd,Lat,LatBow,LatCenter,LatStern,Leg,Line_C,...,interval_duration,mast_brand,gain_forward,gain_lateral,gain_vmg,Line_R2,Line_L2,Line_C2,side_line2,total_line2
45738,2025-06-09T12:46:27.360Z,1749473000.0,51.3,51.3,43.504828,43.504826,43.504832,43.504838,1.0,3.65,...,68.0,Levi,6.200108,3.654097,6.943548,3.65,4.1,111.3,7.75,119.05
44572,2025-06-09T12:38:39.559Z,1749473000.0,55.5,55.5,43.502969,43.502968,43.502974,43.50298,1.0,7.2,...,68.713,Levi,-2.362671,-0.114868,-1.656096,7.2,8.4,111.571,15.6,127.171
47885,2025-06-09T12:54:22.656Z,1749474000.0,57.0,57.0,43.506842,43.506841,43.506847,43.506853,1.0,4.6,...,68.88,Levi,0.662209,-0.837792,-0.202208,4.6,8.9,131.043,13.5,144.543
48090,2025-06-09T12:54:43.160Z,1749474000.0,58.1,58.1,43.504779,43.504777,43.504783,43.50479,1.0,3.4,...,68.88,Levi,3.19183,-5.574463,-2.198438,3.4,5.921,112.3,9.321,121.621
45987,2025-06-09T12:46:52.257Z,1749473000.0,47.3,47.3,43.502131,43.502129,43.502136,43.502142,1.0,2.6,...,68.0,Levi,17.078728,3.11613,13.857196,2.6,6.106,121.8,8.706,130.506


In [13]:
only_gian_last_runs_heavy = data_9juin_last_runs[
    (data_9juin_last_runs["boat_name"] == "Gian Stragiotti") |
    ((data_9juin_last_runs["boat_name"] == "SenseBoard") & 
     (data_9juin_last_runs["opponent_name"] == "Karl Maeder"))
]
only_gian_last_runs_heavy.sample(5)

Unnamed: 0,ISODateTimeUTC,SecondsSince1970,Heel_Abs,Heel_Lwd,Lat,LatBow,LatCenter,LatStern,Leg,Line_C,...,interval_duration,mast_brand,gain_forward,gain_lateral,gain_vmg,Line_R2,Line_L2,Line_C2,side_line2,total_line2
54796,2025-06-09T13:25:24.854Z,1749476000.0,49.1,49.1,43.503351,43.503349,43.503355,43.503361,,7.7,...,61.205,Levi,11.165386,3.622768,9.58315,7.7,9.1,118.3,16.8,135.1
52225,2025-06-09T13:16:06.160Z,1749475000.0,47.4,47.4,43.505938,43.505936,43.505942,43.505948,1.0,5.9,...,68.692,Levi,-3.464844,2.040691,-1.346776,5.9,7.4,112.9,13.3,126.2
56820,2025-06-09T13:34:33.956Z,1749476000.0,43.6,43.6,43.506543,43.506541,43.506547,43.506554,1.0,6.36,...,67.09,Levi,2.386351,-4.985006,-2.326895,6.36,8.2,120.501,14.56,135.061
49856,2025-06-09T13:05:42.654Z,1749474000.0,60.4,60.4,43.506638,43.506636,43.506642,43.506649,,10.4,...,61.789,Levi,7.868382,0.504488,4.904599,8.2,10.4,134.1,18.6,152.7
53431,2025-06-09T13:19:06.160Z,1749475000.0,47.1,47.1,43.505457,43.505459,43.505453,43.505447,,7.3,...,53.803,Levi,-10.944547,-1.880004,7.766433,7.3,8.9,98.3,16.2,114.5


In [14]:
t_test(only_gian_first_runs_light,only_gian_last_runs_heavy) #general

print("\nUpwind and downwind for Gian:")
print(f"\nWeight of Gian on the first runs: {only_gian_first_runs_light['boat_weight'].mean()}, average SOG: {only_gian_first_runs_light['SOG'].mean()}, std SOG: {only_gian_first_runs_light['SOG'].std()}")
print(f"Weight of Gian on the last runs: {only_gian_last_runs_heavy['boat_weight'].mean()}, average SOG: {only_gian_last_runs_heavy['SOG'].mean()}, std SOG: {only_gian_last_runs_heavy['SOG'].std()}")

T-statistic: -11.650, p-value: 0.000000000000000
The difference is statistically significant, keeping data split.

Upwind and downwind for Gian:

Weight of Gian on the first runs: 109.08999999999999, average SOG: 24.236319275008714, std SOG: 2.135963842904484
Weight of Gian on the last runs: 115.08999999999999, average SOG: 24.825573344872346, std SOG: 2.1231026500669974


In [15]:
only_gian_first_runs_light_upwind = only_gian_first_runs_light[only_gian_first_runs_light["TWA"]>0]
only_gian_last_runs_heavy_upwind = only_gian_last_runs_heavy[only_gian_last_runs_heavy["TWA"]>0]

t_test(only_gian_first_runs_light_upwind,only_gian_last_runs_heavy_upwind) #upwind

print("\nUpwind for Gian:")
print(f"Weight of Gian on the first runs: {only_gian_first_runs_light_upwind['boat_weight'].mean()}, average SOG: {only_gian_first_runs_light_upwind['SOG'].mean()}, std SOG: {only_gian_first_runs_light_upwind['SOG'].std()}")
print(f"Weight of Gian on the last runs: {only_gian_last_runs_heavy_upwind['boat_weight'].mean()}, average SOG: {only_gian_last_runs_heavy_upwind['SOG'].mean()}, std SOG: {only_gian_last_runs_heavy_upwind['SOG'].std()}")

T-statistic: -8.572, p-value: 0.000000000000000
The difference is statistically significant, keeping data split.

Upwind for Gian:
Weight of Gian on the first runs: 109.08999999999997, average SOG: 22.98097323600973, std SOG: 0.7538625153686097
Weight of Gian on the last runs: 115.09000000000002, average SOG: 23.180486862442038, std SOG: 0.8136933474979365


In [16]:
only_gian_first_runs_light_downwind = only_gian_first_runs_light[only_gian_first_runs_light["TWA"] <= 0]
only_gian_last_runs_heavy_downwind = only_gian_last_runs_heavy[only_gian_last_runs_heavy["TWA"] <= 0]
t_test(only_gian_first_runs_light_downwind, only_gian_last_runs_heavy_downwind)  # downwind

print("\nDownwind for Gian:")
print(f"Weight of Gian on the first runs: {only_gian_first_runs_light_downwind['boat_weight'].mean()}, average SOG: {only_gian_first_runs_light_downwind['SOG'].mean()}, std SOG: {only_gian_first_runs_light_downwind['SOG'].std()}")
print(f"Weight of Gian on the last runs: {only_gian_last_runs_heavy_downwind['boat_weight'].mean()}, average SOG: {only_gian_last_runs_heavy_downwind['SOG'].mean()}, std SOG: {only_gian_last_runs_heavy_downwind['SOG'].std()}")


T-statistic: 10.288, p-value: 0.000000000000000
The difference is statistically significant, keeping data split.

Downwind for Gian:
Weight of Gian on the first runs: 109.08999999999999, average SOG: 27.405528255528253, std SOG: 0.7868109632056288
Weight of Gian on the last runs: 115.08999999999997, average SOG: 26.918731563421826, std SOG: 1.2548178334314914
