# Associativity variability analysis — conclusion

Goal:
- measure how variability factors affect associativity failure rate
- identify which factor impacts the most
- recommend stable settings to remove variability


In [5]:
!pip install pandas

Collecting pandas
  Downloading pandas-2.3.3-cp314-cp314-win_amd64.whl.metadata (19 kB)
Collecting numpy>=1.26.0 (from pandas)
  Downloading numpy-2.3.4-cp314-cp314-win_amd64.whl.metadata (60 kB)
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Downloading pandas-2.3.3-cp314-cp314-win_amd64.whl (11.1 MB)
   ---------------------------------------- 0.0/11.1 MB ? eta -:--:--
   --- ------------------------------------ 1.0/11.1 MB 7.7 MB/s eta 0:00:02
   ---------------- ----------------------- 4.7/11.1 MB 13.6 MB/s eta 0:00:01
   ------------------------------------ --- 10.2/11.1 MB 18.8 MB/s eta 0:00:01
   ---------------------------------------- 11.1/11.1 MB 18.7 MB/s  0:00:00
Downloading numpy-2.3.4-cp314-cp314-win_amd64.whl (12.9 MB)
   ---------------------------------------- 0.0/12.9 MB ? eta -:--:--
   ------------ --------------------------- 4.2/12.9 MB 23.1 MB/s eta 0:00:01
   ---------------------------------- ----- 11.3/12.9

In [6]:
import pandas as pd

df = pd.read_csv("results_associativity.csv")
df.head()


Unnamed: 0,repetitions,op1,op2,dtype,dist,seed,result
0,1000,(x + y) + z,x + (y + z),float64,uniform01,0,0.157
1,1000,(x + y) + z,x + (y + z),float64,uniform01,1,0.171
2,1000,(x + y) + z,x + (y + z),float32,wide,0,0.208
3,1000,(x + y) + z,x + (y + z),float32,wide,1,0.218
4,1000,(x + y) + z,x + (y + z),decimal50,uniform01,0,0.0


In [7]:
summary = {}
for col in ["dtype","dist","seed"]:
    summary[col] = df.groupby(col)["result"].mean().sort_values()

summary


{'dtype': dtype
 decimal50    0.000056
 float32      0.103911
 float64      0.133100
 Name: result, dtype: float64,
 'dist': dist
 uniform_signed    0.038378
 uniform01         0.056667
 wide              0.142022
 Name: result, dtype: float64,
 'seed': seed
 0    0.078289
 1    0.079756
 Name: result, dtype: float64}

## Conclusion

From our CSV:

**Most important factors:**
1) **distribution (dist)**  
→ `wide` produces the most failures

2) **dtype**  
→ `decimal50` drastically reduces failures  
→ float types have higher failure rates

`seed` has almost no influence.

### recommended settings for stable result

| factor | recommended |
|---|---|
| dtype | `decimal50` |
| dist  | avoid `wide` — use `uniform_signed` |
| seed  | fix to constant (0 or 1) |

Therefore:

> disagreement between groups is mainly due to ??
