# Reach-to-Height Ratio Analysis

**Following up on findings from the initial data exploration...**

In my first analysis it was noticed that reach had a small positive correlation with win rate. But that doesn't really tell us much because taller fighters naturally have longer reach. What we really want to know is: **do fighters with proportionally longer arms (relative to their height) have an advantage?**

This factor is a key reason why Floyd Mayweather is such a good boxer (even if not in MMA), he has super long arms for his body size and can easily play defensive and block punches. 

The idea is that if you have long arms relative to your height, you can hit opponents from further away while staying out of their range. This should give you both an offensive advantage (more reach to land strikes) and a defensive advantage (harder for opponent to hit you).

To test this we will calculate a reach-to-height ratio for each fighter. A ratio of 1.0 would mean your reach equals your height (arm span = height), while anything above 1.0 means you have longer arms relative to your height.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import pearsonr, spearmanr, ttest_ind
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8-whitegrid')


## 1. Load and Prepare Data

Same data loading process as before. I need to load the fighter attributes, calculate the reach/height ratio, and then merge with fight records to get win rates.

In [None]:
fighter_attributes = pd.read_csv('data/fighter_attributes.csv')
fighter_history = pd.read_csv('data/fighter_history.csv')
fighter_stats = pd.read_csv('data/fighter_stats.csv')

print(f"Fighter Attributes: {fighter_attributes.shape[0]} fighters")
print(f"Fighter History: {fighter_history.shape[0]} fight records")
print(f"Fighter Stats: {fighter_stats.shape[0]} stat records")
print("")
print("Data loaded!")

In [None]:
fighter_attributes['reach_height_ratio'] = fighter_attributes['reach'] / fighter_attributes['height']

num_valid = fighter_attributes['reach_height_ratio'].notna().sum()
num_total = len(fighter_attributes)
print(f"Fighters with valid reach/height ratio: {num_valid} out of {num_total}")
print(f"\nReach/Height Ratio Stats:")
fighter_attributes['reach_height_ratio'].describe()

Creating the key variable - reach divided by height. A ratio of 1.0 would mean reach equals height (like if you spread your arms out, your wingspan equals your height). A ratio above 1.0 means you have longer arms relative to your height.

In [None]:
fighter_wins_losses = fighter_history.groupby('fighter_id').agg({
    'fight_result': [
        lambda x: (x == 'W').sum(),
        lambda x: (x == 'L').sum(),
        'count'
    ],
    'fighter_name': 'first'
})

fighter_wins_losses.columns = ['wins', 'losses', 'total_fights', 'fighter_name']
fighter_wins_losses['win_rate'] = fighter_wins_losses['wins'] / fighter_wins_losses['total_fights']
fighter_wins_losses = fighter_wins_losses.reset_index()

print(f"Fighters with records: {len(fighter_wins_losses)}")
print("")
fighter_wins_losses.head()

Need to calculate win/loss records for each fighter. I'll use `groupby` and `agg` like we learned in the pandas lectures.

In [None]:
agg_stats = fighter_stats.groupby('fighter_id').agg({
    'TSL': 'sum',
    'TSA': 'sum',
    'SSL': 'sum',
    'SSA': 'sum',
    'SCBA': 'sum',
    'SCHA': 'sum',
    'SGBA': 'sum',
    'SGHA': 'sum',
    'SDBA': 'sum',
    'SDHA': 'sum',
    'SDLA': 'sum',
    'KD': 'sum',
    'TDL': 'sum',
    'TDA': 'sum'
})

agg_stats = agg_stats.reset_index()

agg_stats['total_strikes_absorbed'] = (agg_stats['SDBA'] + agg_stats['SDHA'] + 
                                       agg_stats['SDLA'] + agg_stats['SCBA'] + 
                                       agg_stats['SCHA'] + agg_stats['SGBA'] + 
                                       agg_stats['SGHA'])
agg_stats['strike_accuracy'] = agg_stats['SSL'] / agg_stats['SSA']
agg_stats['takedown_accuracy'] = agg_stats['TDL'] / agg_stats['TDA']

agg_stats.head()

I also want to grab the detailed fight statistics. For this analysis I specifically want to look at defensive stats (strikes absorbed) to test my hypothesis that longer reach helps with protection.

In [None]:
df = fighter_attributes.merge(fighter_wins_losses, on='fighter_id', how='inner')
df = df.merge(agg_stats, on='fighter_id', how='left')

MIN_FIGHTS = 3
df_analysis = df[(df['reach_height_ratio'].notna()) & (df['total_fights'] >= MIN_FIGHTS)]
df_analysis = df_analysis.copy()

print(f"Fighters for analysis (min {MIN_FIGHTS} fights, valid ratio): {len(df_analysis)}")
print(f"\nColumns available: {list(df_analysis.columns)}")

Now merge everything together. I'll only include fighters with at least 3 fights so we have enough data to make the win rate meaningful.

## 2. Reach-to-Height Ratio Distribution

Let me start by looking at the distribution of this ratio. From the Anscombe's Quartet example in class, I know I need to **always visualize the data** before doing any statistical analysis!

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

ratio_data = df_analysis['reach_height_ratio']
axes[0].hist(ratio_data, bins=40, color='steelblue', edgecolor='white')
mean_val = df_analysis['reach_height_ratio'].mean()
median_val = df_analysis['reach_height_ratio'].median()
axes[0].axvline(mean_val, color='red', linestyle='--', label=f'Mean: {mean_val:.3f}')
axes[0].axvline(median_val, color='green', linestyle='--', label=f'Median: {median_val:.3f}')
axes[0].set_title('Distribution of Reach-to-Height Ratio')
axes[0].set_xlabel('Reach / Height')
axes[0].set_ylabel('Frequency')
axes[0].legend()

df_analysis.boxplot(column='reach_height_ratio', by='gender', ax=axes[1])
axes[1].set_title('Reach/Height Ratio by Gender')
axes[1].set_xlabel('Gender')
axes[1].set_ylabel('Ratio')
plt.suptitle('')

plt.tight_layout()
plt.show()

print(f"\nA ratio of 1.0 means reach equals height.")
print(f"Ratios > 1.0 indicate longer arms relative to height.")

In [None]:
top_weight_classes = df_analysis['weight_class'].value_counts().head(12).index
df_for_plot = df_analysis[df_analysis['weight_class'].isin(top_weight_classes)]

plt.figure(figsize=(14, 6))
sns.boxplot(data=df_for_plot, x='weight_class', y='reach_height_ratio')
plt.axhline(1.0, color='red', linestyle='--', alpha=0.5, label='Ratio = 1.0')
plt.title('Reach/Height Ratio by Weight Class')
plt.xlabel('Weight Class')
plt.ylabel('Ratio')
plt.xticks(rotation=45, ha='right')
plt.legend()
plt.tight_layout()
plt.show()

Distribution looks pretty normal - centered around 1.02-1.03 meaning most fighters have arms slightly longer than their height. Males and females look similar.

Let me check if this varies by weight class.

## 3. Reach/Height Ratio vs Win Rate - The Main Question

Pretty consistent across weight classes, maybe slightly higher for heavier weights. Now the main question - **does this ratio correlate with winning?**

This is basically asking: does having proportionally longer arms give you a fighting advantage?

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].scatter(df_analysis['reach_height_ratio'], df_analysis['win_rate'], alpha=0.3, c='steelblue')

z = np.polyfit(df_analysis['reach_height_ratio'], df_analysis['win_rate'], 1)
p = np.poly1d(z)
x_vals = np.linspace(df_analysis['reach_height_ratio'].min(), df_analysis['reach_height_ratio'].max(), 100)
axes[0].plot(x_vals, p(x_vals), 'r-', linewidth=2, label='Trend')

corr, p_value = pearsonr(df_analysis['reach_height_ratio'], df_analysis['win_rate'])
axes[0].set_title(f'All Fighters\nCorrelation: {corr:.4f} (p={p_value:.4f})')
axes[0].set_xlabel('Reach / Height')
axes[0].set_ylabel('Win Rate')
axes[0].legend()

df_experienced = df_analysis[df_analysis['total_fights'] >= 10]
axes[1].scatter(df_experienced['reach_height_ratio'], df_experienced['win_rate'], alpha=0.4, c='darkgreen')

z2 = np.polyfit(df_experienced['reach_height_ratio'], df_experienced['win_rate'], 1)
p2 = np.poly1d(z2)
x_vals2 = np.linspace(df_experienced['reach_height_ratio'].min(), df_experienced['reach_height_ratio'].max(), 100)
axes[1].plot(x_vals2, p2(x_vals2), 'r-', linewidth=2, label='Trend')

corr2, p_value2 = pearsonr(df_experienced['reach_height_ratio'], df_experienced['win_rate'])
axes[1].set_title(f'Experienced Fighters (10+ fights)\nCorrelation: {corr2:.4f} (p={p_value2:.4f})')
axes[1].set_xlabel('Reach / Height')
axes[1].set_ylabel('Win Rate')
axes[1].legend()

plt.tight_layout()
plt.show()

In [None]:
quintile_labels = ['Very Low', 'Low', 'Medium', 'High', 'Very High']
df_analysis['ratio_quintile'] = pd.qcut(df_analysis['reach_height_ratio'], q=5, labels=quintile_labels)

quintile_summary = df_analysis.groupby('ratio_quintile').agg({
    'win_rate': ['mean', 'std', 'count'],
    'reach_height_ratio': ['min', 'max', 'mean']
})

quintile_summary = quintile_summary.round(4)

print("Win Rate by Reach/Height Ratio Quintile:")
print("="*70)
quintile_summary

Hmm the correlation is pretty weak. Let me try another approach - split fighters into quintiles (5 equal groups) based on their ratio and compare win rates.

I'll use `pd.qcut()` to create equal-sized bins.

In [None]:
fig, ax = plt.subplots(figsize=(10, 6))

means_by_quintile = df_analysis.groupby('ratio_quintile')['win_rate'].mean()
std_by_quintile = df_analysis.groupby('ratio_quintile')['win_rate'].std()
counts_by_quintile = df_analysis.groupby('ratio_quintile')['win_rate'].count()

error_bars = std_by_quintile / np.sqrt(counts_by_quintile)

bars = ax.bar(means_by_quintile.index, means_by_quintile.values, 
              yerr=error_bars.values, capsize=5, color='steelblue', edgecolor='black')

for i, (bar, count) in enumerate(zip(bars, counts_by_quintile.values)):
    x_pos = bar.get_x() + bar.get_width()/2
    y_pos = bar.get_height() + 0.02
    ax.text(x_pos, y_pos, f'n={count}', ha='center', va='bottom', fontsize=10)

ax.set_title('Average Win Rate by Reach/Height Ratio Quintile')
ax.set_xlabel('Reach/Height Category')
ax.set_ylabel('Average Win Rate')
ax.set_ylim(0, 0.7)
overall_mean = df_analysis['win_rate'].mean()
ax.axhline(overall_mean, color='red', linestyle='--', label=f'Overall Mean: {overall_mean:.3f}')
ax.legend()

plt.tight_layout()
plt.show()

In [None]:
very_high_winrate = df_analysis[df_analysis['ratio_quintile'] == 'Very High']['win_rate']
very_low_winrate = df_analysis[df_analysis['ratio_quintile'] == 'Very Low']['win_rate']

t_statistic, t_pvalue = ttest_ind(very_high_winrate, very_low_winrate)

print("T-Test: Very High vs Very Low Ratio Fighters")
print("="*60)
print(f"Very High Ratio Mean: {very_high_winrate.mean():.4f} (n={len(very_high_winrate)})")
print(f"Very Low Ratio Mean: {very_low_winrate.mean():.4f} (n={len(very_low_winrate)})")
print(f"\nT-statistic: {t_statistic:.4f}")
print(f"P-value: {t_pvalue:.4f}")
print(f"\nStatistically significant at 0.05? {'Yes' if t_pvalue < 0.05 else 'No'}")

Interesting - the "Very High" ratio fighters do have a slightly higher win rate than "Very Low". Let me do a proper statistical test to see if this difference is significant.

## 4. Analysis by Stance

Maybe the effect is different for different stances? Like maybe reach matters more for orthodox fighters vs southpaws? Let me break it down by stance.

In [None]:
all_stances = df_analysis['stance'].dropna().unique()
stance_counts = df_analysis.groupby('stance').size()
stances_with_enough_data = stance_counts[stance_counts >= 30].index

fig, axes = plt.subplots(1, len(stances_with_enough_data), figsize=(5*len(stances_with_enough_data), 5))
if len(stances_with_enough_data) == 1:
    axes = [axes]

correlations_by_stance = {}

for i, stance in enumerate(stances_with_enough_data):
    ax = axes[i]
    stance_data = df_analysis[df_analysis['stance'] == stance]
    
    ax.scatter(stance_data['reach_height_ratio'], stance_data['win_rate'], alpha=0.4)
    
    z = np.polyfit(stance_data['reach_height_ratio'], stance_data['win_rate'], 1)
    p = np.poly1d(z)
    x_range = np.linspace(stance_data['reach_height_ratio'].min(), stance_data['reach_height_ratio'].max(), 100)
    ax.plot(x_range, p(x_range), 'r-', linewidth=2)
    
    corr_val, p_val = pearsonr(stance_data['reach_height_ratio'], stance_data['win_rate'])
    correlations_by_stance[stance] = {'corr': corr_val, 'p_value': p_val, 'n': len(stance_data)}
    
    ax.set_title(f'{stance}\nr={corr_val:.3f}, p={p_val:.3f}, n={len(stance_data)}')
    ax.set_xlabel('Reach/Height')
    ax.set_ylabel('Win Rate')

plt.suptitle('Reach/Height Ratio vs Win Rate by Stance', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

In [None]:
stance_results = pd.DataFrame(correlations_by_stance).T
stance_results = stance_results.sort_values('corr', ascending=False)
print("Correlation by Stance:")
print("="*60)
stance_results

## 5. Analysis for Strikers vs Non-Strikers

My hypothesis was that longer reach helps with protection - keeping opponents at a distance. This should matter more for strikers (who fight at range) than for grapplers (who fight up close).

Let me test this by separating strikers from non-strikers based on fighting style.

In [None]:
striking_styles = ['striker', 'muay thai', 'boxing', 'kickboxer', 'karate']
df_analysis['is_striker'] = df_analysis['style'].str.lower().isin(striking_styles)

num_strikers = df_analysis['is_striker'].sum()
num_non_strikers = (~df_analysis['is_striker']).sum()
print(f"Strikers: {num_strikers}")
print(f"Non-strikers: {num_non_strikers}")

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

strikers = df_analysis[df_analysis['is_striker'] == True]
if len(strikers) >= 10:
    axes[0].scatter(strikers['reach_height_ratio'], strikers['win_rate'], alpha=0.5, c='crimson')
    z = np.polyfit(strikers['reach_height_ratio'], strikers['win_rate'], 1)
    p = np.poly1d(z)
    x_range = np.linspace(strikers['reach_height_ratio'].min(), strikers['reach_height_ratio'].max(), 100)
    axes[0].plot(x_range, p(x_range), 'k-', linewidth=2)
    
    corr_strikers, pval_strikers = pearsonr(strikers['reach_height_ratio'], strikers['win_rate'])
    axes[0].set_title(f'STRIKERS\nr={corr_strikers:.3f}, p={pval_strikers:.3f}, n={len(strikers)}')
else:
    axes[0].text(0.5, 0.5, 'Not enough striker data', ha='center', va='center', transform=axes[0].transAxes)
    axes[0].set_title('STRIKERS')
    corr_strikers = 0
    pval_strikers = 1

axes[0].set_xlabel('Reach/Height')
axes[0].set_ylabel('Win Rate')

non_strikers = df_analysis[df_analysis['is_striker'] == False]
axes[1].scatter(non_strikers['reach_height_ratio'], non_strikers['win_rate'], alpha=0.5, c='navy')
z2 = np.polyfit(non_strikers['reach_height_ratio'], non_strikers['win_rate'], 1)
p2 = np.poly1d(z2)
x_range2 = np.linspace(non_strikers['reach_height_ratio'].min(), non_strikers['reach_height_ratio'].max(), 100)
axes[1].plot(x_range2, p2(x_range2), 'k-', linewidth=2)

corr_non_strikers, pval_non_strikers = pearsonr(non_strikers['reach_height_ratio'], non_strikers['win_rate'])
axes[1].set_title(f'NON-STRIKERS\nr={corr_non_strikers:.3f}, p={pval_non_strikers:.3f}, n={len(non_strikers)}')
axes[1].set_xlabel('Reach/Height')
axes[1].set_ylabel('Win Rate')

plt.suptitle('Strikers vs Non-Strikers', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

In [None]:
def calc_style_correlation(group):
    if len(group) >= 10:
        corr, pval = pearsonr(group['reach_height_ratio'], group['win_rate'])
        return pd.Series({
            'n': len(group),
            'mean_ratio': group['reach_height_ratio'].mean(),
            'mean_win_rate': group['win_rate'].mean(),
            'correlation': corr,
            'p_value': pval
        })
    else:
        return pd.Series({
            'n': len(group),
            'mean_ratio': group['reach_height_ratio'].mean(),
            'mean_win_rate': group['win_rate'].mean(),
            'correlation': np.nan,
            'p_value': np.nan
        })

style_results = df_analysis.groupby('style').apply(calc_style_correlation).dropna()
style_results = style_results[style_results['n'] >= 20]
style_results = style_results.sort_values('correlation', ascending=False)

print("Correlation by Fighting Style (min 20 fighters):")
print("="*70)
style_results.round(4)

Let me also look at individual fighting styles to see if there are any where reach ratio matters more.

In [None]:
if len(style_results) > 0:
    plt.figure(figsize=(12, 6))
    bar_colors = []
    for corr_value in style_results['correlation']:
        if corr_value > 0:
            bar_colors.append('green')
        else:
            bar_colors.append('red')
    
    bars = plt.barh(style_results.index, style_results['correlation'], color=bar_colors, alpha=0.7)
    plt.axvline(0, color='black', linestyle='-', linewidth=1)
    plt.title('Correlation: Reach/Height Ratio vs Win Rate by Style')
    plt.xlabel('Correlation')
    plt.ylabel('Style')
    
    for i, row in enumerate(style_results.itertuples()):
        if row.p_value < 0.05:
            x_offset = 0.01 if row.correlation > 0 else -0.01
            alignment = 'left' if row.correlation > 0 else 'right'
            plt.text(row.correlation + x_offset, i, '*', fontsize=14, va='center', ha=alignment)
    
    plt.tight_layout()
    plt.show()
    print("* = significant at p<0.05")

## 6. Defensive Analysis - Strikes Absorbed

Now let me test the core of my hypothesis: **if longer reach provides better protection, fighters with higher reach/height ratios should absorb fewer strikes per fight.**

If this is true, we'd expect a **negative** correlation (more reach = fewer strikes absorbed).

In [None]:
df_analysis['strikes_absorbed_per_fight'] = df_analysis['total_strikes_absorbed'] / df_analysis['total_fights']

df_with_defense = df_analysis[df_analysis['total_strikes_absorbed'] > 0]
df_with_defense = df_with_defense.copy()

print(f"Fighters with defensive stats: {len(df_with_defense)}")

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].scatter(df_with_defense['reach_height_ratio'], df_with_defense['strikes_absorbed_per_fight'], alpha=0.4, c='purple')

z = np.polyfit(df_with_defense['reach_height_ratio'], df_with_defense['strikes_absorbed_per_fight'], 1)
p = np.poly1d(z)
x_range = np.linspace(df_with_defense['reach_height_ratio'].min(), df_with_defense['reach_height_ratio'].max(), 100)
axes[0].plot(x_range, p(x_range), 'r-', linewidth=2)

corr_def, pval_def = pearsonr(df_with_defense['reach_height_ratio'], df_with_defense['strikes_absorbed_per_fight'])
axes[0].set_title(f'Strikes Absorbed per Fight\nr={corr_def:.3f}, p={pval_def:.4f}')
axes[0].set_xlabel('Reach/Height')
axes[0].set_ylabel('Strikes Absorbed')

df_with_defense['head_strikes_absorbed_per_fight'] = df_with_defense['SDHA'] / df_with_defense['total_fights']
axes[1].scatter(df_with_defense['reach_height_ratio'], df_with_defense['head_strikes_absorbed_per_fight'], alpha=0.4, c='darkred')

z2 = np.polyfit(df_with_defense['reach_height_ratio'], df_with_defense['head_strikes_absorbed_per_fight'], 1)
p2 = np.poly1d(z2)
x_range2 = np.linspace(df_with_defense['reach_height_ratio'].min(), df_with_defense['reach_height_ratio'].max(), 100)
axes[1].plot(x_range2, p2(x_range2), 'r-', linewidth=2)

corr_head, pval_head = pearsonr(df_with_defense['reach_height_ratio'], df_with_defense['head_strikes_absorbed_per_fight'])
axes[1].set_title(f'Head Strikes Absorbed per Fight\nr={corr_head:.3f}, p={pval_head:.4f}')
axes[1].set_xlabel('Reach/Height')
axes[1].set_ylabel('Head Strikes')

plt.suptitle('Does Higher Reach/Height = Better Defense?', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print(f"\nIf my hypothesis is right, we'd expect NEGATIVE correlation")
print(f"(more reach = fewer strikes absorbed)")

In [None]:
quintile_labels_defense = ['Very Low', 'Low', 'Medium', 'High', 'Very High']
df_with_defense['ratio_quintile'] = pd.qcut(df_with_defense['reach_height_ratio'], q=5, labels=quintile_labels_defense)

defense_quintile_stats = df_with_defense.groupby('ratio_quintile').agg({
    'strikes_absorbed_per_fight': 'mean',
    'head_strikes_absorbed_per_fight': 'mean',
    'win_rate': 'mean',
    'fighter_id': 'count'
})

defense_quintile_stats = defense_quintile_stats.rename(columns={'fighter_id': 'count'})

print("Defensive Stats by Quintile:")
print("="*70)
defense_quintile_stats.round(3)

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

labels = defense_quintile_stats.index
axes[0].bar(labels, defense_quintile_stats['strikes_absorbed_per_fight'], color='mediumpurple')
axes[0].set_title('Avg Strikes Absorbed per Fight')
axes[0].set_xlabel('Reach/Height Quintile')
axes[0].set_ylabel('Strikes')

axes[1].bar(labels, defense_quintile_stats['win_rate'], color='seagreen')
axes[1].set_title('Avg Win Rate')
axes[1].set_xlabel('Reach/Height Quintile')
axes[1].set_ylabel('Win Rate')
mean_wr = df_with_defense['win_rate'].mean()
axes[1].axhline(mean_wr, color='red', linestyle='--', label='Overall Mean')
axes[1].legend()

plt.tight_layout()
plt.show()

## 7. Offensive Statistics Analysis

What about offense? Maybe fighters with longer reach can also land more strikes because they can hit from further out.

In [None]:
df_analysis['sig_strikes_per_fight'] = df_analysis['SSL'] / df_analysis['total_fights']
df_analysis['knockdowns_per_fight'] = df_analysis['KD'] / df_analysis['total_fights']
df_analysis['takedowns_per_fight'] = df_analysis['TDL'] / df_analysis['total_fights']

df_with_offense = df_analysis[df_analysis['SSL'] > 0]
df_with_offense = df_with_offense.copy()

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

axes[0].scatter(df_with_offense['reach_height_ratio'], df_with_offense['sig_strikes_per_fight'], alpha=0.3, c='crimson')
corr_strikes, p_strikes = pearsonr(df_with_offense['reach_height_ratio'], df_with_offense['sig_strikes_per_fight'])
z = np.polyfit(df_with_offense['reach_height_ratio'], df_with_offense['sig_strikes_per_fight'], 1)
x_fit = np.linspace(df_with_offense['reach_height_ratio'].min(), df_with_offense['reach_height_ratio'].max(), 100)
axes[0].plot(x_fit, np.poly1d(z)(x_fit), 'k-', lw=2)
axes[0].set_title(f'Sig Strikes/Fight\nr={corr_strikes:.3f}, p={p_strikes:.4f}')
axes[0].set_xlabel('Reach/Height')
axes[0].set_ylabel('Strikes')

df_with_kd = df_with_offense[df_with_offense['KD'] > 0]
axes[1].scatter(df_with_kd['reach_height_ratio'], df_with_kd['knockdowns_per_fight'], alpha=0.4, c='darkred')
corr_kd, p_kd = pearsonr(df_with_kd['reach_height_ratio'], df_with_kd['knockdowns_per_fight'])
axes[1].set_title(f'Knockdowns/Fight\nr={corr_kd:.3f}, p={p_kd:.4f}')
axes[1].set_xlabel('Reach/Height')
axes[1].set_ylabel('Knockdowns')

df_with_td = df_with_offense[df_with_offense['TDL'] > 0]
axes[2].scatter(df_with_td['reach_height_ratio'], df_with_td['takedowns_per_fight'], alpha=0.4, c='navy')
corr_td, p_td = pearsonr(df_with_td['reach_height_ratio'], df_with_td['takedowns_per_fight'])
axes[2].set_title(f'Takedowns/Fight\nr={corr_td:.3f}, p={p_td:.4f}')
axes[2].set_xlabel('Reach/Height')
axes[2].set_ylabel('Takedowns')

plt.suptitle('Reach/Height vs Offensive Stats', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

## 8. Correlation Matrix Overview

Let me put together a correlation matrix to see all these relationships at once. From the graphing lectures, heatmaps are great for visualizing correlation matrices with many variables.

In [None]:
columns_for_corr = ['reach_height_ratio', 'win_rate', 'wins', 'losses', 'total_fights',
                    'sig_strikes_per_fight', 'knockdowns_per_fight', 'takedowns_per_fight',
                    'strike_accuracy', 'height', 'weight', 'reach', 'age']

available_cols = [c for c in columns_for_corr if c in df_analysis.columns]

corr_data = df_analysis[available_cols].corr()

plt.figure(figsize=(12, 10))
upper_triangle = np.triu(np.ones_like(corr_data, dtype=bool))
sns.heatmap(corr_data, annot=True, cmap='RdBu_r', center=0, fmt='.2f',
            mask=upper_triangle, square=True)
plt.title('Correlation Matrix')
plt.tight_layout()
plt.show()

In [None]:
correlations_with_ratio = corr_data['reach_height_ratio'].drop('reach_height_ratio')
correlations_with_ratio = correlations_with_ratio.sort_values(ascending=False)

plt.figure(figsize=(10, 6))
bar_colors_list = []
for val in correlations_with_ratio.values:
    if val > 0:
        bar_colors_list.append('green')
    else:
        bar_colors_list.append('red')

plt.barh(correlations_with_ratio.index, correlations_with_ratio.values, color=bar_colors_list, alpha=0.7)
plt.axvline(0, color='black', linestyle='-')
plt.title('What Correlates with Reach/Height Ratio?')
plt.xlabel('Correlation')
plt.ylabel('Variable')
plt.tight_layout()
plt.show()

## 9. Summary and Conclusions

Let me summarize what I found.

In [None]:
print("="*70)
print("REACH-TO-HEIGHT RATIO ANALYSIS: SUMMARY")
print("="*70)

main_corr, main_p = pearsonr(df_analysis['reach_height_ratio'], df_analysis['win_rate'])
print(f"\n1. OVERALL CORRELATION (Ratio vs Win Rate):")
print(f"   r = {main_corr:.4f}, p = {main_p:.4f}")
print(f"   Sample: {len(df_analysis)} fighters")
if abs(main_corr) < 0.1:
    strength = 'Weak'
elif abs(main_corr) < 0.3:
    strength = 'Moderate'
else:
    strength = 'Strong'
direction = 'positive' if main_corr > 0 else 'negative'
print(f"   Interpretation: {strength} {direction} correlation")

print(f"\n2. QUINTILE COMPARISON:")
print(f"   Very Low Ratio Win Rate: {very_low_winrate.mean():.3f}")
print(f"   Very High Ratio Win Rate: {very_high_winrate.mean():.3f}")
difference = very_high_winrate.mean() - very_low_winrate.mean()
print(f"   Difference: {difference:.3f}")
print(f"   T-test p-value: {t_pvalue:.4f}")

if len(strikers) >= 10:
    print(f"\n3. STRIKER ANALYSIS:")
    print(f"   Strikers correlation: {corr_strikers:.4f} (p={pval_strikers:.4f})")
    print(f"   Non-strikers correlation: {corr_non_strikers:.4f} (p={pval_non_strikers:.4f})")

print(f"\n4. DEFENSIVE HYPOTHESIS:")
print(f"   Correlation with strikes absorbed: {corr_def:.4f} (p={pval_def:.4f})")
print(f"   (Negative would support the protection hypothesis)")

print("\n" + "="*70)
print("CONCLUSION:")
print("="*70)

if main_corr > 0.05 and main_p < 0.05:
    conclusion = "SUPPORTED - statistically significant positive correlation found"
elif main_corr > 0 and main_p >= 0.05:
    conclusion = "PARTIALLY SUPPORTED - positive trend but not statistically significant"
elif main_corr < 0 and main_p < 0.05:
    conclusion = "NOT SUPPORTED - significant negative correlation found"
else:
    conclusion = "INCONCLUSIVE - no clear relationship found"

print(conclusion)
print("="*70)

In [None]:
fig, ax = plt.subplots(figsize=(12, 8))

hexbin_plot = ax.hexbin(df_analysis['reach_height_ratio'], df_analysis['win_rate'], 
                         gridsize=30, cmap='YlOrRd', mincnt=1)
colorbar = plt.colorbar(hexbin_plot, label='Fighters')

z = np.polyfit(df_analysis['reach_height_ratio'], df_analysis['win_rate'], 1)
p = np.poly1d(z)
x_line_vals = np.linspace(df_analysis['reach_height_ratio'].min(), df_analysis['reach_height_ratio'].max(), 100)
ax.plot(x_line_vals, p(x_line_vals), 'b-', linewidth=3, label=f'Trend (r={main_corr:.3f})')

ax.set_title('Reach-to-Height Ratio vs Win Rate', fontsize=16)
ax.set_xlabel('Reach / Height')
ax.set_ylabel('Win Rate')
ax.legend(loc='upper right')

text_str = f'n = {len(df_analysis)}\np = {main_p:.4f}'
ax.text(0.02, 0.98, text_str, 
        transform=ax.transAxes, fontsize=10, verticalalignment='top',
        bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

plt.tight_layout()
plt.show()

## Final Thoughts

The reach-to-height ratio does show a slight positive correlation with win rate - fighters with proportionally longer arms do seem to win slightly more often. The effect is statistically significant but pretty weak (r=0.065).

The defensive hypothesis (that longer reach provides better protection) got some support from the negative correlation with strikes absorbed, but again the effect is small.

Overall, reach/height ratio seems to have a small positive effect on fight outcomes, but it's not a strong predictor of success. There are probably other factors that matter much more.

In my initial exploration I also noticed that wrestlers and sambo fighters (especially from Russia) had unusually high win rates. That might be a stronger effect to investigate - I'll explore that in my next analysis.