# Russian Grappler Dominance Analysis

**Building on findings from the initial exploration and reach analysis...**

In my first notebook, I noticed that Russia was one of the top countries for UFC fighters and that wrestling/sambo fighters seemed to have higher win rates. In the reach analysis (notebook 2), I didn't find super strong effects for physical attributes like reach/height ratio. This made me think maybe fighting style and background matter more than just physical measurements.

This notebook investigates a specific hypothesis: **Are Russian grapplers (wrestlers, sambo fighters, BJJ) more dominant than other fighters?**

The UFC has had a lot of success from Russian fighters like Khabib Nurmagomedov (retired undefeated), Islam Makhachev (current champion), and others who came through the Russian combat sambo and wrestling system. I want to see if the data backs up the idea that this training background gives them an edge.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr, ttest_ind, chi2_contingency
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8-whitegrid')

print("Libraries loaded successfully!")

## 1. Load and Prepare Data

Same data loading process as the previous notebooks. This time I need to specifically identify Russian fighters and grappling-style fighters.

In [None]:
fighter_attributes = pd.read_csv('data/fighter_attributes.csv')
fighter_history = pd.read_csv('data/fighter_history.csv')
fighter_stats = pd.read_csv('data/fighter_stats.csv')

print(f"Fighter Attributes: {fighter_attributes.shape[0]} fighters")
print(f"Fighter History: {fighter_history.shape[0]} fight records")
print(f"Fighter Stats: {fighter_stats.shape[0]} stat records")
print("")
print("Data loaded!")

In [None]:
print("Countries in data:")
country_counts = fighter_attributes['country'].value_counts()
print(country_counts.head(20))
print("")
print("Fighting styles:")
style_counts = fighter_attributes['style'].value_counts()
print(style_counts)

Let me check what countries and fighting styles we have in the data. Using `value_counts()` to see the distribution.

Now I need to categorize fighters. I'll create boolean flags for whether someone is a grappler and whether they're Russian. This way I can create 4 categories: Russian Grappler, Russian Non-Grappler, Non-Russian Grappler, Non-Russian Non-Grappler.

In [None]:
grappling_styles_list = ['wrestling', 'brazilian jiu-jitsu', 'grappling', 'sambo', 'judo', 'catch wrestling']

fighter_attributes['is_grappler'] = fighter_attributes['style'].str.lower().isin(grappling_styles_list)

fighter_attributes['is_russian'] = fighter_attributes['country'].str.lower() == 'russia'

def categorize_fighter(row):
    if row['is_russian'] and row['is_grappler']:
        return 'Russian Grappler'
    elif row['is_russian'] and not row['is_grappler']:
        return 'Russian Non-Grappler'
    elif not row['is_russian'] and row['is_grappler']:
        return 'Non-Russian Grappler'
    else:
        return 'Non-Russian Non-Grappler'

fighter_attributes['fighter_category'] = fighter_attributes.apply(categorize_fighter, axis=1)

print("Fighter Categories:")
category_counts = fighter_attributes['fighter_category'].value_counts()
print(category_counts)

Next I need win/loss records for each fighter. This is where `groupby` with `agg` really shines - I can use multiple aggregation functions at once. The lambda functions let me count wins and losses by checking for 'W' and 'L' in the fight_result column.

In [None]:
records = fighter_history.groupby('fighter_id').agg({
    'fight_result': [
        lambda x: (x == 'W').sum(),
        lambda x: (x == 'L').sum(),
        'count'
    ],
    'fighter_name': 'first'
})
records.columns = ['wins', 'losses', 'total_fights', 'fighter_name']
records['win_rate'] = records['wins'] / records['total_fights']
records = records.reset_index()

win_methods = fighter_history[fighter_history['fight_result'] == 'W'].groupby('fighter_id')['fight_result_type'].value_counts().unstack(fill_value=0)
win_methods = win_methods.reset_index()

print("Fighters with records:", len(records))
print("")
records.head()

Now aggregate the fighter stats to get grappling-specific metrics - things like takedowns, submissions, ground strikes. Another `groupby` with sum aggregation.

In [None]:
stats_agg = fighter_stats.groupby('fighter_id').agg({
    'TDL': 'sum',
    'TDA': 'sum',
    'TSL': 'sum',
    'TSA': 'sum',
    'SSL': 'sum',
    'SSA': 'sum',
    'KD': 'sum',
    'SGBL': 'sum',
    'SGHL': 'sum',
    'SM': 'sum',
    'AD': 'sum',
    'RV': 'sum',
})

stats_agg = stats_agg.reset_index()

stats_agg['takedown_accuracy'] = stats_agg['TDL'] / stats_agg['TDA']
stats_agg['ground_strikes'] = stats_agg['SGBL'] + stats_agg['SGHL']

stats_agg.head()

Time to merge everything together. This is like doing JOINs in SQL - we learned about merge in the pandas lectures. I'm using `how='inner'` for the first merge so I only keep fighters that appear in both datasets.

In [None]:
df = fighter_attributes.merge(records, on='fighter_id', how='inner')
df = df.merge(stats_agg, on='fighter_id', how='left')
df = df.merge(win_methods, on='fighter_id', how='left')

win_type_columns = [col for col in df.columns if col in ['DEC-UNA', 'DEC-SPL', 'DEC-MAJ', 'KO-TKO', 'SUBMISSION', 'DQ']]
for col in win_type_columns:
    df[col] = df[col].fillna(0)

MIN_FIGHTS = 3
df_analysis = df[df['total_fights'] >= MIN_FIGHTS]
df_analysis = df_analysis.copy()

df_analysis['takedowns_per_fight'] = df_analysis['TDL'] / df_analysis['total_fights']
df_analysis['ground_strikes_per_fight'] = df_analysis['ground_strikes'] / df_analysis['total_fights']
df_analysis['submissions_per_fight'] = df_analysis['SM'] / df_analysis['total_fights']

print("Fighters for analysis (min", MIN_FIGHTS, "fights):", len(df_analysis))
print("")
print("Fighter Categories in analysis:")
print(df_analysis['fighter_category'].value_counts())

## 2. Russian Fighters Overview

Before comparing groups, let me look at just the Russian fighters to understand the sample better. How many are grapplers vs strikers?

In [None]:
russians = df_analysis[df_analysis['is_russian'] == True]

print("Total Russian fighters (min", MIN_FIGHTS, "fights):", len(russians))
print("")
print("Russian fighters by style:")
russian_styles = russians['style'].value_counts()
print(russian_styles)
print("")
num_grappler = russians['is_grappler'].sum()
num_non_grappler = (~russians['is_grappler']).sum()
print("Russian grapplers:", num_grappler)
print("Russian non-grapplers:", num_non_grappler)

Let me see who the top Russian fighters are:

In [None]:
top_russians_by_wins = russians.nlargest(15, 'wins')
cols_to_show = ['name', 'style', 'wins', 'losses', 'win_rate', 'total_fights']
print("Top 15 Russian Fighters by Wins:")
print("=" * 60)
top_russians_by_wins[cols_to_show]

Pie charts to show the style breakdown. I know some people say bar charts are better for comparison, but pie charts are good for showing parts of a whole.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

styles = russians['style'].value_counts()
num_styles = len(styles)
color_palette = plt.cm.Set3(np.linspace(0, 1, num_styles))
axes[0].pie(styles.values, labels=styles.index, autopct='%1.1f%%', colors=color_palette)
axes[0].set_title('Russian Fighters by Style')

grappler_or_not = russians['is_grappler'].value_counts()
labels_list = ['Non-Grappler', 'Grappler']
axes[1].pie(grappler_or_not.values, labels=labels_list, autopct='%1.1f%%', 
            colors=['#e74c3c', '#27ae60'], explode=[0, 0.05])
axes[1].set_title('Russian Fighters: Grapplers vs Non-Grapplers')

plt.tight_layout()
plt.show()

## 3. Win Rate Comparison: Russian Grapplers vs Everyone Else

This is the main question - do Russian grapplers actually have higher win rates? I'll compare all four categories using `groupby` with multiple aggregations.

In [None]:
stats_by_category = df_analysis.groupby('fighter_category').agg({
    'win_rate': ['mean', 'std', 'count'],
    'wins': 'sum',
    'losses': 'sum',
    'total_fights': 'sum'
})

stats_by_category = stats_by_category.round(4)
stats_by_category.columns = ['avg_win_rate', 'std_win_rate', 'n_fighters', 'total_wins', 'total_losses', 'total_fights']
stats_by_category['overall_win_rate'] = stats_by_category['total_wins'] / stats_by_category['total_fights']

print("Win Rate Statistics by Fighter Category:")
print("=" * 60)
stats_by_category

Bar chart with error bars to visualize this. The error bars show standard error which helps us see the uncertainty. I'll add a horizontal line for the overall mean to make it easy to see who's above/below average.

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

cat_names = stats_by_category.index.tolist()
avg_wr = stats_by_category['avg_win_rate'].values
std_wr = stats_by_category['std_win_rate'].values
n_fighters = stats_by_category['n_fighters'].values

se = std_wr / np.sqrt(n_fighters)

bar_colors = []
for cat in cat_names:
    if 'Russian Grappler' in cat:
        bar_colors.append('#c0392b')
    elif 'Russian' in cat:
        bar_colors.append('#3498db')
    else:
        bar_colors.append('#95a5a6')

bars = ax.bar(cat_names, avg_wr, yerr=se, capsize=5, color=bar_colors, edgecolor='black', alpha=0.8)

for bar, n, rate in zip(bars, n_fighters, avg_wr):
    x = bar.get_x() + bar.get_width()/2
    y = bar.get_height() + 0.02
    ax.text(x, y, f'n={n}\n{rate:.1%}', ha='center', va='bottom', fontsize=10)

ax.set_title('Average Win Rate by Fighter Category')
ax.set_xlabel('Fighter Category')
ax.set_ylabel('Average Win Rate')
ax.set_ylim(0, 0.75)
mean_all = df_analysis['win_rate'].mean()
ax.axhline(mean_all, color='red', linestyle='--', 
           label='Overall Mean: ' + str(round(mean_all*100, 1)) + '%')
ax.legend()
plt.xticks(rotation=15, ha='right')
plt.tight_layout()
plt.show()

Now for statistical testing. I'll use t-tests from scipy to check if the differences are statistically significant. Just like we did in the reach analysis.

In [None]:
russian_grapplers_wr = df_analysis[df_analysis['fighter_category'] == 'Russian Grappler']['win_rate']
non_russian_grapplers_wr = df_analysis[df_analysis['fighter_category'] == 'Non-Russian Grappler']['win_rate']
all_others_wr = df_analysis[df_analysis['fighter_category'] != 'Russian Grappler']['win_rate']

t_stat_1, p_val_1 = ttest_ind(russian_grapplers_wr, all_others_wr)

t_stat_2, p_val_2 = ttest_ind(russian_grapplers_wr, non_russian_grapplers_wr)

print("STATISTICAL SIGNIFICANCE TESTS")
print("=" * 60)
print()
print("1. Russian Grapplers vs ALL Others:")
print("   Russian Grapplers Mean:", round(russian_grapplers_wr.mean(), 4), "(n=" + str(len(russian_grapplers_wr)) + ")")
print("   All Others Mean:", round(all_others_wr.mean(), 4), "(n=" + str(len(all_others_wr)) + ")")
diff1 = russian_grapplers_wr.mean() - all_others_wr.mean()
print("   Difference:", round(diff1, 4))
print("   T-statistic:", round(t_stat_1, 4))
print("   P-value:", round(p_val_1, 4))
if p_val_1 < 0.05:
    print("   Significant at alpha=0.05? YES")
else:
    print("   Significant at alpha=0.05? NO")

print()
print("2. Russian Grapplers vs Non-Russian Grapplers:")
print("   Russian Grapplers Mean:", round(russian_grapplers_wr.mean(), 4), "(n=" + str(len(russian_grapplers_wr)) + ")")
print("   Non-Russian Grapplers Mean:", round(non_russian_grapplers_wr.mean(), 4), "(n=" + str(len(non_russian_grapplers_wr)) + ")")
diff2 = russian_grapplers_wr.mean() - non_russian_grapplers_wr.mean()
print("   Difference:", round(diff2, 4))
print("   T-statistic:", round(t_stat_2, 4))
print("   P-value:", round(p_val_2, 4))
if p_val_2 < 0.05:
    print("   Significant at alpha=0.05? YES")
else:
    print("   Significant at alpha=0.05? NO")

## 4. How Do Russian Grapplers Win?

Beyond just win rate, I want to know HOW they win. Are they submitting people? Going to decision? Knockouts? This could tell us something about their fighting style and approach.

In [None]:
win_type_cols_list = ['DEC-UNA', 'DEC-SPL', 'DEC-MAJ', 'KO-TKO', 'SUBMISSION']
cols_available = [col for col in win_type_cols_list if col in df_analysis.columns]

win_types_by_cat = df_analysis.groupby('fighter_category')[cols_available].sum()
win_types_by_cat['Total_Wins'] = win_types_by_cat.sum(axis=1)

win_type_percentages = win_types_by_cat[cols_available].div(win_types_by_cat['Total_Wins'], axis=0) * 100

print("Win Type Distribution by Category (%)")
print("=" * 60)
win_type_percentages.round(1)

Stacked bar chart to visualize win types. Makes it easy to compare the composition across categories.

In [None]:
fig, ax = plt.subplots(figsize=(12, 7))

cat_list = win_type_percentages.index.tolist()
x_pos = np.arange(len(cat_list))
bar_width = 0.6

color_map = {'SUBMISSION': '#27ae60', 'KO-TKO': '#e74c3c', 'DEC-UNA': '#3498db', 
             'DEC-SPL': '#9b59b6', 'DEC-MAJ': '#f39c12'}

bottom_vals = np.zeros(len(cat_list))
for col in cols_available:
    vals = win_type_percentages[col].values
    color = color_map.get(col, 'gray')
    ax.bar(x_pos, vals, bar_width, label=col, bottom=bottom_vals, color=color)
    bottom_vals += vals

ax.set_xlabel('Fighter Category')
ax.set_ylabel('Percentage of Wins')
ax.set_title('How Different Fighter Categories Win Their Fights')
ax.set_xticks(x_pos)
ax.set_xticklabels(cat_list, rotation=15, ha='right')
ax.legend(loc='upper right')
ax.set_ylim(0, 105)

plt.tight_layout()
plt.show()

Submission rate specifically - what percent of wins are by submission. This should be higher for grapplers since that's their specialty.

In [None]:
if 'SUBMISSION' in df_analysis.columns:
    df_analysis['submission_rate'] = df_analysis['SUBMISSION'] / df_analysis['wins']
    df_analysis['submission_rate'] = df_analysis['submission_rate'].fillna(0)
    
    sub_rates = df_analysis.groupby('fighter_category')['submission_rate'].agg(['mean', 'std', 'count'])
    print("Submission Rate (% of wins by submission):")
    print("=" * 50)
    print(sub_rates.round(4))

## 5. Grappling Statistics Comparison

Now let's look at actual grappling metrics - takedowns, ground control, submission attempts. If Russian grapplers are really better, they should show it in these stats.

In [None]:
grappling_metrics = df_analysis.groupby('fighter_category').agg({
    'takedowns_per_fight': 'mean',
    'takedown_accuracy': 'mean',
    'ground_strikes_per_fight': 'mean',
    'submissions_per_fight': 'mean',
    'TDL': 'sum',
    'TDA': 'sum'
})

grappling_metrics = grappling_metrics.round(3)
grappling_metrics['overall_td_accuracy'] = grappling_metrics['TDL'] / grappling_metrics['TDA']

print("Grappling Statistics by Fighter Category:")
print("=" * 60)
grappling_metrics

Four subplots to show different grappling metrics. Subplots are useful when you want to compare multiple related things at once.

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

all_categories = df_analysis['fighter_category'].unique()
category_order = ['Russian Grappler', 'Non-Russian Grappler', 'Russian Non-Grappler', 'Non-Russian Non-Grappler']
category_order = [c for c in category_order if c in all_categories]

bar_colors_list = []
for cat in category_order:
    if 'Russian Grappler' in cat:
        bar_colors_list.append('#c0392b')
    elif 'Russian' in cat:
        bar_colors_list.append('#3498db')
    else:
        bar_colors_list.append('#95a5a6')

td_pf = df_analysis.groupby('fighter_category')['takedowns_per_fight'].mean().reindex(category_order)
axes[0, 0].bar(range(len(category_order)), td_pf.values, color=bar_colors_list)
axes[0, 0].set_xticks(range(len(category_order)))
axes[0, 0].set_xticklabels(category_order, rotation=20, ha='right', fontsize=9)
axes[0, 0].set_title('Takedowns per Fight')
axes[0, 0].set_ylabel('Takedowns')

td_acc = df_analysis.groupby('fighter_category')['takedown_accuracy'].mean().reindex(category_order)
axes[0, 1].bar(range(len(category_order)), td_acc.values, color=bar_colors_list)
axes[0, 1].set_xticks(range(len(category_order)))
axes[0, 1].set_xticklabels(category_order, rotation=20, ha='right', fontsize=9)
axes[0, 1].set_title('Takedown Accuracy')
axes[0, 1].set_ylabel('Accuracy')

gs_pf = df_analysis.groupby('fighter_category')['ground_strikes_per_fight'].mean().reindex(category_order)
axes[1, 0].bar(range(len(category_order)), gs_pf.values, color=bar_colors_list)
axes[1, 0].set_xticks(range(len(category_order)))
axes[1, 0].set_xticklabels(category_order, rotation=20, ha='right', fontsize=9)
axes[1, 0].set_title('Ground Strikes per Fight')
axes[1, 0].set_ylabel('Ground Strikes')

subs_pf = df_analysis.groupby('fighter_category')['submissions_per_fight'].mean().reindex(category_order)
axes[1, 1].bar(range(len(category_order)), subs_pf.values, color=bar_colors_list)
axes[1, 1].set_xticks(range(len(category_order)))
axes[1, 1].set_xticklabels(category_order, rotation=20, ha='right', fontsize=9)
axes[1, 1].set_title('Submission Attempts per Fight')
axes[1, 1].set_ylabel('Submissions')

plt.suptitle('Grappling Statistics by Fighter Category', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

## 6. Country Comparison: Grapplers by Nation

Let me zoom out and compare grapplers from different countries, not just Russia. Maybe there are other countries with dominant grapplers too like Brazil or USA.

In [None]:
only_grapplers = df_analysis[df_analysis['is_grappler'] == True]
only_grapplers = only_grapplers.copy()

country_fighter_counts = only_grapplers['country'].value_counts()
countries_with_enough = country_fighter_counts[country_fighter_counts >= 5].index.tolist()

grapplers_filtered = only_grapplers[only_grapplers['country'].isin(countries_with_enough)]
grapplers_by_nation = grapplers_filtered.groupby('country').agg({
    'win_rate': 'mean',
    'fighter_id': 'count',
    'wins': 'sum',
    'losses': 'sum',
    'takedowns_per_fight': 'mean'
})

grapplers_by_nation = grapplers_by_nation.rename(columns={'fighter_id': 'n_fighters'})
grapplers_by_nation['overall_record'] = grapplers_by_nation.apply(
    lambda x: str(int(x['wins'])) + "-" + str(int(x['losses'])), axis=1)
grapplers_by_nation = grapplers_by_nation.sort_values('win_rate', ascending=False)

print("Grapplers by Country (min 5 fighters):")
print("=" * 60)
grapplers_by_nation.round(3)

Horizontal bar chart works well when you have many categories. I'll highlight Russia in red.

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

nation_names = grapplers_by_nation.index.tolist()
nation_win_rates = grapplers_by_nation['win_rate'].values
nation_counts = grapplers_by_nation['n_fighters'].values

bar_colors_nations = []
for nation in nation_names:
    if nation == 'russia':
        bar_colors_nations.append('#c0392b')
    else:
        bar_colors_nations.append('#3498db')

bars = ax.barh(nation_names, nation_win_rates, color=bar_colors_nations, alpha=0.8)

for bar, count in zip(bars, nation_counts):
    width = bar.get_width()
    y_pos = bar.get_y() + bar.get_height()/2
    ax.text(width + 0.01, y_pos, 'n=' + str(count), ha='left', va='center', fontsize=9)

ax.set_xlabel('Average Win Rate')
ax.set_ylabel('Country')
ax.set_title('Average Win Rate of Grapplers by Country')
grappler_mean = only_grapplers['win_rate'].mean()
ax.axvline(grappler_mean, color='red', linestyle='--', 
           label='Overall Grappler Mean: ' + str(round(grappler_mean*100, 1)) + '%')
ax.legend()
ax.set_xlim(0, 0.85)

plt.tight_layout()
plt.show()

## 7. Win Rate Distribution Comparison

Looking at just the mean can be misleading - I need to see the full distribution. Histograms and boxplots from the graphing lectures will help here.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

n_rg = len(russian_grapplers_wr)
n_nrg = len(non_russian_grapplers_wr)
axes[0].hist(russian_grapplers_wr, bins=15, alpha=0.7, label='Russian Grapplers (n=' + str(n_rg) + ')', color='#c0392b')
axes[0].hist(non_russian_grapplers_wr, bins=15, alpha=0.5, label='Non-Russian Grapplers (n=' + str(n_nrg) + ')', color='#3498db')
axes[0].axvline(russian_grapplers_wr.mean(), color='#c0392b', linestyle='--', linewidth=2)
axes[0].axvline(non_russian_grapplers_wr.mean(), color='#3498db', linestyle='--', linewidth=2)
axes[0].set_xlabel('Win Rate')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Win Rate Distribution: Russian vs Non-Russian Grapplers')
axes[0].legend()

grapplers_only = df_analysis[df_analysis['is_grappler'] == True]
grapplers_only = grapplers_only.copy()
grapplers_only['nation_type'] = grapplers_only['is_russian'].map({True: 'Russian', False: 'Non-Russian'})
grapplers_only.boxplot(column='win_rate', by='nation_type', ax=axes[1])
axes[1].set_xlabel('Nation')
axes[1].set_ylabel('Win Rate')
axes[1].set_title('Win Rate Distribution (Grapplers Only)')
plt.suptitle('')

plt.tight_layout()
plt.show()

## 8. Elite Fighter Analysis (High Win Rate)

Who are the "elite" fighters? Let me define elite as 70%+ win rate with at least 5 fights. Then see which category produces the most elite fighters.

In [None]:
MIN_FIGHTS_ELITE = 5
ELITE_WIN_RATE = 0.7

elite_fighters = df_analysis[(df_analysis['total_fights'] >= MIN_FIGHTS_ELITE) & 
                              (df_analysis['win_rate'] >= ELITE_WIN_RATE)]
elite_fighters = elite_fighters.copy()

print("Elite Fighters (>= 70% win rate, >= 5 fights):", len(elite_fighters))
print("")
print("Elite fighters by category:")
elite_cat_counts = elite_fighters['fighter_category'].value_counts()
print(elite_cat_counts)

total_cat_counts = df_analysis[df_analysis['total_fights'] >= MIN_FIGHTS_ELITE]['fighter_category'].value_counts()
elite_percentages = (elite_cat_counts / total_cat_counts * 100)
elite_percentages = elite_percentages.round(1)
print("")
print("Percentage of category that is elite:")
print(elite_percentages)

Bar chart to visualize elite percentages:

In [None]:
fig, ax = plt.subplots(figsize=(10, 6))

cat_names_elite = elite_percentages.index.tolist()
pct_values = elite_percentages.values

elite_bar_colors = []
for cat in cat_names_elite:
    if 'Russian Grappler' in cat:
        elite_bar_colors.append('#c0392b')
    elif 'Russian' in cat:
        elite_bar_colors.append('#3498db')
    else:
        elite_bar_colors.append('#95a5a6')

bars = ax.bar(cat_names_elite, pct_values, color=elite_bar_colors, edgecolor='black', alpha=0.8)

for bar, pct in zip(bars, pct_values):
    x_pos = bar.get_x() + bar.get_width()/2
    y_pos = bar.get_height() + 0.5
    ax.text(x_pos, y_pos, str(pct) + '%', ha='center', va='bottom', fontsize=11)

ax.set_xlabel('Fighter Category')
ax.set_ylabel('% Elite (>= 70% win rate)')
ax.set_title('Percentage of Elite Fighters by Category\n(>= 5 fights, >= 70% win rate)')
plt.xticks(rotation=15, ha='right')
plt.tight_layout()
plt.show()

Who are these elite Russian grapplers?

In [None]:
elite_rg = elite_fighters[elite_fighters['fighter_category'] == 'Russian Grappler']
elite_rg_sorted = elite_rg.sort_values('wins', ascending=False)

if len(elite_rg_sorted) > 0:
    print("Elite Russian Grapplers:")
    print("=" * 60)
    cols_display = ['name', 'style', 'wins', 'losses', 'win_rate', 'total_fights']
    print(elite_rg_sorted[cols_display].head(15).to_string())

## 9. Time Trend Analysis: Russian Dominance Over Time

Is Russian grappler dominance a recent thing or has it always been there? I'll use the `.dt` accessor to extract year from dates like we learned in the pandas lectures.

In [None]:
fighter_history['event_date'] = pd.to_datetime(fighter_history['event_date'])
fighter_history['year'] = fighter_history['event_date'].dt.year

fights_with_categories = fighter_history.merge(
    fighter_attributes[['fighter_id', 'fighter_category', 'is_russian', 'is_grappler']], 
    on='fighter_id', how='left'
)

yearly_data = fights_with_categories.groupby(['year', 'fighter_category']).agg({
    'fight_result': [lambda x: (x == 'W').sum(), 'count']
})
yearly_data.columns = ['wins', 'total_fights']
yearly_data['win_rate'] = yearly_data['wins'] / yearly_data['total_fights']
yearly_data = yearly_data.reset_index()

yearly_data = yearly_data[yearly_data['total_fights'] >= 10]

yearly_data.head(10)

Line chart to show win rates over time. Lines make it easy to see trends and patterns over years.

In [None]:
fig, ax = plt.subplots(figsize=(14, 6))

categories_to_plot = ['Russian Grappler', 'Non-Russian Grappler']
for cat in categories_to_plot:
    cat_yearly = yearly_data[yearly_data['fighter_category'] == cat]
    if len(cat_yearly) > 0:
        if cat == 'Russian Grappler':
            color = '#c0392b'
        else:
            color = '#3498db'
        ax.plot(cat_yearly['year'], cat_yearly['win_rate'], 'o-', label=cat, 
                color=color, linewidth=2, markersize=6)

ax.set_xlabel('Year')
ax.set_ylabel('Win Rate')
ax.set_title('Win Rate Over Time: Russian vs Non-Russian Grapplers')
ax.legend()
ax.set_ylim(0, 1)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 10. Summary and Conclusions

Let me put together all the findings from this analysis into one summary.

In [None]:
print("=" * 60)
print("RUSSIAN GRAPPLER DOMINANCE ANALYSIS: SUMMARY")
print("=" * 60)

print("")
print("SAMPLE SIZE:")
n_rg = len(df_analysis[df_analysis['fighter_category'] == 'Russian Grappler'])
n_nrg = len(df_analysis[df_analysis['fighter_category'] == 'Non-Russian Grappler'])
print("   Total fighters analyzed:", len(df_analysis))
print("   Russian Grapplers:", n_rg)
print("   Non-Russian Grapplers:", n_nrg)

print("")
print("WIN RATE COMPARISON:")
rg_mean = russian_grapplers_wr.mean()
nrg_mean = non_russian_grapplers_wr.mean()
ao_mean = all_others_wr.mean()
print("   Russian Grapplers:", str(round(rg_mean*100, 1)) + "%")
print("   Non-Russian Grapplers:", str(round(nrg_mean*100, 1)) + "%")
print("   All Others:", str(round(ao_mean*100, 1)) + "%")
advantage = (rg_mean - nrg_mean)*100
print("   Advantage over non-Russian grapplers: +" + str(round(advantage, 1)) + "%")

print("")
print("STATISTICAL SIGNIFICANCE:")
print("   Russian Grapplers vs All Others:")
if p_val_1 < 0.05:
    print("      P-value:", round(p_val_1, 4), "- SIGNIFICANT")
else:
    print("      P-value:", round(p_val_1, 4), "- Not significant")
print("   Russian Grapplers vs Non-Russian Grapplers:")
if p_val_2 < 0.05:
    print("      P-value:", round(p_val_2, 4), "- SIGNIFICANT")
else:
    print("      P-value:", round(p_val_2, 4), "- Not significant")

print("")
print("GRAPPLING STATS:")
rg_data = df_analysis[df_analysis['fighter_category'] == 'Russian Grappler']
nrg_data = df_analysis[df_analysis['fighter_category'] == 'Non-Russian Grappler']
print("   Russian Grapplers - Takedowns/fight:", round(rg_data['takedowns_per_fight'].mean(), 2))
print("   Non-Russian Grapplers - Takedowns/fight:", round(nrg_data['takedowns_per_fight'].mean(), 2))

print("")
print("ELITE FIGHTERS (>= 70% win rate, >= 5 fights):")
if 'Russian Grappler' in elite_percentages.index:
    rg_elite_pct = elite_percentages.get('Russian Grappler', 0)
    print("   % of Russian Grapplers who are elite:", str(rg_elite_pct) + "%")
if 'Non-Russian Grappler' in elite_percentages.index:
    nrg_elite_pct = elite_percentages.get('Non-Russian Grappler', 0)
    print("   % of Non-Russian Grapplers who are elite:", str(nrg_elite_pct) + "%")

print("")
print("=" * 60)
print("CONCLUSION:")
print("=" * 60)
if p_val_2 < 0.05 and rg_mean > nrg_mean:
    print("The data SUPPORTS the hypothesis that Russian grapplers are")
    print("significantly more dominant than grapplers from other nations.")
elif rg_mean > nrg_mean:
    print("Russian grapplers show higher win rates, but the difference")
    print("is not statistically significant with current sample size.")
else:
    print("The data does not support Russian grappler dominance.")
print("=" * 60)

One final scatter plot to show all the data together - takedowns vs win rate, colored by category.

In [None]:
fig, ax = plt.subplots(figsize=(12, 8))

categories_and_styles = [
    ('Russian Grappler', '#c0392b', 'o'), 
    ('Non-Russian Grappler', '#3498db', 's'),
    ('Russian Non-Grappler', '#f39c12', '^'),
    ('Non-Russian Non-Grappler', '#95a5a6', 'x')
]

for cat, color, marker in categories_and_styles:
    cat_subset = df_analysis[df_analysis['fighter_category'] == cat]
    ax.scatter(cat_subset['takedowns_per_fight'], cat_subset['win_rate'], 
               c=color, marker=marker, label=cat + ' (n=' + str(len(cat_subset)) + ')', 
               alpha=0.6, s=50)

ax.set_xlabel('Takedowns per Fight')
ax.set_ylabel('Win Rate')
ax.set_title('Fighter Performance: Takedowns vs Win Rate by Category')
ax.legend(loc='upper right', fontsize=9)
upper_limit = df_analysis['takedowns_per_fight'].quantile(0.98)
ax.set_xlim(-0.5, upper_limit)
ax.set_ylim(0, 1.05)

plt.tight_layout()
plt.show()

---

## Final Thoughts

This analysis came from the initial data exploration in notebook 1, where I noticed Russian fighters and grapplers seemed to have good records. After the reach analysis in notebook 2 didn't show strong results for physical attributes, I wanted to dig into whether fighting style and nationality matter more.

The analysis used a lot of techniques from class:
- **groupby** with multiple aggregations (mean, std, count, sum) - this is really powerful for comparing groups
- **merge** to combine dataframes on common keys (like JOIN in SQL)
- Statistical testing with t-tests to check if differences are significant
- Various visualizations: bar charts, pie charts, histograms, boxplots, line charts, scatter plots, stacked bar charts
- Boolean indexing to filter data
- The `.dt` accessor for working with datetime data
- Creating derived columns and categorical variables

The results show Russian grapplers do have statistically significantly higher win rates than other fighters, which is interesting. Whether this is due to the sambo/wrestling training system or other factors would need more investigation. The sample size for Russian grapplers is small (n=18) so we should be careful about over-interpreting, but the effect is there.

This was good practice working with real data that's messy and requires thoughtful categorization and analysis.