Before we discovered that usually good wrestlers win slighlty more fights. Now let's see if we can discover something similar for jiu jitsu.

In [6]:
import pandas as pd

In [7]:
data = pd.read_csv('../../data/ufc-master.csv')

As with wrestling there is no feature in the data that says explicitly that this person is good at jiu jitsu, but we can use the number of submission victories and submission attempts to make some kinds of estimations about fighters jiu jitsu abilities.
First let's use number of submission victories. Because fighters may have participated in a different number of fights we can not simply look at the number of submission victories, but we have to look it as a proportion out of their all victories.

In [8]:
data = data[['B_fighter', 'R_fighter', 'B_win_by_Submission', 'R_win_by_Submission', 'B_wins', 'R_wins', 'Winner']]

In [9]:
#To find out the submission win percentage we need to look fights where both fighters have had at least one fight
no_debut = (data['B_wins'] > 0) & (data['R_wins'] > 0)
data = data[no_debut]
B_subs = (data['B_win_by_Submission']) / (data['B_wins'])
R_subs = (data['R_win_by_Submission']) / (data['R_wins'])
data['B_subs'] = B_subs
data['R_subs'] = R_subs
#Now we eliminate fights where both had equal submission percentages
noeq = data['B_subs'] != data['R_subs']
data = data[noeq]

In [10]:
wins = 0
for i in range(len(data)):
    if (data['B_subs'].iloc[i] > data['R_subs'].iloc[i]) & (data['Winner'].iloc[i] == 'Blue'):
        wins += 1
    elif (data['R_subs'].iloc[i] > data['B_subs'].iloc[i]) & (data['Winner'].iloc[i] == 'Red'):
        wins += 1
p = wins / len(data)
print("Fighter who averages more submission victories than the opponent wins " + str(round(p * 100, 2)) + "% of the time.")
print("Sample size: " + str(len(data)))

Fighter who averages more submission victories than the opponent wins 52.06% of the time.
Sample size: 1965


This is not a huge difference, but still a little bit better than random guess.

Now lets focus on the average submissions attempted instead of submission victories. If someone averages a lot of submission this usually shows that they are good at jiu jitsu even though they might not be able to finish the fight before the clock runs out.

In [24]:
data2 = pd.read_csv('../../data/ufc-master.csv')
data2 = data2[['B_fighter', 'R_fighter', 'B_avg_SUB_ATT', 'R_avg_SUB_ATT', 'Winner']]
hasvalue = (~data2['B_avg_SUB_ATT'].isnull()) & (~data2['R_avg_SUB_ATT'].isnull())
data2 = data2[hasvalue]
noeq2 = data2['B_avg_SUB_ATT'] != data2['R_avg_SUB_ATT']
data2 = data2[noeq2]

In [26]:
wins2 = 0
for i in range(len(data2)):
    if (data2['B_avg_SUB_ATT'].iloc[i] > data2['R_avg_SUB_ATT'].iloc[i]) & (data2['Winner'].iloc[i] == 'Blue'):
        wins2 += 1
    elif (data2['R_avg_SUB_ATT'].iloc[i] > data2['B_avg_SUB_ATT'].iloc[i]) & (data2['Winner'].iloc[i] == 'Red'):
        wins2 += 1
p2 = wins2 / len(data2)
print("Fighter who averages more submission attempts than the opponent wins " + str(round(p2 * 100, 2)) + "% of the time.")

Fighter who averages more submission attempts than the opponent wins 52.71% of the time.


This is a slighlty better indicator than the number of submission victories.
Now we are going to check the win rate where one of the fighters is in the top 10% of submissions attempted per fight.

In [39]:
print(data2['B_avg_SUB_ATT'].describe(percentiles=[.9])['90%'])
print(data2['R_avg_SUB_ATT'].describe(percentiles=[.9])['90%'])
t = data2['B_avg_SUB_ATT'].describe(percentiles=[.9])['90%']
threshold = (data2['B_avg_SUB_ATT'] >= t) | (data2['R_avg_SUB_ATT'] >= t)
data3 = data2[threshold]

1.4
1.333333333


In [40]:
wins3 = 0
for i in range(len(data3)):
    if (data3['B_avg_SUB_ATT'].iloc[i] > data3['R_avg_SUB_ATT'].iloc[i]) & (data3['Winner'].iloc[i] == 'Blue'):
        wins3 += 1
    elif (data3['R_avg_SUB_ATT'].iloc[i] > data3['B_avg_SUB_ATT'].iloc[i]) & (data3['Winner'].iloc[i] == 'Red'):
        wins3 += 1
p3 = wins3 / len(data3)
print("Fighter who averages more submission attempts than the opponent wins " + str(round(p3 * 100, 2)) + "% of the time.")

Fighter who averages more submission attempts than the opponent wins 50.57% of the time.


From this we can conclude that the number of submission is not a good indicator of who is going to win, because the win rate dropped instead.