## Is MMA Math real?

In MMA circles, it is often stated that "MMA math isn't real". This is a shorthand way of saying that the transitive property doesn't hold in the sport, ie. if Fighter A beats Fighter B, and Fighter B beats Fighter C, it does not imply that Fighter A will always beat Fighter C. 

Fans often use this statement to convey that it's difficult or impossible to predict the outcome of a match based on the outcome of previous matchups, because styles make fights, fighters are at a different stage in their career, there are weight cuts and unlucky moments etc. So, just because Holly beat Ronda, and Ronda beat Miesha - it doesn't necessarily mean Holly was going to beat Miesha, and we saw that discrepency play out in a spectacular way. 

HOWEVER, my suspicion is that when people say this they are concentrating on the top tier, top 1% echelon fighters, where everyone is very skilled and this type of logic doesn't make sense. There are many professional fighters who never make it to the UFC, Bellator, ONE, PFL or Rizin. They fight on the local scenes and often lose to more skilled opponents on their way up. And so there are probably countless examples of two top calibre fighters both beating the same local tier opponent. 

In this notebook we use data scraped from Sherdog to create a directed graph of pro mma fighter wins and losses. Our intentions is to analysise every triple of fighters A, B and C, where A > B and B > C, and determine whether or not A > C. In this way we will be able to figure out the chance of a cycle, where A > B, B > C but C > A, and determine exactly how rare that is when you consider the whole of professional Mixed Martial Arts.

## Hypothesis
Our hypothesis is that if we measure the probability of A > C vs A < C for the entire network of all professional fights, that there will be a significantly greater probability that A > C. That is, we will conduct a one-sided hypothesis test with

H0: The chance of A > C is equal to A < C for all triples (A, B, C) where A > B and B > C.

H1: The chance of A > C is **greater** to A < C for all such triples

## Assumptions
Weight classes, different organisations, differences between various MMA rulesets etc will be ignored. Amateur fights, and fights in different combat sports are also ignored - we are only looking at professional MMA.

If two fighters have fought more than once, then the fighter with the higher "score" (+1 for a win, -1 for a loss, 0 for a draw) will be used as the superior fighter. Beating a fighter twice won't change anything in the analysis at this stage, and methods of winning a fight (decision/finish) are also irrelevant at this stage. For example, 

Nate and Conor have both beaten eachother once, so for this first analysis it will be as if they hadn't fought. It doesn't matter than Nate got a finish and Conor won by split decision, a win is a win.

GSP and Hughes had a triology that GSP won 2-1, so GSP > Hughes in this analysis. 

Edgar and Maynard's trilogy, however, was a draw - so it is as if they didn't fight. 

## The tech stack
*Requests*, *Beautiful soup* and *regular expressions* will be used to hit and scrape the data from Sherdog.com or prehaps Tapology.com

*Sqlite3* will be used to store fighter and fight details in an SQL database locally. 

*Networkx* is the leading candidate for building the directed graph. We can use *matplotlib* to visualise the graph

*Pickle* is used to store the dictionary of fighters

## Method
Let's start with a fighter (Ronda Rousey) and go to their fighter profile page, which contains all their opponents and whether they won or lost. We can then go to all those fighters' pages and add their opponents etc. We will do this in a 

In [1]:
pip install selenium

[33mYou are using pip version 9.0.1, however version 20.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [5]:
from bs4 import BeautifulSoup
from selenium import webdriver
import requests
import re
import networkx as nx
import graphviz
from graphviz import Digraph
import sqlite3
# small world of words
import matplotlib.pyplot as plt
import logging as logger
import pickle 
logger.getLogger().setLevel(logger.INFO)
plt.rcParams['figure.figsize'] = [40, 30]

In [6]:
# base_url = 'https://www.sherdog.com
base_url = 'https://www.tapology.com'
username = 'Qwopling'
password = 'Password1'

In [7]:
fighter = '14607-conor-mcgregor'

In [8]:
payload = {
    'Username': username,
    'Password': password
}

with requests.Session() as s:
    p = s.post(base_url + '/sign_in', data=payload)

    # An authorised request
    r = s.get(base_url + '/fightcenter/fighters/' + fighter)
    soup = BeautifulSoup(r.text)
    
# driver = webdriver.Chrome()
# driver.get(base_url + '/fightcenter/fighters/' + fighter)
# soup = BeautifulSoup(driver.page_source, 'html.parser')
# driver.quit()
# print(soup.prettify())

In [9]:
soup

<!DOCTYPE html>
<html lang="en"><head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<script>window.NREUM||(NREUM={});NREUM.info={"beacon":"bam.nr-data.net","errorBeacon":"bam.nr-data.net","licenseKey":"60bb929f68","applicationID":"311537","transactionName":"JlkLEkQJW14DFhxUXAJeEQNEFRhBDgtE","queueTime":1,"applicationTime":429,"agent":""}</script>
<script>(window.NREUM||(NREUM={})).loader_config={licenseKey:"60bb929f68",applicationID:"311537"};window.NREUM||(NREUM={}),__nr_require=function(e,n,t){function r(t){if(!n[t]){var i=n[t]={exports:{}};e[t][0].call(i.exports,function(n){var i=e[t][1][n];return r(i||n)},i,i.exports)}return n[t].exports}if("function"==typeof __nr_require)return __nr_require;for(var i=0;i<t.length;i++)r(t[i]);return r}({1:[function(e,n,t){function r(){}function i(e,n,t){return function(){return o(e,[u.now()].concat(f(arguments)),n?null:this,t),n?void 0:this}}var o=e("handle"),a=e(4),f=e(5),c=e("ee").get("tracer"),u=e("loader"),s=NREUM;"undef

In [None]:
def get_fighter_records_tapology(fighter, fighter_dict):

#   logger.info('Adding fighter to dictionary')
    fighter_dict[fighter] = {'beat': [], 'lost to': []}

#   logger.info('Retrieving fighter page')
    r = requests.get(base_url + '/fighter/' + fighter)
    soup = BeautifulSoup(r.text)

#   logger.info('Locating fighter details')
    detail_table = soup.find(class_ = 'details details_two_columns').find(class_ = 'clearfix')

    details = [str(el).
               replace('<li>', '').
               replace('</li>', '').
               strip('\n').
               replace('<strong>', '').replace('</strong>', '').
               replace('<span>', '').replace('</span>', '').split(':\n')
               for el in detail_table.find_all('li')]

    detail_dict = {}
    for el in details:
        if el[0] == 'Age':
            detail_dict['Date of Birth'] = el[2] 
        if el[0] == 'Height':
            detail_dict['Height'] = el[2].split('(')
        else:
            detail_dict[el[0]] = el[1] 

    detail_dict['Record'] = detail_dict.pop('Pro MMA Record')
    detail_dict['Record'] = detail_dict['Record'].split(' ')[0]
    detail_dict['Height'] = float(detail_dict['Height'][1].split('cm')[0])
    detail_dict['Career Disclosed Earnings'] = float(detail_dict['Career Disclosed Earnings'].strip(' USD$').replace(',', ''))
    detail_dict['Born'] = detail_dict['Born'].split(',')[-1]
    detail_dict['Current Streak'] = int(detail_dict['Current Streak'].split(' Wins')[0])
    detail_dict.pop('Fighter Links')
    detail_dict.pop('Personal Links')
    detail_dict.pop('Age')
    detail_dict.pop('Last Fight') 
    detail_dict.pop('Weight Class') 
    detail_dict.pop('Affiliation') 

In [None]:
# Load in the current fighter dictionary
fighter_dict = pickle.load( open( "fighter_dict_3_May.p", "rb" ) )

In [None]:
def get_fighter_records_sherdog(fighter, fighter_dict):

#   logger.info('Adding fighter to dictionary')
    fighter_dict[fighter] = {'beat': [], 'lost to': []}

#   logger.info('Retrieving fighter page')
    r = requests.get(base_url + fighter)
    soup = BeautifulSoup(r.text)

#   logger.info('Locating pro record')
    fight_history_table = soup.find(string = 'Fight History - Pro').find_parent(class_ = 'module fight_history').find('table')
    rows = fight_history_table.find_all(class_ = {'odd','even'})

#   logger.info('Adding opponents to dictionary')
    for num_row, row in enumerate(rows):
        cells      = row.find_all("td")
        result     = cells[0].get_text()
        opponent   = cells[1].find('a').get('href').split('/fighter/')[-1]

        if result == 'win':
            fighter_dict[fighter]['beat'].append(opponent)
        elif result == 'loss':
            fighter_dict[fighter]['lost to'].append(opponent)

    if fighter_dict[fighter]['lost to'] == []:
        print('the above fighter IS UNDEFEATED')
        return 
    
    opponent_list = fighter_dict[fighter]['lost to'][::-1] + fighter_dict[fighter]['beat']

    for opponent in opponent_list:
        if opponent in fighter_dict.keys():
            return 
        else:
            print(opponent)
            get_fighter_records(opponent, fighter_dict)        

In [None]:
fighter = 'Kazushi-Sakuraba-84'

In [None]:
get_fighter_records(fighter, fighter_dict)

In [None]:
len(fighter_dict)

In [None]:
fighter_dict.get(fighter)

## Building the graph using networkx 

In [None]:
H = nx.DiGraph()

In [None]:
for fighter in fighter_dict.keys():
    print(fighter)
    fighter_name = fighter.split('-')[1]
    H.add_node(fighter_name)
#     if len(fighter_dict[fighter]['beat']) > 0:
#         beaten_opponent = fighter_dict[fighter]['beat'][0]:
#         H.add_node(beaten_opponent)
#         H.add_edge(fighter, beaten_opponent)
    if len(fighter_dict[fighter]['lost to']) > 0:
        for beaten_by_opponent in fighter_dict[fighter]['lost to']:
            beaten_by_opponent = beaten_by_opponent.split('-')[0]
            H.add_node(beaten_by_opponent)
            H.add_edge(beaten_by_opponent, fighter_name)

In [None]:
nx.drawing.nx_pylab.draw_circular(H, with_labels=True, 
                                    node_color='white',
                                    font_color='blue', 
                                    font_size=35,
                                    edge_color='gray',
                                    arrowstyle="->",
                                    arrowsize=40,
                                    node_size=28000)

In [None]:
nx.nx_agraph.to_agraph(H)

In [None]:
pickle.dump(fighter_dict, open("fighter_dict_3_May.p", "wb")) 

In [None]:
get_fighter_records('Keizo-Sakuragi-41116', fighter_dict)

In [None]:
nx.write_gpickle(H, "fighter_graph_3_May.gpickle")

In [None]:
len(fighter_dict)

In [None]:
pip install pygraphviz

In [None]:
from networkx.algorithms import approximation as aprx

In [None]:
max_clique = H.subgraph(aprx.max_clique(H))