# Adquisicion, limpieza y manejo de los datos

#### Importante: Si no deseas replicar la parte de adquisicion de los datos mediante web scraping, sigue las notas indicadas al principio de cada apartado del notebook

# Importar librerias

In [1]:
from bs4 import BeautifulSoup as BS
import requests
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import string
import re
from datetime import datetime

import joblib 


pd.options.display.max_rows = None
pd.options.display.max_columns = None

#### Resumen Web Scraping
* Se extraen por separado las estadisitcas intrinsecas de los peleadores y el listado de las peleas y eventos de UFC
* Posteriormente se unirán para usar las estadisticas de los peleadroes para predecir el resultado

#### Descripcion de las variables que se obtiene del webscraping
SLpM - Significant Strikes Landed per Minute 

Str. Acc. - Significant Striking Accuracy 

SApM - Significant Strikes Absorbed per Minute

Str. Def. - Significant Strike Defence (the % of opponents strikes that did not land)

TD Avg. - Average Takedowns Landed per 15 minutes

TD Acc. - Takedown Accuracy

TD Def. - Takedown Defense (the % of opponents TD attempts that did not land)

Sub. Avg. - Average Submissions Attempted per 15 minutes 

# 1. Scraping Next Event

* Scraping pagina eventos ufc: http://ufcstats.com/statistics/events/completed

In [2]:
# Obtenemos todos los eventos
bouts_link = 'http://ufcstats.com/statistics/events/completed?page=all'
events_v2 = pd.read_html(bouts_link)[0]

In [3]:
response = requests.get(bouts_link)
soup = BS(response.text)
scrap_table = soup.find('tbody')
next_event = scrap_table.find_all('a')[0]['href']
next_event

'http://ufcstats.com/event-details/a780d16cf7eed44d'

In [4]:
response = requests.get(next_event)
soup = BS(response.text)
fights = pd.read_html(next_event)[0]
fights = fights[['Fighter', 'Weight class']].copy()
fights.head()

Unnamed: 0,Fighter,Weight class
0,Josh Emmett Ilia Topuria,Featherweight
1,Amanda Ribas Maycee Barber,Women's Flyweight
2,Austen Lane Justin Tafa,Heavyweight
3,David Onama Gabriel Santos,Featherweight
4,Brendan Allen Bruno Silva,Middleweight


In [5]:
fights.columns

Index(['Fighter', 'Weight class'], dtype='object')

In [6]:
peleadores = fights['Fighter'].str.split('  ', expand = True)
fights['Red'] = peleadores[0].copy()
fights['Blue'] = peleadores[1].copy()
fights.drop('Fighter', axis = 1, inplace = True)
fights.head()

Unnamed: 0,Weight class,Red,Blue
0,Featherweight,Josh Emmett,Ilia Topuria
1,Women's Flyweight,Amanda Ribas,Maycee Barber
2,Heavyweight,Austen Lane,Justin Tafa
3,Featherweight,David Onama,Gabriel Santos
4,Middleweight,Brendan Allen,Bruno Silva


In [7]:
stats_prediction = pd.read_csv('./df_prediccion_230207.csv')
scaler = joblib.load('scaler.joblib')
model = joblib.load('modelo_reg_pred.joblib')
cols = stats_prediction.columns[1:]

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [8]:
def predict(red, blue):
    
    stats_prediction = pd.read_csv('./df_prediccion_230207.csv')
    cols = stats_prediction.columns[1:]
    pels_names = stats_prediction['Name'].unique()
    if (red in pels_names) and (blue in pels_names):
        
        stats1 = np.array(stats_prediction[stats_prediction['Name'] == red].iloc[:,1:])
        stats2 = np.array(stats_prediction[stats_prediction['Name'] == blue].iloc[:,1:])

        pred_stats = stats1 - stats2
        df = pd.DataFrame(pred_stats, columns=cols)

        df.iloc[:, :-5] = scaler.transform(df.iloc[:, :-5])

        result = model.predict_proba(df)

        if result[0][0]>result[0][1]:
            return red
        elif result[0][1]>result[0][0]:
            return blue
        else:
            return 'No pediction'
        
    else:
        return 'No pediction'

In [9]:
predict(fights['Red'][0], fights['Blue'][0])

'Ilia Topuria'

In [10]:
fights['Predicted winner'] = fights.apply(lambda x: predict(x['Red'], x['Blue']), axis = 1)

In [11]:
fights

Unnamed: 0,Weight class,Red,Blue,Predicted winner
0,Featherweight,Josh Emmett,Ilia Topuria,Ilia Topuria
1,Women's Flyweight,Amanda Ribas,Maycee Barber,Amanda Ribas
2,Heavyweight,Austen Lane,Justin Tafa,No pediction
3,Featherweight,David Onama,Gabriel Santos,Gabriel Santos
4,Middleweight,Brendan Allen,Bruno Silva,Brendan Allen
5,Welterweight,Neil Magny,Phil Rowe,Neil Magny
6,Welterweight,Randy Brown,Wellington Turman,Randy Brown
7,Lightweight,Mateusz Rebecki,Loik Radzhabov,Mateusz Rebecki
8,Women's Strawweight,Tabatha Ricci,Gillian Robertson,Tabatha Ricci
9,Flyweight,Zhalgas Zhumagulov,Joshua Van,No pediction


In [12]:
fights.to_csv('./next_event.csv', index = False)