En este archivo vamos a entrenar una red neuronal para predecir el resultado de los partidos de la Champions bsándonos en datos de los último 7 años.

Para empezar primero cargamos las librerias necesarias.

In [78]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.utils import to_categorical
from keras.layers import Dropout

Ahora vamos a empezar el entrenamiento para predecir resultados, pero antes de ello vamos a transformar las columnas string de equipos en numéricas para poder evaluarlas y analizarlas durante el procesamiento de la red neuronal.

In [79]:
# Cargar los datos del ranking UEFA
uefa_ranking = pd.read_csv("data/UEFA_Ranking.csv")

# Crear un diccionario para mapear nombres de equipos a números
team_mapping = {team: rank for rank, team in zip(uefa_ranking['Position'], uefa_ranking['Club'])}

# Cargar los datos de los partidos de la Champions League 2017
champions_data = pd.read_csv("data/champions-league-2017.csv")

# Reemplazar nombres de equipos por números en las columnas correspondientes
champions_data['Home Team'] = champions_data['Home Team'].map(team_mapping)
champions_data['Away Team'] = champions_data['Away Team'].map(team_mapping)
champions_data['Winner'] = champions_data['Winner'].map(team_mapping)

Vamos a comprobar que todos los datos están bien.

In [80]:
# Ver las primeras filas de los datos
print(champions_data.head())

   Match Number Round Number  Home Team  Away Team Result  Winner  \
0             1            1         20         61  1 - 2      61   
1             2            1         11         26  3 - 0      11   
2            13            1          2         52  3 - 0       2   
3            14            1         50          8  0 - 5       8   
4            25            1          9         66  6 - 0       9   

   Home Team Goals  Away Team Goals  Match Goals  
0                1                2            3  
1                3                0            3  
2                3                0            3  
3                0                5            5  
4                6                0            6  


In [81]:
# analizar los datos
print(champions_data.describe())

       Match Number   Home Team   Away Team     Winner  Home Team Goals  \
count    125.000000  125.000000  125.000000  125.00000       125.000000   
mean      63.000000   25.288000   25.320000   17.24800         1.784000   
std       36.228442   28.268247   28.242784   22.06722         1.532361   
min        1.000000    1.000000    1.000000    1.00000         0.000000   
25%       32.000000    6.000000    6.000000    4.00000         1.000000   
50%       63.000000   14.000000   14.000000   10.00000         1.000000   
75%       94.000000   36.000000   36.000000   23.00000         3.000000   
max      125.000000  119.000000  119.000000  119.00000         7.000000   

       Away Team Goals  Match Goals  
count       125.000000   125.000000  
mean          1.424000     3.208000  
std           1.398525     1.622901  
min           0.000000     0.000000  
25%           0.000000     2.000000  
50%           1.000000     3.000000  
75%           2.000000     4.000000  
max           7.0000

In [82]:
# ver los datos nulos
print(champions_data.isnull().sum())

Match Number       0
Round Number       0
Home Team          0
Away Team          0
Result             0
Winner             0
Home Team Goals    0
Away Team Goals    0
Match Goals        0
dtype: int64


Como podemos ver está todo bien. Para más información con respecto al CSV que estamos tratando dirigirse a graficas.ipynb

Ahora vamos a comenzar el proceso de entrenamiento de la red neuronal

In [83]:
'''# Preprocesamiento de datos
X = champions_data[['Home Team', 'Away Team', 'Home Team Goals', 'Away Team Goals']].values
y = champions_data['Winner'].values'''

# Preprocesamiento de datos
X = champions_data[['Home Team', 'Away Team', 'Home Team Goals', 'Away Team Goals']].values
y = champions_data['Winner'].values

# Separar características categóricas y numéricas
categorical_features = ['Home Team', 'Away Team']
numeric_features = ['Home Team Goals', 'Away Team Goals']

X_categorical = champions_data[categorical_features].values
X_numeric = champions_data[numeric_features].values

# Codificar características categóricas
X_categorical_encoded = np.zeros_like(X_categorical)
for i in range(X_categorical.shape[1]):
    label_encoder = LabelEncoder()
    X_categorical_encoded[:, i] = label_encoder.fit_transform(X_categorical[:, i])

# Normalizar características numéricas
scaler = StandardScaler()
X_numeric_scaled = scaler.fit_transform(X_numeric)

# Combinar características categóricas y numéricas
X_combined = np.concatenate((X_categorical_encoded, X_numeric_scaled), axis=1)

# Codificar etiquetas
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(champions_data['Winner'])
num_classes = len(label_encoder.classes_)

# Dividir datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X_combined, y, test_size=0.2, random_state=42)

# Normalizar los datos
X_train = X_train / np.max(X_train, axis=0)
X_test = X_test / np.max(X_test, axis=0)

# Convertir etiquetas a one-hot encoding
y_train = to_categorical(y_train, num_classes=num_classes)
y_test = to_categorical(y_test, num_classes=num_classes)

# Definir modelo de red neuronal con más complejidad
model = Sequential()
model.add(Dense(128, input_shape=(X_train.shape[1],), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dense(num_classes, activation='sigmoid'))

# Compilar modelo con hiperparámetros ajustados
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Entrenar modelo
model.fit(X_train, y_train, epochs=100, batch_size=40, validation_split=0.2)

# Evaluar modelo
loss, accuracy = model.evaluate(X_test, y_test)
print("Test Loss:", loss)
print("Test Accuracy:", accuracy)

Epoch 1/100


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 188ms/step - accuracy: 0.0250 - loss: 3.3971 - val_accuracy: 0.2000 - val_loss: 3.3540
Epoch 2/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step - accuracy: 0.0250 - loss: 3.3749 - val_accuracy: 0.1500 - val_loss: 3.3536
Epoch 3/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step - accuracy: 0.0667 - loss: 3.3958 - val_accuracy: 0.1500 - val_loss: 3.3529
Epoch 4/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - accuracy: 0.0500 - loss: 3.3832 - val_accuracy: 0.1500 - val_loss: 3.3518
Epoch 5/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step - accuracy: 0.0833 - loss: 3.3782 - val_accuracy: 0.1000 - val_loss: 3.3509
Epoch 6/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step - accuracy: 0.0250 - loss: 3.3655 - val_accuracy: 0.1500 - val_loss: 3.3505
Epoch 7/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[

Vemos que el entrenamiento inicial ha ido sin problemas. Vamos a probar a ver que predicciones realizaría nuestra red neuronal en base a los partidos de la Champions utilizada para obtener una medida de precisión real. Para ello primero vamos a medir la precisión de una predicción en general.

In [84]:
# Hacer predicciones en los datos de prueba
predictions = model.predict(X_test)

# Convertir las predicciones de vuelta a etiquetas
predicted_labels = np.argmax(predictions, axis=1)
true_labels = np.argmax(y_test, axis=1)

# Calcular la precisión
accuracy = np.mean(predicted_labels == true_labels)
print("Accuracy on test data:", accuracy)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step
Accuracy on test data: 0.04


Como puede verse con este resultado y la medición general anterior, la precisión del modelo es casi inexistente, por lo que vamos a seguir entrenándolo. Primero vamos a aplicar ingeniería de características para darle al modelo más variables sobre las que extraer información:

In [85]:
# Calcular la diferencia de goles entre los equipos
champions_data['Goal Difference'] = champions_data['Home Team Goals'] - champions_data['Away Team Goals']

# Calcular la proporción de goles marcados por cada equipo en relación con el total de goles en el partido
champions_data['Home Goals Ratio'] = champions_data['Home Team Goals'] / champions_data['Match Goals']
champions_data['Away Goals Ratio'] = champions_data['Away Team Goals'] / champions_data['Match Goals']

# Preprocesamiento de datos con nuevas características
X = champions_data[['Home Team', 'Away Team', 'Home Team Goals', 'Away Team Goals', 'Goal Difference', 'Home Goals Ratio', 'Away Goals Ratio']].values
y = champions_data['Winner'].values

# Separar características categóricas y numéricas
categorical_features = ['Home Team', 'Away Team']
numeric_features = ['Home Team Goals', 'Away Team Goals', 'Goal Difference', 'Home Goals Ratio', 'Away Goals Ratio']

X_categorical = champions_data[categorical_features].values
X_numeric = champions_data[numeric_features].values

# Codificar características categóricas
X_categorical_encoded = np.zeros_like(X_categorical)
for i in range(X_categorical.shape[1]):
    label_encoder = LabelEncoder()
    X_categorical_encoded[:, i] = label_encoder.fit_transform(X_categorical[:, i])

# Normalizar características numéricas
scaler = StandardScaler()
X_numeric_scaled = scaler.fit_transform(X_numeric)

# Combinar características categóricas y numéricas
X_combined = np.concatenate((X_categorical_encoded, X_numeric_scaled), axis=1)

# Codificar etiquetas
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(champions_data['Winner'])
num_classes = len(label_encoder.classes_)
y = to_categorical(y, num_classes=num_classes)

# Dividir datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X_combined, y, test_size=0.2, random_state=42)

# Normalizar los datos
X_train = X_train / np.max(X_train, axis=0)
X_test = X_test / np.max(X_test, axis=0)


Ahora vamos a ampliar el alcance de nuestra red neuronal:

In [86]:
# Definir modelo de red neuronal con arquitectura modificada
model = Sequential()
model.add(Dense(128, input_shape=(X_train.shape[1],), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))  # Cambiar a 'softmax' y ajustar neuronas

# Compilar y entrenar el modelo...
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=100, batch_size=40, validation_split=0.3)

# Evaluar modelo
loss, accuracy = model.evaluate(X_test, y_test)
print("Test Loss:", loss)
print("Test Accuracy:", accuracy)

Epoch 1/100


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 188ms/step - accuracy: 0.0631 - loss: 3.3672 - val_accuracy: 0.1333 - val_loss: 3.3669
Epoch 2/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step - accuracy: 0.0452 - loss: 3.3663 - val_accuracy: 0.1333 - val_loss: 3.3664
Epoch 3/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - accuracy: 0.0726 - loss: 3.3654 - val_accuracy: 0.1333 - val_loss: 3.3659
Epoch 4/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step - accuracy: 0.0810 - loss: 3.3645 - val_accuracy: 0.1333 - val_loss: 3.3655
Epoch 5/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step - accuracy: 0.0476 - loss: 3.3638 - val_accuracy: 0.1333 - val_loss: 3.3651
Epoch 6/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step - accuracy: 0.0464 - loss: 3.3627 - val_accuracy: 0.0000e+00 - val_loss: 3.3647
Epoch 7/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━

Vamos a continuar añadiéndo datos de entrenamiento:

In [87]:
champions_data2 = pd.read_csv("data/champions-league-2018.csv")
champions_data3 = pd.read_csv("data/champions-league-2019.csv")
champions_data4 = pd.read_csv("data/champions-league-2020.csv")
champions_data5 = pd.read_csv("data/champions-league-2021.csv")
champions_data6 = pd.read_csv("data/champions-league-2022.csv")

# Reemplazar nombres de equipos por números en las columnas correspondientes
champions = [champions_data2, champions_data3, champions_data4, champions_data5, champions_data6]
for data in champions:
    data['Home Team'] = data['Home Team'].map(team_mapping)
    data['Away Team'] = data['Away Team'].map(team_mapping)
    data['Winner'] = data['Winner'].map(team_mapping)
    # Calcular la diferencia de goles entre los equipos
    data['Goal Difference'] = data['Home Team Goals'] - data['Away Team Goals']

    # Calcular la proporción de goles marcados por cada equipo en relación con el total de goles en el partido
    data['Home Goals Ratio'] = data['Home Team Goals'] / data['Match Goals']
    data['Away Goals Ratio'] = data['Away Team Goals'] / data['Match Goals']

# Concatenar todos los datos verticalmente
combined_data = pd.concat([champions_data2, champions_data3, champions_data4, champions_data5, champions_data6], axis=0, ignore_index=True)

# Preprocesamiento de datos
X = combined_data[['Home Team', 'Away Team', 'Home Team Goals', 'Away Team Goals', 'Goal Difference', 'Home Goals Ratio', 'Away Goals Ratio']].values
y = combined_data['Winner'].values

# Separar características categóricas y numéricas
categorical_features = ['Home Team', 'Away Team']
numeric_features = ['Home Team Goals', 'Away Team Goals', 'Goal Difference', 'Home Goals Ratio', 'Away Goals Ratio']

X_categorical = combined_data[categorical_features].values
X_numeric = combined_data[numeric_features].values

# Codificar características categóricas
X_categorical_encoded = np.zeros_like(X_categorical)
for i in range(X_categorical.shape[1]):
    label_encoder = LabelEncoder()
    X_categorical_encoded[:, i] = label_encoder.fit_transform(X_categorical[:, i])

# Normalizar características numéricas
scaler = StandardScaler()
X_numeric_scaled = scaler.fit_transform(X_numeric)

# Combinar características categóricas y numéricas
X_combined = np.concatenate((X_categorical_encoded, X_numeric_scaled), axis=1)

# Codificar etiquetas
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(combined_data['Winner'])
num_classes = len(label_encoder.classes_)

# Convertir etiquetas a one-hot encoding
y = to_categorical(y, num_classes=num_classes)

# Dividir datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X_combined, y, test_size=0.2, random_state=42)

# Normalizar los datos
X_train = X_train / np.max(X_train, axis=0)

# Definir modelo de red neuronal con más complejidad
model.fit(X_train, y_train, epochs=100, batch_size=40, validation_split=0.2)

# Evaluar modelo
loss, accuracy = model.evaluate(X_test, y_test)
print("Test Loss:", loss)
print("Test Accuracy:", accuracy)

Epoch 1/100


InvalidArgumentError: Graph execution error:

Detected at node compile_loss/categorical_crossentropy/softmax_cross_entropy_with_logits defined at (most recent call last):
  File "C:\Users\javie\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main

  File "C:\Users\javie\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\ipykernel_launcher.py", line 18, in <module>

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\traitlets\config\application.py", line 1075, in launch_instance

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\ipykernel\kernelapp.py", line 739, in start

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\tornado\platform\asyncio.py", line 205, in start

  File "C:\Users\javie\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 595, in run_forever

  File "C:\Users\javie\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 1881, in _run_once

  File "C:\Users\javie\AppData\Local\Programs\Python\Python310\lib\asyncio\events.py", line 80, in _run

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\ipykernel\kernelbase.py", line 545, in dispatch_queue

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\ipykernel\kernelbase.py", line 534, in process_one

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\ipykernel\kernelbase.py", line 437, in dispatch_shell

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\ipykernel\ipkernel.py", line 359, in execute_request

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\ipykernel\kernelbase.py", line 778, in execute_request

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\ipykernel\ipkernel.py", line 446, in do_execute

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\ipykernel\zmqshell.py", line 549, in run_cell

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3075, in run_cell

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3130, in _run_cell

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\IPython\core\async_helpers.py", line 129, in _pseudo_sync_runner

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3334, in run_cell_async

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3517, in run_ast_nodes

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3577, in run_code

  File "C:\Users\javie\AppData\Local\Temp\ipykernel_18784\1725023444.py", line 63, in <module>

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\utils\traceback_utils.py", line 118, in error_handler

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\backend\tensorflow\trainer.py", line 323, in fit

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\backend\tensorflow\trainer.py", line 117, in one_step_on_iterator

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\backend\tensorflow\trainer.py", line 105, in one_step_on_data

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\backend\tensorflow\trainer.py", line 59, in train_step

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\trainers\trainer.py", line 321, in compute_loss

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\trainers\compile_utils.py", line 606, in __call__

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\trainers\compile_utils.py", line 642, in call

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\losses\loss.py", line 43, in __call__

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\losses\losses.py", line 22, in call

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\losses\losses.py", line 1568, in categorical_crossentropy

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\ops\nn.py", line 1456, in categorical_crossentropy

  File "c:\Users\javie\OneDrive\Documents\GitHub\Champions\venv\lib\site-packages\keras\src\backend\tensorflow\nn.py", line 563, in categorical_crossentropy

logits and labels must be broadcastable: logits_size=[40,29] labels_size=[40,54]
	 [[{{node compile_loss/categorical_crossentropy/softmax_cross_entropy_with_logits}}]] [Op:__inference_one_step_on_iterator_150358]