In [175]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout, Input, Lambda
from tensorflow.keras.models import Model
from sklearn.preprocessing import StandardScaler

# Load and inspect the dataset
data = pd.read_csv('Match_Result_Data.csv')
data

Unnamed: 0,date,time,round,day,venue,result,gf,ga,opponent,xg,xga,poss,sh,sot,fk,pk,pkatt,season,team,day_code
0,2020-09-21,20.0,Matchweek 2,Mon,Away,W,3.0,1.0,Wolves,1.9,1.0,65.0,13.0,8.0,2.0,1.0,1.0,2021,Manchester City,0
1,2020-09-27,16.0,Matchweek 3,Sun,Home,L,2.0,5.0,Leicester City,0.9,2.6,72.0,16.0,5.0,1.0,0.0,0.0,2021,Manchester City,6
2,2020-10-03,17.0,Matchweek 4,Sat,Away,D,1.0,1.0,Leeds United,1.5,1.7,48.0,23.0,1.0,1.0,0.0,0.0,2021,Manchester City,5
3,2020-10-17,17.0,Matchweek 5,Sat,Home,W,1.0,0.0,Arsenal,1.5,0.9,59.0,13.0,5.0,0.0,0.0,0.0,2021,Manchester City,5
4,2020-10-24,12.0,Matchweek 6,Sat,Away,D,1.0,1.0,West Ham,1.1,0.5,70.0,14.0,7.0,1.0,0.0,0.0,2021,Manchester City,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1612,2024-05-02,19.0,Matchweek 26,Thu,Home,W,2.0,0.0,Tottenham,2.2,1.6,37.0,16.0,4.0,2.0,0.0,0.0,2023,Chelsea,3
1613,2024-05-05,14.0,Matchweek 36,Sun,Home,W,5.0,0.0,West Ham,4.1,0.9,69.0,25.0,14.0,1.0,0.0,0.0,2023,Chelsea,6
1614,2024-05-11,17.0,Matchweek 37,Sat,Away,W,3.0,2.0,Nott'ham Forest,1.6,1.5,67.0,12.0,5.0,1.0,0.0,0.0,2023,Chelsea,5
1615,2024-05-15,19.0,Matchweek 34,Wed,Away,W,2.0,1.0,Brighton,1.5,1.3,45.0,14.0,6.0,2.0,0.0,0.0,2023,Chelsea,2


In [176]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1617 entries, 0 to 1616
Data columns (total 20 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   date      1617 non-null   object 
 1   time      1617 non-null   float64
 2   round     1617 non-null   object 
 3   day       1617 non-null   object 
 4   venue     1617 non-null   object 
 5   result    1617 non-null   object 
 6   gf        1617 non-null   float64
 7   ga        1617 non-null   float64
 8   opponent  1617 non-null   object 
 9   xg        1617 non-null   float64
 10  xga       1617 non-null   float64
 11  poss      1617 non-null   float64
 12  sh        1617 non-null   float64
 13  sot       1617 non-null   float64
 14  fk        1617 non-null   float64
 15  pk        1617 non-null   float64
 16  pkatt     1617 non-null   float64
 17  season    1617 non-null   int64  
 18  team      1617 non-null   object 
 19  day_code  1617 non-null   int64  
dtypes: float64(11), int64(2), obje

In [177]:
### Step 1: Data Preprocessing ###

# Encode categorical columns to numeric values
label_encoders = {}
for column in ['venue', 'result', 'opponent', 'day']:
    label_encoders[column] = LabelEncoder()
    data[column] = label_encoders[column].fit_transform(data[column])

Explanation:
We are using LabelEncoder to convert categorical columns ('venue', 'result', 'opponent', 'day') into numeric values. This is necessary because most machine learning models require numeric inputs.
A dictionary (label_encoders) stores each encoder to access the mappings later if needed (like for inverse transformations).

In [178]:
# Map 'result' column to a 3-class target (e.g., Win=1, Draw=0, Loss=2)
data['result'] = data['result'].map({
    1: 1,  # Win
    0: 0,  # Draw
    2: 2 # Loss
})

Explanation:
The result column is remapped to define the target variable:
1: Win
0: Draw
2: Loss
This step ensures that the results are converted to a three-class target for classification.

In [179]:
# Drop unnecessary columns
data.drop(columns=['date', 'round', 'season', 'team'], inplace=True)

# Fill missing values with 0
data.fillna(0, inplace=True)

Explanation:
Irrelevant columns ('date', 'round', 'season', 'team') are dropped since they do not provide predictive value in the model or can cause data leakage (e.g., specific match dates influencing predictions).

Any missing values in the dataset are filled with 0. This is a simple approach to handle missing data, though you might want to consider more sophisticated imputation methods if missing values are common.

In [180]:
### Step 2: Feature Engineering ###

# Calculate rolling averages for critical performance metrics
def calculate_rolling_avg(df, column, window=3):
    return df[column].rolling(window=window, min_periods=1).mean()

Explanation:
This defines a helper function calculate_rolling_avg to compute rolling averages. A rolling average smooths data over a specific window (in this case, a default of 3 games). This helps capture recent trends in a team's performance.

Cause of using Rolling Average?

Capture Recent Trends:
In sports, a team's recent performance often has a greater impact on future outcomes than older performances. By using a rolling average, you can capture how a team's form is evolving over time.
For example, if you calculate the rolling average of goals_for (gf) over the last 3 matches, it will give more emphasis to how the team has been scoring recently, rather than taking all past games equally.

Smooth Out Fluctuations:
Performance metrics like goals, shots, or possession may fluctuate from game to game. A rolling average helps to smooth out these short-term variations, making it easier for your model to detect long-term trends.
For instance, if a team scores unusually high or low in one match, the rolling average tempers the effect by considering a few more matches.

Reduce Noise:
Rolling averages help reduce noise (random high or low spikes in data), which can be beneficial for the model by reducing sensitivity to outliers or anomalies in the data.
In the context of football, one-off performances (like scoring a lot of goals in a single match) might not be reflective of a team's overall trend. The rolling average mitigates the influence of such outliers.

=========================================================================================================================================================

Key Points:

Window Size: The size of the window (default = 3) determines how many recent matches are considered for the rolling average. In your case, you're considering the last 3 games, which is a reasonable window to assess recent form without overfitting to short-term noise.

Min Periods: min_periods=1 ensures that even if there aren't enough past matches to fill the window (e.g., fewer than 3 games), the rolling average will still be calculated using whatever data is available.

Temporal Feature: Rolling averages introduce a temporal aspect into your model by providing features that account for how teams have performed over time, which can be crucial for predicting future outcomes.

In [181]:
for col in ['gf', 'ga', 'poss', 'xg', 'xga', 'sh', 'sot']:
    data[f'{col}_roll_avg'] = calculate_rolling_avg(data, col)

This loop applies the calculate_rolling_avg function to important performance metrics ('gf', 'ga', 'poss', 'xg', 'xga', 'sh', 'sot') and creates new columns ('{col}_roll_avg') for the rolling averages.
These new features represent the average of the team's performance metrics over the last 3 matches.

In [182]:
# Ensure 'day_code' is integer
data['day_code'] = data['day_code'].astype(int)

# Define features (X) and target (y)
X = data.drop(columns=['result'])
y = data['result']
#Here, the dataset is split into features (X) and target (y).

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#standerization
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

#The data is split into training and test sets using train_test_split.
#80% of the data is used for training, and 20% is reserved for testing the model's performance. A fixed random_state ensures reproducibility.

In [183]:
### Step 3: Building the Adaptive Match Prediction Network (AMPN) ###

# Define a custom Influence Layer
class InfluenceLayer(tf.keras.layers.Layer):
    def __init__(self, output_dim, **kwargs):
        super(InfluenceLayer, self).__init__(**kwargs)
        self.output_dim = output_dim

    def build(self, input_shape):
        self.influence_weights = self.add_weight(
            shape=(input_shape[-1], self.output_dim),
            initializer='random_normal',
            trainable=True
        )

    def call(self, inputs):
        influence_score = tf.matmul(inputs, self.influence_weights)
        return tf.nn.relu(influence_score)



Purpose of the InfluenceLayer

The key motivation behind using the InfluenceLayer is to capture the relative importance or influence of different feature groups in predicting the match outcome. In your football match prediction model, you have different types of features, such as offensive features (e.g., goals scored, shots on target), defensive features (e.g., goals conceded, possession), and match-specific features (e.g., venue, day). Each of these feature sets may have different levels of importance in determining whether a team wins, loses, or draws a match. The InfluenceLayer is designed to quantify and capture this importance dynamically during training.

=========================================================================================================================================================

Layer Initialization: The constructor (__init__) defines the basic configuration of the layer. Here, the output_dim argument specifies how many output dimensions the layer will have.This could represent the influence scores or weights that will be applied to the feature group during training.

The super() call ensures that the parent class (tf.keras.layers.Layer) is properly initialized.

Output Dimensionality (output_dim): This parameter represents the number of neurons or outputs produced by the layer. In your case, this is typically set to 1, meaning the layer will produce a single influence score (or value) for each feature group. This score will represent how important or influential the feature group is for predicting match outcomes.

=========================================================================================================================================================

Why Use an InfluenceLayer?

Capturing Feature Importance: In traditional neural networks, each feature is treated equally, and the network learns which features are important based on the weights. However, in your case, you know that some features (such as goals scored or shots on target) have a different level of importance than others. This layer is designed to learn how much influence each feature group should have in making predictions.

Separation of Feature Groups: By using separate InfluenceLayers for different groups (offensive, defensive, match-related features), the network can learn independent importance scores for each group, making the model more interpretable and tailored to specific football-related metrics.

Building the Layer (build method):

The build() method is called when the layer is first created, and this is where you define the trainable weights of the layer.

add_weight(): This method creates a trainable weight matrix (called influence_weights) that will be used to calculate influence scores for the features.

shape=(input_shape[-1], self.output_dim): This shape specifies how the weights will map the input features to the output. Here, input_shape[-1] refers to the number of input features (i.e., the number of features in a feature group). The output_dim is typically 1, so the layer will produce one influence score for the group of input features.

initializer='random_normal': This specifies how the weights will be initialized at the start of training. random_normal means that the weights will be initialized with small random values drawn from a normal distribution.

trainable=True: This indicates that the weights will be adjusted during the training process via backpropagation. As the model trains, these weights will be optimized to reflect the true influence of each feature group on the prediction task.

=========================================================================================================================================================

Why Define Weights for Feature Groups?

Learning Influence Dynamically: Instead of manually assigning importance to each feature group, the network will learn these values during training. This allows the model to adapt to the data and capture which features (or feature groups) are truly predictive of match outcomes.

Interpretability: After training, these weights can be interpreted to understand which feature groups are more important. For example, if the weights for offensive features are higher, it means that offensive metrics have a stronger influence on match outcomes compared to other metrics like defensive ones.

=========================================================================================================================================================

In TensorFlow/Keras, the input_shape argument represents the shape of the input data that is passed into a neural network layer. It typically comes in the form of a tuple. Here's what the components mean:

input_shape: This is the shape of the input data, typically given as (batch_size, feature_dim).

batch_size: The number of samples in a batch of input data. It can vary from batch to batch and is usually not specified in input_shape (hence it is sometimes omitted or set to None).

feature_dim: The number of features for each sample. This is the important part when determining how many features are being passed into the layer.

What Does input_shape[-1] Mean?
When you see input_shape[-1], it means the last dimension of the input shape. In the context of neural networks, this refers to the number of features for each sample.

For example:

If input_shape is (32, 10), where: 32 is the batch size (number of samples in the batch),10 is the number of features per sample.
In this case, input_shape[-1] would be 10, which represents the number of input features.

Why -1 in input_shape[-1]?
The use of -1 here refers to the last element of the shape tuple. In Python (and many programming languages), you can index from the end of a list or tuple using negative indexing:

input_shape[-1] gives you the last dimension (the number of features).If input_shape = (32, 10), then input_shape[-1] = 10 (the number of features).
This is convenient because it lets you easily access the feature dimension without having to explicitly write out the shape (especially useful when you don't need to know the batch size).

=========================================================================================================================================================


Forward Pass (call method):

The call() method is where the actual forward pass of the layer happens. This is where the layer takes the input data and produces the output.

tf.matmul(): This function performs matrix multiplication between the input features (inputs) and the influence weights (self.influence_weights). The result is an "influence score" for each input feature group.

In that case, the input to this layer would be a subset of features (e.g., offensive, defensive metrics), and the output would be a single influence score that represents the collective influence of those features.
tf.nn.relu(): This applies a non-linear activation function called ReLU (Rectified Linear Unit). ReLU is commonly used in neural networks because it introduces non-linearity, which helps the model capture complex relationships in the data.

        Why ReLU: ReLU ensures that negative influence scores are set to zero, which makes sense in this context. You likely want to discard negative influence scores since negative contributions from a feature group (such as poor offensive performance) should not contribute to the final influence in the same way that positive scores do.

Workflow of the InfluenceLayer in Your Model:

Input Layer: The raw input features are passed into the model (e.g., offensive metrics like goals scored, shots on target, etc.).
Feature Grouping: You group features into logical sets (e.g., offensive, defensive, match-related).
Influence Layers: Each feature group is passed through a separate InfluenceLayer, which calculates an influence score based on a learned weight matrix.
Combination: The influence scores from all feature groups are combined in the meta-combination layer to generate a final prediction of the match outcome.

Summary of the Key Benefits:

Feature-specific weight learning: The InfluenceLayer learns a specific weight for each feature group, dynamically adjusting how much influence each group should have on the match outcome.
Smooth integration: It fits seamlessly into the larger neural network, contributing to the overall model's ability to make predictions based on the learned influences.
Greater control and flexibility: You can use this layer to create a more interpretable and customizable model that reflects the importance of different aspects of football matches.

In [184]:
# Define model input layer
input_layer = Input(shape=(X_train.shape[1],))

Explanation:
The input layer for the Keras model is defined, where X_train.shape[1] specifies the number of input features.

In [185]:
# Define feature groups and apply Influence Layers
#This helper function is designed to:
#Select specific groups of features from the input data.
#Apply the custom InfluenceLayer to these selected features to calculate an "influence score" based on those features.

def get_influence_layer(inputs, features):
    indices = [X.columns.get_loc(f) for f in features]
    feature_input = Lambda(lambda x: tf.gather(x, indices, axis=1))(inputs)
    return InfluenceLayer(1)(feature_input)

Function Definition:

inputs: This is the full input layer (or data) that contains all the features for the model.
features: A list of feature names that belong to a specific group. For example, this could be a group of offensive features like ['goals', 'shots', 'assists'].
The purpose of this function is to extract the specific feature group from the input and then apply the InfluenceLayer on this feature group.

=========================================================================================================================================================

Finding Feature Indices:


indices = [X.columns.get_loc(f) for f in features]

This line uses list comprehension to get the indices (column positions) of the selected feature names from the DataFrame X (the full feature matrix).

X.columns.get_loc(f) finds the index (position) of feature f in the list of columns of X. for f in features: It loops through all the feature names in the features list.
Example: If features = ['goals', 'shots', 'assists'] and X has columns in this order ['venue', 'day_code', 'goals', 'shots', 'assists', 'xG'], then:

X.columns.get_loc('goals') would return 2 (because 'goals' is the third column in the DataFrame),
X.columns.get_loc('shots') would return 3,
X.columns.get_loc('assists') would return 4.

After this line runs, the variable indices will store the list [2, 3, 4], which corresponds to the positions of the goals, shots, and assists columns in the dataset.

=========================================================================================================================================================

Selecting Relevant Features Using tf.gather:

feature_input = Lambda(lambda x: tf.gather(x, indices, axis=1))(inputs)

This line is used to extract the specific columns (features) from the input layer (inputs) using the indices identified earlier.

Here's what's happening:

Lambda(lambda x: ...): The Lambda layer is a simple way to wrap any custom TensorFlow function into a Keras model. Here, it's wrapping the tf.gather operation.
tf.gather(x, indices, axis=1): This is a TensorFlow operation that selects specific columns from the input x based on the indices.
x represents the input data (i.e., the input layer).
indices is the list of feature positions that we want to select (e.g., [2, 3, 4]).
axis=1 means we are selecting columns (features) from the input. axis=0 refers to rows, and axis=1 refers to columns.

=========================================================================================================================================================

Applying the InfluenceLayer:

return InfluenceLayer(1)(feature_input)

This line applies the custom InfluenceLayer to the selected features (feature_input), and it will return the output of this layer, which is the influence score for that group of features.

InfluenceLayer(1): This initializes the custom InfluenceLayer with an output dimension of 1. This means the layer will output a single score representing the "influence" of the feature group on the model.
(feature_input): This passes the extracted feature data (e.g., goals, shots, assists) into the InfluenceLayer.
The InfluenceLayer will calculate a weighted influence score based on the input features by multiplying the features by the learned influence_weights, as defined in the custom layer. This output will then be used by the model as part of the decision-making process.

=========================================================================================================================================================

Summary of the Workflow:

inputs: This is the full input data, containing all features.
features: A list of specific features (e.g., offensive or defensive features) you want to isolate.
indices: The function first finds the positions of these features in the full dataset.
tf.gather: It then extracts only the columns (features) corresponding to those positions.
InfluenceLayer: Finally, it applies the custom InfluenceLayer to compute the influence score for the selected feature group. This score reflects how much influence the feature group (e.g., offensive features) has on the overall prediction (e.g., predicting win/loss).

Why Use This?

Feature Grouping: This allows you to isolate different types of features (e.g., offensive vs. defensive) and calculate their individual influence on the overall prediction.
Modular Influence Calculation: By using the InfluenceLayer, you're able to compute influence scores separately for each feature group and combine them later. This helps in better interpreting which features or groups of features are most important in determining the match outcome.

In [186]:
# Define influence layers for different feature groups
offensive_influence = get_influence_layer(input_layer, ['gf', 'sh', 'sot', 'xg', 'gf_roll_avg'])
defensive_influence = get_influence_layer(input_layer, ['ga', 'xga', 'poss', 'ga_roll_avg'])
match_influence = get_influence_layer(input_layer, ['venue', 'day_code', 'time', 'poss_roll_avg'])

Explanation:
Three influence layers are created for different feature groups:
Offensive Influence: Metrics like goals, shots, and xG.
Defensive Influence: Metrics like goals conceded, xGA, and possession.
Match Influence: Contextual factors like venue, day of the match, and time.

In [187]:
# Meta-Combination Layer (combine influences)
combined_influence = tf.keras.layers.Concatenate()([offensive_influence, defensive_influence, match_influence])
meta_combination = Dense(16, activation='relu')(combined_influence)

Explanation:
The outputs from the three influence layers are concatenated into a single tensor using Concatenate. This combines the influences from offensive, defensive, and match-specific features.

A dense (fully connected) layer with 16 neurons is applied to the combined influences to further process the aggregated features. ReLU activation is used to introduce non-linearity.

In [188]:
# Dropout layer for regularization
dropout = Dropout(0.3)(meta_combination)

A Dropout layer is added to prevent overfitting by randomly setting 30% of the neurons to zero during training. This helps the model generalize better.

In [189]:
# Prediction layer (output with 3 classes for Win, Draw, Loss)
output = Dense(3, activation='softmax')(dropout)


Explanation:
The output layer has 3 neurons (for the three possible outcomes: Win, Draw, Loss).
A softmax activation is applied to output a probability distribution over the three classes.

In [190]:
# Define and compile the model
model = Model(inputs=input_layer, outputs=output)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])



Explanation:
The Keras model is created, specifying the input and output layers.
The Adam optimizer is used for efficient gradient-based optimization.
The loss function is sparse_categorical_crossentropy, suitable for multi-class classification with integer labels.
accuracy is included as a metric to evaluate model performance.

=========================================================================================================================================================

Efficient gradient-based optimization refers to the use of techniques that improve the process of finding the optimal parameters (weights) for a machine learning model by minimizing a specific loss function. Gradient-based methods rely on calculating the gradients (or slopes) of the loss function with respect to the model's parameters and using these gradients to adjust the parameters in a direction that minimizes the loss.

In this context, efficiency means using methods that allow the optimization process to converge more quickly, use less computational resources, or both, while ensuring good performance and stability.

Key Concepts:
Gradient Descent: Gradient descent is a common optimization technique where the model's parameters are adjusted iteratively to minimize the loss function. It works by computing the gradient of the loss function with respect to the model parameters and updating the parameters in the direction that decreases the loss.

The gradient is the vector of partial derivatives of the loss function with respect to each parameter.
By following the negative of this gradient, the model moves toward a local (or global) minimum of the loss function.

In [191]:
### Step 4: Training the Model ###

# Train the model
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)

Epoch 1/50
[1m33/33[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 8ms/step - accuracy: 0.3163 - loss: 1.1024 - val_accuracy: 0.4131 - val_loss: 1.0873
Epoch 2/50
[1m33/33[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.4432 - loss: 1.0813 - val_accuracy: 0.4208 - val_loss: 1.0726
Epoch 3/50
[1m33/33[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4994 - loss: 1.0606 - val_accuracy: 0.5212 - val_loss: 1.0494
Epoch 4/50
[1m33/33[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.5205 - loss: 1.0293 - val_accuracy: 0.5907 - val_loss: 1.0096
Epoch 5/50
[1m33/33[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.5605 - loss: 0.9818 - val_accuracy: 0.6216 - val_loss: 0.9605
Epoch 6/50
[1m33/33[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6261 - loss: 0.9284 - val_accuracy: 0.6525 - val_loss: 0.9112
Epoch 7/50
[1m33/33[0m [32m━━━━━━━━━━

Explanation:
The model is trained for 50 epochs with a batch size of 32.
20% of the training data is used for validation to monitor the model's performance during training.

In [192]:
### Step 5: Evaluating the Model ###

# Predict and evaluate performance
y_pred = np.argmax(model.predict(X_test), axis=1)
accuracy = accuracy_score

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step


Explanation:
After training, the model is evaluated on the test set to assess its performance.
The test accuracy is printed to show how well the model generalizes to unseen data.

In [193]:
from sklearn.metrics import classification_report

# Predict and evaluate performance
y_pred = np.argmax(model.predict(X_test), axis=1)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print model performance metrics
print("Model Performance Metrics:")
print("Accuracy:", accuracy)

# Check unique classes in y_test and y_pred
unique_classes_test = np.unique(y_test)
unique_classes_pred = np.unique(y_pred)

print("\nUnique classes in y_test:", unique_classes_test)
print("Unique classes in y_pred:", unique_classes_pred)

# Print classification report with appropriate labels
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['Loss', 'Draw', 'Win'], labels=[2, 0, 1]))

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
Model Performance Metrics:
Accuracy: 0.7962962962962963

Unique classes in y_test: [0 1 2]
Unique classes in y_pred: [0 1 2]

Classification Report:
              precision    recall  f1-score   support

        Loss       0.98      0.82      0.89       125
        Draw       0.53      0.90      0.67        73
         Win       0.95      0.71      0.81       126

    accuracy                           0.80       324
   macro avg       0.82      0.81      0.79       324
weighted avg       0.87      0.80      0.81       324



In [194]:
print(y_train.value_counts())
print(y_test.value_counts())


result
2    539
1    467
0    287
Name: count, dtype: int64
result
1    126
2    125
0     73
Name: count, dtype: int64


To tailor the Adaptive Match Prediction Network (AMPN) to your dataset, I’ll first examine its columns to understand the features available. Once that’s clarified, I’ll integrate the specific data attributes into each step of the algorithm and describe how the model processes each column to produce predictions.

Loading and Exploring the Dataset
I'll load the dataset to understand the specific columns and values.

The dataset includes the following columns:

Match Information: date, time, round, day, venue (Home/Away), opponent, season, team, day_code.
Performance Metrics: gf (goals for), ga (goals against), xg (expected goals), xga (expected goals against), poss (possession), sh (shots), sot (shots on target), fk (free kicks), pk (penalty kicks scored), pkatt (penalty kick attempts).
Target Variable: result (win: "W", loss: "L", draw: "D").
The objective is to predict result based on these features. I’ll now define the Adaptive Match Prediction Network (AMPN) algorithm in detail, integrating these dataset columns.

Algorithm: Adaptive Match Prediction Network (AMPN)
Step-by-Step Breakdown
1. Data Preprocessing
Handling Categorical Data:

Convert venue and result (target variable) into numerical labels for model processing. Assign venue values (e.g., Home = 1, Away = 0) and convert result labels to numerical values (Win = 1, Draw = 0, Loss = 2).
Feature Engineering:

Rolling Averages: Calculate rolling averages of gf, ga, poss, xg, xga, and sot over the past 3, 5, and 10 matches for each team.
Recent Form Indicator: Create a recent form metric as the sum of results over the past few games (e.g., last 3 or 5 matches), giving more weight to recent games.
Opponent Strength Indicator: Calculate average performance metrics of the opponent team across the season (or most recent games) to gauge difficulty level.
Day and Time Encoding: Encode day and time features to represent temporal patterns.
2. Algorithm Design
The AMPN architecture will include three main layers: Segmentation Layer (SL), Influence Layers (IL), and Meta-Combination Layer (MCL), followed by a Prediction Layer (PL).

A. Segmentation Layer (SL)
Purpose: Group data into segments based on the similarity of match conditions.
Implementation:
Segment matches by conditions such as venue (Home, Away), and recent form (e.g., last 5 games’ outcomes).
Each segment is treated as a mini-batch during training, helping the model learn different patterns based on contextual match conditions.
B. Influence Layers (IL)
Purpose: Determine the influence strength of each feature group on match outcomes.
Implementation:
Divide features into groups such as offensive performance (gf, sh, sot, xg), defensive performance (ga, xga, poss), and match-specific indicators (venue, recent form, and opponent strength).
Calculate Influence Scores: Each IL produces an influence score representing how strongly its feature group correlates with outcomes.
Dynamic Influence Updating:
The model refines influence scores for each feature through iterative adjustments based on a custom loss function, emphasizing feature influence consistency within similar matches.
The influence scores are stored for each feature in each segment.
C. Meta-Combination Layer (MCL)
Purpose: Combine influence scores from ILs across all segments.
Implementation:
Use a weighted averaging of influence scores, where each weight is dynamically assigned based on recent match performance.
For each match, MCL calculates a feature-weighted score by combining influence scores, considering segment relevance.
Include a regularization penalty in MCL to prevent over-reliance on specific feature groups, encouraging balanced learning across all features.
D. Prediction Layer (PL)
Purpose: Generate the final prediction.
Implementation:
PL aggregates the weighted scores from MCL for each segment, producing a final score for the prediction.
To avoid overfitting, PL uses a "dropout" strategy on minor features, randomly ignoring them during training.
The final output will be a probability for each class (win, loss, draw), which will be converted into the predicted match result.
3. Training and Validation
Loss Function: Define a custom loss function that minimizes prediction error while also penalizing inconsistencies in influence scores across segments.
Training Strategy: Use mini-batch gradient descent with segment-based batches. The ILs adjust influence scores iteratively, while the MCL aggregates scores to enhance robustness across different match contexts.
Validation: Use cross-segment validation to ensure that the model performs well across diverse match conditions, such as different venues and against varied opponents.
4. Evaluation Metrics
Accuracy: Calculate the percentage of correct predictions.
Precision, Recall, and F1 Score: Especially useful if the dataset has class imbalances (e.g., more wins than losses).
Contextual Validation: Test model performance with simulated conditions, like using only data from the first half of a season to predict the second half.
Challenges and Improvements
Challenge: Ensuring the model generalizes well across varied match conditions.
Improvement: Implement a context-awareness mechanism where influence scores adjust dynamically based on recent trends (e.g., changes in team form).