# Prediction Of Final Result Of Football Matches Based on Half Time Statistics

### Dataset Structure
The dataset contains the following columns:

- Div: League identifier (F1=French Ligue 1, I1=Italian Serie A, SP1=Spanish La Liga)

- HomeTeam: Home team name

- AwayTeam: Away team name

- FTR: Full-time result (H=Home win, D=Draw, A=Away win)

- HTHG: Home team half-time goals

- HTAG: Away team half-time goals

- HTR: Half-time result (H=Home lead, D=Draw, A=Away lead)

- BWD: Betting odds for draw

- BWA: Betting odds for away win

- BWH: Betting odds for home win

- Year: Match year

- Month: Match month

- Day: Match day

#### What we have in this dataset?
- Coverage: The dataset includes matches from:

French Ligue 1 (2019-2022)

Italian Serie A (2021-2022 seasons)

Spanish La Liga (2021-2022 seasons)

- Betting Data: The dataset contains betting odds (BWH, BWD, BWA) which could be valuable for predictive modeling.

- Temporal Data: Each match has a complete date (year, month, day) allowing for temporal analysis.

- Match Phases: Contains both half-time and full-time results, enabling analysis of how matches develop.


### Install necessary libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Importing Data

In [3]:
#start by importing merged dataset
#from google.colab import drive
#drive.mount('/content/drive')
df = pd.read_csv('merged_df_File.csv')
print(df.head(5))

   FTAG  WHD    HS   AY   WHH    AwayTeam   HR        Date HTR  HTAG  ...  \
0     3  3.5   7.0  2.0  2.80        Lyon  1.0  09/08/2019   A   2.0  ...   
1     2  3.6  10.0  0.0  1.63       Reims  0.0  10/08/2019   D   0.0  ...   
2     1  3.1  14.0  1.0  2.35    Bordeaux  0.0  10/08/2019   H   1.0  ...   
3     1  3.3  16.0  0.0  2.30    Toulouse  0.0  10/08/2019   H   0.0  ...   
4     2  3.3  15.0  2.0  3.60  St Etienne  0.0  10/08/2019   A   2.0  ...   

    AR   HomeTeam  WHA   HY   BWD  FTHG  FTR    AS   BWA  HTHG  
0  0.0     Monaco  2.4  2.0  3.30     0    A  13.0  2.40   0.0  
1  0.0  Marseille  5.8  1.0  3.60     0    A   8.0  5.75   0.0  
2  0.0     Angers  3.2  2.0  3.10     3    H   8.0  3.10   3.0  
3  0.0      Brest  3.1  0.0  3.10     1    D  13.0  3.00   1.0  
4  0.0      Dijon  2.1  0.0  3.25     1    A  12.0  2.05   1.0  

[5 rows x 22 columns]


In [4]:
#  Define the columns to keep (excluding duplicates like HG, AG, Res)
columns_to_keep = [
    'Div', 'Date', 'HomeTeam', 'AwayTeam','FTR',
    'HTHG', 'HTAG', 'HTR', 'BWD', 'BWA', 'BWH'
]

# Check which columns are actually present in the dataset
available_columns = [col for col in columns_to_keep if col in df.columns]
df = df[available_columns]


# preprocessing

## Change Date Structure

We decided to split the Date variable into smaller parts: Year, Month, and Day. However, the challenge is that there are three different formats of information in this column: dd/mm/yyyy, d/m/yyyy, and dd/mm/yy.

In [5]:
# Step 1: Preprocess the Date column to handle mixed formats and two-digit years
def preprocess_date(date_str):
    # Split the date string into day, month, year
    parts = date_str.split('/')
    day, month, year = parts[0], parts[1], parts[2]

    # Pad day and month with leading zeros if needed (e.g., '5' -> '05')
    day = day.zfill(2)
    month = month.zfill(2)

    # Handle two-digit years by prepending '20'
    if len(year) == 2:
        year = '20' + year

    # Reconstruct the date string in dd/mm/yyyy format
    return f"{day}/{month}/{year}"

# Apply the preprocessing to the Date column
df['Date'] = df['Date'].apply(preprocess_date)

# Step 2: Convert the normalized Date column to datetime
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y', errors='coerce')

# Step 3: Extract Year, Month, and Day into new columns
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day

# Step 4: Drop the original Date column (optional, comment out if you want to keep it)
df = df.drop('Date', axis=1)

# Step 5: Verify the new columns
print("First 5 rows with new Year, Month, Day columns:")
print(df[['Year', 'Month', 'Day']].head())

# Check for nulls in the new columns (should be none since Date has no nulls)
print("\nNull values in new date columns:")
print(df[['Year', 'Month', 'Day']].isnull().sum())

# Step 6: Save the updated dataset
#df.to_csv('football_data_with_date_split_corrected.csv', index=False)
#print("Dataset with correctly split date columns saved to 'football_data_with_date_split_corrected.csv'")

# Display the first few rows of the updated dataset
print("\nFirst 5 rows of the updated dataset:")
print(df.head())

First 5 rows with new Year, Month, Day columns:
   Year  Month  Day
0  2019      8    9
1  2019      8   10
2  2019      8   10
3  2019      8   10
4  2019      8   10

Null values in new date columns:
Year     0
Month    0
Day      0
dtype: int64
Dataset with correctly split date columns saved to 'football_data_with_date_split_corrected.csv'

First 5 rows of the updated dataset:
  Div   HomeTeam    AwayTeam FTR  HTHG  HTAG HTR   BWD   BWA   BWH  Year  \
0  F1     Monaco        Lyon   A   0.0   2.0   A  3.30  2.40  2.85  2019   
1  F1  Marseille       Reims   A   0.0   0.0   D  3.60  5.75  1.62  2019   
2  F1     Angers    Bordeaux   H   3.0   1.0   H  3.10  3.10  2.35  2019   
3  F1      Brest    Toulouse   D   1.0   0.0   H  3.10  3.00  2.40  2019   
4  F1      Dijon  St Etienne   A   1.0   2.0   A  3.25  2.05  3.60  2019   

   Month  Day  
0      8    9  
1      8   10  
2      8   10  
3      8   10  
4      8   10  


## Handling null values

In [6]:
# Verifying no null values remain
print("Missing Values After Preprocessing:")
print(df.isnull().sum())

# Saving the preprocessed dataset to a new CSV file
#df.to_csv('preprocessed_football_data.csv', index=False)
#print("Preprocessed data saved to 'preprocessed_football_data.csv'")

# Displaying the first few rows of the preprocessed dataset
print("\nFirst 5 rows of preprocessed dataset:")
print(df.head())

Missing Values After Preprocessing:
Div           0
HomeTeam      0
AwayTeam      0
FTR           0
HTHG          4
HTAG          4
HTR           4
BWD         645
BWA         645
BWH         645
Year          0
Month         0
Day           0
dtype: int64

First 5 rows of preprocessed dataset:
  Div   HomeTeam    AwayTeam FTR  HTHG  HTAG HTR   BWD   BWA   BWH  Year  \
0  F1     Monaco        Lyon   A   0.0   2.0   A  3.30  2.40  2.85  2019   
1  F1  Marseille       Reims   A   0.0   0.0   D  3.60  5.75  1.62  2019   
2  F1     Angers    Bordeaux   H   3.0   1.0   H  3.10  3.10  2.35  2019   
3  F1      Brest    Toulouse   D   1.0   0.0   H  3.10  3.00  2.40  2019   
4  F1      Dijon  St Etienne   A   1.0   2.0   A  3.25  2.05  3.60  2019   

   Month  Day  
0      8    9  
1      8   10  
2      8   10  
3      8   10  
4      8   10  


The null values are in BWS, BWA and BWH columns that shows the odd values for wining Home team, Away team of draw. for handling the null values for those variable I decided to fill in these missing odds with a number that is neutral, fair, and doesn't bias your future analysis — like replacing with 50% probability when there are two outcomes. Here, because football matches have three outcomes, I want something equivalent for three possibilities.
In decimal odds:

- Probability = 1 / Decimal Odds

- So fair odds are the reciprocal of probability.

If you assume that each outcome (home win, draw, away win) is equally likely (purely 33.33% chance each).
This means each event is equally likely (one-third chance), which is the most neutral assumption when you have 3 possible outcomes and no other information.


In [7]:
# Handling null values as specified
# Step 1: Replace null values in betting odds columns with 1
betting_columns = ['BWD', 'BWA', 'BWH']
for col in betting_columns:
    df[col] = df[col].fillna(3)

# Step 2: Drop rows where HTR has null values
df = df.dropna(subset=['HTR', 'HTHG', 'HTAG' ])


# Step 4: Drop rows with null values in categorical columns
categorical_columns = ['HomeTeam', 'AwayTeam', 'FTR', 'Div', 'Year', 'Month', 'Day']
df = df.dropna(subset=categorical_columns)

# Verifying no null values remain
print("Missing Values After Preprocessing:")
print(df.isnull().sum())

# Saving the preprocessed dataset to a new CSV file
df.to_csv('preprocessed_football_data.csv', index=False)
print("Preprocessed data saved to 'preprocessed_football_data.csv'")

# Displaying the first few rows of the preprocessed dataset
print("\nFirst 5 rows of preprocessed dataset:")
print(df.head())

Missing Values After Preprocessing:
Div         0
HomeTeam    0
AwayTeam    0
FTR         0
HTHG        0
HTAG        0
HTR         0
BWD         0
BWA         0
BWH         0
Year        0
Month       0
Day         0
dtype: int64
Preprocessed data saved to 'preprocessed_football_data.csv'

First 5 rows of preprocessed dataset:
  Div   HomeTeam    AwayTeam FTR  HTHG  HTAG HTR   BWD   BWA   BWH  Year  \
0  F1     Monaco        Lyon   A   0.0   2.0   A  3.30  2.40  2.85  2019   
1  F1  Marseille       Reims   A   0.0   0.0   D  3.60  5.75  1.62  2019   
2  F1     Angers    Bordeaux   H   3.0   1.0   H  3.10  3.10  2.35  2019   
3  F1      Brest    Toulouse   D   1.0   0.0   H  3.10  3.00  2.40  2019   
4  F1      Dijon  St Etienne   A   1.0   2.0   A  3.25  2.05  3.60  2019   

   Month  Day  
0      8    9  
1      8   10  
2      8   10  
3      8   10  
4      8   10  


In [10]:
# Save the updated DataFrame to a new CSV
#df.to_csv('updated_football_date_seperated.csv', index=False)

# Feature Engineering

## Last game result (win/draw/loss) for both home and away teams



In this part I'll adds feature engineering columns to track each team's recent form (win/draw/loss) for both home and away teams

In [8]:
# Load the dataset
#df = pd.read_csv('preprocessed_football_data.csv')

# Sort by date to ensure chronological order
df['Date'] = pd.to_datetime(df[['Year', 'Month', 'Day']])
df = df.sort_values(['Div', 'Date']).reset_index(drop=True)

# Initialize new columns
for prefix in ['Home', 'Away']:
    df[f'{prefix}_PrevWin'] = 0
    df[f'{prefix}_PrevDraw'] = 0
    df[f'{prefix}_PrevLoss'] = 0

# Create a dictionary to track each team's last result
team_last_result = {}

In [9]:
# Let's create a larger sample to see the form columns working
#sample_size = 50  # Look at first 50 matches
#sample_df = df.head(sample_size).copy()

sample_df = df

# Re-run the form calculation just on this sample
team_last_result = {}
for idx, row in sample_df.iterrows():
    home_team = row['HomeTeam']
    away_team = row['AwayTeam']
    
    # Get home team's last result
    if home_team in team_last_result:
        last_result = team_last_result[home_team]
        sample_df.at[idx, 'Home_PrevWin'] = 1 if last_result == 'W' else 0
        sample_df.at[idx, 'Home_PrevDraw'] = 1 if last_result == 'D' else 0
        sample_df.at[idx, 'Home_PrevLoss'] = 1 if last_result == 'L' else 0
    
    # Get away team's last result
    if away_team in team_last_result:
        last_result = team_last_result[away_team]
        sample_df.at[idx, 'Away_PrevWin'] = 1 if last_result == 'W' else 0
        sample_df.at[idx, 'Away_PrevDraw'] = 1 if last_result == 'D' else 0
        sample_df.at[idx, 'Away_PrevLoss'] = 1 if last_result == 'L' else 0
    
    # Update team results with current match
    if row['FTR'] == 'H':
        team_last_result[home_team] = 'W'
        team_last_result[away_team] = 'L'
    elif row['FTR'] == 'A':
        team_last_result[home_team] = 'L'
        team_last_result[away_team] = 'W'
    else:  # Draw
        team_last_result[home_team] = 'D'
        team_last_result[away_team] = 'D'

# Show matches where at least one form indicator is 1
has_history = df[(sample_df['Home_PrevWin'] == 1) | 
                        (sample_df['Home_PrevDraw'] == 1) | 
                        (sample_df['Home_PrevLoss'] == 1) |
                        (sample_df['Away_PrevWin'] == 1) | 
                        (sample_df['Away_PrevDraw'] == 1) | 
                        (sample_df['Away_PrevLoss'] == 1)]

print(has_history[['Date', 'HomeTeam', 'AwayTeam', 'FTR', 
                   'Home_PrevWin', 'Home_PrevDraw', 'Home_PrevLoss',
                   'Away_PrevWin', 'Away_PrevDraw', 'Away_PrevLoss']])
df = has_history                   

            Date        HomeTeam       AwayTeam FTR  Home_PrevWin  \
9     2005-08-13       Bielefeld        Hamburg   A             0   
10    2005-08-13        Dortmund     Schalke 04   A             0   
11    2005-08-13          Hertha  Ein Frankfurt   H             0   
12    2005-08-13  Kaiserslautern       Duisburg   H             0   
13    2005-08-13      Leverkusen  Bayern Munich   A             1   
...          ...             ...            ...  ..           ...   
70029 2025-03-16         Leganes          Betis   A             0   
70030 2025-03-16         Sevilla     Ath Bilbao   A             0   
70031 2025-03-16         Osasuna         Getafe   A             0   
70032 2025-03-16       Vallecano       Sociedad   D             0   
70033 2025-03-16      Ath Madrid      Barcelona   A             0   

       Home_PrevDraw  Home_PrevLoss  Away_PrevWin  Away_PrevDraw  \
9                  0              1             1              0   
10                 1              0


Key approaches and technics that i used in this part is:
- Instead of complex merging operations, this uses a dictionary to track each team's last result.
- Sets the form columns directly during iteration, avoiding merge issues.
-  Handles cases where teams haven't played before (initializes with 0).
- Ensures chronological processing of matches.


The new columns will show:
For each team in each match:

**Home_PrevWin:** 1 if home team won their last match, 0 otherwise

**Home_PrevDraw:** 1 if home team drew their last match, 0 otherwise

**Home_PrevLoss:** 1 if home team lost their last match, 0 otherwise

(Same for Away team columns)


In [24]:
# Saving the preprocessed dataset to a new CSV file
#df.to_csv('Step5_football_data_with_PrevMatch_result.csv', index=False)
#print("feature eng data saved to 'Step5_football_data_with_PrevMatch_result.csv'")


feature eng data saved to 'Step5_football_data_with_PrevMatch_result.csv'


### Extend to More Matches (N-Game Form)

The target of this part is track form over the last 3 matches

In [25]:
# Example: Track last 3 matches' form
def get_last_n_results(team, date, n=3):
    team_matches = df[((df['HomeTeam'] == team) | (df['AwayTeam'] == team)) & (df['Date'] < date)]
    last_n = team_matches.sort_values('Date').tail(n)
    
    wins = 0
    draws = 0
    losses = 0
    
    for _, row in last_n.iterrows():
        if row['HomeTeam'] == team:
            if row['FTR'] == 'H': wins += 1
            elif row['FTR'] == 'D': draws += 1
            else: losses += 1
        else:
            if row['FTR'] == 'A': wins += 1
            elif row['FTR'] == 'D': draws += 1
            else: losses += 1
    
    return wins, draws, losses

# Apply to each match
for idx, row in df.iterrows():
    home_team = row['HomeTeam']
    away_team = row['AwayTeam']
    date = row['Date']
    
    # Home team's last 3 matches
    h_wins, h_draws, h_losses = get_last_n_results(home_team, date, 3)
    df.at[idx, 'Home_Last3Wins'] = h_wins
    df.at[idx, 'Home_Last3Draws'] = h_draws
    df.at[idx, 'Home_Last3Losses'] = h_losses
    
    # Away team's last 3 matches
    a_wins, a_draws, a_losses = get_last_n_results(away_team, date, 3)
    df.at[idx, 'Away_Last3Wins'] = a_wins
    df.at[idx, 'Away_Last3Draws'] = a_draws
    df.at[idx, 'Away_Last3Losses'] = a_losses

In [22]:
df.head(5)

Unnamed: 0,Div,HomeTeam,AwayTeam,FTR,HTHG,HTAG,HTR,BWD,BWA,BWH,...,Away_PrevLoss,Home_Last3Wins,Home_Last3Draws,Home_Last3Losses,Away_Last3Wins,Away_Last3Draws,Away_Last3Losses,H2H_HomeWins,H2H_Draws,H2H_AwayWins
9,D1,Bielefeld,Hamburg,A,0.0,0.0,D,3.4,2.05,3.15,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0
10,D1,Dortmund,Schalke 04,A,1.0,1.0,D,3.25,2.65,2.4,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0
11,D1,Hertha,Ein Frankfurt,H,0.0,0.0,D,4.5,7.2,1.35,...,1,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0
12,D1,Kaiserslautern,Duisburg,H,2.0,1.0,H,3.5,4.4,1.7,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0
13,D1,Leverkusen,Bayern Munich,A,1.0,3.0,A,3.4,2.1,3.0,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0


### Head-to-Head History
the target of this part is check how teams performed against each other in the past (H2H_HomeWins)

In [21]:
def get_h2h_form(home_team, away_team, date, n=3):
    h2h_matches = df[((df['HomeTeam'] == home_team) & (df['AwayTeam'] == away_team)) | 
                     ((df['HomeTeam'] == away_team) & (df['AwayTeam'] == home_team))]
    h2h_matches = h2h_matches[h2h_matches['Date'] < date].sort_values('Date').tail(n)
    
    home_wins = 0
    draws = 0
    away_wins = 0
    
    for _, row in h2h_matches.iterrows():
        if row['FTR'] == 'H':
            if row['HomeTeam'] == home_team: home_wins += 1
            else: away_wins += 1
        elif row['FTR'] == 'D':
            draws += 1
    
    return home_wins, draws, away_wins

# Apply to each match
df[['H2H_HomeWins', 'H2H_Draws', 'H2H_AwayWins']] = df.apply(
    lambda x: pd.Series(get_h2h_form(x['HomeTeam'], x['AwayTeam'], x['Date'], 3)),
    axis=1
)

# Saving the preprocessed dataset to a new CSV file
df.to_csv('Step7_football_data_with_HeadToHead_Histort.csv', index=False)
print("feature eng data saved to 'Step7_football_data_with_HeadToHead_Histort.csv'")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[['H2H_HomeWins', 'H2H_Draws', 'H2H_AwayWins']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[['H2H_HomeWins', 'H2H_Draws', 'H2H_AwayWins']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[['H2H_HomeWins', 'H2H_Draws', 'H2H_AwayWins']] = df.apply(


feature eng data saved to 'Step7_football_data_with_HeadToHead_Histort.csv'


In [10]:
# Saving the preprocessed dataset to a new CSV file
#df.to_csv('sample ver.csv', index=False)


# Models

### Data Preparation

In [13]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler

# Load the data
#df = pd.read_csv('sample ver.csv')

# Convert 'Date' to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Encode categorical features
le_team = LabelEncoder()
df['HomeTeam'] = le_team.fit_transform(df['HomeTeam'])
df['AwayTeam'] = le_team.transform(df['AwayTeam'])  # Use same encoder

le_ftr = LabelEncoder()
df['FTR_encoded'] = le_ftr.fit_transform(df['FTR'])

# Features and target
features = [
    'HomeTeam', 'AwayTeam', 'HTHG', 'HTAG', 'BWD', 'BWA', 'BWH',
    'Year', 'Month', 'Day', 
    'Home_PrevWin', 'Home_PrevDraw', 'Home_PrevLoss', 
    'Away_PrevWin', 'Away_PrevDraw', 'Away_PrevLoss'
]
target = 'FTR_encoded'

X = df[features]
y = df[target]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features (important for LSTM and regression)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


### XGBoost Classifier

In [14]:
import xgboost as xgb
from sklearn.metrics import accuracy_score

# Model
xgb_model = xgb.XGBClassifier(objective='multi:softmax', num_class=3, eval_metric='mlogloss')
xgb_model.fit(X_train, y_train)

# Predict
y_pred_xgb = xgb_model.predict(X_test)

# Accuracy
print('XGBoost Accuracy:', accuracy_score(y_test, y_pred_xgb))


XGBoost Accuracy: 0.6117739515610487


### Logistic Regression

In [15]:
from sklearn.linear_model import LogisticRegression

# Model
lr_model = LogisticRegression(max_iter=1000, multi_class='multinomial')
lr_model.fit(X_train_scaled, y_train)

# Predict
y_pred_lr = lr_model.predict(X_test_scaled)

# Accuracy
print('Logistic Regression Accuracy:', accuracy_score(y_test, y_pred_lr))




Logistic Regression Accuracy: 0.6127027220118597


### LSTM (using TensorFlow/Keras)

In [16]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Reshape input for LSTM: (samples, time_steps, features)
# Here time_steps=1 because we don't have sequential data
X_train_lstm = X_train_scaled.reshape((X_train_scaled.shape[0], 1, X_train_scaled.shape[1]))
X_test_lstm = X_test_scaled.reshape((X_test_scaled.shape[0], 1, X_test_scaled.shape[1]))

# Model
lstm_model = Sequential()
lstm_model.add(LSTM(64, input_shape=(X_train_lstm.shape[1], X_train_lstm.shape[2])))
lstm_model.add(Dense(32, activation='relu'))
lstm_model.add(Dense(3, activation='softmax'))  # 3 classes (H, D, A)

lstm_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train
lstm_model.fit(X_train_lstm, y_train, epochs=20, batch_size=32, validation_split=0.1)

# Evaluate
loss, accuracy = lstm_model.evaluate(X_test_lstm, y_test)
print('LSTM Accuracy:', accuracy)


2025-04-29 02:54:33.353686: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-04-29 02:54:33.353728: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-04-29 02:54:33.354913: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-29 02:54:33.360509: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
LSTM Accuracy: 0.6140601634979248


## Comparison Table

In [17]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Collect predictions
y_pred_xgb = xgb_model.predict(X_test)
y_pred_lr = lr_model.predict(X_test_scaled)
y_pred_lstm = np.argmax(lstm_model.predict(X_test_lstm), axis=1)  # for LSTM (softmax output)

# Function to calculate metrics
def get_metrics(y_true, y_pred, model_name):
    return {
        'Model': model_name,
        'Accuracy': accuracy_score(y_true, y_pred),
        'Precision': precision_score(y_true, y_pred, average='weighted'),
        'Recall': recall_score(y_true, y_pred, average='weighted'),
        'F1 Score': f1_score(y_true, y_pred, average='weighted')
    }

# Create metrics for each model
metrics = []

metrics.append(get_metrics(y_test, y_pred_xgb, 'XGBoost'))
metrics.append(get_metrics(y_test, y_pred_lr, 'Logistic Regression'))
metrics.append(get_metrics(y_test, y_pred_lstm, 'LSTM'))

# Create DataFrame
metrics_df = pd.DataFrame(metrics)

# Show table
print(metrics_df)


                 Model  Accuracy  Precision    Recall  F1 Score
0              XGBoost  0.611774   0.595643  0.611774  0.600874
1  Logistic Regression  0.612703   0.585205  0.612703  0.588191
2                 LSTM  0.614060   0.593419  0.614060  0.597910
