# Match Outcome Prediction: Morocco vs Nigeria

This notebook builds a Machine Learning pipeline to predict the outcome of football matches between Morocco and Nigeria.

**Workflow:**
1. **Data Loading**: Load historical match data.
2. **Preprocessing**: Filter for relevant matches and tournaments.
3. **Feature Engineering**: Create target variables and prepare features.
4. **Modeling**: Use a Scikit-Learn Pipeline with OneHotEncoding and RandomForestClassifier.
5. **Evaluation**: Assess model performance.
6. **Inference**: Predict on new/hypothetical match scenarios.

In [30]:
import pandas as pd
import numpy as np
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Set random seed for reproducibility
RANDOM_SEED = 42

In [31]:
# Configuration
DATA_PATH = Path("results.csv")
if not DATA_PATH.exists():
    # Fallback to absolute path if running from a different context
    DATA_PATH = Path("/media/rachid/d70e3dc6-74e7-4c87-96bc-e4c3689c979a/lmobrmij/Projects/Predict_winner_MAR_NEJ/results.csv")

print(f"Data path set to: {DATA_PATH}")

Data path set to: results.csv


## 1. Data Loading & Preprocessing

In [37]:
def load_and_preprocess_data(path):
    df = pd.read_csv(path)

    df = df[(df['home_team'] == 'Morocco') & (df['away_team'] == 'Nigeria')]
    

    
    return df.copy()

def create_target(df):
    conditions = [
        (df['home_score'] > df['away_score']),
        (df['home_score'] < df['away_score'])
    ]
    choices = ['Home Win', 'Away Win']
    df['result'] = np.select(conditions, choices, default='Draw')
    return df

# Execute
df_raw = load_and_preprocess_data(DATA_PATH)
df_processed = create_target(df_raw)

# Display sample
print(f"Dataset shape: {df_processed.shape}")
df_processed.head(100)

Dataset shape: (5, 10)


Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral,result
7801,1969-09-21,Morocco,Nigeria,2,1,FIFA World Cup qualification,Casablanca,Morocco,False,Home Win
10449,1976-03-06,Morocco,Nigeria,3,1,African Cup of Nations,Dire Dawa,Ethiopia,True,Home Win
10458,1976-03-11,Morocco,Nigeria,2,1,African Cup of Nations,Addis Ababa,Ethiopia,True,Home Win
13819,1983-08-28,Morocco,Nigeria,0,0,African Cup of Nations qualification,Rabat,Morocco,False,Draw
21571,1996-12-12,Morocco,Nigeria,2,0,King Hassan II Tournament,Casablanca,Morocco,False,Home Win


## 2. Modeling Pipeline

We use `sklearn.pipeline.Pipeline` to encapsulate preprocessing and modeling. This ensures:
- No data leakage (preprocessing parameters are learned only from train set).
- Simpler inference on new data.
- Cleaner code.

In [33]:
# distinct feature sets
CATEGORICAL_FEATURES = ['home_team', 'away_team', 'tournament', 'city', 'country']
NUMERIC_FEATURES = ['neutral'] # 'neutral' is boolean/numeric

# Preprocessing for categorical data
categorical_transformer = OneHotEncoder(handle_unknown='ignore', sparse_output=False)

# Bundle preprocessing for numerical and categorical data
preprocessor = ColumnTransformer(
    transformers=[
        ('cat', categorical_transformer, CATEGORICAL_FEATURES),
        ('num', 'passthrough', NUMERIC_FEATURES)
    ])

# Define the complete pipeline
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(n_estimators=100, random_state=RANDOM_SEED))
])

In [34]:
# Prepare features (X) and target (y)
X = df_processed[CATEGORICAL_FEATURES + NUMERIC_FEATURES]
y = df_processed['result']

# Encode target labels (string -> int)
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=RANDOM_SEED)

# Train the model
pipeline.fit(X_train, y_train)

print("Model trained successfully.")

Model trained successfully.


## 3. Evaluation

In [35]:
y_pred = pipeline.predict(X_test)
print(classification_report(y_test, y_pred, labels=range(len(label_encoder.classes_)), target_names=label_encoder.classes_))

              precision    recall  f1-score   support

        Draw       0.00      0.00      0.00         0
    Home Win       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       0.50      0.50      0.50         1
weighted avg       1.00      1.00      1.00         1



  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


## 4. Inference on New Data
Predicting the outcome for a hypothetical match.

In [36]:
def predict_match(home_team, away_team, tournament, city, country, neutral=False):
    # Create a DataFrame for the single row input
    input_data = pd.DataFrame({
        'home_team': [home_team],
        'away_team': [away_team],
        'tournament': [tournament],
        'city': [city],
        'country': [country],
        'neutral': [neutral]
    })
    
    # Make prediction
    prediction_idx = pipeline.predict(input_data)[0]
    prediction_label = label_encoder.inverse_transform([prediction_idx])[0]
    
    return prediction_label

# Example usage
prediction = predict_match(
    home_team='Morocco',
    away_team='Nigeria',
    tournament='Africa Cup of Nations',
    city='Rabat',
    country='Morocco'
)

print(f"Predicted result for Morocco vs Nigeria (in Rabat): {prediction}")

Predicted result for Morocco vs Nigeria (in Rabat): Home Win
