This project employs machine learning techniques to make real-time predictions of football match scores. By using Python and linear regression, the project processes match data and predicts the likelihood of a team winning, losing, or drawing. The dataset includes matches from the English Premier League, covering multiple seasons.
- Python: Programming language used for implementing the project.
- Pandas: Data manipulation and analysis library.
- Scikit-learn: Machine learning library for Python.
.gitignore
: Specifies files to be ignored by Git.LICENSE
: License for the project.Prediction.ipynb
: Jupyter notebook containing the prediction model and analysis.README.md
: Project documentation.code.py
: Python script for data preprocessing and model training.logo.jpeg
: Project logo.matches.csv
: Dataset containing match details.
-
Clone the repository:
git clone https://github.com/your-username/your-repo-name.git cd your-repo-name
-
Create a virtual environment and activate it:
python3 -m venv env source env/bin/activate # On Windows, use `env\Scripts\activate`
-
Install the dependencies:
pip install -r requirements.txt
- Ensure you are in the project directory.
- Run the Jupyter notebook:
jupyter notebook Prediction.ipynb
- Alternatively, you can run the Python script:
python code.py
- Data Preprocessing: The
code.py
script reads thematches.csv
file, preprocesses the data, and prepares it for model training. - Model Training: The Random Forest classifier is trained on historical match data to predict future match outcomes.
- Prediction: The trained model predicts match outcomes for the test dataset.
The output of the model includes:
- Predicted match outcomes (win, lose, draw) for each match in the test dataset.
- Accuracy and precision scores to evaluate the model's performance.
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score
# Load data
matches = pd.read_csv('matches.csv', index_col=0)
# Preprocess data
matches['date'] = pd.to_datetime(matches['date'])
matches['target'] = (matches['result'] == 'W').astype('int')
matches['venue_code'] = matches['venue'].astype('category').cat.codes
matches['opp_code'] = matches['opponent'].astype('category').cat.codes
matches['hour'] = matches['time'].str.replace(":.+", "", regex=True).astype('int')
matches['day_code'] = matches['date'].dt.dayofweek
# Train-test split
train = matches[matches['date'] < '2022-01-01']
test = matches[matches['date'] > '2022-01-01']
# Model training
predictors = ['venue_code', 'opp_code', 'hour', 'day_code']
rf = RandomForestClassifier(n_estimators=50, min_samples_split=10, random_state=1)
rf.fit(train[predictors], train['target'])
# Prediction
preds = rf.predict(test[predictors])
# Evaluation
precision = precision_score(test['target'], preds)
print(f'Precision: {precision:.2f}')
Check this out
This project is licensed under the MIT License - see the LICENSE file for details.