## Introduction

### In this notebook is explained how a Basic Machine Learning Model works, made by [Pol Jaimejuan Caubet](https://github.com/PolJaimejuanCaubet)

#### We demonstrate the process of building and evaluating a basic machine learning model for predicting the results of football matches. We use a decision tree regressor to analyze historical data from the La Liga league, transforming categorical data into numerical features, training the model, and evaluating its performance using Mean Absolute Error (MAE). The notebook covers data preprocessing, model training, and evaluation steps, providing a clear example of how machine learning techniques can be applied to sports analytics. The datasets in this project do not contain much statistical features that would allow us to analizeand accurate predictions. Instead, they serve as a basic example for learning how to apply predictive models. The main point is to practice and understand the process of building and evaluating machine learning models, rather than making highly accurate predictions :)

**First of all, we import necessary libraries for our project**
- `os`: for work with path files.
- `pandas`: for data manipulation as columns, rows, etc.
- `train_test_split` from `sklearn.model_selection`: for divide our data to validation and training sets
- `DecisionTreeRegressor` de `sklearn.tree`: to create and train our model
- `mean_absolute_error` de `sklearn.metrics`: statistical basic metrics.
- `LabelEncoder` de `sklearn.preprocessing`: to convert categorical variables to numerical values.

In [2]:
import os
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import LabelEncoder

**Loading dataset path**

Basically we are loading dataset of the matches of La Liga using `pandas` methods. We use `os.path` to obtain the correct path for .csv, that is located in `LeaguesDataset` file.

In [7]:
current_dir = os.getcwd()
league_dataset_path = os.path.join(current_dir, '..', 'LeaguesDataset', 'LaLiga.csv')
league_dataset = pd.read_csv(league_dataset_path)


**Eliminar valores nulos**

We delete all dataset rows that contain null values, using `dropna()`, to ensure that our model is not affected by incomplete data.

In [8]:
league_dataset.dropna(inplace=True)

**Convert the 'Date' column to numerical features**

The 'Date' column contains information about the date of each match, so we convert it into numerical features (day, month, and year). This is necessary because machine learning models cannot work directly with dates.

First, we convert the 'Date' column to a date format, then extract the day, month, and year of each match and save them as new columns. Finally, we delete through `drop()` the 'Date' column.


In [None]:
league_dataset['Date'] = pd.to_datetime(league_dataset['Date'], format='%d/%m/%Y', errors='coerce')
league_dataset.dropna(subset=['Date'], inplace=True)
league_dataset['Day'] = league_dataset['Date'].dt.day
league_dataset['Month'] = league_dataset['Date'].dt.month
league_dataset['Year'] = league_dataset['Date'].dt.year
league_dataset.drop('Date', axis=1, inplace=True)