# Classificação de partidas de xadrez

Serão criados modelos de classificação para prever o vencedor de partidas de xadrez, baseado em dados pré-processados no projeto anterior.

Será utilizado o MLFlow para rastreamento de experimentos e comparação de modelos.

# Configuração do MLFLow

In [1]:
%pip install mlflow

Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, cross_val_score

import warnings
import mlflow
import mlflow.sklearn
import logging

logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)

warnings.filterwarnings("ignore")

mlflow.set_experiment(experiment_name='Chess games classification')

<Experiment: artifact_location='file:///d:/Arquivos/Documentos/Faculdade/7p/Intro%20dados/intro-dados/notebooks/mlruns/307563494190722733', creation_time=1720337033259, experiment_id='307563494190722733', last_update_time=1720337033259, lifecycle_stage='active', name='Chess games classification', tags={}>

Definição do método de rastreamento do mlflow:

In [3]:
def mlflow_track(model, model_name: str, params: dict, metrics: dict):
    mlflow.log_params(params)
    mlflow.log_metrics(metrics)
    mlflow.sklearn.log_model(sk_model=model, artifact_path="sklearn-model", registered_model_name=model_name)

Definição do método de treinamento e rastreamento do mlflow:

In [4]:
def mlflow_train_and_track(model_class, model_name, train_data: pd.DataFrame, target: str, **params):
    # Split the data labels and features
    train_x = train_data.drop([target], axis=1)
    train_y = train_data[[target]]

    # Train the model and track
    with mlflow.start_run(run_name=model_name):
        # Create model, train it, and create predictions
        model = model_class(**params)
        model.fit(train_x, train_y)

        # Evaluate the model
        scores = cross_val_score(model, train_x, train_y)

        print("Trained %s(%s)." %(model_name, str(params).strip("{}")))
        print("Results: %0.4f mean accuracy with a std of %0.4f" % (scores.mean(), scores.std()))

        # Log parameter, metrics, and model to MLflow
        mlflow_track(model, model_name, params, {"mean_accuracy": scores.mean(), "std_accuracy": scores.std()})

        

# Treinando os modelos

## Importando o dataset

In [5]:
csv_url = "https://raw.githubusercontent.com/Vinicius-resende-cin/intro-dados/master/data/chess_games_cleaned.csv"
try:
    data = pd.read_csv(csv_url, encoding = "ISO-8859-1")
except Exception as e:
    logger.exception(f"Unable to download training & test CSV, check your internet connection. Error: {e}")

Convertendo tipos para execução dos modelos

In [6]:
data['victory_status'] = data['victory_status'].astype('category')
data['winner'] = data['winner'].astype('category')
data['increment_code'] = data['increment_code'].astype('category')
data['white_id'] = data['white_id'].astype('category')
data['black_id'] = data['black_id'].astype('category')
data['moves'] = data['moves'].astype('category')
data['opening_eco'] = data['opening_eco'].astype('category')
data['opening_name'] = data['opening_name'].astype('category')

data['victory_status'] = data['victory_status'].cat.codes
data['winner'] = data['winner'].cat.codes
data['increment_code'] = data['increment_code'].cat.codes
data['white_id'] = data['white_id'].cat.codes
data['black_id'] = data['black_id'].cat.codes
data['moves'] = data['moves'].cat.codes
data['opening_eco'] = data['opening_eco'].cat.codes
data['opening_name'] = data['opening_name'].cat.codes

## Separando dados de treinamento e de teste

In [7]:
# Split the data into training and test sets. (0.75, 0.22) split.
train_data, test_data = train_test_split(data)

## Executando o treinamento

Para visualizar o rastreamento numa interface, execute o comando abaixo no diretório deste notebook (`/notebooks`):

```bash
mlflow ui --port 5000
```

A interface será acessível no endereço `http://localhost:5000` em um navegador.

### Random Forest

In [8]:
from sklearn.ensemble import RandomForestClassifier
mlflow_train_and_track(RandomForestClassifier, 'RandomForest', train_data, 'winner', n_estimators=100, max_depth=10, random_state=0)

Trained RandomForest('n_estimators': 100, 'max_depth': 10, 'random_state': 0).
Results: 0.6707 mean accuracy with a std of 0.0051


Registered model 'RandomForest' already exists. Creating a new version of this model...
Created version '2' of model 'RandomForest'.


### KNN

In [9]:
from sklearn.neighbors import KNeighborsClassifier
mlflow_train_and_track(KNeighborsClassifier, 'KNN', train_data, 'winner', n_neighbors=3)

Trained KNN('n_neighbors': 3).
Results: 0.4898 mean accuracy with a std of 0.0103


Registered model 'KNN' already exists. Creating a new version of this model...
Created version '2' of model 'KNN'.


### SVC

In [10]:
from sklearn.svm import LinearSVC
mlflow_train_and_track(LinearSVC, 'SVC', train_data, 'winner', C=1.0, random_state=0)

Trained SVC('C': 1.0, 'random_state': 0).
Results: 0.6699 mean accuracy with a std of 0.0062


Registered model 'SVC' already exists. Creating a new version of this model...
Created version '2' of model 'SVC'.


### MLP

In [11]:
from sklearn.neural_network import MLPClassifier
mlflow_train_and_track(MLPClassifier, 'MLP', train_data, 'winner', alpha=1, max_iter=200, random_state=0)

Trained MLP('alpha': 1, 'max_iter': 200, 'random_state': 0).
Results: 0.5352 mean accuracy with a std of 0.0154


Registered model 'MLP' already exists. Creating a new version of this model...
Created version '2' of model 'MLP'.
