<center>
    <h1><b>UM - Game-Playing Strength of MCTS Variants</b></h1>
</center>

Predict which variants of Monte-Carlo Tree Search will perform well or poorly against each other in hundreds of board games

## Overview 
In this competition, you’ll create a model to predict how well one Monte-Carlo tree search (MCTS) variant will do against another in a       given game, based on a list of features describing the game. This challenge aims to help us figure out which MCTS variants work best in      different types of games, so we can make more informed choices when applying these algorithms to new problems

## Description 
MCTS is a widely used search algorithm for developing agents that can play board games intelligently. Over the past two decades,             researchers have proposed dozens, if not hundreds, of MCTS variants. Despite this, it's been challenging to determine which variants are     best suited for specific types of games. 

In most studies, researchers demonstrate that a new MCTS variant outperforms one or a few other variants in a limited set of game         s. However, it’s uncommon for a new variant to consistently outperform others across a broad range of games, making it unclear which types   of games certain MCTS variants are best at. Answering this question would greatly improve our understanding of MCTS algorithms, and help     us make better decisions about which variants to apply to new games or other decision-making problems .

This competition challenges you to develop a model that can predict the performance of one MCTS variant against another in a given g    ame, based on the features of the ga me.

Your work could help pave the way for identifying the strengths and weaknesses of different MCTS variants, advancing our understand   ing of where they work best in various sce

## Evaluation 
Submissions are evaluated based on the Root-mean-square-error (RMSE) between predicted and ground-truth performance levels of the first      agent against the second agent.## 

Submitti ng
You must submit to this competition using the provided Python evaluation API, which serves test set instances in random order in batches  of 100. To use the API, follow the template in this notebook.narios..

# Version used for testing

In [None]:
import os
import polars as pl
from sklearn.preprocessing import OrdinalEncoder
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.metrics import mean_squared_error
import lightgbm as lgb
import xgboost as xgb
from lightgbm import LGBMRegressor

import joblib
from pathlib import Path

# Path to competition files
comp_path = Path('/kaggle/input/um-game-playing-strength-of-mcts-variants')
target = 'utility_agent1'

# Function to drop columns with a single unique value
def drop_single_value_cols(df):
    single_value_cols = [col for col in df.columns if df[col].n_unique() == 1]
    return df.drop(single_value_cols)

# Load training data
train = pl.read_csv(comp_path / 'train.csv')
y_train = train[target]

# Drop columns with single values
train = drop_single_value_cols(train)

# Drop unnecessary columns
cols_to_drop = ['num_draws_agent1', 'num_losses_agent1', 'num_wins_agent1', target]
train = train.drop(cols_to_drop)

# Encode categorical columns
obj_cols = train.select(pl.col(pl.String)).columns
enc = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-999, encoded_missing_value=-9999)
enc.fit(train[obj_cols])

# Transform the categorical columns in the training data
train_transformed = enc.transform(train[obj_cols])
for e, c in enumerate(obj_cols):
    train = train.with_columns(pl.Series(c, train_transformed[:, e]))

# Define models to experiment with (LightGBM and XGBoost)
models = {
    'lgb': lgb.LGBMRegressor(),
    'xgb': xgb.XGBRegressor()
}

# Parameter grid with num_leaves for LightGBM
param_grid = {
    'lgb': {
        'num_leaves': [31, 50, 100],
        'learning_rate': [0.01, 0.05, 0.1],
        'n_estimators': [50, 100, 200],
        'max_depth': [5, 10, 15]
    }
},
    'xgb': {
        'n_estimators': [100, 200],
        'learning_rate': [0.01, 0.1],
        'max_depth': [6, 10],
        'min_child_weight': [1, 5]
    },
}

# Perform GridSearchCV to find the best model and parameters
best_models = {}
for model_name, model in models.items():
    grid = GridSearchCV(model, param_grid[model_name], cv=3, scoring='neg_mean_squared_error')
    grid.fit(train, y_train)
    best_models[model_name] = grid.best_estimator_
    print(f"Best parameters for {model_name}: {grid.best_params_}")

    # Evaluate using cross-validation
    cv_scores = cross_val_score(best_models[model_name], train, y_train, cv=3, scoring='neg_mean_squared_error')
    mean_cv_score = -cv_scores.mean()  # Convert negative MSE to positive
    print(f"Mean Cross-Validation MSE for {model_name}: {mean_cv_score}")

# Save the best models for later use
joblib.dump(best_models['lgb'], 'best_lgb_model.joblib')
joblib.dump(best_models['xgb'], 'best_xgb_model.joblib')


# Versions Submitted

In [None]:
import os
import gc
from pathlib import Path
import lightgbm as lgb
from lightgbm import early_stopping
import polars as pl
import pandas as pd
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error

import kaggle_evaluation.mcts_inference_server as mcts_inference_server

#from sklearn.decomposition import PCA

# System Configuration
class Config:
    # Paths to data
    train_data_path = Path('/kaggle/input/um-game-playing-strength-of-mcts-variants/train.csv')
    test_data_path = Path('/kaggle/input/um-game-playing-strength-of-mcts-variants/test.csv')
    submission_path = Path('/kaggle/input/um-game-playing-strength-of-mcts-variants/sample_submission.csv')

    # Feature engineering configuration
    batch_size = 16384
    low_memory = True
    
    # Model parameters
    lgb_params = {
        'n_estimators': 200,
        'learning_rate': 0.1,
        'max_depth': 20,
        'num_leaves': 70,
        'n_jobs': -1,
        'max_bin': 1024,
        'force_row_wise': True
    }

# Feature Engineering class
class FeatureEngineering:
    def __init__(self, batch_size, low_memory):
        self.batch_size = batch_size
        self.low_memory = low_memory
        
    def drop_unwanted_columns(self, df, columns):
        return df.drop(columns, axis=1) if set(columns).issubset(df.columns) else df
    
    def optimize_dtypes(self, df):
        # Setting categorical and numeric data types for optimization
        categorical_cols = df.select_dtypes(include='object').columns.tolist()
        numeric_cols = df.select_dtypes(include='number').columns.tolist()

        df[categorical_cols] = df[categorical_cols].astype('category')
        df[numeric_cols] = df[numeric_cols].astype('float32')
        
        return df
    
    def load_and_process(self, file_path, drop_cols=None):
        # Loading data with polars for efficiency
        df = pl.read_csv(file_path, batch_size=self.batch_size, low_memory=self.low_memory).to_pandas()
        
        # This is to drop unnecessary columns
        if drop_cols:
            df = self.drop_unwanted_columns(df, drop_cols)
        
        # Optimizing data types
        df = self.optimize_dtypes(df)
        
        return df

# Initializing feature engineering
fe = FeatureEngineering(Config.batch_size, Config.low_memory)

# Processing training data and unneeded columns will be dropped
train_df = fe.load_and_process(Config.train_data_path, drop_cols=['num_wins_agent1', 'num_draws_agent1', 'num_losses_agent1'])


# Model Development class
class ModelDevelopment:
    def __init__(self, model_params):
        self.model_params = model_params
        self.models = []
    
    def kfold_lightgbm(self, X_train, y_train, n_splits=5):
        # KFold cross-validation
        kf = KFold(n_splits=n_splits, shuffle=True, random_state=42)
        oof_preds = pd.Series(0.0, index=X_train.index)
        
        for fold, (train_idx, val_idx) in enumerate(kf.split(X_train, y_train)):
            print(f'Fold {fold + 1}')
            
            X_tr, X_val = X_train.iloc[train_idx], X_train.iloc[val_idx]
            y_tr, y_val = y_train.iloc[train_idx], y_train.iloc[val_idx]
            
            # Initializing LightGBM model
            model = lgb.LGBMRegressor(**self.model_params)
            
            # Model fitting with early stopping callback
            model.fit(
                X_tr, y_tr,
                eval_set=[(X_val, y_val)],
                eval_metric='rmse',
                callbacks=[early_stopping(stopping_rounds=50, verbose=True)]
            )
            
            # Storing the model for future use
            self.models.append(model)
            
            # Making out-of-fold predictions and cast to float64
            oof_preds.iloc[val_idx] = model.predict(X_val).astype(float)
        
        # Calculating the out-of-fold RMSE score
        oof_rmse = mean_squared_error(y_train, oof_preds, squared=False)
        print(f'OOF RMSE: {oof_rmse:.4f}')
    
    def predict(self, X_test):
        if self.models:
            preds = sum([model.predict(X_test) for model in self.models]) / len(self.models)
            return preds
        else:
            raise ValueError("Models are not trained yet")

# Initialize model development
md = ModelDevelopment(Config.lgb_params)

# Training function
def train_model():
    # Load training data and drop columns
    train_df = fe.load_and_process(Config.train_data_path, drop_cols=['num_wins_agent1', 'num_draws_agent1', 'num_losses_agent1'])
    
    # Defining features and target data
    X_train = train_df.drop('utility_agent1', axis=1)
    y_train = train_df['utility_agent1']
    
    # Training the model with KFold
    md.kfold_lightgbm(X_train, y_train)
    
    # Clean up
    del train_df
    gc.collect()


# Initializing a counter for prediction calls
counter = 0

# Defining the predict function for the API
def predict(test, submission):
    global counter
    
    # Counter for prediction call    
    if counter == 0:
        train_model()
    
    # Incrementing the counter for each prediction call
    counter += 1
    
    # Prepare test data and drop columns
    test_df = fe.load_and_process(Config.test_data_path, drop_cols=['num_wins_agent1', 'num_draws_agent1', 'num_losses_agent1'])
    
    # Make predictions
    predictions = md.predict(test_df)
    
    # Assign predictions to the submission DataFrame
    return submission.with_columns(pl.Series('utility_agent1', predictions))


inference_server = mcts_inference_server.MCTSInferenceServer(predict)

if os.getenv('KAGGLE_IS_COMPETITION_RERUN'):
    inference_server.serve()
else:
    inference_server.run_local_gateway(
        (Config.test_data_path, Config.submission_path)
    )

In [1]:
import os
import polars as pl
from sklearn.preprocessing import OrdinalEncoder
import lightgbm as lgb
import kaggle_evaluation.mcts_inference_server
import joblib
from pathlib import Path

# Function to drop columns with a single unique value
def drop_single_value_cols(df):
    single_value_cols = [col for col in df.columns if df[col].n_unique() == 1]
    return df.drop(single_value_cols)

# Model training function
def train_model():
    global obj_cols, enc, lgb_model, train_cols

    # Load training data
    train = pl.read_csv('/kaggle/input/um-game-playing-strength-of-mcts-variants/train.csv')
    
    # Target variable
    y_train = train['utility_agent1']
    
    # Drop columns with single values in the train dataset
    train = drop_single_value_cols(train)

    # Drop unnecessary columns
    cols_to_drop = ['num_draws_agent1', 'num_losses_agent1', 'num_wins_agent1', 'utility_agent1']
    train = train.drop(cols_to_drop)

    # Encode categorical columns
    obj_cols = train.select(pl.col(pl.String)).columns
    enc = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-999, encoded_missing_value=-9999)
    enc.fit(train[obj_cols])

    # This will transform the categorical columns in the training data
    train_transformed = enc.transform(train[obj_cols])
    for e, c in enumerate(obj_cols):
        train = train.with_columns(pl.Series(c, train_transformed[:, e]))

    # Storing the train columns for comparison with the test dataset
    train_cols = set(train.columns) 

    # Prepare training data
    X_train = train.to_pandas()
    y_train = y_train.to_pandas()

    # Define LightGBM dataset
    lgb_train = lgb.Dataset(X_train, label=y_train)

    # LightGBM model parameters
    lgb_model = lgb.LGBMRegressor(
        n_estimators=200, 
        learning_rate=0.1, 
        max_depth=20, 
        num_leaves=70, 
        n_jobs=-1
    )

    # Model training
    lgb_model.fit(
        X_train, 
        y_train, 
        eval_metric='mse'  # Set the metric to MSE
    )


counter = 0  # Initializing counter

def preprocess_and_predict(df: pl.DataFrame, is_training: bool):
    global obj_cols, enc, train_cols

    # Drop columns with single values
    #df = drop_single_value_cols(df)

    if is_training:
        # Encoding categorical columns during training
        enc = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-999, encoded_missing_value=-9999)
        enc.fit(df[obj_cols])
        
        # This will transform the categorical columns
        df_transformed = enc.transform(df[obj_cols])
        for e, c in enumerate(obj_cols):
            df = df.with_columns(pl.Series(c, df_transformed[:, e]))
        
        # Storing the columns for future predictions
        train_cols = set(df.columns)
    else:
        # Applying the same encoding to the test data
        df_transformed = enc.transform(df[obj_cols])
        for e, c in enumerate(obj_cols):
            df = df.with_columns(pl.Series(c, df_transformed[:, e]))
    
    return df


def predict(test, submission):
    global counter, obj_cols, train_cols, enc, lgb_model

    try:
        if counter == 0:
            train_model()
        counter += 1

        # Loading of the test data
        test = pl.read_csv('/kaggle/input/um-game-playing-strength-of-mcts-variants/test.csv')
        #test = drop_single_value_cols(test)

        # Droppig columns in test dataset but not present in training data
        test_cols = set(test.columns)
        extra_cols = test_cols - train_cols
        test = test.drop(list(extra_cols))

        # This will apply the same encoding to the test data
        test_transformed = enc.transform(test[obj_cols])
        for e, c in enumerate(obj_cols):
            test = test.with_columns(pl.Series(c, test_transformed[:, e]))

        # Prepare the test data for prediction
        X_test = test.to_pandas()

        # Make predictions
        predictions = lgb_model.predict(X_test)

        # Update the submission DataFrame with predictions
        submission = submission.with_columns(pl.Series('utility_agent1', predictions))

        # Save the predictions as 'submission.parquet'
        submission.write_parquet('submission.parquet')

    except Exception as e:
        print(f"An error occurred: {e}")
        return None

    return submission





ModuleNotFoundError: No module named 'polars'

In [None]:
import os

import polars as pl

from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import OrdinalEncoder
from sklearn.ensemble import RandomForestRegressor

import kaggle_evaluation.mcts_inference_server
from pathlib import Path

# Path to competition files
comp_path = Path('/kaggle/input/um-game-playing-strength-of-mcts-variants')

target = 'utility_agent1'
counter = 0

# Load training data
train = pl.read_csv(comp_path / 'train.csv')
y_train = train[target]

cols_to_drop = ['num_draws_agent1', 'num_losses_agent1', 'num_wins_agent1', target]
train = train.drop(cols_to_drop)

# Encode categorical columns
obj_cols = train.select(pl.col(pl.String)).columns
enc = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-999, encoded_missing_value=-9999)
enc.fit(train[obj_cols])

train_transformed = enc.transform(train[obj_cols])
for e, c in enumerate(obj_cols):
    train = train.with_columns(pl.Series(c, train_transformed[:, e]))

# Define models to test with Bagging
models = {
    'rf': RandomForestRegressor(),
    'gbr': GradientBoostingRegressor(),
}

# Define parameter grids for grid search
param_grid = {
    'rf': {'n_estimators': [100, 200], 'max_depth': [5, 10]},
    'gbr': {'n_estimators': [100, 200], 'learning_rate': [0.01, 0.1]},
}

# Perform GridSearchCV for each model
best_models = {}
for model_name in models:
    grid = GridSearchCV(BaggingRegressor(base_estimator=models[model_name], n_jobs=-1),
                        param_grid[model_name], cv=3, scoring='neg_mean_squared_error')
    grid.fit(train, y_train)
    best_models[model_name] = grid.best_estimator_

    # Output best parameters
    print(f"Best parameters for {model_name}: {grid.best_params_}")

# Select the best model based on cross-validation score
best_model_name = max(best_models, key=lambda m: -grid.best_score_)
best_model = best_models[best_model_name]
print(f"Best model: {best_model_name}")

# Save the best model
joblib.dump(best_model, 'best_model.joblib')


In [None]:
import joblib

counter = 0
best_model = None

# Prediction function
def predict(test: pl.DataFrame, submission: pl.DataFrame):
    global counter, best_model
    if counter == 0:
        # Load the pre-trained model during the first call
        best_model = joblib.load('best_model.joblib')

    counter += 1

    # Transform test data
    test_transformed = enc.transform(test[obj_cols])
    for e, c in enumerate(obj_cols):
        test = test.with_columns(pl.Series(c, test_transformed[:, e]))

    # Make predictions
    predictions = best_model.predict(test)

    # Update the submission DataFrame with predictions
    submission = submission.with_columns(pl.Series(target, predictions))

    # Save the predictions as 'submission.parquet'
    submission.write_parquet('submission.parquet')

    # Return the submission DataFrame
    return submission

# Initialize the inference server
inference_server = kaggle_evaluation.mcts_inference_server.MCTSInferenceServer(predict)

if os.getenv('KAGGLE_IS_COMPETITION_RERUN'):
    inference_server.serve()
else:
    inference_server.run_local_gateway(
        (
            '/kaggle/input/um-game-playing-strength-of-mcts-variants/test.csv',
            '/kaggle/input/um-game-playing-strength-of-mcts-variants/sample_submission.csv'
        )
    )

In [None]:
import os
import polars as pl
from sklearn.preprocessing import OrdinalEncoder
from sklearn.ensemble import RandomForestRegressor

import kaggle_evaluation.mcts_inference_server
from pathlib import Path

# Path to competition files
comp_path = Path('/kaggle/input/um-game-playing-strength-of-mcts-variants')

target = 'utility_agent1'
counter = 0

# Model training function
def train_model():
    global obj_cols, enc, rf

    # Load training data
    train = pl.read_csv(comp_path / 'train.csv')
    y_train = train[target]

    # Drop unnecessary columns
    cols_to_drop = ['num_draws_agent1', 'num_losses_agent1', 'num_wins_agent1', target]
    train = train.drop(cols_to_drop)
    
    # Encode categorical columns
    obj_cols = train.select(pl.col(pl.String)).columns
    enc = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-999, encoded_missing_value=-9999)
    enc.fit(train[obj_cols])
    
    # Transform the categorical data
    train_transformed = enc.transform(train[obj_cols])
    for e, c in enumerate(obj_cols):
        train = train.with_columns(pl.Series(c, train_transformed[:, e]))
    
    # Train Random Forest Regressor
    rf = RandomForestRegressor(n_estimators=100, max_depth=5, n_jobs=-1)
    rf.fit(train, y_train)

# Prediction function
def predict(test: pl.DataFrame, submission: pl.DataFrame):
    global counter
    if counter == 0:
        # Perform model training during the first call
        train_model()
    counter += 1

    # Transform test data
    test_transformed = enc.transform(test[obj_cols])
    for e, c in enumerate(obj_cols):
        test = test.with_columns(pl.Series(c, test_transformed[:, e]))

    # Make predictions
    predictions = rf.predict(test)

    # Update the submission DataFrame with predictions
    submission = submission.with_columns(pl.Series(target, predictions))

    # Save the predictions as 'submission.parquet'
    submission.write_parquet('submission.parquet')

    # Return the submission DataFrame
    return submission

# Initialize the inference server
inference_server = kaggle_evaluation.mcts_inference_server.MCTSInferenceServer(predict)

if os.getenv('KAGGLE_IS_COMPETITION_RERUN'):
    inference_server.serve()
else:
    inference_server.run_local_gateway(
        (
            '/kaggle/input/um-game-playing-strength-of-mcts-variants/test.csv',
            '/kaggle/input/um-game-playing-strength-of-mcts-variants/sample_submission.csv'
        )
    )


In [None]:
#Describe showing Only the requested statistics (mean, minimum and maximum). Then, transpose the table.

df.describe().loc[['mean','min','max']].T