# Feature engineering. Define the train and test sets. Build two dummy models

In [1]:
# increase the width of the notebook
from IPython.display import display, HTML, Markdown

display(HTML("<style>.container { width:90% !important; }</style>"))

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split

In [3]:
df = pd.read_csv("data_ML.csv",sep=",")

In [4]:
df.head(5)

Unnamed: 0,WhiteElo,BlackElo,EloDif,Opening_name,Time_format,Increment_binary,Score
0,1851,1901,-50,Alekhine's defense,classical,Yes,1.0
1,2060,2111,-51,French Defense,blitz,Yes,0.0
2,2307,2290,17,Philidor Defense,blitz,No,0.5
3,2380,2419,-39,Sicilian defense,rapid,No,0.0
4,2686,2848,-162,Ruy Lopez,rapid,No,0.0


### Given the various problems encountered during our initial model building attempts and the lack of improvement with PCA, a dedicated feature engineering phase is necessary to potentially extract more predictive information from our data.

In [5]:
df = df.drop("BlackElo", axis=1) #"BlackElo" is redundant

## Make Score Categorical

In [6]:
df['Score'] = df['Score'].map({
    1.0: 'White Win',
    0.5: 'Draw',
    0.0: 'Black Win'
})

In [7]:
df.head()

Unnamed: 0,WhiteElo,EloDif,Opening_name,Time_format,Increment_binary,Score
0,1851,-50,Alekhine's defense,classical,Yes,White Win
1,2060,-51,French Defense,blitz,Yes,Black Win
2,2307,17,Philidor Defense,blitz,No,Draw
3,2380,-39,Sicilian defense,rapid,No,Black Win
4,2686,-162,Ruy Lopez,rapid,No,Black Win


### Keep the top 10 openings.

In [8]:
top10_openings = df['Opening_name'].value_counts().nlargest(10).index

In [9]:
# Merge all other openings into "Other"
df['Opening_name'] = df['Opening_name'].where(
    df['Opening_name'].isin(top10_openings), 
    'Other'
)

In [10]:
df['Opening_name'].value_counts()

Other                                            27530
Sicilian defense                                 14435
Queen's Pawn Game                                 8721
French Defense                                    5398
English Opening                                   5181
Caro-Kann defense                                 3747
Irregular Openings                                3565
Queen's Gambit                                    3413
Scandinavian Defense (Center-Counter Defense)     3046
Closed Game, Irregular Responses                  2584
Zukertort Opening                                 2380
Name: Opening_name, dtype: int64

## I want to make sure that the same train and test sets will be used in all Jupyter notebooks.

In [12]:
train, test = train_test_split(df, test_size=10000, random_state=42,  stratify=df['Score'])

In [13]:
train.to_csv("train.csv", index=False)
test.to_csv("test.csv", index=False)

## Create two dummy classification models  

In [14]:
from sklearn.dummy import DummyClassifier
from sklearn.metrics import accuracy_score, classification_report

In [15]:
X_train = train.drop('Score', axis=1)
X_test  = test.drop('Score', axis=1)

In [16]:
y_train = train['Score']
y_test  = test ['Score']

In [17]:
# 1) Most Frequent Dummy Classifier
clf_most_frequent = DummyClassifier(strategy='most_frequent', random_state=42)
clf_most_frequent.fit(X_train, y_train)
y_pred_mf = clf_most_frequent.predict(X_test)

In [18]:
# 2) Stratified Random Guessing Dummy Classifier
clf_stratified = DummyClassifier(strategy='stratified', random_state=42)
clf_stratified.fit(X_train, y_train)
y_pred_strat = clf_stratified.predict(X_test)

In [19]:
# Evaluate both classifiers
print("=== Most Frequent Dummy Classifier ===")
print("Accuracy:", accuracy_score(y_test, y_pred_mf))
print("Classification Report:\n", classification_report(y_test, y_pred_mf))

print("=== Stratified Dummy Classifier ===")
print("Accuracy:", accuracy_score(y_test, y_pred_strat))
print("Classification Report:\n", classification_report(y_test, y_pred_strat))

=== Most Frequent Dummy Classifier ===
Accuracy: 0.491
Classification Report:
               precision    recall  f1-score   support

   Black Win       0.00      0.00      0.00      4524
        Draw       0.00      0.00      0.00       566
   White Win       0.49      1.00      0.66      4910

    accuracy                           0.49     10000
   macro avg       0.16      0.33      0.22     10000
weighted avg       0.24      0.49      0.32     10000

=== Stratified Dummy Classifier ===
Accuracy: 0.4549


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Classification Report:
               precision    recall  f1-score   support

   Black Win       0.46      0.45      0.46      4524
        Draw       0.06      0.06      0.06       566
   White Win       0.50      0.50      0.50      4910

    accuracy                           0.45     10000
   macro avg       0.34      0.34      0.34     10000
weighted avg       0.45      0.45      0.45     10000




#### Most Frequent Dummy (Accuracy ≈ 0.491):
#### By always predicting the single most common outcome in the test set, this classifier is right just under half the time. Its accuracy essentially equals the proportion of the majority class in your data.

#### Stratified Dummy (Accuracy ≈ 0.4549):
#### By randomly sampling predictions according to the empirical class frequencies, this strategy does a bit worse. It reflects the difficulty of the task: purely random draws, even when respecting label ratios, can’t beat the baseline of just guessing the most common class.

## Our task is to build models that exceed these baselines—ideally pushing well above 0.49 accuracy—and also offer balanced performance across all three outcomes (white win, draw, black win)