# Challenge 1 - Tic Tac Toe

In this lab you will perform deep learning analysis on a dataset of playing [Tic Tac Toe](https://en.wikipedia.org/wiki/Tic-tac-toe).

There are 9 grids in Tic Tac Toe that are coded as the following picture shows:

![Tic Tac Toe Grids](tttboard.jpg)

In the first 9 columns of the dataset you can find which marks (`x` or `o`) exist in the grids. If there is no mark in a certain grid, it is labeled as `b`. The last column is `class` which tells you whether Player X (who always moves first in Tic Tac Toe) wins in this configuration. Note that when `class` has the value `False`, it means either Player O wins the game or it ends up as a draw.

Follow the steps suggested below to conduct a neural network analysis using Tensorflow and Keras. You will build a deep learning model to predict whether Player X wins the game or not.

## Step 1: Data Engineering

This dataset is almost in the ready-to-use state so you do not need to worry about missing values and so on. Still, some simple data engineering is needed.

1. Read `tic-tac-toe.csv` into a dataframe.
1. Inspect the dataset. Determine if the dataset is reliable by eyeballing the data.
1. Convert the categorical values to numeric in all columns.
1. Separate the inputs and output.
1. Normalize the input data.

In [1]:
# your code here

# First, we need to import the pandas library. We give it the nickname 'pd' to type less.
import pandas as pd

# Now, we read the file. The result is stored in a variable called 'df' (short for DataFrame).
df = pd.read_csv('tic-tac-toe.csv')

# Let's check if it worked by looking at the first 5 rows of our data.
print(df.head())
print("Shape:", df.shape)

  TL TM TR ML MM MR BL BM BR  class
0  x  x  x  x  o  o  x  o  o   True
1  x  x  x  x  o  o  o  x  o   True
2  x  x  x  x  o  o  o  o  x   True
3  x  x  x  x  o  o  o  b  b   True
4  x  x  x  x  o  o  b  o  b   True
Shape: (958, 10)


In [2]:
# We will use a 'dictionary' to define the mapping from letters to numbers.
# It's like a real dictionary that translates words.
mapping = {'b': 0, 'o': 1, 'x': 2}

# Now, we apply this mapping to the first 9 columns.
# The .iloc[:, :9] part means "all rows, first 9 columns".
df_numeric = df.iloc[:, :9].applymap(mapping.get)

  df_numeric = df.iloc[:, :9].applymap(mapping.get)



***

### 3. Why This Encoding? (0, 1, 2)

We used a simple numeric encoding (`b`=0, `o`=1, `x`=2) instead of One-Hot Encoding for a specific reason:

*   **Meaningful Order:** The categories here have a natural relationship for our specific goal. The value `2` (X) is the player whose win we are predicting, making it the most significant value. The value `1` (O) is the opponent. The value `0` (blank) is the absence of a player.
*   **Preserving Information:** This numeric scale (0, 1, 2) subtly tells the neural network that an `x` is "greater than" an `o`, which is "greater than" a `b` in the context of predicting X's victory. One-Hot Encoding would have treated all three options as equally separate, losing this useful hint for the model.
  ----

In [3]:
# Let's check our work! Print the first 5 rows of the new numeric dataframe.
print("Numeric Data:")
print(df_numeric.head())

# Now, let's also check the 'class' column to see what it looks like.
print("\nOriginal Class Column:")
print(df['class'].head())

Numeric Data:
   TL  TM  TR  ML  MM  MR  BL  BM  BR
0   2   2   2   2   1   1   2   1   1
1   2   2   2   2   1   1   1   2   1
2   2   2   2   2   1   1   1   1   2
3   2   2   2   2   1   1   1   0   0
4   2   2   2   2   1   1   0   1   0

Original Class Column:
0    True
1    True
2    True
3    True
4    True
Name: class, dtype: bool


### 4. Separate the inputs and output.


In [4]:
# Step 1: Separate inputs and output
# X will be all rows (:) and all columns except the last one (:-1)
X = df_numeric.values  # .values converts the DataFrame to a simple NumPy array, which Keras prefers.

# y will be all rows (:) and only the last column (-1)
# We also use .astype(int) to convert True/False into 1/0. This is required for the next steps.
y = df['class'].astype(int).values

# Let's check the shapes to make sure it worked!
print("Shape of X (Input features):", X.shape)
print("Shape of y (Target variable):", y.shape)
print("\nFirst 5 rows of y:")
print(y[:5])

Shape of X (Input features): (958, 9)
Shape of y (Target variable): (958,)

First 5 rows of y:
[1 1 1 1 1]


### 5. Normalize the input data.

In [5]:
# Normalize the input data (X) by dividing by the maximum value (2)
X_normalized = X / 2.0

# Let's check the first row to see the result
print("First row of original X:", X[0])
print("First row of normalized X:", X_normalized[0])

First row of original X: [2 2 2 2 1 1 2 1 1]
First row of normalized X: [1.  1.  1.  1.  0.5 0.5 1.  0.5 0.5]


We normalized the input features (`X`) by dividing all values by the maximum value (2). This transformed the data range from `[0, 1, 2]` to `[0.0, 0.5, 1.0]`.

**Why?**
Neural networks perform best when all input features are on a **common scale**. Normalization helps the model's internal math work more efficiently, leading to faster training and often better performance. It ensures no single feature (grid cell) dominates the learning process just because it has larger numbers.

## Step 2: Build Neural Network

To build the neural network, you can refer to your own codes you wrote while following the [Deep Learning with Python, TensorFlow, and Keras tutorial](https://www.youtube.com/watch?v=wQ8BIBpya2k) in the lesson. It's pretty similar to what you will be doing in this lab.

1. Split the training and test data.
1. Create a `Sequential` model.
1. Add several layers to your model. Make sure you use ReLU as the activation function for the middle layers. Use Softmax for the output layer because each output has a single lable and all the label probabilities add up to 1.
1. Compile the model using `adam` as the optimizer and `sparse_categorical_crossentropy` as the loss function. For metrics, use `accuracy` for now.
1. Fit the training data.
1. Evaluate your neural network model with the test data.
1. Save your model as `tic-tac-toe.model`.

### Step 2.1: Split the data into Training and Test sets

In [6]:
# Import the function we need to split the data
from sklearn.model_selection import train_test_split

# Split the data (X_normalized is our input, y is our output)
# test_size=0.2 means 20% of data is for testing
# random_state=42 is like choosing a specific seed for a random number generator.
# This ensures we get the same split every time we run the code, which is good for reproducibility.
X_train, X_test, y_train, y_test = train_test_split(X_normalized, y, test_size=0.2, random_state=42)

# Let's check the shapes of our new sets
print("Training data shape:", X_train.shape)
print("Testing data shape:", X_test.shape)
print("Training labels shape:", y_train.shape)
print("Testing labels shape:", y_test.shape)

Training data shape: (766, 9)
Testing data shape: (192, 9)
Training labels shape: (766,)
Testing labels shape: (192,)


### Step 2.2: Create a `Sequential` Model

In Keras (the library we use with TensorFlow), a Sequential model is like building a layer cake. We add one layer on top of the other, in sequence.

In [7]:
# your code here

# First, we need to import the necessary parts from TensorFlow
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Create a Sequential model
model = Sequential()
print("Empty model created. Now we will add layers to it.")


Empty model created. Now we will add layers to it.


### Step 2.3: Add Layers to the Model



In [8]:
# Add the input and first hidden layer (128 neurons)

# 'Dense' means every neuron is connected to every neuron in the next layer.
# input_dim=9 defines the input layer (9 features from our board).
# units=128 defines how many neurons this layer has. We can start with this number.
# activation='relu' is the function that helps the model learn non-linear patterns.
model.add(Dense(units=128, activation='relu', input_dim=9))

# Add a second hidden layer (64 neurons)
model.add(Dense(units=64, activation='relu'))

# 3. CHANGE THE OUTPUT LAYER for 'sparse_categorical_crossentropy'
# We have 2 categories: "X does not win" (class 0) and "X wins" (class 1)
# So we need 2 neurons and softmax activation
model.add(Dense(units=2, activation='softmax'))


# Let's see a summary of our model's structure!
model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
2025-09-08 14:30:36.746869: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M3
2025-09-08 14:30:36.746897: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB
2025-09-08 14:30:36.746903: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB
2025-09-08 14:30:36.746918: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2025-09-08 14:30:36.746928: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


### Step 2.4: Compile the Model

In [14]:
# 4. COMPILE with the instructed loss function
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy', # CHANGED
              metrics=['accuracy'])
print("Model compiled successfully! Ready to train.")


Model compiled successfully! Ready to train.



The `compile` step sets the rules for *how* the neural network will learn.

*   **Optimizer (`optimizer='adam'`):** This is the algorithm that adjusts the model's internal settings to reduce errors. Think of it as the **engine** that drives the learning process. **Adam** is a very popular and efficient optimizer that works well for most problems.

*   **Loss Function (`loss='sparse_categorical_crossentropy'`):** This is how the model calculates its **mistake**. It's a mathematical formula that measures the difference between the model's prediction and the correct answer. This specific function is ideal for classification problems where the outputs are categories (like "win" or "lose").

*   **Metrics (`metrics=['accuracy']`):** This is how we, the humans, evaluate the model's performance. **Accuracy** is the simple percentage of correct predictions, which is easy for us to understand.

In short: **The compiler sets up the "engine" (Adam), defines how to measure "mistakes" (Loss), and how to report "progress" (Accuracy).**

### Step 2.5: Fit the Training Data

In [15]:
# Train the model
history = model.fit(X_train, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_data=(X_test, y_test)) # This lets us see test accuracy after each epoch

print("Model training complete!")


Epoch 1/10


2025-09-08 14:34:10.528781: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.


[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 30ms/step - accuracy: 0.6332 - loss: 0.6595 - val_accuracy: 0.6510 - val_loss: 0.6435
Epoch 2/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6658 - loss: 0.6247 - val_accuracy: 0.6667 - val_loss: 0.6296
Epoch 3/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6567 - loss: 0.6261 - val_accuracy: 0.6406 - val_loss: 0.6409
Epoch 4/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.6580 - loss: 0.6281 - val_accuracy: 0.6615 - val_loss: 0.6275
Epoch 5/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.6736 - loss: 0.6207 - val_accuracy: 0.6667 - val_loss: 0.6256
Epoch 6/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.6802 - loss: 0.6183 - val_accuracy: 0.6719 - val_loss: 0.6251
Epoch 7/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━

### Step 2.6: Evaluate the Model with the Test Data



In [16]:
# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(X_test, y_test)

# Print the results in a nice format
print("\n--- Final Evaluation on Test Set ---")
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6667 - loss: 0.6263 

--- Final Evaluation on Test Set ---
Test Loss: 0.6263
Test Accuracy: 0.6667


### Step 2.7: Save your model as `tic-tac-toe.model`.

In [18]:
# Save the model using the newer .keras format (recommended)
model.save('tic-tac-toe.model.keras') # Just add .keras to the end
print("Model saved as 'tic-tac-toe.model.keras'")

Model saved as 'tic-tac-toe.model.keras'


## Step 3: Make Predictions

Now load your saved model and use it to make predictions on a few random rows in the test dataset. Check if the predictions are correct.

In [19]:
# your code here

# Step 3: Make Predictions

# 1. Load the saved model
# (Use the exact same filename you used to save it, including the .keras extension)
loaded_model = tf.keras.models.load_model('tic-tac-toe.model.keras')
print("Model loaded successfully!")

# 2. Let's select 5 random rows from the test data (X_test)
import numpy as np

# Get 5 random indices from the test set
random_indices = np.random.choice(len(X_test), size=5, replace=False)

# Use those indices to get the actual data and labels
random_test_samples = X_test[random_indices]
random_true_labels = y_test[random_indices]

# 3. Use the loaded model to make predictions on these samples
# model.predict() returns probabilities for each class [prob_class_0, prob_class_1]
predictions = loaded_model.predict(random_test_samples)
print("\nRaw predictions (probabilities for class 0 and class 1):")
print(predictions)

# 4. Interpret the predictions:
# The model outputs probabilities for each class. We take the class with the highest probability.
# np.argmax finds the index of the highest value in each prediction.
predicted_classes = np.argmax(predictions, axis=1)
print("\nPredicted class (0 = X does NOT win, 1 = X wins):", predicted_classes)
print("    True class (0 = X does NOT win, 1 = X wins):", random_true_labels)

# 5. Check if the predictions are correct
print("\nChecking predictions:")
for i in range(5):
    is_correct = (predicted_classes[i] == random_true_labels[i])
    result = "✓ CORRECT" if is_correct else "✗ WRONG"
    print(f"Sample {i+1}: Predicted {predicted_classes[i]}, Actual {random_true_labels[i]} - {result}")

Model loaded successfully!
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 250ms/step

Raw predictions (probabilities for class 0 and class 1):
[[0.15790036 0.8420996 ]
 [0.37385976 0.6261403 ]
 [0.27907187 0.72092813]
 [0.5589671  0.4410329 ]
 [0.310625   0.68937504]]

Predicted class (0 = X does NOT win, 1 = X wins): [1 1 1 0 1]
    True class (0 = X does NOT win, 1 = X wins): [0 1 1 1 1]

Checking predictions:
Sample 1: Predicted 1, Actual 0 - ✗ WRONG
Sample 2: Predicted 1, Actual 1 - ✓ CORRECT
Sample 3: Predicted 1, Actual 1 - ✓ CORRECT
Sample 4: Predicted 0, Actual 1 - ✗ WRONG
Sample 5: Predicted 1, Actual 1 - ✓ CORRECT


## Step 4: Improve Your Model

Did your model achieve low loss (<0.1) and high accuracy (>0.95)? If not, try to improve your model.

But how? There are so many things you can play with in Tensorflow and in the next challenge you'll learn about these things. But in this challenge, let's just do a few things to see if they will help.

* Add more layers to your model. If the data are complex you need more layers. But don't use more layers than you need. If adding more layers does not improve the model performance you don't need additional layers.
* Adjust the learning rate when you compile the model. This means you will create a custom `tf.keras.optimizers.Adam` instance where you specify the learning rate you want. Then pass the instance to `model.compile` as the optimizer.
    * `tf.keras.optimizers.Adam` [reference](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam).
    * Don't worry if you don't understand what the learning rate does. You'll learn about it in the next challenge.
* Adjust the number of epochs when you fit the training data to the model. Your model performance continues to improve as you train more epochs. But eventually it will reach the ceiling and the performance will stay the same.

### 4.1:
* Add more layers to your model. If the data are complex you need more layers. But don't use more layers than you need. If adding more layers does not improve the model performance you don't need additional layers.


In [21]:
# Improvement 1: Bigger Model (More layers/neurons)
print("--- Improvement 1: Training a Bigger Model ---")

improved_model = Sequential()
improved_model.add(Dense(units=256, activation='relu', input_dim=9))
improved_model.add(Dense(units=128, activation='relu'))
improved_model.add(Dense(units=64, activation='relu'))
improved_model.add(Dense(units=2, activation='softmax'))

improved_model.compile(optimizer='adam',
                       loss='sparse_categorical_crossentropy',
                       metrics=['accuracy'])

# Train for 10 epochs (same as before)
history_1 = improved_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))

# Evaluate
test_loss_1, test_accuracy_1 = improved_model.evaluate(X_test, y_test, verbose=0)
print(f"Bigger Model Test Accuracy: {test_accuracy_1:.4f}\n")

--- Improvement 1: Training a Bigger Model ---
Epoch 1/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 16ms/step - accuracy: 0.6266 - loss: 0.6393 - val_accuracy: 0.6771 - val_loss: 0.6233
Epoch 2/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.6606 - loss: 0.6254 - val_accuracy: 0.6771 - val_loss: 0.6221
Epoch 3/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.6554 - loss: 0.6241 - val_accuracy: 0.6615 - val_loss: 0.6340
Epoch 4/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.6580 - loss: 0.6353 - val_accuracy: 0.6302 - val_loss: 0.6289
Epoch 5/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.6345 - loss: 0.6341 - val_accuracy: 0.6146 - val_loss: 0.6317
Epoch 6/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.6305 - loss: 0.6507 - val_accuracy: 0.6198 - val_loss

### 4.2:
* Adjust the learning rate when you compile the model. This means you will create a custom `tf.keras.optimizers.Adam` instance where you specify the learning rate you want. Then pass the instance to `model.compile` as the optimizer.
    * `tf.keras.optimizers.Adam` [reference](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam).
    * Don't worry if you don't understand what the learning rate does. You'll learn about it in the next challenge.


In [22]:
# Improvement 2: Adjust Learning Rate
print("--- Improvement 2: Adjusting Learning Rate ---")

from tensorflow.keras.optimizers import Adam

# Create a new model (same architecture as the original)
lr_model = Sequential()
lr_model.add(Dense(units=128, activation='relu', input_dim=9))
lr_model.add(Dense(units=64, activation='relu'))
lr_model.add(Dense(units=2, activation='softmax'))

# Use a custom Adam optimizer with a smaller learning rate
custom_adam = Adam(learning_rate=0.0005) # Default is 0.001
lr_model.compile(optimizer=custom_adam,
                 loss='sparse_categorical_crossentropy',
                 metrics=['accuracy'])

# Train for 10 epochs
history_2 = lr_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))

# Evaluate
test_loss_2, test_accuracy_2 = lr_model.evaluate(X_test, y_test, verbose=0)
print(f"Adjusted LR Model Test Accuracy: {test_accuracy_2:.4f}\n")

--- Improvement 2: Adjusting Learning Rate ---
Epoch 1/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step - accuracy: 0.6410 - loss: 0.6491 - val_accuracy: 0.6510 - val_loss: 0.6395
Epoch 2/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6593 - loss: 0.6241 - val_accuracy: 0.6562 - val_loss: 0.6297
Epoch 3/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6736 - loss: 0.6208 - val_accuracy: 0.6927 - val_loss: 0.6262
Epoch 4/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6958 - loss: 0.6177 - val_accuracy: 0.6823 - val_loss: 0.6252
Epoch 5/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6867 - loss: 0.6148 - val_accuracy: 0.6875 - val_loss: 0.6252
Epoch 6/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6945 - loss: 0.6152 - val_accuracy: 0.6875 - val_loss:

### 4.3 Improvement 3: Train for More Epochs

* Adjust the number of epochs when you fit the training data to the model. Your model performance continues to improve as you train more epochs. But eventually it will reach the ceiling and the performance will stay the same.

In [23]:
# Improvement 3: Train for More Epochs
print("--- Improvement 3: Training for More Epochs ---")

# Let's take the best model so far and train it longer
# We'll use the model from Improvement 1 (bigger model) and train for 10 more epochs (20 total)
history_3 = improved_model.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_test, y_test))

# Evaluate
test_loss_3, test_accuracy_3 = improved_model.evaluate(X_test, y_test, verbose=0)
print(f"More Epochs Model Test Accuracy: {test_accuracy_3:.4f}\n")

--- Improvement 3: Training for More Epochs ---
Epoch 1/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.6057 - loss: 0.7113 - val_accuracy: 0.6458 - val_loss: 0.6694
Epoch 2/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.6240 - loss: 0.6762 - val_accuracy: 0.5156 - val_loss: 0.7128
Epoch 3/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.5849 - loss: 0.7532 - val_accuracy: 0.5833 - val_loss: 0.7905
Epoch 4/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.5666 - loss: 0.8505 - val_accuracy: 0.6250 - val_loss: 0.9155
Epoch 5/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.5705 - loss: 0.8226 - val_accuracy: 0.6094 - val_loss: 0.8096
Epoch 6/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.5966 - loss: 0.8028 - val_accuracy: 0.6146 - val_los

**Which approach(es) did you find helpful to improve your model performance?**

In [None]:
# # your answer here

# **Answer:**
# Based on my experiments, none of the suggested approaches 
# (adding more layers, adjusting the learning rate, training for more epochs) 
# significantly improved the model's performance on the test set beyond the original ~67% accuracy.

# The best result was achieved by **adjusting the learning rate** (0.0005), 
# which yielded a test accuracy of **67.71%**, a very minor improvement.

# This suggests that a simple neural network might be
#  struggling to learn the underlying patterns of Tic Tac Toe wins from this dataset with these features. 
# The model's performance seems to have a ceiling with this architecture.