<a href="https://colab.research.google.com/github/aswin-ar3669/RLA-Python-Learning/blob/main/RLA_Unit_1_Programs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Function to initialize the board
def initialize_board():
    return [[" " for _ in range(3)] for _ in range(3)]

# Function to display the board # Displays the current state of the board in a visually appealing format
def display_board(board):
    print("\n")
    for row in board:
        print(" | ".join(row))
        print("-" * 5)
    print("\n")

# Function to check if a move is valid
def is_valid_move(board, row, col):
    return 0 <= row < 3 and 0 <= col < 3 and board[row][col] == " "

# Function to check for a win
def check_winner(board, player):
    # Check rows, columns, and diagonals
    for i in range(3):
        if all(board[i][j] == player for j in range(3)) or \
           all(board[j][i] == player for j in range(3)):
            return True
    if all(board[i][i] == player for i in range(3)) or \
       all(board[i][2 - i] == player for i in range(3)):
        return True
    return False

# Function to check for a draw
def is_draw(board):
    return all(board[row][col] != " " for row in range(3) for col in range(3))

# Main function to play the game
def play_game():
    print("Welcome to Tic-Tac-Toe!")
    board = initialize_board()
    current_player = "X"

    while True:
        display_board(board)
        print(f"Player {current_player}'s turn")

        # Get user input
        try:
            row, col = map(int, input("Enter row and column (0, 1, or 2) separated by space: ").split())
            if not is_valid_move(board, row, col):
                print("Invalid move! Try again.")
                continue
        except ValueError:
            print("Invalid input! Please enter two integers separated by space.")
            continue

        # Make the move
        board[row][col] = current_player

        # Check for a win or draw
        if check_winner(board, current_player):
            display_board(board)
            print(f"Player {current_player} wins!")
            break
        if is_draw(board):
            display_board(board)
            print("It's a draw!")
            break

        # Switch players
        current_player = "O" if current_player == "X" else "X"

# Run the game
if __name__ == "__main__":
    play_game()


Welcome to Tic-Tac-Toe!


  |   |  
-----
  |   |  
-----
  |   |  
-----


Player X's turn
Enter row and column (0, 1, or 2) separated by space: 1 1


  |   |  
-----
  | X |  
-----
  |   |  
-----


Player O's turn
Enter row and column (0, 1, or 2) separated by space: 1 0


  |   |  
-----
O | X |  
-----
  |   |  
-----


Player X's turn
Enter row and column (0, 1, or 2) separated by space: 2 1


  |   |  
-----
O | X |  
-----
  | X |  
-----


Player O's turn
Enter row and column (0, 1, or 2) separated by space: 0 1


  | O |  
-----
O | X |  
-----
  | X |  
-----


Player X's turn
Enter row and column (0, 1, or 2) separated by space: 0 0


X | O |  
-----
O | X |  
-----
  | X |  
-----


Player O's turn
Enter row and column (0, 1, or 2) separated by space: 2 2


X | O |  
-----
O | X |  
-----
  | X | O
-----


Player X's turn
Enter row and column (0, 1, or 2) separated by space: 2 0


X | O |  
-----
O | X |  
-----
X | X | O
-----


Player O's turn
Enter row and column (0, 1

In [None]:
import numpy as np
import random

class TicTacToeRL:
    def __init__(self):
        self.board = np.zeros((3, 3), dtype=int)  # 0 for empty, 1 for X, -1 for O
        self.state_values = {}  # State-value function
        self.alpha = 0.2  # Learning rate
        self.epsilon = 0.2  # Exploration probability

    def reset(self):
        self.board = np.zeros((3, 3), dtype=int)

    def get_state(self):
        """Returns a tuple representation of the current board."""
        return tuple(self.board.flatten())

    def is_winner(self, player):
        """Checks if a player has won."""
        for i in range(3):
            if np.all(self.board[i, :] == player) or np.all(self.board[:, i] == player):
                return True
        if np.all(np.diag(self.board) == player) or np.all(np.diag(np.fliplr(self.board)) == player):
            return True
        return False

    def is_draw(self):
        """Checks if the game is a draw."""
        return not np.any(self.board == 0) and not self.is_winner(1) and not self.is_winner(-1)

    def available_moves(self):
        """Returns a list of available moves."""
        return [(i, j) for i in range(3) for j in range(3) if self.board[i, j] == 0]

    def make_move(self, move, player):
        """Makes a move on the board."""
        self.board[move] = player

    def get_value(self, state):
        """Returns the value of a state, initializing it if necessary."""
        if state not in self.state_values:
            self.state_values[state] = 0.5  # Initialize to neutral value
        return self.state_values[state]

    def choose_action(self, player):
        """Chooses an action using an epsilon-greedy policy."""
        if random.random() < self.epsilon:
            # Explore: Choose a random move
            return random.choice(self.available_moves())
        else:
            # Exploit: Choose the best move based on the value function
            best_value = -float('inf') if player == 1 else float('inf')
            best_move = None
            for move in self.available_moves():
                self.make_move(move, player)
                state = self.get_state()
                value = self.get_value(state)
                self.make_move(move, 0)  # Undo the move
                if (player == 1 and value > best_value) or (player == -1 and value < best_value):
                    best_value = value
                    best_move = move
            return best_move

    def update_value_function(self, prev_state, next_state, reward):
        """Updates the value function using TD(0)."""
        prev_value = self.get_value(prev_state)
        next_value = self.get_value(next_state)
        self.state_values[prev_state] += self.alpha * (reward + next_value - prev_value)

    def train(self, episodes=5000):
        """Trains the agent by self-play."""
        for _ in range(episodes):
            self.reset()
            states = []
            current_player = 1  # Player 1 starts
            while True:
                current_state = self.get_state()
                states.append(current_state)
                move = self.choose_action(current_player)
                self.make_move(move, current_player)

                if self.is_winner(current_player):
                    reward = 1 if current_player == 1 else -1
                    for state in reversed(states):
                        self.update_value_function(state, self.get_state(), reward)
                        reward = 0  # Propagate rewards only to previous states
                    break
                elif self.is_draw():
                    for state in reversed(states):
                        self.update_value_function(state, self.get_state(), 0)
                    break
                current_player *= -1  # Switch players

    def play(self):
        """Plays a game against the trained agent."""
        self.reset()
        print("You are X (Player 1). Agent is O (Player 2).")
        current_player = 1  # Player 1 starts
        while True:
            print("\nCurrent Board:")
            self.print_board()
            if current_player == 1:
                move = tuple(map(int, input("Enter your move (row and column): ").split()))
                if move not in self.available_moves():
                    print("Invalid move! Try again.")
                    continue
            else:
                print("Agent's turn...")
                move = self.choose_action(current_player)
            self.make_move(move, current_player)
            if self.is_winner(current_player):
                print("\nCurrent Board:")
                self.print_board()
                if current_player == 1:
                    print("Congratulations! You win!")
                else:
                    print("Agent wins! Better luck next time.")
                break
            elif self.is_draw():
                print("\nCurrent Board:")
                self.print_board()
                print("It's a draw!")
                break
            current_player *= -1  # Switch players

    def print_board(self):
        """Prints the current board."""
        symbols = {1: "X", -1: "O", 0: " "}
        for row in self.board:
            print(" | ".join(symbols[cell] for cell in row))
            print("-" * 5)

# Initialize and train the agent
agent = TicTacToeRL()
agent.train(episodes=10000)

# Play a game
agent.play()


You are X (Player 1). Agent is O (Player 2).

Current Board:
  |   |  
-----
  |   |  
-----
  |   |  
-----
Enter your move (row and column): 0 0

Current Board:
X |   |  
-----
  |   |  
-----
  |   |  
-----
Agent's turn...

Current Board:
X | O |  
-----
  |   |  
-----
  |   |  
-----
Enter your move (row and column): 2 2

Current Board:
X | O |  
-----
  |   |  
-----
  |   | X
-----
Agent's turn...

Current Board:
X | O |  
-----
  | O |  
-----
  |   | X
-----
Enter your move (row and column): 2 1

Current Board:
X | O |  
-----
  | O |  
-----
  | X | X
-----
Agent's turn...

Current Board:
X | O |  
-----
  | O |  
-----
O | X | X
-----
Enter your move (row and column): 0 2

Current Board:
X | O | X
-----
  | O |  
-----
O | X | X
-----
Agent's turn...

Current Board:
X | O | X
-----
  | O | O
-----
O | X | X
-----
Enter your move (row and column): 1 0

Current Board:
X | O | X
-----
X | O | O
-----
O | X | X
-----
It's a draw!
