<a href="https://colab.research.google.com/github/daaniyahjkhan/buildweekAIML/blob/main/Copy_of_NIM_RL_Teaching_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧠 Reinforcement Learning with the NIM Game
Let's teach our AI how to win a simple game using Q-learning.

## 🎮 The NIM Game Rules
- Start with 21 sticks.
- Each player takes 1, 2, or 3 sticks on their turn.
- The player who takes the **last stick loses**.

We'll train an AI to get smarter over time!

In [2]:
MAX_STICKS = 21
ACTIONS = [1, 2, 3, 4]

## 🧠 Step 1: Create a Q-table
We’ll use a dictionary to store the AI’s knowledge — the expected value (Q) of taking each action in every possible state.

In [3]:
Q = {}

## 🎲 Step 2: Action Choice
Let’s write a function that chooses an action. We’ll use **epsilon-greedy** — random at first, smarter later.

In [5]:

import random

def choose_action(state, epsilon):
    if state not in Q:
        Q[state] = {a: 0 for a in ACTIONS}
    if random.random() < epsilon:
        return random.choice([a for a in ACTIONS if a <= state])
    return min(Q[state], key=Q[state].get)#here



## 💡 Step 3: Q-Value Update Rule
We’ll update the Q-values using this formula:
```
Q(s,a) = Q(s,a) + alpha * (reward + gamma * max(Q(s') - Q(s,a))
```

In [6]:

def update_q(state, action, reward, next_state, alpha=0.1, gamma=0.9):
    if state not in Q:
        Q[state] = {a: 0 for a in ACTIONS}
    if next_state not in Q:
        Q[next_state] = {a: 0 for a in ACTIONS}
    max_q_next = min(Q[next_state].values())#here
    Q[state][action] += alpha * (reward + gamma * max_q_next - Q[state][action])


## 🔁 Step 4: Training Loop
Now we’ll play lots of games where the AI learns from experience.

In [8]:
def train(episodes=10000, epsilon=0.3, alpha=0.1, gamma=0.9):
    for _ in range(episodes):
        state = MAX_STICKS
        last_state, last_action = None, None

        while state > 0:
            action = choose_action(state, epsilon)
            next_state = state - action

            if last_state is not None:
                update_q(last_state, last_action, 0, state, alpha, gamma)

            last_state = state
            last_action = action

            if next_state == 0:
                update_q(state, action, 1, next_state, alpha, gamma)#here
                break

            valid_opponent_actions = [a for a in ACTIONS if a <= next_state]
            if not valid_opponent_actions:
                update_q(last_state, last_action, 0, next_state, alpha, gamma)
                break

            opponent_action = random.choice(valid_opponent_actions)
            state = next_state - opponent_action

            if state <= 0:
                update_q(last_state, last_action, -1, next_state, alpha, gamma)#here
                break


## 🚀 Train the AI!

In [9]:
train()

In [10]:
print(Q)

{21: {1: -0.7003215249421911, 2: -0.7215801515924087, 3: -0.7465234484773017, 4: -0.770468113762695}, 15: {1: -0.7889124333997404, 2: -0.7978252512354524, 3: -0.8150790001284299, 4: -0.8698373772752374}, 11: {1: -0.802385965003625, 2: -0.8294576993863876, 3: -0.853818318124913, 4: -0.8949199229285516}, 9: {1: -0.8639762306323282, 2: -0.8966120399327397, 3: -0.9031224956349677, 4: -0.7659557935077566}, 5: {1: -0.7915911731114064, 2: -0.8436305547009575, 3: -0.9512926195644567, 4: -0.9999999999999996}, 3: {1: -1.015739343434459, 2: -0.9999999999999996, 3: 0.9999999999544168, 4: 0}, 19: {1: -0.5276586892922396, 2: -0.5350186832937612, 3: -0.7617135507251342, 4: -0.545246554413719}, 16: {1: -0.7783645333007905, 2: -0.7871736867380739, 3: -0.8002517885955431, 4: -0.8340459549310305}, 7: {1: -0.9072058210580198, 2: -0.7914194532920034, 3: -0.977699912814908, 4: -0.6714039516171103}, 4: {1: -0.9856036932600014, 2: -0.6503662041146229, 3: -0.9999999999999996, 4: 0.9999998309981739}, 2: {1: -0.

## 🧪 Let’s play against the AI!

In [12]:

def play():
    state = MAX_STICKS
    while state > 0:
        print(f"Sticks left: {state}")
        move = int(input("Your move (1–3): "))
        state -= move
        if state <= 0:
            print("You took the last stick. You lose!")
            return
        if state in Q:
            ai_move = min(Q[state], key=Q[state].get)#here
        else:
            ai_move = random.choice([a for a in ACTIONS if a <= state])
        print(f"AI takes {ai_move} stick(s).")
        state -= ai_move
        if state <= 0:
            print("AI took the last stick. You win!")
            return


In [15]:
play()


Sticks left: 21
Your move (1–3): 1
AI takes 3 stick(s).
Sticks left: 17
Your move (1–3): 2
AI takes 4 stick(s).
Sticks left: 11
Your move (1–3): 3
AI takes 2 stick(s).
Sticks left: 6
Your move (1–3): 3
AI takes 1 stick(s).
Sticks left: 2
Your move (1–3): 1
AI takes 2 stick(s).
AI took the last stick. You win!


## 🎉 Summary
You just trained an agent to play a game using trial-and-error. That’s the magic of Reinforcement Learning!