<a href="https://colab.research.google.com/github/Nusrahkhan/AIML-PRACTICE/blob/main/Day5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧠 Reinforcement Learning with the NIM Game
Let's teach our AI how to win a simple game using Q-learning.

## 🎮 The NIM Game Rules
- Start with 21 sticks.
- Each player takes 1, 2, or 3 sticks on their turn.
- The player who takes the **last stick loses**.

We'll train an AI to get smarter over time!

In [None]:
MAX_STICKS = 21
ACTIONS = [1, 2, 3, 4]

## 🧠 Step 1: Create a Q-table
We’ll use a dictionary to store the AI’s knowledge — the expected value (Q) of taking each action in every possible state.

In [None]:
Q = {}

## 🎲 Step 2: Action Choice
Let’s write a function that chooses an action. We’ll use **epsilon-greedy** — random at first, smarter later.

In [None]:

import random

def choose_action(state, epsilon):
    if state not in Q:
      # epsilon helps to decide how often do we want to take random moves
        Q[state] = {a: 0 for a in ACTIONS}
    if random.random() < epsilon:
        return random.choice([a for a in ACTIONS if a <= state])
    return min(Q[state], key=Q[state].get)


## 💡 Step 3: Q-Value Update Rule
We’ll update the Q-values using this formula:
```
Q(s,a) = Q(s,a) + alpha * (reward + gamma * max(Q(s') - Q(s,a))
```

In [None]:

def update_q(state, action, reward, next_state, alpha=0.1, gamma=0.9):
    if state not in Q:
      # is this state(action taken by ai) is new(taken first time) then saves it to dictionary Q
        Q[state] = {a: 0 for a in ACTIONS}
    if next_state not in Q:
        Q[next_state] = {a: 0 for a in ACTIONS}
    max_q_next = min(Q[next_state].values())
    Q[state][action] += alpha * (reward + gamma * max_q_next - Q[state][action])


## 🔁 Step 4: Training Loop
Now we’ll play lots of games where the AI learns from experience.

In [None]:
def train(episodes=10000, epsilon=0.3, alpha=0.1, gamma=0.9):
    for _ in range(episodes):
        state = MAX_STICKS
        last_state, last_action = None, None

        while state > 0:
            action = choose_action(state, epsilon)
            next_state = state - action

            if last_state is not None:
                update_q(last_state, last_action, 0, state, alpha, gamma)

            last_state = state
            last_action = action

            if next_state == 0:
                update_q(state, action, -1, next_state, alpha, gamma)
                break

            valid_opponent_actions = [a for a in ACTIONS if a <= next_state]
            if not valid_opponent_actions:
                update_q(last_state, last_action, 0, next_state, alpha, gamma)
                break

            opponent_action = random.choice(valid_opponent_actions)
            state = next_state - opponent_action

            if state <= 0:
                update_q(last_state, last_action, 1, next_state, alpha, gamma)
                break


## 🚀 Train the AI!

In [None]:
train()

In [None]:
print(Q)

{21: {1: -0.6484963319299526, 2: -0.6806572761885872, 3: -0.7014796122393314, 4: -0.7186231357178836}, 16: {1: -0.7296225880954113, 2: -0.7274366867986205, 3: -0.7465046107458302, 4: -0.7791810006771072}, 13: {1: -0.7741088552414703, 2: -0.7900704382310804, 3: -0.8099999999999987, 4: -0.7421325821550027}, 11: {1: -0.8099999999999987, 2: -0.7530846821698083, 3: -0.8076790085376747, 4: -0.7432007053621844}, 6: {1: -0.899999999999999, 2: -0.6036594837484002, 3: -0.4726819050197155, 4: -0.5511353005746819}, 3: {1: -0.4762723908616747, 2: 0.10000000000760247, 3: -0.9999999999999996, 4: 0}, 1: {1: -0.9999999999999996, 2: 0.0, 3: 0, 4: 0}, 0: {1: 0, 2: 0, 3: 0, 4: 0}, 17: {1: -0.7221108762958416, 2: -0.7230888565878019, 3: -0.717732230210284, 4: -0.7523526735942319}, 15: {1: -0.7289803337764345, 2: -0.7458856760830072, 3: -0.7733789542262808, 4: -0.783453471251202}, 12: {1: -0.7788362831454898, 2: -0.8099999999999987, 3: -0.7532826834953736, 4: -0.7979088700733771}, 8: {1: -0.8015297447029616

## 🧪 Let’s play against the AI!

In [None]:

def play():
    state = MAX_STICKS
    while state > 0:
        print(f"Sticks left: {state}")
        move = int(input("Your move (1–3): "))
        state -= move
        if state <= 0:
            print("You took the last stick. You lose!")
            return
        if state in Q:
            ai_move = min(Q[state], key=Q[state].get)
        else:
            ai_move = random.choice([a for a in ACTIONS if a <= state])
        print(f"AI takes {ai_move} stick(s).")
        state -= ai_move
        if state <= 0:
            print("AI took the last stick. You win!")
            return


In [None]:
play()

Sticks left: 21
Your move (1–3): 4
AI takes 4 stick(s).
Sticks left: 13
Your move (1–3): 4
AI takes 4 stick(s).
Sticks left: 5
Your move (1–3): 1
AI takes 4 stick(s).
AI took the last stick. You win!


## 🎉 Summary
You just trained an agent to play a game using trial-and-error. That’s the magic of Reinforcement Learning!