# ‚ôüÔ∏è Chess AI ‚Äî Projet M1
### Intelligence Artificielle pour les √âchecs : Ouvertures + Minimax + Q-Learning

---

**Auteurs :** [Vos noms]  
**Encadrant :** [Nom de l'encadrant]  
**Universit√© :** [Universit√©]  
**Ann√©e :** 2024-2025

---

## Plan du Notebook

1. [Installation & Imports](#1)
2. [Gestion des r√®gles d'√©checs (python-chess)](#2)
3. [Livre d'ouvertures](#3)
4. [√âvaluateur de position](#4)
5. [Minimax + Alpha-B√™ta](#5)
6. [Q-Learning & Self-Play](#6)
7. [Agent Hybride](#7)
8. [√âvaluation & Benchmarks](#8)

<a id='1'></a>
## 1. Installation & Imports

In [None]:
# Installation des d√©pendances
!pip install python-chess matplotlib numpy ipython

In [None]:
import sys
sys.path.insert(0, '..')

import chess
import chess.svg
import chess.pgn
import matplotlib.pyplot as plt
import numpy as np
import time
import json
from IPython.display import SVG, display, HTML

# Modules du projet
from src.engine.board import ChessBoard
from src.engine.evaluator import Evaluator
from src.engine.minimax import MinimaxAgent
from src.opening.opening_book import OpeningBook
from src.rl.q_learning import QLearningAgent
from src.agent import ChessAI
from src.utils.visualization import (
    render_board, plot_eval_curve,
    plot_training_progress, plot_piece_heatmap
)

print('‚úÖ Tous les modules charg√©s avec succ√®s !')
print(f'python-chess version : {chess.__version__}')

<a id='2'></a>
## 2. Gestion des R√®gles d'√âchecs avec `python-chess`

La librairie `python-chess` g√®re int√©gralement :
- ‚úÖ Les mouvements l√©gaux (toutes pi√®ces)
- ‚úÖ √âchec, mat, pat, roque, prise en passant
- ‚úÖ Notation FEN et PGN
- ‚úÖ Hash Zobrist pour la table de transposition


In [None]:
# Cr√©ation d'un √©chiquier initial
cb = ChessBoard()
print('Position initiale (FEN) :')
print(cb.to_fen())
print()
print(f'Nombre de coups l√©gaux : {len(cb.get_legal_moves())}')
print(f'Coups l√©gaux (UCI) : {cb.get_legal_moves_uci()[:5]}...')

In [None]:
# Affichage de l'√©chiquier en SVG
display(render_board(cb.board, size=350))

In [None]:
# Test des r√®gles sp√©ciales
# Exemple de Ruy Lopez (3 premiers coups)
cb = ChessBoard()
moves = ['e2e4', 'e7e5', 'g1f3', 'b8c6', 'f1b5']
for uci in moves:
    success = cb.push_uci(uci)
    print(f'Coup {uci}: {"‚úÖ" if success else "‚ùå"}')

print(f'\nPosition apr√®s 1.e4 e5 2.Nf3 Nc6 3.Bb5 (Ruy Lopez) :')
display(render_board(cb.board, size=350, last_move=chess.Move.from_uci('f1b5')))

In [None]:
# Test : Position de mat en 1 (Scholar's Mate)
scholars_mate_fen = 'r1bqkb1r/pppp1Qpp/2n2n2/4p3/2B1P3/8/PPPP1PPP/RNB1K1NR b KQkq -'
cb_mate = ChessBoard(fen=scholars_mate_fen)
print(f'√âchec et mat : {cb_mate.is_checkmate()}')
print(f'R√©sultat : {cb_mate.get_result()}')
display(render_board(cb_mate.board, size=350))

<a id='3'></a>
## 3. Livre d'Ouvertures

L'agent consulte d'abord une base d'ouvertures classiques avant de calculer.

**Ouvertures incluses :** Ruy Lopez ¬∑ Sicilienne ¬∑ Fran√ßaise ¬∑ Gambit Dame ¬∑ Italienne

In [None]:
# Initialisation du livre d'ouvertures
book = OpeningBook(random_weight=True, max_opening_plies=20)

# Test sur la position initiale
board = chess.Board()
print('=== Test du livre d\'ouvertures ===')
print(f'Position : {board.fen()[:30]}...')

# Jouer 5 coups via le livre
for i in range(5):
    move = book.get_move(board)
    if move:
        opening_name = book.get_opening_name(board)
        print(f'  Coup {i+1}: {move.uci()} ‚Äî {opening_name}')
        board.push(move)
    else:
        print(f'  Coup {i+1}: Hors du livre')
        break

print(f'\nPosition finale :')
display(render_board(board, size=350, last_move=move))

In [None]:
# Identifier l'ouverture apr√®s quelques coups
test_positions = [
    (['e2e4', 'c7c5'], 'Sicilienne ?'),
    (['e2e4', 'e7e6'], 'Fran√ßaise ?'),
    (['d2d4', 'd7d5', 'c2c4'], 'Gambit Dame ?'),
    (['e2e4', 'e7e5', 'g1f3', 'b8c6', 'f1b5'], 'Ruy Lopez ?'),
]

for moves, expected in test_positions:
    b = chess.Board()
    for m in moves:
        b.push(chess.Move.from_uci(m))
    name = book.get_opening_name(b)
    print(f'  {expected:<25} ‚Üí {name}')

<a id='4'></a>
## 4. √âvaluateur de Position

La fonction d'√©valuation heuristique combine :
- **Mat√©riel** : valeur des pi√®ces (Pion=100, Cavalier=320, ...)
- **Tables positionnelles** (PST) : bonus/p√©nalit√©s selon la case
- **Mobilit√©** : bonus pour les coups disponibles
- **Contr√¥le du centre** : bonus pour les cases centrales
- **S√©curit√© du roi** : p√©nalit√© si le roi est expos√©

In [None]:
evaluator = Evaluator()

# √âvaluation de la position initiale
board = chess.Board()
score = evaluator.evaluate(board)
print(f'Position initiale : {score:.1f} centipawns')

# Apr√®s 1.e4
board.push(chess.Move.from_uci('e2e4'))
score = evaluator.evaluate(board)
print(f'Apr√®s 1.e4 : {score:.1f} centipawns')

# Avantage mat√©riel
board_advantage = chess.Board('rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKB1R w KQkq -')
score = evaluator.evaluate(board_advantage)
print(f'Blancs avec Cavalier en moins : {score:.1f} centipawns')

In [None]:
# Courbe d'√©valuation sur une partie exemple
board = chess.Board()
sample_moves = [
    'e2e4','e7e5','g1f3','b8c6','f1b5','a7a6','b5a4','g8f6',
    'e1g1','f8e7','f1e1','b7b5','a4b3','d7d6','c2c3','e8g8',
]

eval_scores = []
move_labels = []

for uci in sample_moves:
    move = chess.Move.from_uci(uci)
    board.push(move)
    eval_scores.append(evaluator.evaluate(board))
    move_labels.append(uci)

fig = plot_eval_curve(eval_scores, move_labels, 
                      title="Courbe d'√©valuation ‚Äî Ruy Lopez (Morphy)")
plt.show()

<a id='5'></a>
## 5. Minimax + Alpha-B√™ta Pruning

### Principe de l'algorithme

```
                    N≈ìud racine (Blancs MAX)
                   /           |          \
              e2e4           d2d4         c2c4
             (MIN)          (MIN)        (MIN)
            /    \         /    \
          e7e5  c7c5    d7d5   g8f6
          ...   ...    ...    ...
```

L'√©lagage Alpha-B√™ta √©vite d'explorer des branches qui ne peuvent pas influencer le r√©sultat.

In [None]:
# Test du Minimax
minimax = MinimaxAgent(depth=3, time_limit=10.0)
board = chess.Board()

# Position apr√®s 1.e4 e5
board.push(chess.Move.from_uci('e2e4'))
board.push(chess.Move.from_uci('e7e5'))

print('Position analys√©e :')
display(render_board(board, size=300))

print('\n[Minimax profondeur 3 ‚Äî blancs √† jouer]')
start = time.time()
best_move = minimax.choose_move(board)
elapsed = time.time() - start

print(f'Meilleur coup : {best_move.uci()}')
print(f'N≈ìuds visit√©s : {minimax.nodes_visited:,}')
print(f'Temps : {elapsed:.3f}s')

display(render_board(board, size=300, 
                     last_move=best_move,
                     arrows=[(best_move.from_square, best_move.to_square, '#00ff00')]))

In [None]:
# Comparaison des profondeurs (vitesse vs qualit√©)
depths = [1, 2, 3, 4]
results = []

board = chess.Board()
for _ in range(4):  # Position apr√®s 4 coups
    board.push(board.legal_moves.__iter__().__next__())

print('‚è±Ô∏è Benchmark Minimax selon la profondeur :')
print(f'{"Profondeur":>12} | {"N≈ìuds":>10} | {"Temps":>8} | {"Coup":>6}')
print('-' * 50)

for d in depths:
    agent = MinimaxAgent(depth=d)
    start = time.time()
    move = agent.choose_move(board)
    elapsed = time.time() - start
    results.append({'depth': d, 'nodes': agent.nodes_visited, 'time': elapsed})
    print(f'{d:>12} | {agent.nodes_visited:>10,} | {elapsed:>7.3f}s | {move.uci()}')

In [None]:
# Visualisation de l'impact de l'Alpha-B√™ta
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

depths_list = [r['depth'] for r in results]
nodes_list = [r['nodes'] for r in results]
times_list = [r['time'] for r in results]

ax1.bar(depths_list, nodes_list, color='#2196F3', alpha=0.8)
ax1.set_xlabel('Profondeur')
ax1.set_ylabel('N≈ìuds visit√©s')
ax1.set_title('N≈ìuds explor√©s par profondeur')
ax1.set_xticks(depths_list)

ax2.bar(depths_list, times_list, color='#FF5722', alpha=0.8)
ax2.set_xlabel('Profondeur')
ax2.set_ylabel('Temps (secondes)')
ax2.set_title('Temps de calcul par profondeur')
ax2.set_xticks(depths_list)

plt.tight_layout()
plt.show()

<a id='6'></a>
## 6. Q-Learning & Self-Play

### Formalisation

| Composant | D√©finition |
|-----------|------------|
| **√âtat s** | Hash FEN de la position |
| **Action a** | Coup UCI (ex: `e2e4`) |
| **R√©compense r** | +1 victoire ¬∑ -1 d√©faite ¬∑ 0 nulle ¬∑ ¬±0.1 capture |
| **Politique œÄ** | Œµ-greedy (exploration vs exploitation) |

**R√®gle de mise √† jour :**
$$Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right]$$

In [None]:
# Initialisation de l'agent Q-Learning
rl_agent = QLearningAgent(
    alpha=0.3,          # Taux d'apprentissage
    gamma=0.95,         # Facteur d'actualisation
    epsilon=1.0,        # Exploration initiale : 100%
    epsilon_decay=0.995,# D√©croissance par √©pisode
    epsilon_min=0.05,   # Exploration minimale : 5%
)

print('Agent Q-Learning cr√©√© :')
print(f'  Œ± (learning rate)  : {rl_agent.alpha}')
print(f'  Œ≥ (discount)       : {rl_agent.gamma}')
print(f'  Œµ initial          : {rl_agent.epsilon}')
print(f'  Œµ min              : {rl_agent.epsilon_min}')

In [None]:
# Entra√Ænement par self-play
N_EPISODES = 500  # Augmenter √† 2000-5000 pour de meilleurs r√©sultats

# Suivi des m√©triques pour visualisation
stats_history = []
win_rates = []

print(f'üéØ D√©marrage du self-play : {N_EPISODES} √©pisodes')
print('‚îÄ' * 60)

VERBOSE_EVERY = 50
for ep in range(N_EPISODES):
    result, _ = rl_agent.self_play_episode(max_moves=150)
    
    stats = rl_agent.export_stats()
    stats_history.append(stats.copy())
    
    if (ep + 1) % VERBOSE_EVERY == 0:
        total = stats['wins'] + stats['draws'] + stats['losses']
        wr = stats['wins'] / total * 100 if total else 0
        print(f'Ep {ep+1:5d} | Œµ={stats["epsilon"]:.3f} | '
              f'V:{stats["wins"]} N:{stats["draws"]} D:{stats["losses"]} | '
              f'Q-√©tats: {stats["q_table_size"]:,}')

print('‚îÄ' * 60)
print(f'‚úÖ Entra√Ænement termin√©. {len(rl_agent.q_table):,} √©tats Q appris.')

In [None]:
# Visualisation de l'entra√Ænement
fig = plot_training_progress(stats_history, window=30)
plt.show()

In [None]:
# Sauvegarde du mod√®le
rl_agent.save('../data/q_table.pkl')
print('Q-table sauvegard√©e !')

<a id='7'></a>
## 7. Agent Hybride (Ouvertures + RL + Minimax)

L'agent hybride combine les trois composants :

```
Position ‚Üí [Ouvertures] ‚Üí coup connu ?
                ‚Üì non
         [Q-Learning] ‚Üí Q-value significative ?
                ‚Üì non
          [Minimax]  ‚Üí meilleur coup calcul√©
```

In [None]:
# Cr√©ation de l'agent hybride
ai_white = ChessAI(
    mode='hybrid',
    minimax_depth=3,
    minimax_time_limit=5.0,
    q_table_path='../data/q_table.pkl',
    color=chess.WHITE,
)

ai_black = ChessAI(
    mode='minimax',  # Agent Minimax pur comme adversaire
    minimax_depth=2,
    color=chess.BLACK,
)

print('Agents cr√©√©s :')
print(f'  {ai_white}')
print(f'  {ai_black}')

In [None]:
# Jouer une partie compl√®te
print('‚ôüÔ∏è Partie : Agent Hybride (Blancs) vs Minimax (Noirs)')
print('=' * 55)

game_result = ai_white.play_game(opponent=ai_black, max_moves=80, verbose=True)

In [None]:
# Afficher le PGN de la partie
print('PGN de la partie :')
print(game_result['pgn'])

In [None]:
# Analyser les sources de d√©cision
log = game_result['log']
sources = {}
for entry in log:
    src = entry['source']
    sources[src] = sources.get(src, 0) + 1

print('Sources de d√©cision :')
for src, count in sorted(sources.items()):
    pct = count / len(log) * 100
    bar = '‚ñà' * int(pct / 2)
    print(f'  {src:<15} : {count:3d} coups ({pct:5.1f}%) {bar}')

# Pie chart
fig, ax = plt.subplots(figsize=(6, 4))
colors = {'opening_book': '#4CAF50', 'q_learning': '#2196F3', 'minimax': '#FF5722'}
ax.pie(
    list(sources.values()),
    labels=list(sources.keys()),
    colors=[colors.get(k, '#9E9E9E') for k in sources.keys()],
    autopct='%1.1f%%',
    startangle=90
)
ax.set_title('Distribution des sources de d√©cision')
plt.show()

<a id='8'></a>
## 8. √âvaluation & Benchmarks

### M√©triques d'√©valuation
- **Taux de victoire** sur N parties contre un adversaire fixe
- **Profondeur effective** (n≈ìuds explor√©s par seconde)
- **Taille de la Q-table** (√©tats couverts)

In [None]:
# Benchmark : N parties Minimax vs Minimax (profondeurs diff√©rentes)
N_GAMES = 10  # Augmenter pour un benchmark complet

print(f'üèÜ Tournoi interne ({N_GAMES} parties par configuration)')
print('=' * 60)

configs = [
    ('Minimax-d2 vs Minimax-d3', 2, 3),
    ('Minimax-d3 vs Minimax-d3', 3, 3),
]

tournament_results = []

for name, depth_w, depth_b in configs:
    wins, draws, losses = 0, 0, 0
    for g in range(N_GAMES):
        white = ChessAI(mode='minimax', minimax_depth=depth_w,
                        minimax_time_limit=3.0, color=chess.WHITE)
        black = ChessAI(mode='minimax', minimax_depth=depth_b,
                        minimax_time_limit=3.0, color=chess.BLACK)
        res = white.play_game(opponent=black, max_moves=60, verbose=False)
        if res['result'] == '1-0':   wins += 1
        elif res['result'] == '0-1': losses += 1
        else:                         draws += 1
    
    wr = wins / N_GAMES * 100
    print(f'{name}: V={wins} N={draws} D={losses} | WR Blancs={wr:.0f}%')
    tournament_results.append({'name': name, 'wins': wins, 'draws': draws, 'losses': losses})

In [None]:
# R√©sum√© des performances RL apr√®s entra√Ænement
stats = rl_agent.export_stats()
print('üìä Statistiques Q-Learning :')
for k, v in stats.items():
    print(f'  {k:<20} : {v}')

In [None]:
# Heatmap des pi√®ces dans la position finale
final_board = chess.Board()
for uci in game_result['moves']:
    try:
        final_board.push(chess.Move.from_uci(uci))
    except:
        break

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Afficher deux heatmaps
plt.sca(ax1)
plot_piece_heatmap(final_board, chess.WHITE)
plt.sca(ax2)
plot_piece_heatmap(final_board, chess.BLACK)

plt.tight_layout()
plt.show()

## Conclusion

| Composant | Technologie | R√¥le |
|-----------|-------------|------|
| R√®gles | `python-chess` | Mouvements l√©gaux, FEN/PGN |
| Ouvertures | Livre int√©gr√© | Phase d'ouverture |
| Milieu de jeu | Minimax + Œ±-Œ≤ | D√©cision principale |
| Apprentissage | Q-Learning + Self-play | Am√©lioration continue |

**Pistes d'am√©lioration :**
- Int√©grer un vrai livre Polyglot (ex: `baron30.bin`)
- Remplacer Q-Learning par MCTS (Monte Carlo Tree Search)
- Entra√Æner un r√©seau de neurones (DQN ou AlphaZero simplifi√©)
- Ajouter la gestion des fins de partie (Endgame tablebases)