# Chess Data Analysis - Raw Data Inspection

Let's examine the lichess_db_eval.jsonl.zst file to see what data we actually have.

In [1]:
import json
import zstandard as zstd
from pathlib import Path

# File path
eval_file = Path("lichess_db_eval.jsonl.zst")
print(f"File exists: {eval_file.exists()}")
print(f"File size: {eval_file.stat().st_size / 1024 / 1024:.2f} MB")

File exists: True
File size: 17214.86 MB


## Step 1: Read First 10 Raw Lines

Let's read the first 10 lines exactly as they are in the file (no processing yet).

In [2]:
# Decompress and read first 10 lines (streaming approach)
dctx = zstd.ZstdDecompressor()

raw_lines = []
with open(eval_file, "rb") as compressed:
    with dctx.stream_reader(compressed) as reader:
        # Read in small chunks and extract lines
        buffer = ""
        chunk_size = 1024 * 1024  # 1MB chunks
        
        while len(raw_lines) < 10:
            chunk = reader.read(chunk_size)
            if not chunk:
                break
            
            buffer += chunk.decode('utf-8')
            lines = buffer.split('\n')
            
            # Keep last incomplete line in buffer
            buffer = lines[-1]
            
            # Add complete lines
            for line in lines[:-1]:
                if line.strip():
                    raw_lines.append(line.strip())
                    if len(raw_lines) >= 10:
                        break

# Show raw lines
for i, line in enumerate(raw_lines, 1):
    print(f"Line {i} (first 200 chars):")
    print(line[:200])
    print()

Line 1 (first 200 chars):
{"fen":"7r/1p3k2/p1bPR3/5p2/2B2P1p/8/PP4P1/3K4 b - -","evals":[{"pvs":[{"cp":69,"line":"f7g7 e6e2 h8d8 e2d2 b7b5 c4b3 g7f6 d1e1 a6a5 a2a3"},{"cp":163,"line":"h8d8 d1e1 a6a5 a2a3 c6d7 e6e7 f7f6 e1f2 b7

Line 2 (first 200 chars):
{"fen":"8/4r3/2R2pk1/6pp/3P4/6P1/5K1P/8 b - -","evals":[{"pvs":[{"cp":0,"line":"e7a7 f2e3 a7a3 e3e4 a3a2 h2h4 g5h4 g3h4 a2h2 c6c1"},{"cp":0,"line":"e7b7 f2e3 b7b3 e3e4 b3b2 h2h4 g5h4 g3h4 b2h2 c6c1"}]

Line 3 (first 200 chars):
{"fen":"6k1/6p1/8/4K3/4NN2/8/8/8 w - -","evals":[{"pvs":[{"mate":15,"line":"e5e6 g8f8 e4d6 g7g5 f4h5 g5g4 h5g3 f8g8 e6e7 g8g7"},{"mate":18,"line":"e4d6 g8h7 e5f5 g7g5 f4h5 h7h6 h5g3 h6g7 f5e6 g7f8"},{

Line 4 (first 200 chars):
{"fen":"r1b2rk1/1p2bppp/p1nppn2/q7/2P1P3/N1N5/PP2BPPP/R1BQ1RK1 w - -","evals":[{"pvs":[{"cp":24,"line":"c1e3 f8d8 d1c1 h7h6 h2h3 f6d7 f1d1 d7c5 a3c2 e7f6"}],"knodes":261194,"depth":36},{"pvs":[{"cp":2

Line 5 (first 200 chars):
{"fen":"6k1/4Rppp/8/8/8/8/5PPP/6K1 w - -","evals":[{"pvs":[{"m

## Step 2: Parse First Line as JSON

Let's see what structure the data actually has.

In [3]:
# Parse first line
first_record = json.loads(raw_lines[0])

# Show all keys
print("Keys in record:")
print(list(first_record.keys()))
print()

# Show full first record (pretty printed)
print("Full first record:")
print(json.dumps(first_record, indent=2))

Keys in record:
['fen', 'evals']

Full first record:
{
  "fen": "7r/1p3k2/p1bPR3/5p2/2B2P1p/8/PP4P1/3K4 b - -",
  "evals": [
    {
      "pvs": [
        {
          "cp": 69,
          "line": "f7g7 e6e2 h8d8 e2d2 b7b5 c4b3 g7f6 d1e1 a6a5 a2a3"
        },
        {
          "cp": 163,
          "line": "h8d8 d1e1 a6a5 a2a3 c6d7 e6e7 f7f6 e1f2 b7b5 c4b3"
        },
        {
          "cp": 229,
          "line": "h8a8 d1e1 a6a5 e6h6 f7g7 h6h4 a8d8 c4d3 c6g2 d3f5"
        },
        {
          "cp": 231,
          "line": "h8f8 d1e1 b7b5 c4b3 a6a5 e6h6 f7g7 h6h4 f8e8 e1f2"
        },
        {
          "cp": 237,
          "line": "h8b8 d1e1 a6a5 e6h6 f7g7 h6h4 b8d8 c4d3 c6g2 d3f5"
        }
      ],
      "knodes": 4189972,
      "depth": 46
    }
  ]
}


## Step 3: Look at 5 More Examples

Let's see if all records have the same structure.

In [4]:
# Parse and show next 5 records
for i in range(1, 6):
    record = json.loads(raw_lines[i])
    print(f"Record {i+1}:")
    print(f"  Keys: {list(record.keys())}")
    
    # Show a sample of each key's value
    for key in record.keys():
        value = record[key]
        if isinstance(value, str):
            display = value[:50] + "..." if len(value) > 50 else value
        elif isinstance(value, (list, dict)):
            display = f"{type(value).__name__} with {len(value)} items"
        else:
            display = value
        print(f"  {key}: {display}")
    print()

Record 2:
  Keys: ['fen', 'evals']
  fen: 8/4r3/2R2pk1/6pp/3P4/6P1/5K1P/8 b - -
  evals: list with 2 items

Record 3:
  Keys: ['fen', 'evals']
  fen: 6k1/6p1/8/4K3/4NN2/8/8/8 w - -
  evals: list with 1 items

Record 4:
  Keys: ['fen', 'evals']
  fen: r1b2rk1/1p2bppp/p1nppn2/q7/2P1P3/N1N5/PP2BPPP/R1BQ...
  evals: list with 2 items

Record 5:
  Keys: ['fen', 'evals']
  fen: 6k1/4Rppp/8/8/8/8/5PPP/6K1 w - -
  evals: list with 4 items

Record 6:
  Keys: ['fen', 'evals']
  fen: 6k1/6p1/6N1/4K3/4N3/8/8/8 b - -
  evals: list with 2 items



## Data Structure Summary

The `lichess_db_eval.jsonl.zst` file contains:

**Structure:**
- Each line is a JSON object with 2 keys: `fen` and `evals`
- `fen`: Chess position in FEN notation
- `evals`: List of evaluation results (multiple Stockfish analysis runs)

**Each eval object contains:**
- `pvs`: List of principal variations (best moves) with evaluation scores
  - `cp`: Centipawn score (positive = better for white)
  - `mate`: Mate in N moves (if checkmate is found)
  - `line`: UCI move sequence (space-separated)
- `knodes`: Nodes searched (thousands)
- `depth`: Search depth

**Key differences from puzzle data:**
- This is **position evaluations**, not puzzles
- No themes, ratings, or puzzle metadata
- Contains raw Stockfish analysis
- Multiple evaluation runs per position
- Much larger file (17GB vs puzzles)

## Deep Analysis: Parse Position and Evaluate Moves

Let's actually create a board and analyze what these evaluations mean.

In [5]:
import chess

# Take the first record and analyze it deeply
record = first_record
print("=" * 80)
print("POSITION ANALYSIS")
print("=" * 80)

# Create board from FEN
fen = record["fen"]
board = chess.Board(fen)

print(f"\nFEN: {fen}")
print(f"\nBoard visualization:")
print(board)
print()

# Extract info from FEN
print(f"Turn: {'White' if board.turn == chess.WHITE else 'Black'} to move")
print(f"Castling rights: {board.castling_rights if board.castling_rights else 'None'}")
print(f"En passant: {board.ep_square if board.ep_square else 'None'}")
print(f"Halfmove clock: {board.halfmove_clock}")
print(f"Fullmove number: {board.fullmove_number}")

# Get ALL legal moves
legal_moves = list(board.legal_moves)
print(f"\nTotal legal moves: {len(legal_moves)}")
print(f"Legal moves (UCI): {', '.join([move.uci() for move in legal_moves[:20]])}")
if len(legal_moves) > 20:
    print(f"  ... and {len(legal_moves) - 20} more")

POSITION ANALYSIS

FEN: 7r/1p3k2/p1bPR3/5p2/2B2P1p/8/PP4P1/3K4 b - -

Board visualization:
. . . . . . . r
. p . . . k . .
p . b P R . . .
. . . . . p . .
. . B . . P . p
. . . . . . . .
P P . . . . P .
. . . K . . . .

Turn: Black to move
Castling rights: None
En passant: None
Halfmove clock: 0
Fullmove number: 1

Total legal moves: 25
Legal moves (UCI): h8g8, h8f8, h8e8, h8d8, h8c8, h8b8, h8a8, h8h7, h8h6, h8h5, f7g8, f7f8, f7g7, c6e8, c6d7, c6d5, c6b5, c6e4, c6a4, c6f3
  ... and 5 more


In [6]:
# Analyze the evaluations
print("\n" + "=" * 80)
print("STOCKFISH EVALUATIONS")
print("=" * 80)

for eval_idx, evaluation in enumerate(record["evals"], 1):
    print(f"\n--- Evaluation {eval_idx} ---")
    print(f"Depth: {evaluation['depth']}")
    print(f"Nodes searched: {evaluation['knodes']:,} K nodes ({evaluation['knodes'] * 1000:,} nodes)")
    print(f"Number of principal variations: {len(evaluation['pvs'])}")
    
    print(f"\nTop moves analyzed:")
    for pv_idx, pv in enumerate(evaluation['pvs'], 1):
        # Parse the line (UCI moves)
        moves = pv['line'].split()
        first_move = moves[0] if moves else "N/A"
        
        # Check if first move is legal
        is_legal = first_move in [m.uci() for m in legal_moves]
        
        # Get evaluation
        if 'cp' in pv:
            eval_str = f"cp={pv['cp']:+4d} ({pv['cp']/100:+.2f} pawns)"
        elif 'mate' in pv:
            eval_str = f"mate in {pv['mate']}"
        else:
            eval_str = "Unknown"
        
        # Show move in SAN notation if legal
        if is_legal:
            san_move = board.san(chess.Move.from_uci(first_move))
        else:
            san_move = "ILLEGAL"
        
        print(f"  {pv_idx}. {first_move} ({san_move}) - {eval_str} {'‚úì' if is_legal else '‚úó'}")
        print(f"     Line: {' '.join(moves[:10])}")
        if len(moves) > 10:
            print(f"           ... +{len(moves)-10} more moves")


STOCKFISH EVALUATIONS

--- Evaluation 1 ---
Depth: 46
Nodes searched: 4,189,972 K nodes (4,189,972,000 nodes)
Number of principal variations: 5

Top moves analyzed:
  1. f7g7 (Kg7) - cp= +69 (+0.69 pawns) ‚úì
     Line: f7g7 e6e2 h8d8 e2d2 b7b5 c4b3 g7f6 d1e1 a6a5 a2a3
  2. h8d8 (Rd8) - cp=+163 (+1.63 pawns) ‚úì
     Line: h8d8 d1e1 a6a5 a2a3 c6d7 e6e7 f7f6 e1f2 b7b5 c4b3
  3. h8a8 (Ra8) - cp=+229 (+2.29 pawns) ‚úì
     Line: h8a8 d1e1 a6a5 e6h6 f7g7 h6h4 a8d8 c4d3 c6g2 d3f5
  4. h8f8 (Rf8) - cp=+231 (+2.31 pawns) ‚úì
     Line: h8f8 d1e1 b7b5 c4b3 a6a5 e6h6 f7g7 h6h4 f8e8 e1f2
  5. h8b8 (Rb8) - cp=+237 (+2.37 pawns) ‚úì
     Line: h8b8 d1e1 a6a5 e6h6 f7g7 h6h4 b8d8 c4d3 c6g2 d3f5


In [8]:
# Let's actually play out the top line and see the resulting positions
print("\n" + "=" * 80)
print("PLAYING OUT BEST LINE")
print("=" * 80)

best_pv = record["evals"][0]["pvs"][0]
best_line = best_pv['line'].split()

if 'cp' in best_pv:
    eval_display = f"{best_pv['cp']} cp ({best_pv['cp']/100:+.2f} pawns)"
else:
    eval_display = f"mate in {best_pv['mate']}"

print(f"Best evaluation: {eval_display}")
print(f"Moves to play: {' '.join(best_line[:10])}\n")

# Create a copy of the board to play moves
temp_board = board.copy()

for move_idx, move_uci in enumerate(best_line[:10], 1):
    try:
        move = chess.Move.from_uci(move_uci)
        san = temp_board.san(move)
        temp_board.push(move)
        
        print(f"Move {move_idx}: {move_uci} ({san})")
        print(f"  Turn: {'White' if temp_board.turn == chess.WHITE else 'Black'}")
        print(f"  FEN: {temp_board.fen()}")
        print()
    except:
        print(f"Move {move_idx}: {move_uci} - ERROR: Invalid move")
        break


PLAYING OUT BEST LINE
Best evaluation: 69 cp (+0.69 pawns)
Moves to play: f7g7 e6e2 h8d8 e2d2 b7b5 c4b3 g7f6 d1e1 a6a5 a2a3

Move 1: f7g7 (Kg7)
  Turn: White
  FEN: 7r/1p4k1/p1bPR3/5p2/2B2P1p/8/PP4P1/3K4 w - - 1 2

Move 2: e6e2 (Re2)
  Turn: Black
  FEN: 7r/1p4k1/p1bP4/5p2/2B2P1p/8/PP2R1P1/3K4 b - - 2 2

Move 3: h8d8 (Rd8)
  Turn: White
  FEN: 3r4/1p4k1/p1bP4/5p2/2B2P1p/8/PP2R1P1/3K4 w - - 3 3

Move 4: e2d2 (Rd2)
  Turn: Black
  FEN: 3r4/1p4k1/p1bP4/5p2/2B2P1p/8/PP1R2P1/3K4 b - - 4 3

Move 5: b7b5 (b5)
  Turn: White
  FEN: 3r4/6k1/p1bP4/1p3p2/2B2P1p/8/PP1R2P1/3K4 w - - 0 4

Move 6: c4b3 (Bb3)
  Turn: Black
  FEN: 3r4/6k1/p1bP4/1p3p2/5P1p/1B6/PP1R2P1/3K4 b - - 1 4

Move 7: g7f6 (Kf6)
  Turn: White
  FEN: 3r4/8/p1bP1k2/1p3p2/5P1p/1B6/PP1R2P1/3K4 w - - 2 5

Move 8: d1e1 (Ke1)
  Turn: Black
  FEN: 3r4/8/p1bP1k2/1p3p2/5P1p/1B6/PP1R2P1/4K3 b - - 3 5

Move 9: a6a5 (a5)
  Turn: White
  FEN: 3r4/8/2bP1k2/pp3p2/5P1p/1B6/PP1R2P1/4K3 w - - 0 6

Move 10: a2a3 (a3)
  Turn: Black
  FEN: 3r4/8/2bP1k2

## Analyze Multiple Records

Now let's check if evaluations vary across different positions.

In [9]:
# Analyze first 5 records to see patterns
for record_idx in range(5):
    record = json.loads(raw_lines[record_idx])
    board = chess.Board(record["fen"])
    
    print(f"\n{'='*80}")
    print(f"RECORD {record_idx + 1}")
    print(f"{'='*80}")
    print(f"Turn: {'White' if board.turn == chess.WHITE else 'Black'}")
    print(f"Legal moves: {len(list(board.legal_moves))}")
    print(f"Number of evaluations: {len(record['evals'])}")
    
    # Look at first evaluation
    eval1 = record["evals"][0]
    print(f"\nFirst evaluation:")
    print(f"  Depth: {eval1['depth']}, Nodes: {eval1['knodes']:,}K")
    print(f"  PVs analyzed: {len(eval1['pvs'])}")
    
    # Check best move
    best_pv = eval1['pvs'][0]
    best_move_uci = best_pv['line'].split()[0]
    
    if 'mate' in best_pv:
        eval_str = f"Mate in {best_pv['mate']}"
    else:
        eval_str = f"{best_pv['cp']/100:+.2f} pawns"
    
    # Verify it's legal
    legal_moves_uci = [m.uci() for m in board.legal_moves]
    is_legal = best_move_uci in legal_moves_uci
    
    if is_legal:
        san = board.san(chess.Move.from_uci(best_move_uci))
        print(f"  Best move: {best_move_uci} ({san}) - {eval_str} ‚úì")
    else:
        print(f"  Best move: {best_move_uci} - {eval_str} ‚úó ILLEGAL!")
    
    # Check if multiple evals differ
    if len(record['evals']) > 1:
        depths = [e['depth'] for e in record['evals']]
        pv_counts = [len(e['pvs']) for e in record['evals']]
        print(f"  Multiple evals: depths={depths}, pv_counts={pv_counts}")


RECORD 1
Turn: Black
Legal moves: 25
Number of evaluations: 1

First evaluation:
  Depth: 46, Nodes: 4,189,972K
  PVs analyzed: 5
  Best move: f7g7 (Kg7) - +0.69 pawns ‚úì

RECORD 2
Turn: Black
Legal moves: 21
Number of evaluations: 2

First evaluation:
  Depth: 58, Nodes: 491,568K
  PVs analyzed: 2
  Best move: e7a7 (Ra7) - +0.00 pawns ‚úì
  Multiple evals: depths=[58, 57], pv_counts=[2, 5]

RECORD 3
Turn: White
Legal moves: 21
Number of evaluations: 1

First evaluation:
  Depth: 95, Nodes: 589,893K
  PVs analyzed: 5
  Best move: e5e6 (Ke6) - Mate in 15 ‚úì

RECORD 4
Turn: White
Legal moves: 38
Number of evaluations: 2

First evaluation:
  Depth: 36, Nodes: 261,194K
  PVs analyzed: 1
  Best move: c1e3 (Be3) - +0.24 pawns ‚úì
  Multiple evals: depths=[36, 28], pv_counts=[1, 5]

RECORD 5
Turn: White
Legal moves: 20
Number of evaluations: 4

First evaluation:
  Depth: 245, Nodes: 152K
  PVs analyzed: 1
  Best move: e7e8 (Re8#) - Mate in 1 ‚úì
  Multiple evals: depths=[245, 85, 46, 29], 

## SIMPLE EXPLANATION: What is this data?

Let me break down the JSON structure with a concrete example.

In [10]:
# Let's take ONE record and explain every single field
example = json.loads(raw_lines[0])

print("=" * 80)
print("UNDERSTANDING THE JSON STRUCTURE")
print("=" * 80)

print("\n1Ô∏è‚É£ THE POSITION (fen)")
print("-" * 80)
print(f"Value: {example['fen']}")
print("\nWhat does this mean?")
print("  - This is a chess position in FEN (Forsyth-Edwards Notation)")
print("  - It describes: where pieces are, whose turn, castling rights, en passant")

# Show the board
b = chess.Board(example['fen'])
print("\nThe actual board:")
print(b)
print(f"\n  Turn: {'White' if b.turn else 'Black'} to move")
print(f"  Legal moves available: {len(list(b.legal_moves))}")

print("\n\n2Ô∏è‚É£ THE EVALUATIONS (evals)")
print("-" * 80)
print(f"Number of evaluations: {len(example['evals'])}")
print("\nWhat does this mean?")
print("  - Stockfish analyzed this position multiple times")
print("  - Each evaluation might use different settings (depth, number of lines)")

print("\n\n3Ô∏è‚É£ FIRST EVALUATION DETAILS")
print("-" * 80)
eval1 = example['evals'][0]

print(f"\nüìä depth: {eval1['depth']}")
print(f"   ‚Üí Stockfish looked {eval1['depth']} moves ahead")

print(f"\nüìä knodes: {eval1['knodes']:,}")
print(f"   ‚Üí Stockfish examined {eval1['knodes']:,} thousand positions")
print(f"   ‚Üí That's {eval1['knodes'] * 1000:,} total positions!")

print(f"\nüìä pvs: {len(eval1['pvs'])} variations")
print(f"   ‚Üí Stockfish found {len(eval1['pvs'])} different good move sequences")
print("   ‚Üí They're ordered from best to worst")

UNDERSTANDING THE JSON STRUCTURE

1Ô∏è‚É£ THE POSITION (fen)
--------------------------------------------------------------------------------
Value: 7r/1p3k2/p1bPR3/5p2/2B2P1p/8/PP4P1/3K4 b - -

What does this mean?
  - This is a chess position in FEN (Forsyth-Edwards Notation)
  - It describes: where pieces are, whose turn, castling rights, en passant

The actual board:
. . . . . . . r
. p . . . k . .
p . b P R . . .
. . . . . p . .
. . B . . P . p
. . . . . . . .
P P . . . . P .
. . . K . . . .

  Turn: Black to move
  Legal moves available: 25


2Ô∏è‚É£ THE EVALUATIONS (evals)
--------------------------------------------------------------------------------
Number of evaluations: 1

What does this mean?
  - Stockfish analyzed this position multiple times
  - Each evaluation might use different settings (depth, number of lines)


3Ô∏è‚É£ FIRST EVALUATION DETAILS
--------------------------------------------------------------------------------

üìä depth: 46
   ‚Üí Stockfish looked 46 

In [11]:
# Now let's look at each PV (principal variation)
print("\n\n4Ô∏è‚É£ PRINCIPAL VARIATIONS (pvs)")
print("=" * 80)

for i, pv in enumerate(eval1['pvs'], 1):
    print(f"\n--- Variation #{i} ---")
    
    # Field 1: cp (centipawns) or mate
    if 'cp' in pv:
        print(f"üìä cp: {pv['cp']}")
        print(f"   ‚Üí Centipawn score = {pv['cp']}")
        print(f"   ‚Üí In pawn units: {pv['cp']/100:+.2f}")
        if pv['cp'] > 0:
            print(f"   ‚Üí White is winning by {pv['cp']/100:.2f} pawns")
        elif pv['cp'] < 0:
            print(f"   ‚Üí Black is winning by {abs(pv['cp'])/100:.2f} pawns")
        else:
            print(f"   ‚Üí Position is equal")
    
    if 'mate' in pv:
        print(f"üìä mate: {pv['mate']}")
        if pv['mate'] > 0:
            print(f"   ‚Üí White can checkmate in {pv['mate']} moves")
        else:
            print(f"   ‚Üí Black can checkmate in {abs(pv['mate'])} moves")
    
    # Field 2: line (the actual moves)
    print(f"\nüìä line: {pv['line'][:80]}...")
    moves = pv['line'].split()
    print(f"   ‚Üí This is the sequence of moves Stockfish recommends")
    print(f"   ‚Üí Total moves in sequence: {len(moves)}")
    print(f"   ‚Üí First 5 moves: {' '.join(moves[:5])}")
    
    # Show the first move in human-readable form
    first_move_uci = moves[0]
    first_move = chess.Move.from_uci(first_move_uci)
    first_move_san = b.san(first_move)
    print(f"   ‚Üí First move in UCI: {first_move_uci}")
    print(f"   ‚Üí First move in chess notation: {first_move_san}")
    
    if i >= 2:  # Only show first 2 variations
        print(f"\n... and {len(eval1['pvs']) - 2} more variations")
        break



4Ô∏è‚É£ PRINCIPAL VARIATIONS (pvs)

--- Variation #1 ---
üìä cp: 69
   ‚Üí Centipawn score = 69
   ‚Üí In pawn units: +0.69
   ‚Üí White is winning by 0.69 pawns

üìä line: f7g7 e6e2 h8d8 e2d2 b7b5 c4b3 g7f6 d1e1 a6a5 a2a3...
   ‚Üí This is the sequence of moves Stockfish recommends
   ‚Üí Total moves in sequence: 10
   ‚Üí First 5 moves: f7g7 e6e2 h8d8 e2d2 b7b5
   ‚Üí First move in UCI: f7g7
   ‚Üí First move in chess notation: Kg7

--- Variation #2 ---
üìä cp: 163
   ‚Üí Centipawn score = 163
   ‚Üí In pawn units: +1.63
   ‚Üí White is winning by 1.63 pawns

üìä line: h8d8 d1e1 a6a5 a2a3 c6d7 e6e7 f7f6 e1f2 b7b5 c4b3...
   ‚Üí This is the sequence of moves Stockfish recommends
   ‚Üí Total moves in sequence: 10
   ‚Üí First 5 moves: h8d8 d1e1 a6a5 a2a3 c6d7
   ‚Üí First move in UCI: h8d8
   ‚Üí First move in chess notation: Rd8

... and 3 more variations


In [12]:
# SUMMARY: Put it all together
print("\n\n" + "=" * 80)
print("5Ô∏è‚É£ PUTTING IT ALL TOGETHER")
print("=" * 80)

print("""
This file contains:
  üì¶ Millions of chess positions
  ü§ñ Each analyzed by Stockfish engine
  üìä With evaluations showing:
     - How good the position is (cp score)
     - Best move sequences (lines)
     - How deep it searched (depth)
     
EXAMPLE from the first record:
  Position: Black to move (25 legal moves)
  Best move: f7g7 (King to g7)
  Evaluation: White is slightly better (+0.69 pawns)
  If both sides play best moves: f7g7 e6e2 h8d8 e2d2 b7b5...
  
This is RAW Stockfish analysis data - not puzzles!
""")

print("\nüìã KEY DIFFERENCES FROM PUZZLES:")
print("  ‚úó No puzzle themes (fork, pin, etc.)")
print("  ‚úó No difficulty rating")
print("  ‚úó No 'correct answer' - just best moves")
print("  ‚úì Has multiple alternative lines")
print("  ‚úì Has exact engine evaluations")
print("  ‚úì Has search depth & nodes explored")



5Ô∏è‚É£ PUTTING IT ALL TOGETHER

This file contains:
  üì¶ Millions of chess positions
  ü§ñ Each analyzed by Stockfish engine
  üìä With evaluations showing:
     - How good the position is (cp score)
     - Best move sequences (lines)
     - How deep it searched (depth)

EXAMPLE from the first record:
  Position: Black to move (25 legal moves)
  Best move: f7g7 (King to g7)
  Evaluation: White is slightly better (+0.69 pawns)
  If both sides play best moves: f7g7 e6e2 h8d8 e2d2 b7b5...

This is RAW Stockfish analysis data - not puzzles!


üìã KEY DIFFERENCES FROM PUZZLES:
  ‚úó No puzzle themes (fork, pin, etc.)
  ‚úó No difficulty rating
  ‚úó No 'correct answer' - just best moves
  ‚úì Has multiple alternative lines
  ‚úì Has exact engine evaluations
  ‚úì Has search depth & nodes explored


## 6Ô∏è‚É£ CRITICAL CONCEPT: What is a "LINE"?

This is the most important concept to understand!

In [13]:
# A "line" is NOT just your moves - it's the ENTIRE game continuation
# It includes BOTH your moves AND your opponent's expected best responses

print("=" * 80)
print("UNDERSTANDING LINES: Your Moves + Opponent's Best Responses")
print("=" * 80)

# Get the best line from first record
line = example['evals'][0]['pvs'][0]['line']
moves = line.split()

print(f"\nLine: {line}")
print(f"Total moves: {len(moves)}")

print("\n" + "=" * 80)
print("Let's play this out move by move:")
print("=" * 80)

# Start from the position
game_board = chess.Board(example['fen'])
print(f"\nStarting position: {example['fen']}")
print(f"Turn: {'Black' if game_board.turn == chess.BLACK else 'White'} to move\n")

# Play each move and show who plays it
for idx, move_uci in enumerate(moves[:8], 1):  # Show first 8 moves
    move = chess.Move.from_uci(move_uci)
    san = game_board.san(move)
    
    # Determine who's moving
    player = "Black" if game_board.turn == chess.BLACK else "White"
    
    # Make the move
    game_board.push(move)
    
    print(f"Move {idx}: {move_uci} ({san:6s}) - {player:5s} plays")
    
print("\n" + "=" * 80)
print("KEY INSIGHT:")
print("=" * 80)
print("""
The line alternates between:
  üîµ YOUR move (Black in this example)
  ‚ö™ OPPONENT's best response (White)
  üîµ YOUR next move (Black)
  ‚ö™ OPPONENT's best response (White)
  ... and so on

Stockfish assumes BOTH sides play perfectly!
This is called the "Principal Variation" (PV)
""")

UNDERSTANDING LINES: Your Moves + Opponent's Best Responses

Line: f7g7 e6e2 h8d8 e2d2 b7b5 c4b3 g7f6 d1e1 a6a5 a2a3
Total moves: 10

Let's play this out move by move:

Starting position: 7r/1p3k2/p1bPR3/5p2/2B2P1p/8/PP4P1/3K4 b - -
Turn: Black to move

Move 1: f7g7 (Kg7   ) - Black plays
Move 2: e6e2 (Re2   ) - White plays
Move 3: h8d8 (Rd8   ) - Black plays
Move 4: e2d2 (Rd2   ) - White plays
Move 5: b7b5 (b5    ) - Black plays
Move 6: c4b3 (Bb3   ) - White plays
Move 7: g7f6 (Kf6   ) - Black plays
Move 8: d1e1 (Ke1   ) - White plays

KEY INSIGHT:

The line alternates between:
  üîµ YOUR move (Black in this example)
  ‚ö™ OPPONENT's best response (White)
  üîµ YOUR next move (Black)
  ‚ö™ OPPONENT's best response (White)
  ... and so on

Stockfish assumes BOTH sides play perfectly!
This is called the "Principal Variation" (PV)



In [14]:
# Let's visualize this more clearly with colors
print("\n" + "=" * 80)
print("DETAILED BREAKDOWN: Who Plays What")
print("=" * 80)

# Reset board
game_board = chess.Board(example['fen'])
starting_player = "Black" if game_board.turn == chess.BLACK else "White"

print(f"\nStarting position: {starting_player} to move")
print(f"FEN: {example['fen']}\n")

# Create two lists: your moves vs opponent moves
your_moves = []
opponent_moves = []

for idx, move_uci in enumerate(moves[:10]):
    move = chess.Move.from_uci(move_uci)
    san = game_board.san(move)
    player = "Black" if game_board.turn == chess.BLACK else "White"
    
    if player == starting_player:
        your_moves.append(f"{san}")
    else:
        opponent_moves.append(f"{san}")
    
    game_board.push(move)

print("üîµ YOUR moves (if you follow Stockfish's recommendation):")
print(f"   {' ‚Üí '.join(your_moves)}")

print("\n‚ö™ OPPONENT's expected best responses:")
print(f"   {' ‚Üí '.join(opponent_moves)}")

print("\n" + "=" * 80)
print("WHAT THIS MEANS FOR YOU:")
print("=" * 80)
print(f"""
1. Stockfish says YOUR best move is: {your_moves[0]}
2. It predicts opponent will respond with: {opponent_moves[0]}
3. Then YOUR best move is: {your_moves[1]}
4. Opponent responds with: {opponent_moves[1]}
5. And so on...

The evaluation score (+0.69 pawns) is the expected result
if BOTH players follow this exact line.

In reality:
  - You might not play the best move
  - Your opponent might not play the best response
  - The position will diverge from this line
  - But it gives you the OPTIMAL play path
""")


DETAILED BREAKDOWN: Who Plays What

Starting position: Black to move
FEN: 7r/1p3k2/p1bPR3/5p2/2B2P1p/8/PP4P1/3K4 b - -

üîµ YOUR moves (if you follow Stockfish's recommendation):
   Kg7 ‚Üí Rd8 ‚Üí b5 ‚Üí Kf6 ‚Üí a5

‚ö™ OPPONENT's expected best responses:
   Re2 ‚Üí Rd2 ‚Üí Bb3 ‚Üí Ke1 ‚Üí a3

WHAT THIS MEANS FOR YOU:

1. Stockfish says YOUR best move is: Kg7
2. It predicts opponent will respond with: Re2
3. Then YOUR best move is: Rd8
4. Opponent responds with: Rd2
5. And so on...

The evaluation score (+0.69 pawns) is the expected result
if BOTH players follow this exact line.

In reality:
  - You might not play the best move
  - Your opponent might not play the best response
  - The position will diverge from this line
  - But it gives you the OPTIMAL play path



In [15]:
# COMPARISON: Compare different lines (different strategies)
print("\n" + "=" * 80)
print("COMPARING MULTIPLE LINES: Different Strategies")
print("=" * 80)

pvs = example['evals'][0]['pvs']

for pv_idx, pv in enumerate(pvs[:3], 1):  # Compare first 3 lines
    line_moves = pv['line'].split()
    board_copy = chess.Board(example['fen'])
    
    # Get first move
    first_move = chess.Move.from_uci(line_moves[0])
    first_san = board_copy.san(first_move)
    
    # Get score
    if 'cp' in pv:
        score = f"{pv['cp']/100:+.2f} pawns"
    else:
        score = f"Mate in {pv['mate']}"
    
    print(f"\n--- Line {pv_idx}: {first_san} ---")
    print(f"Evaluation: {score}")
    print("Full sequence:")
    
    # Show the moves with turn indicators
    for i, m in enumerate(line_moves[:8]):
        move_obj = chess.Move.from_uci(m)
        san = board_copy.san(move_obj)
        player = "Black" if board_copy.turn == chess.BLACK else "White"
        emoji = "üîµ" if player == starting_player else "‚ö™"
        board_copy.push(move_obj)
        print(f"  {i+1}. {emoji} {san:8s} ({player})", end="")
        if (i+1) % 2 == 0:
            print()  # New line every 2 moves

print("\n\n" + "=" * 80)
print("SUMMARY:")
print("=" * 80)
print("""
All 3 lines start with different first moves:
  - Line 1 (Kg7): Best move, leads to +0.69
  - Line 2 (Rd8): Worse move, leads to +1.63
  - Line 3 (Ra8): Even worse, leads to +2.29

Each line shows the full game continuation assuming best play
from both sides after that initial choice.

This helps you understand the CONSEQUENCES of each move!
""")


COMPARING MULTIPLE LINES: Different Strategies

--- Line 1: Kg7 ---
Evaluation: +0.69 pawns
Full sequence:
  1. üîµ Kg7      (Black)  2. ‚ö™ Re2      (White)
  3. üîµ Rd8      (Black)  4. ‚ö™ Rd2      (White)
  5. üîµ b5       (Black)  6. ‚ö™ Bb3      (White)
  7. üîµ Kf6      (Black)  8. ‚ö™ Ke1      (White)

--- Line 2: Rd8 ---
Evaluation: +1.63 pawns
Full sequence:
  1. üîµ Rd8      (Black)  2. ‚ö™ Ke1      (White)
  3. üîµ a5       (Black)  4. ‚ö™ a3       (White)
  5. üîµ Bd7      (Black)  6. ‚ö™ Re7+     (White)
  7. üîµ Kf6      (Black)  8. ‚ö™ Kf2      (White)

--- Line 3: Ra8 ---
Evaluation: +2.29 pawns
Full sequence:
  1. üîµ Ra8      (Black)  2. ‚ö™ Ke1      (White)
  3. üîµ a5       (Black)  4. ‚ö™ Rh6+     (White)
  5. üîµ Kg7      (Black)  6. ‚ö™ Rxh4     (White)
  7. üîµ Rd8      (Black)  8. ‚ö™ Bd3      (White)


SUMMARY:

All 3 lines start with different first moves:
  - Line 1 (Kg7): Best move, leads to +0.69
  - Line 2 (Rd8): Worse move, leads to +1.63
 