# ANALYSIS 
- Using openai gpt4 and gemeni, the correct answer could only be generated when I mentioned capture move will have "x" in it. That did not work a number of times.
- Explicitly saying check each move with x in it has worked so far.

In [99]:
import pandas as pd
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
openai_api_key = os.getenv("openai_token")

In [100]:
openai = OpenAI(api_key=openai_api_key)

## data

In [101]:
df = pd.read_csv("../../data/lichess_main.csv", index_col = False)
df = df.sample(50)
df = df[["moves"]]
df = df.reset_index()
df.drop("index", axis = 1, inplace=True)
df.head(), df.shape

(                                               moves
 0  e3 c5 d3 d6 c3 g6 Nf3 Bg7 Bd2 Nc6 Na3 Qa5 Qb3 ...
 1  e4 c5 Nf3 Nf6 Nc3 Nc6 d4 Nxd4 Nxd4 cxd4 Qxd4 e...
 2  e4 d6 Bc4 f5 Qf3 g6 exf5 Bxf5 Qxb7 Nd7 d3 Ngf6...
 3  d4 Nf6 c4 g6 Nc3 d5 cxd5 Nxd5 e4 Nxc3 bxc3 Bg7...
 4  Nc3 d5 d4 e6 e4 c6 Nf3 dxe4 Nxe4 Be7 Bd3 Nf6 N...,
 (50, 1))

## format list of moves to get pairs

In [102]:
def gen_cot_prompt(prev_moves : str) -> str:
    prev_moves = prev_moves.split(" ")
    cot_moves = [f"White: {prev_moves[i]}, Black: {prev_moves[i+1]}" if i + 1 < len(prev_moves) else f"White: {prev_moves[i]}" for i in range(0, len(prev_moves), 2)]
    return "\n".join(cot_moves), len(cot_moves)

In [103]:
print(gen_cot_prompt(df.iloc[0]["moves"])[0])

White: e3, Black: c5
White: d3, Black: d6
White: c3, Black: g6
White: Nf3, Black: Bg7
White: Bd2, Black: Nc6
White: Na3, Black: Qa5
White: Qb3, Black: Be6
White: Qb5, Black: Qxb5
White: Nxb5, Black: Kd7
White: Ng5, Black: a6
White: Na3, Black: Bd5
White: e4, Black: Be6
White: Nxe6, Black: Kxe6
White: Nc4, Black: Nf6
White: Be2, Black: Kd7
White: Nb6+, Black: Kc7
White: Nxa8+, Black: Rxa8
White: Bg5, Black: e6
White: Bxf6, Black: Bxf6
White: O-O, Black: d5
White: exd5, Black: exd5
White: Bf3, Black: Kd6
White: Rae1, Black: c4
White: d4, Black: Ne7
White: Bg4, Black: h5
White: Bh3, Black: a5
White: b3, Black: cxb3
White: axb3, Black: a4
White: bxa4, Black: Rxa4
White: Rb1, Black: Rc4
White: Rb6+, Black: Rc6
White: Rxb7, Black: Rxc3
White: Rb6+, Black: Rc6
White: Rxc6+, Black: Nxc6
White: Rd1, Black: Nxd4
White: Kf1, Black: Nb5
White: Bc8, Black: d4
White: Ba6, Black: Nc3
White: Rc1, Black: Ne4
White: Bd3, Black: Nd2+
White: Ke2, Black: Nb3
White: Rb1, Black: Nc5
White: Rb6+, Black: Ke5
W

In [104]:
print(gen_cot_prompt(df.iloc[0]["moves"])[1])

69


## system prompt

In [105]:
system_prompt = '''
    Act as a chess master.
    You will be provided with a list of chess move pairs in Algebraic Notation where 1st move is by White and 2nd by Black.
    Analyse each move of each pair and check which chess piece has been captured. 
    Remember if a chess move has 'x' in the chess move, it means a chess piece has been captured.
    Provide a list of all chess pieces that have been captured and how many pieces have been captured for each player.
    '''

## data generation

In [106]:
# gpt3 = "gpt-3.5-turbo-0125"
# # oai_model = "gpt-4-turbo"
gpt4 = "gpt-4o"

In [107]:
def generate_explanation(server: OpenAI, 
                         model_name: str, 
                         df: pd.DataFrame, 
                         system_prompt: str, 
                         save_path: str, 
                         save_interval: int = 10) -> pd.DataFrame:
    return_df = df.copy()

    for i in range(df.shape[0]):
        print(f"[INFO] Generating explanation for row {i}")
        moves = df.iloc[i]["moves"]

        user_prompt_move, context_length = gen_cot_prompt(moves) # using gpt 4 for all
        user_content_cot = f"Move pairs are - \n{user_prompt_move}"
        
        stream = server.chat.completions.create(
            model=model_name,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_content_cot}
            ],
            stream=True,
        )
        
        explanation = ""
        for chunk in stream:
            if chunk.choices[0].delta.content is not None:
                explanation += chunk.choices[0].delta.content
        return_df.loc[i, "explanation"] = explanation.strip()
        
        # saving file at each interval
        if (i + 1) % save_interval == 0:
            header = (i == 0)
            return_df.to_csv(save_path, mode="w", header=header, index=False)
            print(f"[INFO] Saved progress as csv in {save_path} at iteration {i + 1}")

    # Save the final DataFrame after the loop
    print("[INFO] Final data saved")
    return return_df


In [108]:
# for testing
temp = df.iloc[:1]

In [109]:
temp

Unnamed: 0,moves
0,e3 c5 d3 d6 c3 g6 Nf3 Bg7 Bd2 Nc6 Na3 Qa5 Qb3 ...


In [110]:
test = generate_explanation(openai, gpt4, temp, system_prompt, "./interval_data/capture_interval.csv", 5)

[INFO] Generating explanation for row 0
[INFO] Final data saved


In [111]:
print(test.iloc[0]["moves"])

e3 c5 d3 d6 c3 g6 Nf3 Bg7 Bd2 Nc6 Na3 Qa5 Qb3 Be6 Qb5 Qxb5 Nxb5 Kd7 Ng5 a6 Na3 Bd5 e4 Be6 Nxe6 Kxe6 Nc4 Nf6 Be2 Kd7 Nb6+ Kc7 Nxa8+ Rxa8 Bg5 e6 Bxf6 Bxf6 O-O d5 exd5 exd5 Bf3 Kd6 Rae1 c4 d4 Ne7 Bg4 h5 Bh3 a5 b3 cxb3 axb3 a4 bxa4 Rxa4 Rb1 Rc4 Rb6+ Rc6 Rxb7 Rxc3 Rb6+ Rc6 Rxc6+ Nxc6 Rd1 Nxd4 Kf1 Nb5 Bc8 d4 Ba6 Nc3 Rc1 Ne4 Bd3 Nd2+ Ke2 Nb3 Rb1 Nc5 Rb6+ Ke5 Rb5 Be7 g3 Kd6 Rb6+ Kc7 Rb5 Kc6 Kf3 Bf6 h4 Kd6 Kf4 Be5+ Kg5 Bg7 Rb6+ Kc7 Rb5 Nxd3 f3 Ne5 Rc5+ Kd6 Rc2 Nxf3+ Kf4 Ne1 Rd2 Be5+ Ke4 Kc5 Kxe5 Nf3+ Kf6 Nxd2 Kxf7 Nf1 Kxg6 Nxg3 Kg5 d3 Kf4 d2 Kxg3 d1=Q Kf4 Qg1 Kf5 Kd4 Kf4 Qg4#


## this is wrong again

In [112]:
print(test.iloc[0]["explanation"])

Let's analyze the move pairs and identify the captures.

1. White: Qb5, Black: Qxb5
    - Black captures White's Queen on b5 with their Queen.
2. White: Nxb5, Black: Kd7
    - White captures Black's Queen on b5 with their Knight.
3. White: Nxe6, Black: Kxe6
    - White captures Black's Bishop on e6 with their Knight, and Black captures White's Knight on e6 with their King.
4. White: Nxa8+, Black: Rxa8
    - White captures Black's Rook on a8 with their Knight, and Black captures White's Knight on a8 with their Rook.
5. White: Bxf6, Black: Bxf6
    - White captures Black's Knight on f6 with their Bishop, and Black captures White's Bishop on f6 with their Bishop.
6. White: exd5, Black: exd5
    - White captures Black's pawn on d5 with their pawn, and Black captures White's pawn on d5 with their pawn.
7. White: bxa4, Black: Rxa4
    - White captures Black's pawn on a4 with their pawn, and Black captures White's pawn on a4 with their Rook.
8. White: Rxb7, Black: Rxc3
    - White captures Bl

# trying to pass list instead of pairs

In [127]:
system_prompt = '''
    Act as a chess master.
    You will be provided with a list of chess move pairs in Algebraic Notation.
    Analyse each move of each pair and check which chess piece has been captured. 
    Remember if a chess move has 'x' in the chess move, it means a chess piece has been captured.
    Count how many moves has 'x' in it. That will be the total number of chess pieces captured.
    Provide a list of all chess pieces that have been captured and how many pieces have been captured for each player.
    '''
def generate_explanation(server: OpenAI, 
                         model_name: str, 
                         df: pd.DataFrame, 
                         system_prompt: str, 
                         save_path: str, 
                         save_interval: int = 10) -> pd.DataFrame:
    return_df = df.copy()

    for i in range(df.shape[0]):
        print(f"[INFO] Generating explanation for row {i}")
        moves = df.iloc[i]["moves"]

        user_prompt_move, context_length = gen_cot_prompt(moves) # using gpt 4 for all
        user_content_cot = f"Move pairs are - \n{user_prompt_move}"
        print(user_content_cot)
        
        stream = server.chat.completions.create(
            model=model_name,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_content_cot}
            ],
            stream=True,
        )
        
        explanation = ""
        for chunk in stream:
            if chunk.choices[0].delta.content is not None:
                explanation += chunk.choices[0].delta.content
        return_df.loc[i, "explanation"] = explanation.strip()
        
        # saving file at each interval
        if (i + 1) % save_interval == 0:
            header = (i == 0)
            return_df.to_csv(save_path, mode="w", header=header, index=False)
            print(f"[INFO] Saved progress as csv in {save_path} at iteration {i + 1}")

    # Save the final DataFrame after the loop
    print("[INFO] Final data saved")
    return return_df


In [128]:
test = generate_explanation(openai, gpt4, temp, system_prompt, "./interval_data/capture_interval.csv", 5)

[INFO] Generating explanation for row 0
Move pairs are - 
White: e3, Black: c5
White: d3, Black: d6
White: c3, Black: g6
White: Nf3, Black: Bg7
White: Bd2, Black: Nc6
White: Na3, Black: Qa5
White: Qb3, Black: Be6
White: Qb5, Black: Qxb5
White: Nxb5, Black: Kd7
White: Ng5, Black: a6
White: Na3, Black: Bd5
White: e4, Black: Be6
White: Nxe6, Black: Kxe6
White: Nc4, Black: Nf6
White: Be2, Black: Kd7
White: Nb6+, Black: Kc7
White: Nxa8+, Black: Rxa8
White: Bg5, Black: e6
White: Bxf6, Black: Bxf6
White: O-O, Black: d5
White: exd5, Black: exd5
White: Bf3, Black: Kd6
White: Rae1, Black: c4
White: d4, Black: Ne7
White: Bg4, Black: h5
White: Bh3, Black: a5
White: b3, Black: cxb3
White: axb3, Black: a4
White: bxa4, Black: Rxa4
White: Rb1, Black: Rc4
White: Rb6+, Black: Rc6
White: Rxb7, Black: Rxc3
White: Rb6+, Black: Rc6
White: Rxc6+, Black: Nxc6
White: Rd1, Black: Nxd4
White: Kf1, Black: Nb5
White: Bc8, Black: d4
White: Ba6, Black: Nc3
White: Rc1, Black: Ne4
White: Bd3, Black: Nd2+
White: Ke2, B

In [129]:
print(test.iloc[0]["moves"])

e3 c5 d3 d6 c3 g6 Nf3 Bg7 Bd2 Nc6 Na3 Qa5 Qb3 Be6 Qb5 Qxb5 Nxb5 Kd7 Ng5 a6 Na3 Bd5 e4 Be6 Nxe6 Kxe6 Nc4 Nf6 Be2 Kd7 Nb6+ Kc7 Nxa8+ Rxa8 Bg5 e6 Bxf6 Bxf6 O-O d5 exd5 exd5 Bf3 Kd6 Rae1 c4 d4 Ne7 Bg4 h5 Bh3 a5 b3 cxb3 axb3 a4 bxa4 Rxa4 Rb1 Rc4 Rb6+ Rc6 Rxb7 Rxc3 Rb6+ Rc6 Rxc6+ Nxc6 Rd1 Nxd4 Kf1 Nb5 Bc8 d4 Ba6 Nc3 Rc1 Ne4 Bd3 Nd2+ Ke2 Nb3 Rb1 Nc5 Rb6+ Ke5 Rb5 Be7 g3 Kd6 Rb6+ Kc7 Rb5 Kc6 Kf3 Bf6 h4 Kd6 Kf4 Be5+ Kg5 Bg7 Rb6+ Kc7 Rb5 Nxd3 f3 Ne5 Rc5+ Kd6 Rc2 Nxf3+ Kf4 Ne1 Rd2 Be5+ Ke4 Kc5 Kxe5 Nf3+ Kf6 Nxd2 Kxf7 Nf1 Kxg6 Nxg3 Kg5 d3 Kf4 d2 Kxg3 d1=Q Kf4 Qg1 Kf5 Kd4 Kf4 Qg4#


In [130]:
print(test.iloc[0]["explanation"])

Let's analyze each move pair and note where a piece is captured ('x' in the move notation) and identify which piece was captured. We'll also keep a count of how many captures have occurred and which pieces were captured.

1. White: e3, Black: c5
   - No capture

2. White: d3, Black: d6
   - No capture

3. White: c3, Black: g6
   - No capture

4. White: Nf3, Black: Bg7
   - No capture

5. White: Bd2, Black: Nc6
   - No capture

6. White: Na3, Black: Qa5
   - No capture

7. White: Qb3, Black: Be6
   - No capture

8. White: Qb5, Black: Qxb5 (captured White Queen)
   - Pieces captured: White Queen 
   - Count: 1

9. White: Nxb5, Black: Kd7
   - No capture (the capture was on a previous turn)

10. White: Ng5, Black: a6
    - No capture

11. White: Na3, Black: Bd5
    - No capture

12. White: e4, Black: Be6
    - No capture

13. White: Nxe6, Black: Kxe6 (captured Black Bishop, then captured White Knight)
    - Pieces captured: Black Bishop, White Knight
    - Count: 3

14. White: Nc4, Black: