# Assignment 5: Automated Agent for Solving Data Science Tasks


## Meta Instructions
1. Environment:
    - Install the required package using: `pip install scikit-learn openai`
    - Additional packages may be required depending on your implementation.

2. Finish coding tasks according to instructions in this file, and change the code in the required area, which is indicated by "Your code starts here" and "Your code ends here".

3. Please summarize all your implementation process, experiment details, results, analysis, discussion and thoughts into a 2-page report. It is encouraged that you include a short proposal for future improvement in the conclusion part. Please submit the report in PDF format.

4. Submission: submit a zip file named "StuID_Name.zip" (e.g., "A0123456J_Wang-Wenjie.zip") to Canvas **Assignments -> Assignment5**. Note that it is **NOT** NUSNET ID. The zip file should **only** include "StuID_Assignment_5.ipynb" and "StuID_Assignment_5.pdf" with your implemented code. The submissison deadline is **23:59 on Apr. 11**.

5. Please strictly follow the above instructions, otherwise a grade deduction will be conducted.

### Grading Rules:  
1. Successfully completing the missing parts in the notebook, running the full workflow, and submitting a well-structured report that demonstrates your reasoning will earn you 8/10 of the total score.
2. **(Optional Puzzle)**:  If you aim for a higher score and want to earn 1-2 bonus points (without exceeding the maximum score of 60), you need to improve the existing pipeline and include an ablation study. One potential improvement is to predefine a set of data science-related APIs and tools (refer to Assignment 5 tutorial slides). Additional supplementary materials:
    - [Data Interpreter](https://arxiv.org/pdf/2402.18679)
    - [AutoCodeRover](https://arxiv.org/pdf/2404.05427)

### For any questions, please do one of the following actions with priority:
1. Search for similar questions on Slack (https://app.slack.com/client/T088V95D8LC/C088L557RK8).
2. Propose a new question on Slack if not already answered.
3. For non-public questions, e-mail to Xiangyan Liu (e0950125@u.nus.edu) and Pengfei Zhou (e1374451@u.nus.edu) with the subject starting with "CS5260 2025 Spring"

In [1]:
!pip install scikit-learn openai

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:

from openai import OpenAI

In [80]:
parent_dir = "/Users/evansun/Documents/Claudia/CS5446_AI_planning_decision_making/Sokoban-AI-agent"
data_file = parent_dir + "/human_demos/3_9.txt"

In [81]:

UP = 'up'
DOWN = 'down'
LEFT = 'left'
RIGHT = 'right'
    
class SokobanGame:
    """A class that contains the AI for the Sokoban game."""

    def __init__(self, data_file=None):
        self.DATA_FILE = data_file
        self.AcTION_SEQUENCE = None
        self.gameStateObj = None
        self.mapObj = None
        self.levelObj = None


    def get_actions(self, action_sequence):
        actions = []
        for char in action_sequence: 
            if char == "L":
                actions.append(LEFT)
            elif char == "R":
                actions.append(RIGHT)
            elif char == "U":
                actions.append(UP)
            elif char == "D":
                actions.append(DOWN)
        return actions


    def read_map_from_file(self, file_path):
        current_map = []
        with open(file_path, 'r') as sf:
            for line in sf.readlines():
                if '#' == line[0]:          # if the current line contains # which represents wall, then continue add this line as current map
                    current_map.append(line.strip())
                else:
                    break  
        return current_map



    def isWall(self, mapObj, x, y):
        """Returns True if the (x, y) position on
        the map is a wall, otherwise return False."""
        if x < 0 or x >= len(self.mapObj) or y < 0 or y >= len(self.mapObj[x]):
            return False # x and y aren't actually on the map.
        elif self.mapObj[x][y] in ('#', 'x'):
            return True # wall is blocking
        return False



    def isBlocked(self, x, y):
        """Returns True if the (x, y) position on the map is
        blocked by a wall or star, otherwise return False."""

        if self.isWall(self.mapObj, x, y):
            return True

        elif x < 0 or x >= len(self.mapObj) or y < 0 or y >= len(self.mapObj[x]):
            return True # x and y aren't actually on the map.

        elif (x, y) in self.gameStateObj['boxes']:
            return True # a star is blocking

        return False


    def makeMove(self, playerMoveTo):
        """Given a map and game state object, see if it is possible for the
        player to make the given move. If it is, then change the player's
        position (and the position of any pushed star). If not, do nothing.

        Returns True if the player moved, otherwise False."""

        # Make sure the player can move in the direction they want.
        playerx, playery = self.gameStateObj['player']

        # This variable is "syntactic sugar". Typing "stars" is more
        # readable than typing "gameStateObj['stars']" in our code.
        boxes = self.gameStateObj['boxes']

        # The code for handling each of the directions is so similar aside
        # from adding or subtracting 1 to the x/y coordinates. We can
        # simplify it by using the xOffset and yOffset variables.
        if playerMoveTo == UP:
            xOffset = -1
            yOffset = 0
        elif playerMoveTo == RIGHT:
            xOffset = 0
            yOffset = 1
        elif playerMoveTo == DOWN:
            xOffset = 1
            yOffset = 0
        elif playerMoveTo == LEFT:
            xOffset = 0
            yOffset = -1

        # See if the player can move in that direction.
        if self.isWall(self.mapObj, playerx + xOffset, playery + yOffset):
            return False
        else:
            if (playerx + xOffset, playery + yOffset) in boxes:
                # There is a star in the way, see if the player can push it.
                if not self.isBlocked(playerx + (xOffset*2), playery + (yOffset*2)):
                    # Move the star.
                    ind = boxes.index((playerx + xOffset, playery + yOffset))
                    boxes[ind] = (boxes[ind][0] + xOffset, boxes[ind][1] + yOffset)
                else:
                    return False
            # Move the player upwards.
            self.gameStateObj['player'] = (playerx + xOffset, playery + yOffset)
            return True


    # map_data is stored as list. Each item represents one line's layout
    def read_map(self, mapTextLines):
        self.mapObj = [list(mapline) for mapline in mapTextLines]     

        # Loop through the spaces in the map and find the @, ., and $
        # characters for the starting game state.
        startx = None # The x and y for the player's starting position
        starty = None
        goals = [] # list of (x, y) tuples for each goal.
        boxes = [] # list of (x, y) for each star's starting position.

        for x in range(len(self.mapObj)):
            for y in range(len(self.mapObj[0])):

                if self.mapObj[x][y] == '@':
                    startx = x
                    starty = y

                if self.mapObj[x][y] == '.':
                    goals.append((x, y))
                if self.mapObj[x][y] == '$':
                    boxes.append((x, y))
                if self.mapObj[x][y] == "*":
                    goals.append((x, y))
                    boxes.append((x, y))

        # Basic level design sanity checks:
        assert startx != None and starty != None, 'Level missing a "@" or "+" to mark the start point.' 
        assert len(goals) > 0, 'Levelmust have at least one goal.'
        assert len(boxes) >= len(goals), 'Level is impossible to solve. It has %s goals but only %s stars.'

        # Create level object and starting game state object.
        self.gameStateObj = {'player': (startx, starty),
                        'stepCounter': 0,
                        'boxes': boxes}
        self.levelObj = {'width': len(self.mapObj[0]),
                    'height': len(self.mapObj),
                    'mapObj': self.mapObj,
                    'goals': goals,
                    'startState': self.gameStateObj}

        # return self.levelObj

    
    def convert_current_state_to_map(self):
        map_width = self.levelObj['width']
        map_height = self.levelObj['height']
        ai_map = self.levelObj['mapObj']
        
        # print("starting mpa info:", starting_map_info)
        # print("stars_pos:", stars_pos)
        # print("player_pos:", player_pos)
        
        starting_map = self.levelObj['mapObj']
        goal_pos = self.levelObj['goals']
        
        for i in range(map_height):
            for j in range(map_width):
                if starting_map[i][j] == "#":
                    ai_map[i][j] = "#"
                # updated player and stars position
                elif (i, j) == self.gameStateObj['player']:
                    ai_map[i][j] = "@"
                elif (i, j) in self.gameStateObj['boxes']:
                    if (i, j) in goal_pos:
                        ai_map[i][j] = "*"
                    else:
                        ai_map[i][j] = "$"
                elif starting_map[i][j] == ".":
                    ai_map[i][j] = "."
                else:
                # other places are just floor
                    ai_map[i][j] = " "

        # print(ai_map)
        return ai_map

    def isLevelFinished(self):
        """Returns True if all the goals have stars in them."""
        for goal in self.levelObj['goals']:
            if goal not in self.gameStateObj['boxes']:
                # Found a space with a goal but no star on it.
                return False
        return True


In [None]:

import re

class SokobanAgent:
    def __init__(self):
        self.llm = OpenAI(
            api_key="", # input your api key here
            base_url="https://api.deepinfra.com/v1/openai",
        )
        self.temperature = 0.5
        self.max_iterations = 10
        

        self.SYSTEM_PROMPT_STRATEGIES = f'''
        You are a strategy and planning expert for Sokoban game. Your task is to analyze the provided sokoban map and build a high level strategy to solve the sokoban. Do not give detailed step. The sokoban game is a puzzle game where the player must push boxes onto target locations in a grid-like environment. The player can only move in four directions (up, down, left, right) and cannot move through walls or other boxes. The goal is to push all boxes onto their respective target locations.

        **LEGAL MOVE REMINDERS:**
        - Player can only move in U, D, L, R
        - Player cannot move into a wall (#)
        - The first row, first column has index of (0,0). When moving up, row index decreases by 1; when moving down, row index increases by 1; when moving left, column index decreases by 1; when moving right, column index increases by 1.
        - To push a box, player must move into the box, and the box must be able to move one step forward in the same direction into an empty space. If the box cannot move to an empty space, the player's move is invalid. Try to avoid the invalid move.
        - No pulling or swapping
        - No two or more boxes can be pushed into the same target location
        - Player can only push one box at a time
        - Tips: In order to move a box, the player might need to walk around to stay in the right position to push the box.

        ** sokoban map format: **
        The map is stored in a nested list. Each item represents the element at that position. "#" represents the wall, "@" represents player, "$" represents box, and "." represents target. " " represents the empty free space which player or box can move into. 
        Player position: (7,7) 
        box positions: (3,3), (5,5)
        target positions: (7,3), (7,6)
        
        **Example:**
        - MAP (indexing from 0):
        [['#', '#', '#', '#', '#', '#', '#', '#', '#', '#'], ['#', '#', '#', '#', '#', '#', '#', '#', '#', '#'], ['#', '#', ' ', ' ', ' ', ' ', ' ', ' ', '#', '#'], ['#', '#', ' ', '$', ' ', ' ', ' ', ' ', '#', '#'], ['#', '#', ' ', ' ', ' ', ' ', ' ', ' ', '#', '#'], ['#', '#', ' ', ' ', ' ', '$', ' ', ' ', '#', '#'], ['#', '#', ' ', ' ', ' ', ' ', ' ', ' ', '#', '#'], ['#', '#', ' ', '.', ' ', ' ', '.', '@', '#', '#'], ['#', '#', '#', '#', '#', '#', '#', '#', '#', '#'], ['#', '#', '#', '#', '#', '#', '#', '#', '#', '#']]
        Player Position: (7,7)
        Box Positions: (3,3), (5,5)
        Target Positions: (7,3), (7,6)

        - High Level Strategy:
        1. The player pushes the right-side box (5,5) to the target (7,6)
        2. The player pushes the box at (3,3) to the taret (7,3).
        
        You only need to provide the high level strategy in the format of a numbered list. Do not provide any code or detailed steps.
        If you cannot solve the sokoban, please say "I cannot solve this sokoban" and do not provide any code.
        '''.strip()
        
        
        self.SYSTEM_PROMPT_PLAN = F''' 
        You are a skilled player of Sokoban game. Your task is to provide a detailed step-by-step plan to solve the sokoban game. The sokoban game is a puzzle game where the player must push boxes onto target locations in a grid-like environment. The player can only move in four directions (up, down, left, right) and cannot move through walls or other boxes. The goal is to push all boxes onto their respective target locations.        
        The steps should be in the format of a numbered list, where each step describes the player's movement and the box movement if applicable.
        
        **LEGAL MOVE REMINDERS:**
        - Player can only move in U, D, L, R
        - Player cannot move into a wall (#)
        - To push a box, player must move into the box, and the box must be able to move one step forward in the same direction into an empty space. If the box cannot move to an empty space, the player's move is invalid. Try to avoid the invalid move.
        - No pulling or swapping
        - The box movement direction must be the same as the player movement direction
        - No two or more boxes can be pushed into the same target location
        - Player can only push one box at a time

        ** sokoban map format: **
        The map is stored in a nested list. Each item represents the element at that position. "#" represents the wall, "@" represents player, "$" represents box, and "." represents target. " " represents the empty free space which player or box can move into. 
        
        **Example:**
        - MAP (indexing from 0):
        [['#', '#', '#', '#', '#', '#', '#', '#', '#', '#'], ['#', '#', '#', '#', '#', '#', '#', '#', '#', '#'], ['#', '#', ' ', ' ', ' ', ' ', ' ', ' ', '#', '#'], ['#', '#', ' ', '$', ' ', ' ', ' ', ' ', '#', '#'], ['#', '#', ' ', ' ', ' ', ' ', ' ', ' ', '#', '#'], ['#', '#', ' ', ' ', ' ', '$', ' ', ' ', '#', '#'], ['#', '#', ' ', ' ', ' ', ' ', ' ', ' ', '#', '#'], ['#', '#', ' ', '.', ' ', ' ', '.', '@', '#', '#'], ['#', '#', '#', '#', '#', '#', '#', '#', '#', '#'], ['#', '#', '#', '#', '#', '#', '#', '#', '#', '#']]
        Player position: (7,7) 
        box positions: (3,3), (5,5). 
        target positions: (7,3), (7,6).
        
        - Your Step-by-Step Plan:
        1. Move the player from (7,7) to (6,7), box positions are kept as (3,3), (5,5) (UP)
        2. Move the player from (6,7) to (5,7), box positions are kept as (3,3), (5,5) (UP)
        3. Move the player from (5,7) to (4,7), box positions are kept as (3,3), (5,5) (UP)
        4. Move the player from (4,7) to (4,6), box positions are kept as (3,3), (5,5) (LEFT)
        5. Move the player from (4,6) to (4,5), box positions are kept as (3,3), (5,5) (LEFT)
        6. Move the player from (4,5) to (5,5), box positions are updated to (3,3), (6,5) (DOWN)
        7. Move the player from (5,5) to (6,5), box positions are updated to (3,3), (7,5) (DOWN)
        8. Move the player from (6,5) to (6,4), box positions are kept as (3,3), (7,5) (LEFT)
        9. Move the player from (6, 4) to (7, 4), box positions are kept as (3,3), (7,5) (DOWN)
        10. Move the player from (6, 4) to (7, 4), box positions are updated to (3,3) to (7,6) (RIGHT)
        11. Move the player from (7,5) to (7,4), box positions are kept as (3,3), (7,6) (LEFT)
        12. Move the player from (7,4) to (7,3), box positions are kept as (3,3), (7,6) (LEFT)
        13: Move the player from (7,3) to (7,2), box positions are kept as (3,3), (7,6) (LEFT)
        14. Move the player from (7,2) to (6,2), box positions are kept as (3,3), (7,6) (UP)
        15. Move the player from (6,2) to (5,2), box positions are kept as (3,3), (7,6) (UP)
        16. Move the player from (5,2) to (4,2), box positions are kept as (3,3), (7,6) (UP)        
        17. Move the player from (4,2) to (3,2), box positions are kept as (3,3), (7,6) (UP)
        18. Move the player from (3, 2) to (2,2), box positions are kept as (3,3), (7,6) (UP)
        19. Move the player from (2,2) to (2,3), box positions are kept as (3,3), (7,6) (RiGHT)
        20. Move the player from (2,3) to (3,3), box positions are updated to (4,3) to (7,6) (DOWN)
        21. Move the player from (3,3) to (4,3), box positions are updated to (5,3) to (7,6) (DOWN)
        22. Move the player from (4,3) to (5,3), box positions are updated to (6,3) to (7,6) (DOWN)
        23. Move the player from (5,3) to (6,3), box positions are updated to (7,3) to (7,6) (DOWN)
        
        You only need to provide the step-by-step plan in the format of a numbered list. Do not provide any other text or code.
        '''.strip()

        self.strategy_messages = [{"role": "system", "content": self.SYSTEM_PROMPT_STRATEGIES}]
        self.plan_messages = [{"role": "system", "content": self.SYSTEM_PROMPT_PLAN}]



    def query_llm_for_strategies(self, user_prompt=None): 
        total_messages = self.strategy_messages.copy()
        if user_prompt:
            total_messages.append({"role": "user", "content": user_prompt})
            
        response = self.llm.chat.completions.create(
            model="Qwen/Qwen3-32B",
            messages=total_messages,
            temperature=self.temperature,
            max_tokens=256
            
        )
        
        return response.choices[0].message.content


    def query_llm_for_plan(self, user_prompt=None):
        total_messages = self.plan_messages.copy()
        if user_prompt:
            total_messages.append({"role": "user", "content": user_prompt})
            
        response = self.llm.chat.completions.create(
            model="Qwen/Qwen3-32B",
            messages=total_messages,
            temperature=self.temperature,
            
        )
        
        return response.choices[0].message.content
    

    
    def post_processing_strategy(self, response):
        strategies = response.strip().split('\n')     
        for i, strategy in enumerate(strategies):
            match = re.search(r'^\d\.\s*(.*)', strategy)

            if match:
                strategies[i] = match.group(1)
            
        return strategies
    
    def post_processing_plan(self, response):
        
        steps = ""
        plan = response.strip().split('\n')     
        for i, step in enumerate(plan):
            if "move the player " in step.lower():
                if "(UP)" in step: 
                    steps += "U"
                elif "(DOWN)" in step:
                    steps += "D" 
                elif "LEFT" in step: 
                    steps += "L"
                elif "RIGHT" in step:
                    steps += "R"
            
        return steps


In [None]:
agent = SokobanAgent()
gamestate = SokobanGame(data_file)

error_info = ""
success = False
for i in range(6):
    # re-read from the data_file and get the map information for each iteration
    map = gamestate.read_map_from_file(data_file)
    gamestate.read_map(map)
    current_mapObj = gamestate.mapObj
    

    plan_user_prompt = f'''
    MAP: {current_mapObj}
    Player Position: {gamestate.gameStateObj['player']}
    Box Positions: {gamestate.gameStateObj['boxes']}
    Target Positions: {gamestate.levelObj['goals']}

    {error_info}
    Now please provide your Step-by-Step plan to solve the sokoban:
    '''


    print("plan_user_prompt:", plan_user_prompt)
    plan_result = agent.query_llm_for_plan(plan_user_prompt)
    print(plan_result)

    steps = agent.post_processing_plan(plan_result)
    print("step_plan:", steps)

    # verify if the generated steps could solve the sokoban
    if steps:
        actions = gamestate.get_actions(steps)
        print("actions:", actions)
        
        for i, action in enumerate(actions):
            is_valid = gamestate.makeMove(action)
            if not is_valid:
                error_info = f"The generated actions {actions} will be invalid from {i}th Action. Please try to generate a new plan."
                continue
        
        if gamestate.isLevelFinished():
            print("Congratulations! The sokoban is finished successfully.")
            success = True
            break
        else:
            print("The sokoban is not finished yet, please try again.")


if not success:
    print("Have reached the maximum iteration but has not gotten correct result. Please try again.")