# 1.What is Grid-World?

Grid-World is most commontly used discrete(离散的) Markov Decision Process(MDP, 马尔科夫决策链) example environment in RL.Agent is located on a two-dimensional grid. Each grid cell represents a state. The agent moves through actions, receives rewards, and enters a new state.

# 2.Establish the key components of Grid - World (the five elements of MDP)

## 2.1 State set $\mathbb{S}$

Number all the grid positions:

for a grid of $ m \times n $ :

$$
\mathbb{S} = {(i,j) | i=1...m, j=1...n}
$$

## 2.2 Action set $\mathbb{A}$

Usually, five actions are used.

the code is :
```python
self.actions = ['up', 'right', 'down', 'left', 'stay']

    # -------- 处理动作移动 --------
    if action == 'a1' or action == 'up':
        next_state = (max(i - 1, 1), j)
    elif action == 'a2' or action == 'right':
        next_state = (i, min(j + 1, self.cols))
    elif action == 'a3' or action == 'down':
        next_state = (min(i + 1, self.rows), j)
    elif action == 'a4' or action == 'left':
        next_state = (i, max(j - 1, 1))
    elif action == 'a5' or action == 'stay':
        next_state = self.state
    else:
        raise ValueError("Invalid action")
```

In [3]:
import numpy as np

class GridWorld():
    """网格世界环境
    坐标定义：
    (1,1)----->(1,n)
    |
    |
    v
    (m,1)----->(m,n)

    使用示例：
    env = GridWorld(
        rows=3, 
        cols=3, 
        start_state=(1,1), 
        terminal_states={(3,3)}, 
        rewards={(3,3): 10, (2,2): -5})
    """
    
    def __init__(
        self, 
        rows,
        cols, 
        start_state, 
        forbidden_states, 
        target_state
    ):
        """
        rows, cols: 网格大小
        start_state: 初始状态 (i,j)
        forbidden_states: 禁止状态的集合 { (i,j), ... }
        target_state: 目标状态 (i,j)
        """
        self.rows = rows
        self.cols = cols
        self.actions = ['up', 'right', 'down', 'left', 'stay']
        self.start_state = start_state
        self.forbidden_states = forbidden_states
        self.target_state = target_state
        self.state = start_state

    def reset(self):
        """
        重置环境到初始状态
        """
        self.state = self.start_state
        return self.state

    def take_action_to_next_state_and_reward(self, action):
        """
        action ∈ {a1:up, a2:right, a3:down, a4:left, a5:stay}
        返回：next_state, reward, done
        """
        if self.state in self.terminal_states:
            return self.state, 0, True

        i, j = self.state

        # -------- 处理动作移动 --------
        if action == 'a1' or action == 'up':
            next_state = (max(i - 1, 1), j)
        elif action == 'a2' or action == 'right':
            next_state = (i, min(j + 1, self.cols))
        elif action == 'a3' or action == 'down':
            next_state = (min(i + 1, self.rows), j)
        elif action == 'a4' or action == 'left':
            next_state = (i, max(j - 1, 1))
        elif action == 'a5' or action == 'stay':
            next_state = self.state
        else:
            raise ValueError("Invalid action")

        # -------- 奖励 --------
        reward = self.rewards.get(next_state, -1)

        # -------- 是否终止 --------
        done = next_state in self.terminal_states

        self.state = next_state
        return next_state, reward, done

    def render(self):
        """
        打印 grid 的当前状态，智能体用 A 标记
        """
        grid = [['.' for _ in range(self.cols)] for _ in range(self.rows)]
        for (i, j) in self.terminal_states:
            grid[i-1][j-1] = 'T'
        i, j = self.state
        grid[i-1][j-1] = 'A'

        for row in grid:
            print(' '.join(row))
        print()


In [4]:
env = GridWorld(
    rows=3, 
    cols=3, 
    start_state=(1,1), 
    forbidden_states={(3,3)}, 
    target_state=(3,3)
    )