## Introduction

In this tutorial, we'll explore decision-making under uncertainty using reinforcement learning (RL). We'll introduce a simple 1D "Decision Learning Problem" where an agent must find the best actions (go left or right) to maximize its rewards while considering uncertainty. This will demonstrate how RL concepts can be applied in a wide range of fields, such as robotics, autonomous systems, medicine, finance, language, and vision.

## Decision Learning Problem
We'll start with a simple environment:

- The environment consists of 5 states (labeled 0 to 5), where 0 and 5 are terminal states.
- The agent starts in a non-terminal state (between 1 to 4).
- The goal is to find a policy, which maps states to actions (left or right), to maximize rewards.

In [1]:
class DecisionLearningProblem:
    def __init__(self):
        self.states = [0, 1, 2, 3, 4, 5]
        self.actions = ['left', 'right']
        self.terminal_states = [0, 5]
        self.rewards = {0: 10, 5: -10}
        
    def is_terminal(self, state):
        return state in self.terminal_states

    def step(self, state, action):
        if action == 'left':
            next_state = max(0, state - 1)
        elif action == 'right':
            next_state = min(5, state + 1)
        reward = self.rewards.get(next_state, 0)
        return next_state, reward

class StaticPolicy:
    def __init__(self, env):
        self.env = env
        self.policy = {1: 'left', 2: 'left', 3: 'right', 4: 'right'}
    
    def show_policy(self):
        print("Static Policy:", self.policy)

env = DecisionLearningProblem()
policy = StaticPolicy(env)
policy.show_policy()

for state in range(1, 5):
    action = policy.policy[state]
    next_state, reward = env.step(state, action)
    print(f"In state {state}, taking action '{action}' leads to state {next_state} with reward {reward}.")


Static Policy: {1: 'left', 2: 'left', 3: 'right', 4: 'right'}
In state 1, taking action 'left' leads to state 0 with reward 10.
In state 2, taking action 'left' leads to state 1 with reward 0.
In state 3, taking action 'right' leads to state 4 with reward 0.
In state 4, taking action 'right' leads to state 5 with reward -10.
