This project is an interactive web dashboard built with Flask + Vanilla JavaScript to visualize how classical Reinforcement Learning (RL) algorithms behave in different environments.
It focuses on making value functions, policies, and agent behavior visible and intuitive for students and practitioners.
- GridWorldEnvironment
- Deterministic grid
- Start, Goal, wall penalties
- FrozenLakEnviroment
- Stochastic FrozenLake-style grid
- Holes + slip probability
- Policy Evaluation
- Policy Improvement
- Policy Iteration
- Value Iteration
- Monte Carlo (Every-Visit & First-Visit)
- Temporal Difference TD(0)
- Color-coded grid:
S– StartG– GoalH– HoleF– Frozen / Free
- Policy arrows (↑ ↓ ← →)
- Animated agent movement
- Episode return plots (Chart.js)
- Tables for state values & rewards
- Grid size (rows, cols)
- Slip probability (FrozenLake)
- Discount factor
γ - Tolerance
θ - Episodes, learning rate
α, explorationε - Stochastic simulation toggle
- Episode trace history
All algorithms share a dictionary-based implementation style for clarity:
v[state]
Q[state][action]
policy[state]def policy_evaluation(givenPolicy, enviro, theta=0.0001, MAX=1000, gamma=0.9):
v = {s: 0.0 for s in enviro.states}
for _ in range(MAX):
delta = 0.0
for s in enviro.states:
old_value = v[s]
new_value = 0.0
action = givenPolicy[s]
for prob, next_state, reward in enviro.Prob[s][action]:
new_value += prob * (reward + gamma * v[next_state])
v[s] = new_value
delta = max(delta, abs(new_value - old_value))
if delta < theta:
break
return v- Policy Improvement
- Policy Iteration (with history)
- Value Iteration (with optional trace)
Located in Algorithms/Monte_CarloTypes.py
monte_carlo_every_visit(...)monte_carlo_first_visit(...)
Features:
- Episode generation
- Optional stochastic transitions
- Return tracking for learning curves
Located in Algorithms/Temporal_Differance.py
Update rule:
[ Q(s,a) \leftarrow Q(s,a) + \alpha (r + \gamma \max_{a'} Q(s',a') - Q(s,a)) ]
Supports:
- ε-greedy exploration
- Episode return history
- States:
(i, j) - Actions:
up, down, left, right - Deterministic transitions
- Rewards:
- Wall →
-1 - Normal →
0 - Goal →
+10
- Wall →
Extends GridWorld with:
- Random holes
- Slip probability
- Stochastic transitions
- Rewards:
- Goal →
+10 - Hole / invalid →
-1 - Otherwise →
0
- Goal →
GET /GET /env/<env_name>GET /api/run_algorithmPOST /api/simulate_policy
Example response:
{
"policy": { "(i,j)": "action" },
"values": { "(i,j)": 1.23 },
"episode_returns": [],
"trajectory": [],
"env": "frozenlake",
"rows": 4,
"cols": 4
}- HTML + CSS
- Vanilla JavaScript
- Chart.js
- Environment selection cards
- Algorithm control panel
- Grid visualization
- Agent animation
- Value & reward tables
- Learning curves
git clone https://github.com/Mariam-1611/Reinforcement_Learning_Algorithms_in_Different_Environment.git
cd Reinforcement_Learning_Algorithms_in_Different_Environmentpython -m venv .venv
# Windows
.venv\Scripts\activate
# Linux / Mac
source .venv/bin/activatepip install -r requirements.txtpython app.pyOpen:
http://127.0.0.1:5000
Reinforcement_Learning_Algorithms_in_Different_Environment/
│
├─ Algorithms/
│ ├─ Policy_Iteration.py
│ ├─ Value_Iteration.py
│ ├─ Monte_CarloTypes.py
│ └─ Temporal_Differance.py
│
├─ GridWorld_Enviroment.py
├─ FrozenLake_Enviroment.py
├─ app.py
├─ static/
│ ├─ style.css
│ └─ script.js
├─ templates/
│ ├─ index.html
│ └─ env_detail.html
├─ requirements.txt
└─ README.md
- Name: Mariam
- GitHub: @Mariam-1611