This repository implements a custom Gymnasium-compatible environment imulating browser-based workflows such as login, navigation, and form submission on a sample website. Agents are trained using PPO and A2C to autonomously perform multi-step web interactions.
| Name | Role / Contribution |
|---|---|
| Hailey D'souza | Lead Developer |
| Sammak Ahmed | Developer |
project_root/
│
├─ envs/ # Custom environment wrappers
│ └─ web_flow_env.py
│
├─ src/ # Training / evaluation / gameplay scripts
│ ├─ eval_web.py
│ ├─ graphs.py
│ └─ training_web.py
│
├─ web_app/ # Local test web app templates
│ ├─ templates/
│ │ ├─ index.html
│ │ ├─ login.html
│ │ ├─ navbar.html
│ │ ├─ contact.html
│ │ └─ dashboard.html
│ └─ server.py
│
├─ models/webflow/ # Saved PPO agent, created automatically during testing,demonstrates reproducibility and file structure
├─ logs/ # Training and evaluation logs
├─ configs/ # YAML configs for algorithms, rewards, seeds, personas
├─ requirements.txt
└─ README.md
- Ensure Python 3.9+ is installed.
- Install dependencies:
pip install -r requirements.txtor create a virtual environment:
python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txtThis setup requires two active terminals running concurrently:
Run the Flask web app (this must stay active while training or evaluation runs):
python server.pyThis starts a local web server at http://127.0.0.1:5000, which the DRL agent interacts with through Selenium.
Use src/training_web.py to train a PPO or A2C model on the WebFlowEnv.
| Argument | Description | Default |
|---|---|---|
--algo |
RL algorithm: ppo or a2c |
ppo |
--timesteps |
Total training timesteps | 8000 |
--seed |
Random seed for reproducibility | 42 |
--persona |
Agent persona form_filler or explorer |
form_filler |
Train a PPO model with for form-filling tasks:
python -m src.training_web --algo ppo --persona form_filler --timesteps 8000Train an A2C agent for exploration tasks:
python -m src.training_web --algo a2c --persona explorer --timesteps 5000Evaluate a trained model and log metrics using src/eval_web.py.
| Argument | Description | Default |
|---|---|---|
--algo |
Algorithm type: ppo or a2c |
ppo |
--model_path |
Path to the saved model | required |
--episodes |
Number of evaluation episodes | 5 |
--persona |
Reward persona used for evaluation | form_filler |
Evaluate a PPO model visually:
python -m src.eval_web --algo ppo --model_path models/webflow/ppo_formfiller_final.zip --episodes 5If the trained model file is missing, the script automatically generates dummy evaluation data in:
logs/ppo_formfiller_eval.csv
Use src/graph_results.py to visualize training results and evaluation metrics.
python -m src.graph_results| File | Description |
|---|---|
logs/ppo_formfiller_eval.csv |
Episode-level metrics |
logs/reward_vs_episode.png |
Reward progression plot |
| Action | Description |
|---|---|
0 |
Click next / proceed |
1 |
Fill form input |
2 |
Submit form |
3 |
Navigate to next page |
Each observation represents a simplified web state (page progress, form completion, and navigation success indicators).

| Event | Reward |
|---|---|
| Successful login | +10 |
| Correct form input | +5 |
| Navigation to contact page | +8 |
| Repeated failure | -2 |
| Timeout | -5 |
The evaluation script logs:
- Total reward per episode
- Login success rate
- Contact page visit success
- Steps taken
- Time elapsed
Reward Curve:
Score Distributions:
Success Rate Analysis:
Evaluation CSV Preview:
| Episode | Total Reward | Login Success | Contact Visit | Steps | Time Elapsed |
|---|---|---|---|---|---|
| 1 | 12.0 | 1 | 1 | 15 | 18.2 |
| 2 | 9.5 | 1 | 0 | 12 | 14.6 |
After setup, run:
python -m src.training_web
python -m src.eval_web
python -m src.graph_resultsExpected outputs:
- ✅ Automated Chrome browser interaction
- ✅ Evaluation CSV with metrics
- ✅ Reward vs Episode graph in
logs/
- ✅ Compatible with headless browser automation (Selenium + Gymnasium)
- ✅ Uses PPO and A2C algorithms from Stable-Baselines3
- ✅ CSV + PNG outputs for report submission
- ✅ Dummy model/eval fallback for reproducibility
- ❌ Long training times if timesteps > 5000
- ❌ Browser window may freeze if not headless



