# Reinforcement Learning with Knowledge Shaping (RL-KS)
## Comprehensive Analysis and Results

**Project Team:**
- Charith Kapuluru
- Bhargav Reddy Alimili
- Karthik Saraf
- Sreekanth Taduru
- Himesh Chander Addiga

This notebook presents the complete analysis of our experiments comparing baseline Reinforcement Learning (RL) with Reinforcement Learning with Knowledge Shaping (RL-KS).

In [None]:
import sys
import os
sys.path.append('src')

import numpy as np
import matplotlib.pyplot as plt
import pickle
from IPython.display import Image, display

# Set plot style
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

## 1. Introduction

### Problem Statement
Reinforcement Learning is powerful but slow to learn new tasks from scratch. Transfer learning through knowledge shaping can accelerate learning by incorporating prior knowledge from related source tasks.

### Approach
We implemented:
1. **Baseline Q-Learning**: Standard tabular Q-learning agent
2. **RL-KS**: Q-learning enhanced with knowledge shaping from a source task

### Hypothesis
RL-KS should demonstrate faster convergence and/or better sample efficiency compared to baseline RL by leveraging prior knowledge.

## 2. Experimental Setup

### Environments
1. **GridWorld**: Discrete navigation task (5×5 grid)
   - Source task: Specific obstacle configuration
   - Target task: Different obstacle configuration (similar structure)

2. **CartPole**: Classic control task with discretized state space
   - Source task: Coarse-grained discretization (4×4×4×4)
   - Target task: Fine-grained discretization (6×6×6×6)

### Hyperparameters
- Learning rate (α): 0.1
- Discount factor (γ): 0.99
- Initial epsilon (ε): 1.0
- Epsilon decay: 0.995
- Minimum epsilon: 0.01
- Knowledge shaping weight (λ): 0.5 (GridWorld), 0.3 (CartPole)

## 3. GridWorld Experiment Results

In [None]:
# Load GridWorld results
with open('results/gridworld_results.pkl', 'rb') as f:
    gridworld_results = pickle.load(f)

gw_training = gridworld_results['training_results']
gw_eval = gridworld_results['eval_results']

print("GridWorld Results Summary")
print("="*60)
for name, results in gw_eval.items():
    print(f"\n{name}:")
    print(f"  Average Reward: {results['avg_reward']:.2f} ± {results['std_reward']:.2f}")
    print(f"  Success Rate: {results['success_rate']:.2%}")
    print(f"  Average Episode Length: {results['avg_length']:.2f}")

In [None]:
# Display GridWorld comparison plot
display(Image('results/gridworld_comparison.png'))

### GridWorld Analysis

Both agents achieved similar final performance, with 100% success rate. The GridWorld task demonstrates that:
- Knowledge transfer is effective for structurally similar tasks
- Both agents converge to optimal policies
- The task complexity is relatively low, allowing both approaches to succeed

## 4. CartPole Experiment Results

In [None]:
# Load CartPole results
with open('results/cartpole_results.pkl', 'rb') as f:
    cartpole_results = pickle.load(f)

cp_training = cartpole_results['training_results']
cp_eval = cartpole_results['eval_results']

print("CartPole Results Summary")
print("="*60)
for name, results in cp_eval.items():
    print(f"\n{name}:")
    print(f"  Average Reward: {results['avg_reward']:.2f} ± {results['std_reward']:.2f}")
    print(f"  Success Rate: {results['success_rate']:.2%}")
    print(f"  Average Episode Length: {results['avg_length']:.2f}")

In [None]:
# Display CartPole comparison plot
display(Image('results/cartpole_comparison.png'))

### CartPole Analysis

In CartPole, baseline RL outperformed RL-KS. This reveals important insights:
1. **Domain Mismatch**: Source and target tasks had different discretizations
2. **Negative Transfer**: Prior knowledge from coarser discretization may have interfered
3. **State Space Difference**: Different binning strategies led to incompatible Q-values

This demonstrates that knowledge transfer requires careful consideration of task similarity.

## 5. Comparative Analysis

In [None]:
# Create detailed comparison plots
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# GridWorld Learning Curves
ax1 = axes[0, 0]
for name, results in gw_training.items():
    rewards = results['episode_rewards']
    window = 20
    smoothed = np.convolve(rewards, np.ones(window)/window, mode='valid')
    ax1.plot(smoothed, label=name, linewidth=2)
ax1.set_title('GridWorld: Training Rewards', fontsize=14, fontweight='bold')
ax1.set_xlabel('Episode', fontsize=12)
ax1.set_ylabel('Reward (smoothed)', fontsize=12)
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)

# GridWorld Episode Lengths
ax2 = axes[0, 1]
for name, results in gw_training.items():
    lengths = results['episode_lengths']
    window = 20
    smoothed = np.convolve(lengths, np.ones(window)/window, mode='valid')
    ax2.plot(smoothed, label=name, linewidth=2)
ax2.set_title('GridWorld: Episode Lengths', fontsize=14, fontweight='bold')
ax2.set_xlabel('Episode', fontsize=12)
ax2.set_ylabel('Steps (smoothed)', fontsize=12)
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)

# CartPole Learning Curves
ax3 = axes[1, 0]
for name, results in cp_training.items():
    rewards = results['episode_rewards']
    window = 50
    smoothed = np.convolve(rewards, np.ones(window)/window, mode='valid')
    ax3.plot(smoothed, label=name, linewidth=2)
ax3.set_title('CartPole: Training Rewards', fontsize=14, fontweight='bold')
ax3.set_xlabel('Episode', fontsize=12)
ax3.set_ylabel('Reward (smoothed)', fontsize=12)
ax3.legend(fontsize=11)
ax3.grid(True, alpha=0.3)

# CartPole Episode Lengths
ax4 = axes[1, 1]
for name, results in cp_training.items():
    lengths = results['episode_lengths']
    window = 50
    smoothed = np.convolve(lengths, np.ones(window)/window, mode='valid')
    ax4.plot(smoothed, label=name, linewidth=2)
ax4.set_title('CartPole: Episode Lengths', fontsize=14, fontweight='bold')
ax4.set_xlabel('Episode', fontsize=12)
ax4.set_ylabel('Steps (smoothed)', fontsize=12)
ax4.legend(fontsize=11)
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('results/comprehensive_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("Comprehensive comparison plot saved!")

## 6. Key Findings

### Successes
1. **Implementation**: Successfully implemented both baseline Q-learning and RL-KS
2. **GridWorld Performance**: Both approaches achieved optimal performance on GridWorld
3. **Framework**: Created reusable framework for knowledge transfer experiments

### Challenges
1. **CartPole Transfer**: Knowledge transfer was less effective due to:
   - Different state space discretizations
   - Potential negative transfer from mismatched representations
2. **Convergence Detection**: Simple threshold-based convergence metrics
3. **Computational Constraints**: Limited to tabular Q-learning

### Insights
1. **Task Similarity Matters**: Successful knowledge transfer requires similar state/action representations
2. **Discretization Impact**: State space discretization significantly affects transfer effectiveness
3. **Shaping Weight Tuning**: The knowledge shaping weight λ requires careful tuning

## 7. Future Improvements

1. **Better Transfer Methods**:
   - State mapping functions between different discretizations
   - Progressive neural networks for deep RL
   - Meta-learning approaches

2. **Advanced Algorithms**:
   - Deep Q-Networks (DQN) for continuous state spaces
   - Actor-Critic methods (A3C, PPO)
   - Successor representations

3. **Evaluation Metrics**:
   - Sample efficiency curves
   - Transfer learning metrics (jumpstart, asymptotic performance)
   - Statistical significance testing

4. **Additional Experiments**:
   - More diverse source-target task pairs
   - Multiple source tasks
   - Ablation studies on shaping weight

## 8. Conclusion

This project demonstrated the implementation and evaluation of Reinforcement Learning with Knowledge Shaping. While RL-KS showed promise in GridWorld with structurally similar tasks, it faced challenges in CartPole due to representation mismatches.

**Key Takeaways**:
- Knowledge transfer can accelerate learning when source and target tasks are sufficiently similar
- State representation compatibility is crucial for successful transfer
- Careful hyperparameter tuning (especially shaping weight) is essential
- Baseline RL remains robust when transfer assumptions don't hold

This work provides a foundation for understanding knowledge transfer in RL and highlights important considerations for practical applications.

## 9. References

1. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
2. Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. JMLR.
3. Ng, A. Y., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. ICML.
4. Brys, T., et al. (2015). Reinforcement learning from demonstration through shaping. IJCAI.
5. OpenAI Gymnasium Documentation: https://gymnasium.farama.org/