# Reinforcement Learning

Select one of the following research papers, read it, then write a critical summary of it in about **600 words**.

- [A scalable approach to optimize traffic signal control with federated reinforcement learning](https://www.nature.com/articles/s41598-023-46074-3)
- [Faster sorting algorithms discovered using deep reinforcement learning](https://www.nature.com/articles/s41586-023-06004-9)
- [Discovering faster matrix multiplication algorithms with reinforcement learning](https://www.nature.com/articles/s41586-022-05172-4)
- [Educational Timetabling: Problems, Benchmarks, and State-of-the-Art Results](https://arxiv.org/abs/2201.07525)
- [Deep Reinforcement Learning in Surgical Robotics: Enhancing the Automation Level](https://arxiv.org/abs/2309.00773)
- [Reinforcement Learning for Battery Management in Dairy Farming](https://arxiv.org/abs/2308.09023)
- [Integrating Renewable Energy in Agriculture: A Deep Reinforcement Learning-based Approach](https://arxiv.org/abs/2308.08611)

Your summary must capture the key ingredients of Reinforcement Learning mentioned in the paper, e.g. specification of the environment, agent, reward, etc.
Do not cover the background material already explained in the lectures.

**N.B.** If you use any images then put them in the `img` folder, then include using `![](img/image_filename)`.

## Paper summary

<h2>Title: Federated Reinforcement Learning for Scalable Traffic Signal Control for Optimizing Intersection Management with Partial Model Aggregation</h2>

<h3>1. Reinforcement Learning Framework</h3>
<p>The paper presents a comprehensive RL framework for Traffic Signal Control (TSC) with the following components:</p>

<ul>
  <li><strong>Environment:</strong> Urban traffic network with multiple heterogeneous intersections (varying lane configurations, phase settings)</li>
  <li><strong>Agent:</strong> Traffic signal controller at each intersection</li>
  <li><strong>State Space:</strong> Multidimensional matrix representing queue length, vehicle count, position/speed, delay time, and current phase</li>
  <li><strong>Action Space:</strong> Phase selection from intersection-specific valid phase sets</li>
  <li><strong>Reward Function:</strong> Weighted sum of average queue length (H) and waiting time (T): R = -(H + αT)</li>
  <li><strong>Algorithm:</strong> Deep Q-Network (DQN) with experience replay and target network</li>
</ul>

<h3>2. Federated Learning Integration</h3>
<p>The key innovation is combining RL with Federated Learning (FL):</p>

<ul>
  <li><strong>Network Architecture:</strong> Divided into global feature extraction layers (federated) and local feature extraction layers</li>
  <li><strong>FL Process:</strong>
    <ul>
      <li>Local agents train DQN models independently</li>
      <li>Global layers are aggregated using AvgFed algorithm every 20 episodes</li>
      <li>Local layers remain specific to each intersection</li>
    </ul>
  </li>
  <li><strong>Fine-tuning:</strong> New intersections adapt global model by only training local layers</li>
</ul>

<h3>3. Experimental Results</h3>
<p>The method was tested on real-world networks (Cologne and Monaco) showing:</p>

<ul>
  <li>2.29% better convergence than independent DQN</li>
  <li>39.95% reduction in halting vehicles</li>
  <li>55.65% reduction in waiting time for first vehicle</li>
  <li>64.48% reduction in average cumulative waiting time</li>
  <li>Successful transfer learning to new intersections with different configurations</li>
</ul>

<h3>4. Critical Analysis</h3>
<p><strong>Strengths:</strong></p>
<ul>
  <li>Novel combination of FL and RL addresses scalability challenges</li>
  <li>Unified state representation handles heterogeneous intersections</li>
  <li>Practical solution preserving data privacy (no raw data sharing)</li>
  <li>Comprehensive evaluation on real-world networks</li>
</ul>

<p><strong>Limitations:</strong></p>
<ul>
  <li>Assumes uniform distribution of observation data for aggregation</li>
  <li>Fixed FL interval (20 episodes) may not be optimal for all scenarios</li>
  <li>Limited exploration of different neural network architectures</li>
  <li>Testing limited to two specific urban networks</li>
</ul>

<h3>5. Summary</h3>
<p>The paper presents a significant advancement in TSC by combining FL and RL. The partial model aggregation approach effectively balances global knowledge sharing with local specialization, demonstrating superior performance over independent DQN methods. The solution is particularly valuable for real-world deployment where intersections have varying configurations and data privacy is important. Future work could explore dynamic FL aggregation strategies and more sophisticated neural architectures to further improve performance.</p>