# Reinforcement Learning

Select one of the following research papers, read it, then write a critical summary of it in about **600 words**.

- [A scalable approach to optimize traffic signal control with federated reinforcement learning](https://www.nature.com/articles/s41598-023-46074-3)
- [Faster sorting algorithms discovered using deep reinforcement learning](https://www.nature.com/articles/s41586-023-06004-9)
- [Discovering faster matrix multiplication algorithms with reinforcement learning](https://www.nature.com/articles/s41586-022-05172-4)
- [Educational Timetabling: Problems, Benchmarks, and State-of-the-Art Results](https://arxiv.org/abs/2201.07525)
- [Deep Reinforcement Learning in Surgical Robotics: Enhancing the Automation Level](https://arxiv.org/abs/2309.00773)
- [Reinforcement Learning for Battery Management in Dairy Farming](https://arxiv.org/abs/2308.09023)
- [Integrating Renewable Energy in Agriculture: A Deep Reinforcement Learning-based Approach](https://arxiv.org/abs/2308.08611)

Your summary must capture the key ingredients of Reinforcement Learning mentioned in the paper, e.g. specification of the environment, agent, reward, etc.
Do not cover the background material already explained in the lectures.

**N.B.** If you use any images then put them in the `img` folder, then include using `![](img/image_filename)`.

## Paper summary

Title: Reinforcement Learning for Battery Management in Dairy Farming

Authors: Nawazish Ali, Abdul Wahid, Rachael Shaw, and Karl Mason

The paper by Ali et al. brings a fresh method of Reinforcement Learning (RL) application to the energy management of batteries on the dairy farms, an area not quite covered yet and of utmost importance in terms of energy-intensive nature of contemporary dairy farms. The objective is to minimize the electricity used, and depend on the grid by competent timing of charging, discharging, or sitting still a battery that gathers energy produced by solar photovoltaic (PV) systems.
The main contribution of the investigation is the embedment of model-free RL algorithm in Q-learning to find the best policy on battery usage. The proposal develops a simulation system that is an illustration of a common grid-connected dairy farm with a Tesla Powerwall 2.0 and photovoltaic panels. The model implementing the environment is modeled on actual consumption data of a dairy farm in Finland and PV data through the System Advisor Model (SAM). The pricing information takes into consideration Time-of-Use (TOU) pricing that approximate the real-life electricity market forces.
Battery management system is the agent in this RL environment and the action space (A) is defined as three discrete actions, charging (0), discharging (1), and idling (2). State space (S) is characterized by factors like charges in the battery and the time of the day. The reward function (Rp) derives on the energy purchases of electricity less than minimum and self-consumption of PV-produced energy exceeding the utmost. In particular, the reward is based on the grid electricity price less a constant baseline that stimulates against using grid energy at a high price.
The RL-based approach is contrasted to two conventional methods of setting baseline, namely Maximizing Self-Consumption (MSC) and Time-of-Use (TOU). The Q-learning agent is observed to be better in that it saves grid electricity import in the range of 9.45-10.42 percent and the electricity cost by 11.93-12.39 percent within one year. The figures below represent these findings in figures that compare the levels of electricity import and prices in the various months. The battery charging profile also increases the conclusion that under Q-learning the off-peak hours were utilized more efficiently.
Ali et al. argue that such application of RL has a number of strong points. It gives online, dynamic decision-making capabilities in changing energy environments which conventional rule-based algorithms do not have. It scales smoothly as well in the long time horizons despite volatile solar sources and electricity demand. The research is, nevertheless, narrow and restricted to Q-learning because it does not investigate more sophisticated methods of deep RL or problems involving multiple agents. Also, expansion to bigger farms and effects of battery depreciation over time are not dealt with yet.
Conclusively, this paper shows convincingly how reinforcement learning could streamline battery time allocation in dairy farming. The project is innovative and practically important, as it corresponds to the sustainability ambitions of Ireland. Ali et al. suggest more advanced research on RL models, adaptation to weather conditions, and additional benchmark questions with the comparison with other learning algorithms to achieve better performance in the context of agriculture energy systems.A closer look at the tech side shows that the agent learns by updating a Q-table over and over. This table links each move an agent can make with the rewards it expects in the future. This learning step is driven by key settings, including the learning rate  to manage the impact of new info, the discount factor  to keep a balance between now and later rewards, and a way to explore like epsilon-greedy to make sure it finds the best plans.

Moreover, we should pay more attention to how the model's limits, like battery wear, affect things. By not thinking about the high cost of replacing batteries after each use, the agent might pick a plan that makes money now but might turn the whole system into a money loss by damaging the batteries too soon. This shows a big difference between what happens in tests and what works money-wise in real life.



## Reference


Ali, N., Wahid, A., Shaw, R., & Mason, K. (2023). Reinforcement Learning for Battery Management in Dairy Farming. arXiv preprint arXiv:2308.09023. Retrieved from https://arxiv.org/abs/2308.09023