# Conclusions 🎉

Congratulations on making it to the end of **Deep Reinforcement Learning: Zero to Hero!** We've
journeyed from the fundamentals of Markov Decision Processes to building cutting-edge algorithms
like PPO. You've trained agents to play games, land on the moon, and much more. I hope this course
has given you a solid foundation and the confidence to continue exploring the fascinating world of
Deep RL.

The field of RL is constantly evolving, with new ideas and architectures emerging all the time. In
this final notebook, we'll take a quick look at some of the exciting, advanced topics that are
pushing the boundaries of what's possible. This is not an exhaustive list, but rather a starting
point for your continued learning.


## More Advanced Topics 🚀

Here are some of the advanced topics and recent breakthroughs in the field that we didn't cover in
this course. They represent the current frontiers of RL research and are worth exploring further.


### Offline Reinforcement Learning

Offline RL (or batch RL) deals with learning from a fixed dataset of interactions, without the
ability to explore the environment further. This is a very practical setting, as it allows us to
leverage large, pre-existing datasets (e.g., from human demonstrations, or logs from a previous
policy). For a good overview, you can check out this
[Reddit discussion on Offline RL](https://www.reddit.com/r/reinforcementlearning/comments/utnhia/what_is_offline_reinforcement_learning/).
The main challenge is to learn a good policy without being able to correct for distributional shift.
Here are a few key algorithms in this area:

- **Conservative Q-Learning (CQL)**: A popular offline RL algorithm that learns a lower bound of the
  true Q-function to avoid overestimating the value of out-of-distribution actions. You can find the
  original implementation in this [CQL GitHub Repository](https://github.com/young-geng/CQL).
- **Implicit Q-Learning (IQL)**: An offline RL method that avoids querying values of unseen actions,
  instead learning by extracting the best actions implicitly from the dataset.
- **TD3+BC**: A simpler approach to offline RL that extends the TD3 algorithm with a behavioral
  cloning term to keep the policy close to the data distribution. A more recent approach,
  [Flow Q-Learning (FQL)](https://seohong.me/projects/fql/), builds on these ideas for even better
  performance.


### Dreamer

Dreamer is a model-based reinforcement learning agent that learns a world model of the environment
and uses it to "dream" or imagine future trajectories to learn a policy. This approach is very
sample-efficient, as the agent can learn from imagined experience, which is much cheaper than
real-world interaction. You can read more about it in Google's blog post,
[Introducing Dreamer](https://research.google/blog/introducing-dreamer-scalable-reinforcement-learning-using-world-models/).


### Decision Transformers

Decision Transformers reframe reinforcement learning as a sequence modeling problem. Instead of
learning a policy or a value function, a
[Decision Transformer](https://sites.google.com/berkeley.edu/decision-transformer) uses a GPT-like
architecture to predict future actions based on a desired return, past states, and actions. This is
a powerful approach for offline RL.


### Flow-based RL

This is a newer area of research that uses normalizing flows (a type of generative model) to
represent the policy. Flow-based models can represent complex, multi-modal action distributions,
which can be beneficial in many real-world tasks. A great example of this is the recent
[Flow Q-Learning (FQL)](https://seohong.me/projects/fql/) work.


### Meta-Learning

Meta-learning, or "learning to learn," aims to create agents that can quickly adapt to new tasks. In
the context of RL, a meta-learning agent is trained on a distribution of tasks, and the goal is for
it to generalize and learn a new, unseen task with very few samples. For a deep dive, check out this
comprehensive [Survey of Meta-Reinforcement Learning](https://arxiv.org/abs/2301.08028).


### Robotics and Vision-Language-Action Models (VLAs)

A very exciting application of RL is in robotics. Vision-Language-Action models (VLAs) are a recent
development where large, pre-trained models are used to control robots. These models can understand
natural language instructions, see the world through a camera, and take actions to complete complex
tasks.

- **Gemini VLA**: Google's VLA model that shows impressive capabilities in robotics, leveraging the
  power of the Gemini family of models. Read more about it in their post,
  [Gemini Robotics: Bringing AI into the Physical World](https://deepmind.google/discover/blog/gemini-robotics-bringing-ai-into-the-physical-world/).
- **$\pi0$**: A Vision-Language-Action flow model for general robot control, from the
  [Open-Source Physical Intelligence Project](https://www.pi.website/blog/openpi). You can also read
  about their [first open-source model](https://www.pi.website/blog/pi05) and a new fine-tuning
  technique called [Knowledge Insulation](https://www.pi.website/research/knowledge_insulation).


## Final Words

The journey of learning is a continuous one. The topics above are just a glimpse of the exciting
research happening in Deep Reinforcement Learning. I encourage you to pick one that fascinates you,
read the papers, and maybe even try to implement it yourself! The skills you've developed in this
course have prepared you for this next step.

Thank you for taking this course. Keep learning, keep experimenting, and I can't wait to see what
you'll build. Good luck! 🤖
