# Deep Reinforcement Learning: 0 to 100

Using RL to teach robots to fly a drone

'https://towardsdatascience.com/deep-reinforcement-learning-for-dummies/'

<img src="Image/00/StartOfCourse00.jpg">

Ever wondered how you’d teach a robot to land a drone without programming every single move? That’s exactly what I set out to explore. I spent weeks building a game where a virtual drone has to figure out how to land on a platform—not by following pre-programmed instructions, but by learning from trial and error, just like how you learned to ride a bike.

This is Reinforcement Learning (RL), and it’s fundamentally different from other machine learning approaches. Instead of showing the AI thousands of examples of “correct” landings, you give it feedback: “Hey, that was pretty good, but maybe try being more gentle next time?” or “Yikes, you crashed—probably don’t do that again.” Through countless attempts, the AI figures out what works and what doesn’t.

In this post, I’m documenting my journey from RL basics to building a working system that (mostly!) teaches a drone to land. You’ll see the successes, the failures, and all the weird behaviors I had to debug along the way.

## 1. Reinforcement learning: Overview

A lot of the idea can be related to Pavlov’s dog and Skinner’s rat experiments. The idea is that you give the subject a ‘reward‘ when it does something you want it to do (positive reinforcement) and a ‘penalty‘ when it does something bad (negative reinforcement). Through many repeated attempts, your subject learns from this feedback, gradually discovering which actions lead to success—similar to how Skinner’s rat learned which lever presses produced food rewards.

<img src="Image/00/Pavlov.jpg">

In the same fashion, we want a system that will learn to do things (or tasks) such that it can maximize the reward and minimize the penalty. Note this fact about maximizing reward, which will come in later.

### 1.1 Core Concepts

When talking about systems that can be implemented programmatically on computers, the best practice is to write clear definitions for ideas that can be abstracted. In the study of AI (and more specifically, Reinforcement learning), the core ideas can be boiled down to the following:

1. **Agent** (or Actor): This is our subject from the previous section. This can be the dog, a robot trying to navigate a huge factory, a video game NPC, etc.
2. **Environment** (or the world): This can be a place, a simulation with restrictions, a video game’s virtual game world, etc. I think of this like, “A box, real or virtual, where the agent’s entire life is confined to; it only knows of what happens within the box. We, as the overlords, can alter this box, while the agent will think that god is exacting his will on his world.”
3. **Policy**: Just like in governments, companies, and many more similar entities, ‘policies’ dictate “What actions should be taken when given a certain situation”.
4. **State**: This is what the agent “sees” or “knows” about its current situation. Think of it as the agent’s snapshot of reality at any given moment—like how you see the traffic light color, your speed, and the distance to the intersection when driving.
5. **Action**: Now that our agent can ‘see’ things in its environment, it may want to do something about its state. Maybe it just woke up from a long night’s slumber, and now it wants to get a cup of coffee. In this case, the first thing it will do is get out of bed. This is an action that the agent will take to achieve its goal, i.e., GET SOME COFFEE!
6. **Reward**: Every time the actor executes an action (of its own volition), something may change in the world. For example, our agent got out of bed and started walking towards the kitchen, but then, because it is so bad at walking, it tripped and fell. In this situation, the god (us) rewards it with a punishment for being bad at walking (negative reward). But then the agent makes it to the kitchen and gets the coffee, so the god (us) rewards it with a cookie (positive reward).

<img src="Image/00/RL_illustration.jpg">

As you can imagine, most of these key components need to be tailored for the specific task/problem that we want the agent to solve.