Skip to content

Commit

Permalink
Iterate on text delivery and fix typos
Browse files Browse the repository at this point in the history
  • Loading branch information
SwamyDev committed Feb 24, 2020
1 parent 82dc11a commit 33b286e
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 26 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
[![Build Status](https://travis-ci.com/SwamyDev/udacity-deep-rl-navigation.svg?branch=master)](https://travis-ci.com/SwamyDev/udacity-deep-rl-navigation) [![Coverage Status](https://coveralls.io/repos/github/SwamyDev/udacity-deep-rl-navigation/badge.svg?branch=master)](https://coveralls.io/github/SwamyDev/udacity-deep-rl-navigation?branch=master)
# Udacity Projects

This repository is part of the Udacity Reinforcement Learning Nanodegree. It contains solutions to the courses class projects `navigation` and `continuous control`. You can find more detailed explanations for each project and their environments in their dedicated README or Report.md files:
This repository is part of the Udacity Reinforcement Learning Nanodegree. It contains solutions to the courses class projects `navigation` and `continuous control`. You can find more detailed explanations for each project and their environments in their dedicated README or Report files:

- [Project Navigation](doc/README_p1_navigation.md)
- [Project Continuous Control](doc/README_p2_continuous.md)

## Installation
To run the code of the projects you need to install the repositories virtual environment. To make this as easy as possible it uses `GNU Make` to set up virtual environments and download dependencies. It requires a Linux environment. Under Ubuntu make is part of the `build-essential` package (`apt install build-essential`). Other dependencies are python3 virutalenv (`apt install python3-venv`) and pip (`apt install python3-pip`).
To run the code of the projects you need to install the repositories virtual environment. To make this as easy as possible I provide a `Makefile` using `GNU Make` to set up virtual environments and download dependencies. It requires a Linux environment. Under Ubuntu make is part of the `build-essential` package (`apt install build-essential`). Other dependencies are python3 virutalenv (`apt install python3-venv`) and pip (`apt install python3-pip`).

### Setup & Test
To create Python virtual environments and install dependencies run:
Expand All @@ -21,10 +21,10 @@ make test
```

## Quick Start
This section show exemplary usage of the `udacity-rl` command line interface by training and running agent for the `navigation` project and exploring its environment.
This section shows an example usage of the `udacity-rl` command-line interface by training and running the agent for the `navigation` project and exploring its environment.

### The Command Line Interface
When the environment is set up you can activate the environment (i.e. `source venv/bin/activate`) and you have access to the udacity-rl command-line interface. With this interface you can run the code for each project. The following section describes how to run a project and how to get help.
When the environment is set up you can activate the environment (i.e. `source venv/bin/activate`) and you have access to the udacity-rl command-line interface. With this interface, you can run the code for each project. The following section describes how to run a project and how to get help.

Showing help messages:
```bash
Expand All @@ -35,22 +35,22 @@ Example: Training the `navigation` agent on the Banana environment with the stan
```bash
udacity-rl -e resources/environments/Banana_Linux/Banana.x86_64 train DQN 3000 -c configs/standard.json
```
The `-e` flag specifies the environment (here the Unity-environment executable). The arguments after `train` specify the algorithm to be used and the number of episodes the agent should be trained. The -c flag sets the config file to be used for the agent.
The `-e` flag specifies the environment (here the Unity-environment executable). The arguments after `train` specify the algorithm to be used and the number of episodes the agent should be trained. The `-c` flag sets the config file to be used for the agent.

### Agent Configuration
The JSON files found in `configs` contain the description of the agents model(s), learning parameters and epsilon behaviour. The various configs have been tried to find good solutions to the environment and serve as living documentation of the process.

### Running a Saved Agent
The agent is serialized, after a successful training run (by default under `/tmp/agent_ckpt`). This agent can be loaded and run on an environment. By default the run is rendered and the user can observe the agent interacting with the environment:
The agent is serialized, after a successful training run (by default under `/tmp/agent_ckpt`). This agent can be loaded and run on an environment. By default, the run is rendered and the user can observe the agent interacting with the environment:

```bash
udacity-rl -e resources/environments/Banana_Linux/Banana.x86_64 run /tmp/agent_ckpt 1
```
The `-e` flag specifies the environment (here the Unity-environment executable). The path after `run` specifies the agent to be loaded. The number after that is the number of episodes the agent should be run.

### Exploring an Environment
The CLI also allows you to initially explore an environment with an random agent. The environment is rendered by default so you can get a better feel for what is required.
The CLI also allows you to initially explore an environment with a random agent. The environment is rendered by default so you can get a better feel for what is required.

```bash
udacity-rl -e resources/environments/Banana_Linux/Banana.x86_64 explore
```
```
6 changes: 3 additions & 3 deletions doc/README_p2_continuous.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Project: Continuous Control

This project is part of the Udacity Reinforcement Learning Nanodegree. In this project a `DDPG` agent is trained to solve a continuous control task. Specifically the agent needs to control a robot arm to reach a target rotating around the arm. The agent receives a reward for each time step the arm is within the target location. The environment is considered solved when the agents scores an average of >30 points over the course of 100 episodes.
This project is part of the Udacity Reinforcement Learning Nanodegree. In this project, a `DDPG` agent is trained to solve a continuous control task. Specifically, the agent needs to control a robot arm to reach a target area rotating around the arm. The agent receives a reward for each time step the arm reaches within the target area. The environment is considered solved when the agent scores an average of >30 points throughout 100 episodes.

## Environment Setup
### Reward Signal
The agent receives a reward of `0.1` each time step the robot arm is within the target area. The goal for the agent is therefore to stay as long within the target area as possible.
The agent receives a reward of `0.1` each time step the robot arm is within the target area. The goal for the agent is, therefore, to stay as long within the target area as possible.

### Observation
An observation state consists of the agent's current position, rotation, velocity and angular velocities of the arm. This state is encoded in a 1x33 tensor.

### Actions
The action the agent can take consists of a 1x4 tensor corresponding to the torque applicable to the two joints of the arm. The torque value of the action tensor is normalized to a range between `-1` and `1`.
The action the agent can take consists of a 1x4 tensor corresponding to the torque applied to the two joints of the arm. The torque value of the action tensor is normalized to a range between `-1` and `1`.

## Exploring
To explore the `Reacher_Linux` environment run the following command from the root of the repository:
Expand Down
Loading

0 comments on commit 33b286e

Please sign in to comment.