Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.


Localizing Radio Frequency Targets Using Reinforcement Learning

The BirdsEye project demonstrates the simulated tracking of radio frequency (RF) signals via reinforcement learning (RL) techniques implemented on low-fidelity sensors. This permits training and inference without the need for significant compute hardware such as graphical processing units (GPU). Instead, these methods can be run on low-cost, commercial-off-the-shelf technology, providing capabilities to applications in which low-cost sensors are paramount in deployment, or where more sensitive sensors do not function or cannot be installed due to the nature of the environment.

This repository is the official implementation of the work published here:

L. Tindall, Z. Hampel-Arias, J. Goldfrank, E. Mair and T. Q. Nguven, "Localizing Radio Frequency Targets Using Reinforcement Learning," 2021 IEEE International Symposium on Robotic and Sensors Environments (ROSE), 2021, pp. 1-7, doi: 10.1109/ROSE52750.2021.9611756.


BirdsEye has implemented two statistical methods which drive how the sensor adaptively tracks an observed target signal: Monte Carlo Tree Search (MCTS) and Deep Q-Learning (DQN). While each method has advantages over the other, neither requires heavy compute resources such as a GPU. The MCTS method performs a stochastic search and selection of the actions available to the sensor, identifying the decision which maximizes the return on localization rewards. The DQN method is a reinforcement learning algorithm which can adapt to large decision spaces using neural networks, with major public successes such as DeepMind’s AlphaGo.

Fig. 1: State estimation and motion planning

Fig. 2: Deep Q-Network architecture for motion planning


  1. Usage
  2. Configurations
  3. Examples
  4. Results
  5. Descriptions

1. Usage


pip install -r requirements.txt

To run on command line

$ python -h
usage: [-h] -c FILE [-b]

optional arguments:
  -h, --help            show this help message and exit
  -c FILE, --config FILE
                        Specify a configuration file
  -b, --batch           Perform batch run

To run using a Docker container

First install Docker with GPU support. Instructions here.

A Docker file has also been provided for ease of use. To run with Docker, execute the following commands:

> docker build -t birds_eye .
> docker run -it --gpus all birds_eye -c {config.yaml}

In order to streamline this process a Makefile has been provided as a shorthand.

> make run_mcts
> make run_dqn
> make run_batch

Accepted make values are: run_mcts, run_dqn, run_batch, build

If you don't have or want to use GPUs you can preface the make command with GPUS= like so:

> GPUS= make run_mcts

2. Configurations

Running experiments requires a set of configurations variables which specify settings for the envrionment and motion planning method. See Configurations Documentation for more information.

3. Examples

Run with Monte Carlo Tree Search policy

$ python -c configs/mcts.yaml

Run with Deep Q-Network policy

$ python -c configs/dqn.yaml

4. Results

The results of the experiments will be stored in the ./runs directory. Log files are created which store the configurations for each experiment along with metrics and state information per time step. For a detailed guide and visualizations please see results user guide notebook: results_guide.ipynb.

Fig. 3: Localization results (distance between mean belief and actual target location)

Fig. 4: Inference times and speed comparison

5. Descriptions

All code for training and evaluation in simulation is contained in the birdseye directory. The birdseye directory contains some important base classes which can be extended to offer customizability to a specific use case. We provide a few subclasses to get started.


The Sensor class defines the observation model for the simulated sensor. Users must define methods for sampling an observation given a state and for determining the likelihood of an observation given a state. We have provided example subclasses for an omni-directional signal strength setup and a bearing based directional setup.


The Actions class defines the action space for the sensor. For computational simplicity, actions are discretized.


The State class includes methods for updating the state variables of the environment. This includes states for both the sensor and target. Motion dynamics and reward functions are defined within this class. We have included example reward functions based on an entropy/collision tradeoff as well as a range based reward.


The RFEnv class is a Gym-like class for controlling the entire pipeline of the simulation.