OTTO (short for Odor-based Target Tracking Optimization) is a Python package to learn, evaluate and visualize strategies for odor-based searches.
OTTO is part of the C0PEP0D project. It has been used in a publication and for a friendly competition amongst PhD students. It has also been benchmarked against traditional solvers.
- Background
- What does OTTO do?
- Installation
- How to use OTTO?
- Documentation
- Known issues
- Community guidelines
- Authors
- How to cite OTTO?
- License
- Acknowledgements
Imagine a treasure hunt where the player needs to find a hidden treasure using odor cues. Because the wind constantly changes direction, the player smells nothing most of the time, but occasionally catches a puff. How should he move to find the treasure as fast as possible? This game is a common task, for example, mosquitoes looking for a prey to bite by detecting carbon dioxide or sniffer robots trying to locate explosive in an airport. Because of turbulence, there is no odor trail to follow in this problem, which makes it particularly challenging.
The source-tracking problem is a POMDP (partially observable Markov decision process) designed to mimic the task faced by animals or robots searching for a source of odor in a turbulent flow.
The agent (the searcher) must find a stationary target (a source of odor) hidden in a grid world. At each step, the agent moves to a neighbor cell and receives an observation (odor detection), which provides some partial information on how far the source is. The agent has a perfect memory and a perfect knowledge of the process that generates observations.
How should the agent behave in order to reach the source in the smallest possible number of steps?
Infotaxis is a popular strategy proposed by Vergassola et al. (Nature, 2007). It states that the agent should choose the action from which it expects the greatest information gain about the source location.
Infotaxis is far superior to all naive strategies, such as going to the more likely source location. But infotaxis is suboptimal, so better strategies are possible.
OTTO provides:
- a simulator of the source-tracking POMDP for any number of space dimensions,
- various heuristic policies including infotaxis,
- a custom deep reinforcement learning algorithm able to yield near-optimal policies,
- a gym wrapper allowing the use of general-purpose reinforcement learning libraries,
- an efficient algorithm to rigorously evaluate policies (including custom policies defined by the user),
- a rendering of searches (only up to 3D!).
OTTO requires Python 3.8 or greater. Dependencies are listed in requirements.txt, missing dependencies will be installed automatically.
Optional: OTTO requires FFmpeg to make videos. If FFmpeg is not installed, OTTO will save video frames as images instead.
If you use conda to manage your Python environments, you can install OTTO in a dedicated environment ottoenv
conda create --name ottoenv python=3.8 setuptools=58.0
conda activate ottoenv
First go to the directory where you wish to install otto
.
Then clone the git repository with
git clone https://github.com/C0PEP0D/otto.git
If git
is not installed on your system, you can alternatively download the package and unzip it.
Finally go to the otto
directory and install OTTO using
python3 setup.py install
You can test your installation with following command:
python3 -m pytest tests
This will execute the test functions located in the folder tests
.
Go to the otto
subdirectory.
You will see that it is organized in three main directories corresponding to the three main uses of OTTO:
evaluate
: for evaluating the performance of a policylearn
: for learning a neural network policy that solves the taskvisualize
: for visualizing a search episode
The other directory, classes
, contains all the class definitions used by the main scripts.
The three main directories share the same structure. They contain:
*.py
: the main scriptparameters
: the directory to store input parametersoutputs
: the directory to store files generated by the script (it will be created on first use)
To use OTTO, go to the relevant main directory and run the corresponding script.
For example, to visualize an episode, go to the visualize
directory and run the visualize.py
script with
python3 visualize.py
You should now see the rendering of a 1D search in a new window (it may be very short!). You can visualize another episode by using again the same command.
Some logging information is displayed in the terminal as the script runs. In the rendering window, the first panel is a map of odor detections, and the second panel is the agent's belief (probability distribution over source locations).
The videos have been saved as visualize/outputs/YYmmdd-HHMMSS_video.mp4
where 'YYmmdd-HHMMSS' is a
timestamp (the time you started the script).
If you do not have FFmpeg or if you are using Windows, you will find instead frames saved
in visualize/outputs/YYmmdd-HHMMSS_frames
.
Many parameters (space dimension, domain size, source intensity, policy, ...) can be changed from the defaults.
To run a script with different parameters, create a Python script that sets your parameters in the
parameters
directory.
A file myparam.py
is already present in visualize/parameters/
for this example.
It contains a single line
N_DIMS = 2
which sets the dimensionality of the search to 2D (1D is the default).
User-defined parameters are called by using the --input
option followed by the name of the parameter file.
For example, you can now visualize a search in 2D with
python3 visualize.py --input myparam.py
The --input
option can be shortened to -i
, and the file name can be with or without .py
. So the command
python3 visualize.py -i myparam
will have the same effect.
Each parameters
directory contain sample parameter files called example*.py
.
They show essential parameters you can play with, for example:
N_DIMS
sets the dimensionality of the search (1D, 2D, 3D), default isN_DIMS = 1
LAMBDA_OVER_DX
controls the size of the domain, default isLAMBDA_OVER_DX = 2.0
R_DT
controls the source intensity, default isR_DT = 2.0
POLICY
defines the policy to use, default isPOLICY = 0
(infotaxis)
Note: the actual size of the computational domain, called N_GRID
, is determined internally based
on N_DIMS
, LAMBDA_OVER_DX
and R_DT
to make the domain "large enough" for the boundaries
to have (almost) no effect on the search. As a rule of thumb, N_GRID ≈ 15 LAMBDA_OVER_DX
.
The definition of all parameters is provided in the
documentation,
and you can find their default values by examining the contents of __defaults.py
.
The evaluate.py
script (in the evaluate
directory) computes many statistics that characterize the performance
of a policy, such as
- probability of never finding the source,
- average time to find the source,
- probability distribution of arrival times,
- and much more.
It does so essentially by running thousands of episodes in parallel and averaging over those.
You can try with
python3 evaluate.py
This will take some time (order of magnitude is 2 minutes on 8 cores). Logging information is displayed in the terminal while the episodes are running.
Windows users: if a NameError
is raised, see known issues.
Once the script has completed, you can look at the results in the directory evaluate/outputs/YYmmdd-HHMMSS
where 'YYmmdd-HHMMSS' is the time you started the script.
Ymmdd-HHMMSS_figure_distributions.pdf
is a figure summarizing the results.
All output files are described in the documentation.
These results are for the "infotaxis" policy, which is the default policy. You can now try to compute the statistics of another policy on the same problem. For example, evaluate the "space-aware infotaxis" policy by running
python3 evaluate.py --input myparam.py
where myparam.py
is a file containing the line
POLICY = 1
This file is already present in evaluate/parameters/
for this example.
The main policies are
POLICY = 0
for infotaxis (default)POLICY = 1
for space-aware infotaxis, a recently proposed heuristic that beats infotaxis in most casesPOLICY = -1
for a reinforcement learning policy: for that we need to learn first!
All policies are described in the documentation.
The learn.py
script learns a policy using deep reinforcement learning.
It actually trains a neural network model of the optimal value function.
The (approximately) optimal policy is then derived from this function.
To train a model, go to the learn
directory and use
python3 learn.py
Now is the perfect time for a coffee since it will take quite a while. Logging information is displayed in the terminal while the script runs (if the script seems to have frozen, see known issues).
When you come back, you can look at the contents of the learn/outputs/YYmmdd-HHMMSS
directory.
There should be a figure called YYmmdd-HHMMSS_figure_learning_progress.png
(if not you need a larger coffee).
This figure shows the progress of the learning agent and is periodically updated as the training progresses. In particular, it shows the evolution of 'p_not_found', the probability that the source is never found, and of 'mean', the mean time to find the source provided it is ever found (if p_not_found is large, the mean is meaningless).
Other outputs are described in the documentation.
Completing the training may take up to roughly 5000-10000 iterations (several hours on an average laptop), but progress should be clearly visible from 500-1000 iterations. For reference, the optimal policy yields p_not_found < 1e-6 and mean ~ 7.15.
Training will continue until 10000 iterations, but can be stopped at any time.
Models are saved in the learn/models/YYmmdd-HHMMSS
directory:
YYmmdd-HHMMSS_model
is the most recent model,YYmmdd-HHMMSS_model_bkp_i
, where i is an integer, are the models saved at evaluation points (the models which performance is shown inYYmmdd-HHMMSS_figure_learning_progress.png
).
Note: training can restart from a previously saved model.
Once a neural network model is trained, the corresponding policy can be evaluated or visualized by running the
main scripts with a parameter file (using --input
) containing
POLICY = -1
MODEL_PATH = "../learn/models/YYmmdd-HHMMSS/YYmmdd-HHMMSS_model_bkp_i"
where MODEL_PATH
is the path to the neural network model.
Important: parameters should be consistent. For example, if you set N_DIMS = 2
for learning then you must also
set N_DIMS = 2
for evaluation and visualization.
A collection of trained neural networks is provided in the zoo
directory accessible from the root of the package.
They are saved in the models
directory and corresponding parameter files are in the parameters
directory.
They are named zoo_model_i_j_k
where i, j, k are integers associated to N_DIMS
, LAMBDA_OVER_DX
, R_DT
.
The list of all trained neural networks is available in the documentation.
To visualize the policy associated to the neural network model zoo_model_1_2_2
, use
python3 visualize.py --input zoo_model_1_2_2
Similarly you can evaluate this neural network policy with
python3 evaluate.py --input zoo_model_1_2_2
You want to try your own policy?
Policies are implemented in classes/heuristicpolicies
.
You can define your own in the function _custom_policy
.
To use it in the main scripts, set POLICY = 2
in your parameter file.
To facilitate the evaluation of new policies compared to existing baselines, the performances of several policies (infotaxis, space-aware infotaxis and near-optimal) are reported in a dataset.
The directories can be restored to their original state by running the cleanall.sh
bash script located
at the root of the package.
Warning: all user-generated outputs and models will be deleted!
OTTO uses Sphinx for documentation and is made available online
here.
To build the html version of the documentation locally, go to the docs
directory and use:
make html
The generated html can be viewed by opening docs/_build/html/index.html
.
When using large neural networks in parallel, the code may hang. This is a
known incompatibility
between keras
and multiprocessing
.
The workaround is to set N_PARALLEL = 1
in the parameter file, which enforces sequential computations.
While OTTO can run on all platforms, it has been developed for Unix-based systems and there are minor issues with Windows.
- Videos are not recorded by
visualize.py
. Frames are saved as images instead. - Parallelization for
learn.py
andevaluate.py
does not currently work, and the errorNameError: name '*' is not defined
is raised when running these scripts. This is because child processes instanciated withmultiprocessing
do not see global variables defined only during execution (afterif __name__ == "__main__"
). The workaround is to setN_PARALLEL = 1
in the parameter file, which enforces sequential computations
If you discover a bug in OTTO which is not a known issue, please create a new issue.
Have you designed a new policy? Would you like to add a new feature? Can you fix a known issue? We welcome contributions to OTTO. To contribute, please fork the repository and submit a pull request.
Are you having troubles with OTTO? Please first consult the instructions for installing and using OTTO, check the known issues, and explore the documentation.
Can you still not find an answer? Would you like more information? Please create an issue or send an email with the subject "OTTO: your request" to the authors.
OTTO is developed by Aurore Loisy and Christophe Eloy (Aix Marseille Univ, CNRS, Centrale Marseille, IRPHE, Marseille, France).
If you use OTTO in your publications, you can cite the package as follows:
Loisy, A. and Eloy, C. (2022). OTTO: A Python package to simulate, solve and visualize the source-tracking POMDP. Journal of Open Source Software, 7(74), 4266, https://doi.org/10.21105/joss.04266
or if you use LaTeX:
@article{otto,
doi = {10.21105/joss.04266},
year = {2022},
volume = {7},
number = {74},
pages = {4266},
author = {Loisy, A. and Eloy, C.},
title = {OTTO: A Python package to simulate, solve and visualize the source-tracking POMDP},
journal = {Journal of Open Source Software}
}
See the LICENSE file for license rights and limitations.
This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 834238).
The name OTTO was inspired by Otto the copepod, an interactive story by Jan Heuschele.