This repository includes a research platform for studying audio-visual search behavior in both humans and reinforcement learning agents. The project combines Python implementation of PPO-based RL agents that learn to locate sound-emitting targets using binaural audio and visual cues along with a Unity-based simulator.
The platform consists of three main components:
- RL Training & Testing: Train and test PPO agents to locate a target audiovisual source (e.g., vehicle) in a grid environment (e.g., parking garage) using audio (ITD) and visual observations
- Unity Simulation: A 3D environment for realistic audio-visual rendering, supporting both agent control and human participant studies
- Analysis Tools: Comprehensive tools for comparing human and model behavior, including trajectory analysis, belief visualization, and statistical metrics
Sensonaut/
├── python/ # Python codebase (RL, analysis, utilities)
│ ├── agent.py # Main entry point for training/testing
│ ├── agents/ # RL agent implementations
│ │ ├── ppo_agent.py # PPO training and testing
│ │ └── wandb_callback.py # Weights & Biases logging
│ ├── envs/ # Gym environments
│ │ └── unityproxy_env.py # Main environment (grid world + Unity)
│ ├── analysis/ # Human vs. model comparison tools
│ │ ├── compare_traj.py # Trajectory comparison
│ │ ├── belief_action_comparison.py # Belief and action analysis
│ │ ├── human_actions_and_beliefs.py # Human behavior inference
│ │ └── plot_beliefs.py # Belief visualization
│ ├── utils/ # Shared utilities
│ │ ├── constants.py # Centralized constants
│ │ ├── coordinates.py # Coordinate transformations
│ │ ├── unity_client.py # Unity communication
│ │ ├── audio_features.py # ITD computation
│ │ └── metrics.py # Evaluation metrics
│ ├── maps/ # Map generation and visualization
│ ├── scripts/ # Helper scripts
│ └── configs/ # Configuration files
│ ├── config_train.yaml # Config file for training a new model
│ └── config_test.yaml # Config file for testing a trained model
├── unity/ # Unity project
│ ├── Assets/ # Unity assets and scripts
│ └── README.md # Unity-specific documentation
├── requirements.txt # Python dependencies
└── README.md # This file
- Python 3.8+
- Unity 2022.3 LTS or later
- CUDA-capable GPU (recommended for training)
# Clone the repository
git clone https://github.com/choch-o/sensonaut.git
cd sensonaut
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt- Open the Unity project in
unity/folder - Install required Unity packages (Steam Audio, Meta XR Audio SDK)
- Configure the scene with appropriate prefabs
- See
unity/README.mdfor detailed Unity setup instructions
cd python
# Train with default configuration
# Edit config to set pretrained_model path for curriculum learning
python agent.py --config configs/config_train.yaml# Edit config to set pretrained_model path
python agent.py --config configs/config_test.yamlThe agent starts in a parking garage environment with multiple vehicles. One vehicle (the target) emits a sound. The agent must:
- Listen: Use binaural audio cues (ITD) to estimate target direction
- Look: Use visual observations to identify and localize vehicles
- Move: Navigate the environment (turn left/right, move forward)
- Commit: When confident, commit to a target location estimate
est_theta: Estimated target angle (θ azimuth angle relative to current position)est_r: Estimated target distance (r radius relative to current position)theta_uncertainty: Current theta uncertaintyr_uncertainty: Current r uncertaintylast_actions: Recent action historyposterior: Full belief distribution over (r, θ)
0: Turn left1: Turn right2: Move forward3: Commit (end episode)4: Stay (no movement, collect more evidence)
Refer to run_analysis.sh for how to run scripts for analysis.
If you use this code in your research, please cite:
@inproceedings{cho2026simulating,
title={Simulating Human Audiovisual Search Behavior},
author={Cho, Hyunsung and Luo, Xuejing and Lee, Byungjoo and Lindlbauer, David and Oulasvirta, Antti},
booktitle={Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems},
pages={1--17},
year={2026}
}This project is licensed under the MIT License - see the LICENSE file for details.
