Sensonaut: Simulating Human Audiovisual Search Behavior

This repository includes a research platform for studying audio-visual search behavior in both humans and reinforcement learning agents. The project combines Python implementation of PPO-based RL agents that learn to locate sound-emitting targets using binaural audio and visual cues along with a Unity-based simulator.

Overview

The platform consists of three main components:

RL Training & Testing: Train and test PPO agents to locate a target audiovisual source (e.g., vehicle) in a grid environment (e.g., parking garage) using audio (ITD) and visual observations
Unity Simulation: A 3D environment for realistic audio-visual rendering, supporting both agent control and human participant studies
Analysis Tools: Comprehensive tools for comparing human and model behavior, including trajectory analysis, belief visualization, and statistical metrics

Project Structure

Sensonaut/
├── python/                               # Python codebase (RL, analysis, utilities)
│   ├── agent.py                          # Main entry point for training/testing
│   ├── agents/                           # RL agent implementations
│   │   ├── ppo_agent.py                  # PPO training and testing
│   │   └── wandb_callback.py             # Weights & Biases logging
│   ├── envs/                             # Gym environments
│   │   └── unityproxy_env.py             # Main environment (grid world + Unity)
│   ├── analysis/                         # Human vs. model comparison tools
│   │   ├── compare_traj.py               # Trajectory comparison
│   │   ├── belief_action_comparison.py   # Belief and action analysis
│   │   ├── human_actions_and_beliefs.py  # Human behavior inference
│   │   └── plot_beliefs.py               # Belief visualization
│   ├── utils/                            # Shared utilities
│   │   ├── constants.py                  # Centralized constants
│   │   ├── coordinates.py                # Coordinate transformations
│   │   ├── unity_client.py               # Unity communication
│   │   ├── audio_features.py             # ITD computation
│   │   └── metrics.py                    # Evaluation metrics
│   ├── maps/                             # Map generation and visualization
│   ├── scripts/                          # Helper scripts
│   └── configs/                          # Configuration files
│       ├── config_train.yaml             # Config file for training a new model
│       └── config_test.yaml              # Config file for testing a trained model
├── unity/                                # Unity project
│   ├── Assets/                           # Unity assets and scripts
│   └── README.md                         # Unity-specific documentation
├── requirements.txt                      # Python dependencies
└── README.md                             # This file

Installation

Prerequisites

Python 3.8+
Unity 2022.3 LTS or later
CUDA-capable GPU (recommended for training)

Python Setup

# Clone the repository
git clone https://github.com/choch-o/sensonaut.git
cd sensonaut

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Unity Setup

Open the Unity project in unity/ folder
Install required Unity packages (Steam Audio, Meta XR Audio SDK)
Configure the scene with appropriate prefabs
See unity/README.md for detailed Unity setup instructions

Quick Start

Training an Agent

cd python

# Train with default configuration
# Edit config to set pretrained_model path for curriculum learning
python agent.py --config configs/config_train.yaml

Testing a Trained Model

# Edit config to set pretrained_model path
python agent.py --config configs/config_test.yaml

The Task

The agent starts in a parking garage environment with multiple vehicles. One vehicle (the target) emits a sound. The agent must:

Listen: Use binaural audio cues (ITD) to estimate target direction
Look: Use visual observations to identify and localize vehicles
Move: Navigate the environment (turn left/right, move forward)
Commit: When confident, commit to a target location estimate

Observation Space

est_theta: Estimated target angle (θ azimuth angle relative to current position)
est_r: Estimated target distance (r radius relative to current position)
theta_uncertainty: Current theta uncertainty
r_uncertainty: Current r uncertainty
last_actions: Recent action history
posterior: Full belief distribution over (r, θ)

Action Space

0: Turn left
1: Turn right
2: Move forward
3: Commit (end episode)
4: Stay (no movement, collect more evidence)

Analysis

Refer to run_analysis.sh for how to run scripts for analysis.

Citation

If you use this code in your research, please cite:

@inproceedings{cho2026simulating,
  title={Simulating Human Audiovisual Search Behavior},
  author={Cho, Hyunsung and Luo, Xuejing and Lee, Byungjoo and Lindlbauer, David and Oulasvirta, Antti},
  booktitle={Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems},
  pages={1--17},
  year={2026}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sensonaut: Simulating Human Audiovisual Search Behavior

Overview

Project Structure

Installation

Prerequisites

Python Setup

Unity Setup

Quick Start

Training an Agent

Testing a Trained Model

The Task

Observation Space

Action Space

Analysis

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
media		media
python		python
unity		unity
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Sensonaut: Simulating Human Audiovisual Search Behavior

Overview

Project Structure

Installation

Prerequisites

Python Setup

Unity Setup

Quick Start

Training an Agent

Testing a Trained Model

The Task

Observation Space

Action Space

Analysis

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages