# Quickstart

## Welcome to `AudibleLight`!

This tutorial walks through the data generation and synethesis process end-to-end.

We'll do the following:
1) Create a basic `Scene`
2) Add a tetrahedral microphone to the `Scene`
3) Add some basic static sound `Event`s
4) Add some background `Ambience`
5) Add some more advanced `Event`s, including moving events and events with augmentations
6) Render the whole scene to a first-order ambisonics audio file and metadata JSON file

For more information on any of these steps, you can check out the API documentation or the other tutorial files.

## Import dependencies

We need a few basic Python dependencies for this notebook. Note that `audiblelight.utils` contains basic utility functions that will come in handy when working with this package.

In [1]:
import os
from pathlib import Path

from scipy import stats

from audiblelight import utils

## Import `Scene` from `audiblelight.core`

In this notebook, we'll mostly be working with the `Scene` object. We should import it now.

The `Scene` is the highest level object within the `AudibleLight` API. It manages the `WorldState` and any listeners or events added to it, and is used to synthesise the entire audio file and metadata.

In [2]:
from audiblelight.core import Scene

## Set default values

All of these values can (and should!) be changed in order to experiment with the functionality of `AudibleLight`.

In [3]:
# OUTPUT DIRECTORY
OUTFOLDER = utils.get_project_root() / 'spatial_scenes'
if not os.path.isdir(OUTFOLDER):
    os.makedirs(OUTFOLDER)

In [4]:
# PATHS
FG_FOLDER = utils.get_project_root() / "tests/test_resources/soundevents"
MESH_PATH = utils.get_project_root() / "tests/test_resources/meshes/Oyens.glb"
NOISE_TYPE = "white"

In [5]:
# SCENE SETTINGS
DURATION = 30.0  # seconds
MIC_ARRAY_NAME = 'ambeovr'    # could also be "eigenmike32"...
MAX_OVERLAP = 3   # maximum number of temporally overlapping sound-events

MICROPHONE_POSITION = [2.5, -1.0, 1.0]  # inside the living room

In [6]:
# SCENE-WIDE DISTRIBUTIONS
MIN_VELOCITY, MAX_VELOCITY = 0.5, 1.5    # meters per second
MIN_SNR, MAX_SNR = 2, 8
MIN_RESOLUTION, MAX_RESOLUTION = 0.25, 2.0    # Hz/IRs per second
REF_DB = -50    # noise floor

In [7]:
# These can be changed at will
N_STATIC_EVENTS = 4
N_MOVING_EVENTS = 1

## Make a `Scene`!

Now, we're ready to create a `Scene` object with the parameters below.

By default, our `Scene` has the following properties:

- A duration of 30 seconds
- No more than 3 overlapping sound events at any one time
- A noise floor level of -50 dB
- Moving events at between 0.5 and 1.5 meters per second
- Moving events with between 0.25 and 2.0 IRs per second
- Events with maximum peaks at between 2-8 dB vs the noise floor

In [8]:
# This function simply returns a fresh `Scene` object with the parameters set in the cells above
def create_scene() -> Scene:
    return Scene(
        duration=DURATION,
        sample_rate=44100,
        backend="rlr",
        backend_kwargs=dict(
            mesh=utils.get_project_root() / "tests/test_resources/meshes/Oyens.glb"
        ),
        scene_start_dist=stats.uniform(0.0, DURATION - 1),
        event_start_dist=None,
        event_duration_dist=None,
        event_velocity_dist=stats.uniform(MIN_VELOCITY, MAX_VELOCITY),
        event_resolution_dist=stats.uniform(MIN_RESOLUTION, MAX_RESOLUTION),
        snr_dist=stats.uniform(MIN_SNR, MAX_SNR),
        fg_path=Path(FG_FOLDER),
        max_overlap=MAX_OVERLAP,
        ref_db=REF_DB
    )

In [9]:
# Create a fresh scene object
scene = create_scene()

CreateContext: Context created


Now, we can visualise the `Scene`. The resulting object is interactive: try giving it a spin!

In [10]:
out = scene.state.create_scene()
out.show()

## Add a listener

Now, we'll add a microphone to our mesh.

In `AudibleLight`, microphones are represented as subclasses of the `audiblelight.micarrays.MicArray` dataclass. A variety of standard microphone array geometries are included by default, or you can subclass this dataclass and create your own.

For now, we can use `scene.add_microphone` to create a tetrahedral microphone inside the living room of our mesh.

The output of this microphone will be in Ambisonics A-Format (sometimes referred to as "MIC"). To work with B-Format directly, we can use the `FOAListener` object, which will output first-order Ambisonics audio.

In [11]:
# Add the microphone type we want, at the desired position
scene.add_microphone(microphone_type=MIC_ARRAY_NAME, alias=MIC_ARRAY_NAME, position=MICROPHONE_POSITION)

CreateContext: Context created




In [12]:
# Print some information about the microphone
scene.get_microphone(alias=MIC_ARRAY_NAME)

{
    "name": "ambeovr",
    "micarray_type": "AmbeoVR",
    "is_spherical": true,
    "channel_layout_type": "mic",
    "n_capsules": 4,
    "capsule_names": [
        "FLU",
        "FRD",
        "BLD",
        "BRU"
    ],
    "coordinates_absolute": [
        [
            2.5057922796533956,
            -0.9942077203466043,
            1.0057357643635105
        ],
        [
            2.5057922796533956,
            -1.0057922796533958,
            0.9942642356364896
        ],
        [
            2.4942077203466044,
            -0.9942077203466043,
            0.9942642356364896
        ],
        [
            2.4942077203466044,
            -1.0057922796533958,
            1.0057357643635105
        ]
    ],
    "coordinates_center": [
        2.5,
        -1.0,
        1.0
    ]
}

## Add some sound sources

Now, we're ready to add some sound sources.

In `AudibleLight`, sound sources are represented by `audiblelight.event.Event` objects. Each `Event` is associated with one or more `audiblelight.worldstate.Emitter` objects, which dictate the position of the `Event` inside the mesh at a single point in time.

For a static sound source, an `Event` has one `Emitter`. For a moving sound source, an `Event` has multiple `Emitters`, depending on its velocity and resolution.

Note that `Emitter` objects should **never** be created directly. Instead, when we create an `Event`, we'll automatically create the `Emitter` objects that it needs.

For now, we'll just add in a small number of static `Event` objects with random positions and audio files.

In [13]:
# Add the correct number of static sources
scene.clear_events()
for _ in range(N_STATIC_EVENTS):
    scene.add_event(event_type="static")



CreateContext: Context created




CreateContext: Context created


[32m2025-10-07 15:01:55.494[0m | [1mINFO    [0m | [36maudiblelight.core[0m:[36madd_event[0m:[36m830[0m - [1mEvent added successfully: Static 'Event' with alias 'event000', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/doorCupboard/35632.wav' (unloaded, 0 augmentations), 1 emitter(s).[0m


CreateContext: Context created


[32m2025-10-07 15:01:55.970[0m | [1mINFO    [0m | [36maudiblelight.core[0m:[36madd_event[0m:[36m830[0m - [1mEvent added successfully: Static 'Event' with alias 'event001', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/waterTap/205695.wav' (unloaded, 0 augmentations), 1 emitter(s).[0m


CreateContext: Context created


[32m2025-10-07 15:01:56.436[0m | [1mINFO    [0m | [36maudiblelight.core[0m:[36madd_event[0m:[36m830[0m - [1mEvent added successfully: Static 'Event' with alias 'event002', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/maleSpeech/93899.wav' (unloaded, 0 augmentations), 1 emitter(s).[0m


CreateContext: Context created


[32m2025-10-07 15:01:56.944[0m | [1mINFO    [0m | [36maudiblelight.core[0m:[36madd_event[0m:[36m830[0m - [1mEvent added successfully: Static 'Event' with alias 'event003', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/laughter/9547.wav' (unloaded, 0 augmentations), 1 emitter(s).[0m


## Add background noise

In `AudibleLight`, `Ambience` objects capture non-moving, non-spatialised sound, that is not associated with a particular spatial position. Adding this type of noise can be useful to train robust acoustic imaging systems.

To create `Ambience`, we have two choices:
1) Pass in an audio filepath, which will be tiled to match the duration and channel count of the `Scene`
2) Pass in the name of a particular noise type (e.g., `white`, `pink`)

For now, we'll just add in white noise.

In [14]:
scene.add_ambience(noise=NOISE_TYPE)

## Add more advanced `Event` types

`AudibleLight` has support for many different types of sound events, including sound events that move across a variety of trajectories, sound events placed in particular positions, and sound events with data augmentations (time-frequency domain masking, etc.). For more information, see the tutorial on adding `Event` objects to a `Scene`.

For now, we'll just show how we can create a sound event that makes a random walk starting from a position given in polar coordinates with respect to our microphone, with distortion applied to the audio file.

In [15]:
from audiblelight.augmentation import Distortion

moving_event = scene.add_event(
    event_type="moving",
    alias="telephone",
    filepath=FG_FOLDER / "telephone/30085.wav",
    polar=True,
    position=[0.0, 90.0, 1.0],
    shape="linear",
    scene_start=5.0,    # start five seconds in to the scene
    spatial_resolution=1.5,
    spatial_velocity=1.0,
    duration=2,
    augmentations=Distortion
)

[32m2025-10-07 15:02:03.334[0m | [1mINFO    [0m | [36maudiblelight.core[0m:[36madd_event[0m:[36m830[0m - [1mEvent added successfully: Moving 'Event' with alias 'telephone', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/telephone/30085.wav' (unloaded, 1 augmentations), 4 emitter(s).[0m


CreateContext: Context created


We can also take a listen to our audio file (**note that this has not been spatialised yet, so only the distortion will be audible**)

In [16]:
from IPython.display import Audio

Audio(moving_event.load_audio(), rate=scene.sample_rate)

## Synthesise the audio and metadata

As a recap, we have done the following:

1) Created a `Scene` object from a mesh
2) Added multiple static `Event` objects at random positions
3) Added background white noise `Ambience`
4) Added a single moving `Event` with distortion applied, that makes a random walk from a given position

We can now generate the spatial audio and metadata by calling `Scene.generate` and providing output paths to save the `wav` and `json` files.

In [18]:
# Do the generation!
scene.generate(
    audio_fname=str(OUTFOLDER / "audio_out_random.wav"),
    metadata_fname=str(OUTFOLDER / "metadata_out_random.json"),
)

[32m2025-10-07 15:02:20.693[0m | [1mINFO    [0m | [36maudiblelight.worldstate[0m:[36msimulate[0m:[36m1685[0m - [1mStarting simulation with 8 emitters, 1 microphones[0m
[32m2025-10-07 15:02:49.395[0m | [1mINFO    [0m | [36maudiblelight.worldstate[0m:[36msimulate[0m:[36m1693[0m - [1mFinished simulation! Overall indirect ray efficiency: 0.997[0m


CreateContext: Context created


[32m2025-10-07 15:02:54.774[0m | [1mINFO    [0m | [36maudiblelight.synthesize[0m:[36mrender_audio_for_all_scene_events[0m:[36m571[0m - [1mRendered scene audio in 4.70 seconds![0m


The audio file and metadata should now be accessible inside our output folder.

In [19]:
# Pretty print the metadata JSON
print(repr(scene))

{
    "audiblelight_version": "0.1.0",
    "rlr_audio_propagation_version": "0.0.1",
    "creation_time": "2025-08-20_12:07:50",
    "duration": 30.0,
    "ref_db": -50,
    "max_overlap": 3,
    "fg_path": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents",
    "ambience": {
        "ambience000": {
            "alias": "ambience000",
            "beta": 0,
            "filepath": null,
            "channels": 4,
            "sample_rate": 44100,
            "duration": 30.0,
            "ref_db": -50,
            "noise_kwargs": {}
        }
    },
    "events": {
        "event000": {
            "alias": "event000",
            "filename": "236657.wav",
            "filepath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/femaleSpeech/236657.wav",
            "class_id": null,
            "class_label": null,
            "is_moving": false,
            "scene_start": 24.235181686117272,
          

## Create `DCASE`-style metadata.

The `DCASE` challenges use a special metadata format, more details about which can be [found on the website](https://dcase.community/challenge2024/task-audio-and-audiovisual-sound-event-localization-and-detection-with-source-distance-estimation).

`AudibleLight` can be used to generate this metadata from a `Scene`. In combination with the spatial audio we just generated above, that is enough to train a model like [`SELDNet`](https://github.com/sharathadavanne/seld-dcase2023)

In [20]:
from audiblelight.synthesize import generate_dcase2024_metadata

dcase_out = generate_dcase2024_metadata(scene)

{'ambeovr':               active_class_index  source_number_index  azimuth  elevation  \
frame_number                                                                
28                             7                    0     -157          3   
29                             7                    0     -157          3   
30                             7                    0     -157          3   
31                             7                    0     -157          3   
32                             7                    0     -157          3   
...                          ...                  ...      ...        ...   
129                           10                    0     -100        -24   
130                           10                    0     -100        -24   
131                           10                    0     -100        -24   
132                           10                    0     -100        -24   
133                           10                    0     -100  

By default, this function creates a dictionary of `pandas.DataFrame` objects, one for every microphone added to our scene. We can easily print just the first few frames for our `AmbeoVR` microphone:

In [21]:
dcase_out["ambeovr"].head()

Unnamed: 0_level_0,active_class_index,source_number_index,azimuth,elevation,distance
frame_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
28,7,0,-157,3,337
29,7,0,-157,3,337
30,7,0,-157,3,337
31,7,0,-157,3,337
32,7,0,-157,3,337


For more information on what any of these columns mean, refer to the [DCASE community website](https://dcase.community/challenge2024/task-audio-and-audiovisual-sound-event-localization-and-detection-with-source-distance-estimation).

## Recreating a `Scene` from metadata

Finally, note that we can also re-create a `Scene` object from scratch, just by reloading our JSON:

In [20]:
reloaded_scene = Scene.from_json(str(OUTFOLDER / "metadata_out_random.json"))
assert reloaded_scene == scene



CreateContext: Context created


Material for category 'default' was not found. Using default material instead.
Material for category 'default' was not found. Using default material instead.


CreateContext: Context created


That's the end of the quickstart guide for `AudibleLight`! For more information, check out the rest of the tutorials or take a look at the API documentation.