# `Soundscapy` - Quick Start Guide

By Andrew Mitchell, Research Fellow, University College London

## Background

`Soundscapy` is a python toolbox for analysing quantitative soundscape data. Urban soundscapes are typically assessed through surveys which ask respondents how they perceive the given soundscape. Particularly when collected following the technical specification ISO 12913, these surveys can constitute quantitative data about the soundscape perception. As proposed in *How to analyse and represent quantitative soundscape data* [(Mitchell, Aletta, & Kang, 2022)](https://asa.scitation.org/doi/full/10.1121/10.0009794), in order to describe the soundscape perception of a group or of a location, we should consider the distribution of responses. `Soundscapy`'s approach to soundscape analysis follows this approach and makes it simple to process soundscape data and visualise the distribution of responses. 

For more information on the theory underlying the assessments and forms of data collection, please see ISO 12913-Part 2, *The SSID Protocol* [(Mitchell, *et al.*, 2020)](https://www.mdpi.com/2076-3417/10/7/2397), and *How to analyse and represent quantitative soundscape data*.

## This Notebook

The purpose of this notebook is to give a brief overview of how `Soundscapy` works and how to quickly get started using it to analyse your own soundscape data. The example dataset used is *The International Soundscape Database (ISD)* (Mitchell, *et al.*, 2021), which is publicly available at [Zenodo](https://zenodo.org/record/6331810) and is free to use. `Soundscapy` expects data to follow the format used in the ISD, but can be adapted for similar datasets.

----------

## Installation

`Soundscapy` is currently under active development. To install with `pip`:

```
pip install soundscapy
```

----

## Working with data

### Loading data

Data can be loaded as normal using `pandas`' `read_csv()` or `read_excel()` functions. However, we have made a built in function to access the ISD directly from the Zenodo URL, which will automatically default to the latest version of the ISD. 

In [21]:
%load_ext autoreload
%autoreload 2

# Add soundscapy to the Python path
import sys
sys.path.append('../..')
# Import Soundscapy
import soundscapy as sspy
from soundscapy.databases import isd

df = isd.load()
df

### Data included

The ISD contains two primary types of data - surveys and acoustic metrics. The surveys include several blocks of questions, the most important of which are the Perceptual Attribute Questions (PAQS). These form the 8 descriptors of the soundscape circumplex - pleasant, vibrant, eventful, chaotic, annoying, monotonous, uneventful, and calm. In addition, each survey includes other information about the soundscape and demographic characteristics (age, gender, etc.). Finally, the survey section includes identifiers of when and where the survey was conducted - the LocationID, SessionID, latitude, longitude, start_time, etc. 

The final bit of information for the survey is the `GroupID`. When stopping respondents in the survey space, they were often stopped as a group, for instance a couple walking through the space would be approached together and given the same `GroupID`. While each group completes the survey, a binaural audio recording is taken, typically lasting about 30 seconds. It is from these recordings that the acoustic data is calculated. Therefore, each `GroupID` can be connected to something like 1 to 10 surveys, and to one recording, or one set of acoustic features.

Within the acoustic data are a set of psychoacoustic analyses calculated for each recording. For each metric, originally one value is calculated for each channel (right and left ear), and the maximum of the to channels is what is shown here.


In [2]:
df.shape

### Validating the dataset
 
In order to validate that the dataset includes the data we would expect, and to check for missing or incorrect PAQ data, we use the `isd.validate()` function. This method can also rename the PAQ columns if necessary.

In [3]:
df, excl = isd.validate(df)
df

When samples are found which need to be excluded based on the PAQ quality checks, a dataframe with these samples will be returned. Then we can take a look at which ones were excluded and why.

In [4]:
sspy.surveys.return_paqs(excl)

### Calculating the ISOPleasant and ISOEventful coordinate values

The analysis methods used by `Soundscapy` are based firstly on converting the 8 PAQs into their projected pleasantness and eventfulness coordinate values (called ISOPleasant and ISOEventful).

Now, calculate the ISOCoordinate values.

In [17]:
df = sspy.surveys.add_iso_coords(df)
df

`Soundscapy` expects the PAQ values to be Likert scale values ranging from 1 to 5 by default, as specified in ISO 12913 and the SSID Protocol. However, it is possible to use data which, although structured the same way, has a different range of values. For instance this could be a 7-point Likert scale, or a 0 to 100 scale. By passing these numbers both to `validate_dataset()` and `add_paq_coords()` as the `val_range`, `Soundscapy` will check that the data conforms to what is expected and will automatically scale the ISOCoordinates from -1 to +1 depending on the original value range. 

For example:

In [18]:
import pandas as pd
val_range = (0, 100)
sample_transform = {
    "RecordID": ["EX1", "EX2"],
    "pleasant": [40, 25],
    "vibrant": [45, 31],
    "eventful": [41, 54],
    "chaotic": [24, 56],
    "annoying": [8, 52],
    "monotonous": [31, 55],
    "uneventful": [37, 31],
    "calm": [40, 10],
}
sample_transform = pd.DataFrame().from_dict(sample_transform)
sample_transform, excl = isd.validate(sample_transform, val_range=val_range)

In [19]:
sample_transform = sspy.surveys.add_iso_coords(sample_transform, val_range=val_range)
sample_transform

### More filtering

`Soundscapy` includes methods for several filters that are normally needed, such as filtering by `LocationID` or `SessionID`.

In [20]:
isd.select_location_ids(df, ['CamdenTown', 'PancrasLock'])

In [21]:
isd.select_session_ids(df, ['RegentsParkJapan1']).head()

However, if more complex filters or some other custom filter is needed, `pandas` provides a very nice approach with its `query()` method. For instance, if we wanted to filter by gender:

In [22]:
df.query("gen00 == 'Female'")

Or a more complex filter like women over 50:

In [23]:
df.query("gen00 == 'Female' and age00 > 50")

All of these filters can also be chained together. So, for instance, to return surveys from women over 50 taken in Camden Town, we would do:

In [24]:
isd.select_location_ids(df, 'CamdenTown').query("gen00 == 'Female' and age00 > 50")

## Plotting

Probably the most important part of the `Soundscapy` package is its methods for plotting soundscape circumplex data. Making use of the `seaborn kdeplot()`, we can visualise the distribution of responses within the soundscape circumplex. 

### Scatter plots

The most basic plot is the `circumplex_scatter()`. 

First, we filter down to one location that we want to look at. Then, using the `circumplex_scatter()`, we can create a default formatted plot:

In [25]:
sspy.plotting.scatter(isd.select_location_ids(df, ['RussellSq']))

Each point in this scatter plot represents the ISOCoordinate values of one survey taken in Russell Square during all of the sessions. 

We can see that the `circumplex_scatter()` has added some customisations on top of the underlying `seaborn` plots. The first is to automatically scale the plot area to the -1 to +1 bounding of the circumplex. Second is the inclusion of a grid highlighting the typically quadrants of the circumplex. Finally, customised labels which make the relationship of the ISOPleasant and ISOEventful values more clear. 

This plot can be further customised though. For instance, if you don't like or need those custom primary labels, we can remove them by setting `prim_labels = False`. We could also add labels for the diagonal circumplex axes with `diagonal_lines = True`.

In [26]:
sspy.plotting.scatter(
    isd.select_location_ids(df, ['RussellSq']),
    diagonal_lines=True,
    )

It's also often very useful to plot the different sessions taken in the same location with different colours. This is done with the `hue` parameter. At the same time, we'll also add a legend and make the scatter points larger.

In [27]:
sspy.plotting.scatter(
    isd.select_location_ids(df, ['RussellSq']),
    hue='SessionID',
    legend=True,
    s=20,
    title="RussellSq Sessions"
    )

### Distribution Plots

The real power of `Soundscapy` is in creating plots of the distribution of soundscape assessments. The interface for doing this is the same as the scatter plots above.

In [28]:
sspy.plotting.density(isd.select_location_ids(df, ['RussellSq']))

This can be customised in the same ways as the scatter plots. To see how the scatter points and the density heatmap are related, we can add scatter points to the density plot.

In [29]:
sspy.plotting.density(
    isd.select_location_ids(df, ['RussellSq']),
    incl_scatter=True,
    alpha=0.75,
    hue="LocationID"
    )

*How to analyse* proposes a method for simplifying this plot, allowing easy comparison between multiple soundscapes. In the simplified version, rather than showing the full distribution heatmap, we plot only the 50th percentile density curve, showing the general shape of the soundscape.

This is done by digging into `seaborn` `kdeplot()` and using its parameters `thresh` and `levels`. We'll also go ahead and customise some other aspects, such as the color palette.

In [30]:
sspy.plotting.density(
    isd.select_location_ids(df, ['RegentsParkJapan']),
    title="Median perception contour and scatter plot of individual assessments\n\n",
    density_type="simple",
    hue="LocationID",
    legend=True,
    palette="dark:gray",
)

As we said, this is particularly useful for comparing different soundscapes. So let's see how we can plot three different soundscapes at once.

In [31]:
sspy.plotting.density(
    isd.select_location_ids(df, ["CamdenTown", "RussellSq", "PancrasLock"]),
    title="Comparison of the soundscapes of three urban spaces\n\n",
    hue="LocationID",
    density_type="simple",
    incl_scatter=False,
    palette="husl",
)

### Jointplot

In [32]:
sspy.plotting.jointplot(
        isd.select_location_ids(df, ["CamdenTown", "RussellSq"]),
        hue="LocationID", marginal_kind="kde", density_type="full")

### Using Adjusted Angles

In Aletta et. al. (2024), we propose a method for adjusting the angles of the circumplex to better represent the perceptual space. These adjusted angles are derived for each language separately, meaning that, once projected, the circumplex coordinates will be comparable across all languages. This ability and the derived angles have been incorporated into `Soundscapy`.

In [33]:
from soundscapy.utils.parameters import LANGUAGE_ANGLES
df = sspy.surveys.add_iso_coords(df, angles=LANGUAGE_ANGLES['eng'], names=("AdjustedPleasant", "AdjustedEventful"))

sspy.plotting.jointplot(
        isd.select_location_ids(df, ["CamdenTown", "RussellSq"]),
        x="AdjustedPleasant", y="AdjustedEventful",
        hue="LocationID", marginal_kind="kde", density_type="full")
