# Installing the Clarity tools and Interacting with Metadata

This first tutorial walks through the process of installing the Clarity package and then using it to download and interact with some sample data.

### Download and install the Clarity package

The clarity enhancement challenge tools can be found at the Clarity <a href="https://github.com/claritychallenge/clarity">GitHub</a> site.

They can be downloaded into the notebook environment usung `git clone`. 

In [None]:
print("Cloning git repo...")

!git clone --quiet https://github.com/claritychallenge/clarity.git

This will have made a directory called <code>clarity</code> storing the package code.

The repository can now be installed as a python package using pip

In [None]:
print("Changing directory...")
%cd clarity
print("Installing Clarity tools")
%pip install -e .
import os
import sys

sys.path.append(os.getcwd())
print("Moving back to project root directory")
%cd ..

If you run `%pip list` then `clarity` should now appear in the alphabetic list of packages.

In [None]:
%pip list

### Obtaining the sample data

In order to demonstrate basic functionality, a smaller demo dataset is available through the <code>clarity.data.demo_data</code> module. Running the following functions downloads different components of the datasets:

  - <code>get_metadata_demo()</code>
  - <code>get_targets_demo()</code>
  - <code>get_interferers_demo()</code>
  - <code>get_rooms_demo()</code>
  - <code>get_scenes_demo()</code>

For this demonstration we will just download and install just the `metadata` dataset.


In [None]:
from clarity.data import demo_data

demo_data.get_metadata_demo()

This will have created a directory called `clarity_data` containing the metadata files that have been downloaded.

---
### The structure of the metadata files 

There are four metadata files 

- `rooms` - geometry of the rooms used for the simulations
- `scenes` - information about the sound scene that is playing in the room
- `listeners` - audiometric data for the hearing-impaired listeners who will listen to the scenes
- `scenes_listeners` - a mapping assigning specific listeners to specific scenes (in the evaluation, each scene will be listened to by three separate listeners)

Information about *individual* rooms, scenes, listeners etc is stored as a dictionary. The complete collections are then stored as either a list or dict depending on how the collection is mostly conveniently indexed. The datastructure of the four datatypes is summarized below.


| Dataset | Structure | Index |
| --- | --- | --- |
| `rooms` | list of dicts | int |
| `scenes` | list of dicts | int |
| `listener` | dict of dicts | LISTENER_ID |
| `scenes_listeners` | dict of lists | LISTENED_ID |



### Reading the metadata files

The Clarity metadata is stored in JSON format. The python JSON library imports JSON files and parses them into python objects.

This is demonstrated in the cell below.

In [None]:
import json

with open("clarity_data/demo/metadata/scenes.demo.json") as f:
    scenes = json.load(f)

with open("clarity_data/demo/metadata/rooms.demo.json") as f:
    rooms = json.load(f)

with open("clarity_data/demo/metadata/listeners.json") as f:
    listeners = json.load(f)

with open("clarity_data/demo/metadata/scenes_listeners.dev.json") as f:
    scenes_listeners = json.load(f)

---
### Working with the metadata

Once the data is loaded, we can access the dictionary for an individual item using the items index, and then we can access the parameters of that item using the dictionary's keys. 

For example, we will retrieve the information about the first scene and then use the `keys` method to see what scene parameters are available.

In [None]:
scene_0 = scenes[0]

print(scene_0.keys())

We can see that for the scene we have the following list of keys

- dataset
- room
- scene
- target
- duration
- interferers
- SNR
- listener

To find out the SNR of this scene we can simply use the `SNR` key, i.e., `scene_0['SNR']`

In [None]:
print(scene_0["SNR"])

### Processing a collection of scenes

We can then run processes over the complete list of scenes using standard Python list processing idioms.

So for example, in the code below we extract the SNR from each scene and plot a histogram of this set of SNRs. We can then use the `interferers` field to separate scenes according to whether they have either two or three interferers and compare the range of SNRs for each type.

In [None]:
import numpy as np
from matplotlib import pyplot as plt

fig, ax = plt.subplots(1, 2)

# Get list of SNRs of scenes
snr_values = np.array([s["SNR"] for s in scenes], dtype="float32")

# Plot histogram
ax[0].hist(snr_values)
ax[0].set_title("Histogram of SNR values")
ax[0].set_xlabel("SNR (dB)")

# Get list of number of interferers in scenes
n_interferers = np.array([len(s["interferers"]) for s in scenes], dtype="int32")

# Prepare data for boxplot
snr_comparison_data = [
    [s for s, n in zip(snr_values, n_interferers) if n == 2],
    [s for s, n in zip(snr_values, n_interferers) if n == 3],
]

# Plot boxplot
ax[1].boxplot(np.array(snr_comparison_data, dtype="object"))
ax[1].set_xlabel("Number of interferers")
ax[1].set_ylabel("SNR (dB)")

plt.subplots_adjust(left=0.1, bottom=0.1, right=0.9, top=0.9, wspace=0.4, hspace=0.4)

fig.show()

---
## Associations between metadata types

There are various associations between the metadata types which sometime require cross referencing from one collection to another. 

For example, room dimensions are stored in the room dict rather than directly in the scene dict. So to get the room dimensions for a given scene, you need to first look at the room ID field in the scene to find the correct room.

One approach to doing this is shown below.


In [None]:
room_id = scene_0["room"]

# Iterate through rooms to find the one named `room_id`

room = next((item for item in rooms if item["name"] == room_id), None)

print(room["dimensions"])

This approach uses a linear search and is therefore not very efficient. If you are going to be doing this often you might want to convert the list of rooms into a dictionary indexed by room ID, e.g.

In [None]:
room_dict = {room["name"]: room for room in rooms}

You can now look up the dimensions of a scene's room more efficiently,

In [None]:
room_id = scene_0["room"]
room_dict[room_id]
print(room["dimensions"])

### Example: Locating information about the scene's listener

We will now use these ideas to plot the audiograms of one of the listeners associated with a specific scene. The code also prints out some information about the target and listener locations that are stored in the scene's associated room dict.

In [None]:
scene_no = 32  # this is just an arbitrary index. try any from 0 - 49

scene = scenes[scene_no]

room = room_dict[scene["room"]]
current_listeners = scenes_listeners[scene["scene"]]


print(
    f'\nScene number {scene_no}  (ID {scene["scene"]}) has room dimensions of {room["dimensions"]}'
)

print(
    f'\nSimulated listeners for scene {scene_no} have spatial attributes: \n{room["listener"]}'
)

print(f'\nAudiograms for listeners in Scene ID {scene["scene"]}')


fig, ax = plt.subplots(1, len(current_listeners))

ax[0].set_ylabel("Hearing level (dB)")
for i, l in enumerate(current_listeners):
    listener_data = listeners[l]
    (left_ag,) = ax[i].plot(
        listener_data["audiogram_cfs"],
        -np.array(listener_data["audiogram_levels_l"]),
        label="left audiogram",
    )
    (right_ag,) = ax[i].plot(
        listener_data["audiogram_cfs"],
        -np.array(listener_data["audiogram_levels_r"]),
        label="right audiogram",
    )
    ax[i].set_title(f"Listener {l}")
    ax[i].set_xlabel("Hz")
    ax[i].set_ylim([-100, 10])

plt.legend(handles=[left_ag, right_ag])
plt.subplots_adjust(left=0.1, bottom=0.1, right=0.9, top=0.9, wspace=0.4, hspace=0.4)

We hope this tutorial has been useful. We will be releasing future tutorials demonstrating how to build other useful visualisations of the metadata. If you have any feedback or questions please feel free to contact us. Contact details are available on the Clarity project websites.

