# Loudness Level Labeling with the Dictionary Approach

Given an automatically annotated corpus of sound events, the annotations are cleaned from frequent false positives by string matching. 
Then, the remaining annotations are extracted and prepared as a csv file with the extracted sound event span, sound classification, and file name.
The csv is then opened as a pandas dataframe, in which a copy of the sound event spans is lemmatized and lowercased in preparation for the matching algorithm, which detects sound words that have a loudness level assigned to them in the loudness level dictionary. 
In a final step, after averaging the matched sound-word loudness value pairs, the average loudness level value of a sound event is written back into the revised XML TEI files from the beginning of the script. 

In [189]:
import os
import csv
import pandas as pd
import regex as re
from pathlib import Path
from collections import Counter
import xml.etree.ElementTree as ET
import spacy
# Load the English medium model
nlp = spacy.load('en_core_web_sm')

In [190]:
#Define the path to the input folder and the name and path of the output file

folder_path = '/Users/sguhr/Desktop/Arbeitslaptop/TU_Darmstadt/2024:25/Vortragstournée/Erlangen/for_loudness_dic_anno'
output_file = '/Users/sguhr/Desktop/Arbeitslaptop/TU_Darmstadt/2024:25/Vortragstournée/Erlangen/for_loudness_dic_anno/20250520_Dickens_corpus_predicted.csv'

## Automated Revision of frequent false positive Annotations

The following cell is very long because it contains the collection of false positives noticed in the revision of sound events annotated with the round 2 model. As a continuing list of strings where I could add a new false positive case every time I noticed it, the string became longer and longer.

In [191]:
# This works! But it is a very long string with sample false positives from the revisions of the round 2 model annotation output! But hey, the more false positive matches, the better! 

strings = ['', 'this not said', 'father', 'mother', 'the beautiful lips', 'her hand', 'her', 'the official was', 'gone', "the Gensd'armen Schmidt", 'old dung cart', 'him standing still', 'she waited for no answer', 'his room', 'a right', 'only a little too small human room', 'one above the same', 'like', 'the complicated lock open', 'that he cause with it', 'but', 'like a big', 'from all side', 'at the disc', 'mother-', 'in the but', 'like from below', 'one not to suppress', 'but limit themselves to this circumstance', 'but with the fist', 'endeavour', 'caught up in some nonsensical hope', 'they', 'and', 'and have them', 'and have nothing', 'again and again', 'if now the mother', "pointing to gregor's room", 'and', "despite the mother's imploringly raised hand", 'the sister', 'with', 'and', 'the father seems to think it more necessary --', 'instead of expelling Gregor', "meanwhile the sister's lostness", 'overcome', "which the master of the room quickly approached under the father's urging", 'who nods his head strongly as if unsteadily', 'and one', 'give himself', 'to the mother', 'and in a fright that is completely incomprehensible to gregor, leave the sister even to the mother', 'and one', 'grete', 'mr samsa merely nods to him briefly several times with his big eye', 'pull your sticks out of the cane container', 'bowed', 'while writing the waitress comes in --', 'fend him off with outstretched hand', 'remember the big one', 'you have', 'lean back comfortably on your seat', 'of the prospect for the future', 'you', 'steaming like a little pot', 'and', 'and have you', 'without knocking', 'and have nothing', 'again and again', 'without knocking', 'and he also heard', 'as first,', 'and', 'but', 'from', 'but the', 'but he', 'but she', 'all', 'all', 'as', 'as', 'and', 'but', 'am', 'am', 'an', 'an', 'old', 'Adelens', 'Adele', 'Agnes,', 'as Alice', 'as this', 'as he', 'as she', 'as these', 'as me', 'as man', 'as one', 'as she', 'as he', 'Ameile', ',', '.', ';', ':', '"', '"', '-', '-', 'Dagobert shrugged his shoulders in a very perfidious manner', 'and then raised me', 'and now he favoured', "and later returned the host's questioning look with a shrug", 'The priest made a convulsive movement', 'my heartbeat faltered', 'He dismissed her with a slight gesture', 'from whose countenance the joy laughed', 'To telegraph a brief word to your wife of your detention', 'and greeted me with both arms', 'But when the two spouses now exchanged a welcome kiss', 'All at once the words passed through my mind in the same tone as I had heard them', 'He nodded', 'shrugged his shoulders', 'Every ten paces someone greeted him', 'He', 'but I nodded to him', 'about the clergy of the pious city', 'With a jerk I pushed the bourgeois away', 'had just', 'Now she let me go with a curtsey', 'Only on the way did she become a little calmer', 'Du bedarfst der Ruhe', 'jähem Ruck', 'ich greette emporte', 'Ein leichtes resedenes Kleid umschmiete sie wie zartes Gewölk', 'Ich nodt schämig lächeln', 'und lächeln höhnisch', 'und zuckncken mit den Aucktern', 'einem Ruck', 'Ich holen die Laute', 'Sie nodte', 'sie nahm die Laute', 'Lips of the beautiful woman', "but he'll tell her tomorrow", 'Since then Doctor Matthias has not teased his sister in a hurtful way', 'A part of him meant', 'spoke English and French', 'like German', "Of course she didn't talk about it", "so she didn't ask about it either", 'The fit of weakness had just been overcome', 'He stood still', 'He stood still', 'He nodded absentmindedly', 'The Doctor suddenly stumbled at the sight', 'He raised her very calmly', 'The Doctor nodded his head affirmatively', 'The Doctor smiled at the childish certainty', 'Then came the reproaches of his conscience with regard to Mrs von Dahlhorst', 'You falter', 'The young man shrugged his shoulders regretfully', 'She faltered', 'But the question weighed heavily on his heart', 'But then she thought she noticed', 'At last he stood still', 'which was clearly expressed in her whole demeanour and in the large and tightly open eye', 'Controlled by her deep, strictly concealed tenderness, she stood still at the threshold', 'He stood still again', 'and hesitated', 'to speak any bad or good word', 'Then Elisabeth stumbled', 'He closed her mouth with a kiss', 'A silent musician', 'He nodded eagerly', 'and he nodded kindly', 'Man', 'He nodded', 'even', '"', 'Thoughts her voice with', 'I threatened her with my finger', 'and', 'as if the great master nodded to me', 'They had', 'But I calmed myself', 'He nodded', '"', 'But', 'but friend Valentin had allowed himself a small change this time', 'for old Asmus spoke in that poem only of his own recovery', 'Eyes', 'He nodded eagerly', 'Once again through a mutual acquaintance I received a greeting from Valentin', "behind which the silent musician's had gradually disappeared completely", 'the old gentleman', 'on my side nodded his head more and more emphatically', 'Pretty voice', 'And with a polite movement turned to me', 'Though she was talking about herself then', 'She nodded smilingly', 'and', 'Say', 'and crawled sleepily', 'But', 'breathing', 'wasting away', 'The blacksmith nodded', 'He stood quite still', 'Schibes waved approvingly and with regret', 'The blacksmith stood still', 'She nodded', 'He marvelled', 'She nodded', "He didn't say it", 'With an abrupt jerk', 'When he greeted', 'He shivered covered in sweat', "She didn't ask", "She didn't cry", 'wagged like mad', 'away -', 'happiness', 'and agonised', 'Jan greeted from afar with his hand, so naturally', 'prancing mare', 'ridiculed', 'please', 'and bite', 'and', 'he wagged', 'rushing abruptly through the hut', 'The old man calmly pushed his fur cap', 'he did', 'he nodded', 'she shrugged her shoulders', 'now stretched', 'lady', 'mole', 'in passing only the wireless operator had saluted the deck officer', 'the young lady nodded', 'and', 'still hesitating', 'he approached his brother', 'if', 'and lies there quite still', 'but also lies purely still', 'Uncle Pökel nodded', 'Mrs Sellentin heard that', 'Petiskussen agreed', 'and', 'took a small sip', 'Helene listened intently', 'names', 'musical Helene quivered pleasantly', 'if I', 'for she began to suspect the meaning of his speech', 'she had to answer', 'and', 'the Professor bowed gracefully', 'Mrs', 'Helene breathed a sigh of relief', 'this last question could at least be answered in general', 'fatherly house', 'and frowning', 'I telegraphed', 'So I simply telegraphed back to this professor', 'I telegraphed to Leipzig to her husband', 'And an inner voice answered', 'Bowing frostily', 'Without answering the professor', 'and', 'to reassure her', 'Without responding to this remark', 'that he smirked at her', 'Emma smilingly threatened with a raised finger', 'she then began with two dainty bows', 'quietly', 'you', 'on', 'to Emma', 'And at this the poor singer', 'and listened', 'The made-up and powdered Excellency', 'never a loud word, a joke, a laugh escaped her', 'especially', 'and', 'and then', 'the', 'inexplicable', 'so', 'but nobody heard me', 'the orchestra tired', 'and continued on their way', 'too moved', 'to speak a loud word', 'the', 'swayed', 'I wrestled', 'The lamentable husband and father was close to despair', 'child on a still', 'one', 'The dying heard her', 'no sound was heard', 'A battle seemed to rage within her', 'The children had little bells, small animals, houses, parlours', 'Later we heard nothing more of her', 'since I heard you so', 'they were whistling a lot', 'that', 'it scooped', 'even a maid', 'when', 'it burned', 'still', 'was called', 'the tumult', 'the second time', 'morning after their departure', 'he', 'the uncanny one looked at him calmly', 'one involuntarily', 'a decision ripened in him, calm and cold', 'Baron Benzing sat quietly again', 'as at', 'she was again', 'that', 'it', 'I paused in my walk', 'Therese shares my apprehension', 'and one feels the roaring and groaning in', 'The cunning-looking boy', 'Athem', 'The', 'but was one without any objection', 'and sat down without thanking me', 'And iron', 'thundering rebuke', 'words', 'called', 'curse,', 'help to', 'between', 'there', 'house', 'she', 'and', 'the', 'the', 'that', 'one', 'one', 'these', 'this', 'today he', 'the two old men', 'but so often', 'what the guest', 'of death in the sounds of her country, her parents, her childhood, her happiness', 'him', 'her', 'in', 'there', 'the girls greeted bashfully again', 'and swayed her head to and fro', 'impatiently she shook off his hand', 'then she opened her lips with a contemptuous twitch', 'and the flowered, blond and brown heads bowed in shy greeting', 'a serious handshake', 'who greeted him', 'they shrugged their shoulders', 'they greeted each other discreetly', 'to shake hands with her', 'the cute little girl beckoned to her', 'with an unbearable tremor in her knees the girl went back to her seat', 'her fingers trembled', 'she waved', 'how the girl trembled all over', 'Eugenie kissed Agathe passionately', 'was greeted by him', 'Mum suddenly beckoned to her', 'But the little maid shook her head', 'while his scrawny long hand greeted her warmly', 'The movement with which he greeted', 'A violent, prolonged tremor ran through her body', 'and looked up with open, trembling lips', 'to greet her', 'by kissing her mother', 'and did not greet her', 'his broad shoulders shrugged', 'and he wanted to pass her with a hurried greeting', 'gave him his hand', 'he shook her hand very warmly', 'with trembling knees she went to the door', 'her hand was shaken', 'and nodded her head', 'He shook his head', 'and kissed him on the forehead', 'Kissed him on the forehead', 'She also shook hands with Daniel - quite mechanically', 'Her mouth began to tremble', 'Agathe bowed her head', 'He kissed her on the forehead', 'and shook hands with his colleague', 'Agathe rose trembling', 'She trembled more', 'The old medical officer received a nod of the head', 'Raikendorf gave her his hand with a tenderly hesitant squeeze', 'He squeezed her hand', 'Like a sensitive she trembled under his sharp eyes', 'And triumphantly she had greeted and waved all around', 'Flush elegance greeted', 'Flametti winked', 'swallowed the leftover food', 'and Mary had yawned', 'and very amusedly returned the signs of the snake-man indicating with his head', 'and shook her raised hand dismissively', 'and shook her blouse tremblingly', 'both nodded, Mrs Häsli so hastily', 'waved Mrs Häsli away', 'and shook hands with Flametti', 'Lena shook her head', 'He gestured with his head towards the two gently walking officials', 'nodded his head in a suave and witty manner', 'and swayed his head', 'he', 'shook his head', 'But then he shook his head in disapproval', 'Mechmed shook his head absorbedly', 'waved his head', 'and stretched his hand across the table to Mr Schnabel', 'invited Flametti with a quick, deft gesture', 'and Flametti shook hands with Mr Rotter in Indian', 'Raffaëla shook her head at such incredulity', 'she smiled, shaking her head', 'greeted Mr Farolyi from the Donna Maria Josefa circus expertly with an outstretched hand', 'and introduced the pianist with a sideways hand gesture', 'Jenny waved to Mother Dudlinger', 'She waved to us', 'but shook her head', 'who now benevolently offered me her hand in farewell', 'before he withdrew with a friendly, equanimous nod of his head', 'for he only shook my hand quickly', 'we waved to each other', 'while Fritz trembled like an aspen leaf in his thin, smoke-blackened clothes', 'that the poor boy, shivering with cold and the fear of death he had overcome, belonged here in the warm nest', 'and she shook hands with Irene', 'Irene shook her head', 'Wilhelmine shook hands with her guests again', 'and shook hands with the strange gentleman', 'she trembled', 'and shook hands with him', 'he kissed her forehead with a smile', 'all the time he was trembling under the canapé', 'for a quarter of an hour he shook his head slowly', 'Elvira waved to the waitress', 'Hildegard shook her head', 'from which a lady nodded graciously to Elvira', 'The teacher bowed low', 'Elvira shrugged her shoulders', 'Hildegard shook her head', 'Then she held out her hand to Elviren', 'Elvira cradled her head', 'Elvira rubbed her hands with pleasure', 'Fräulein Schulze made a gesture with her hand', 'and beckoned a hansom cab', 'took her hand', 'He swayed his head thoughtfully', 'trembled', 'and shook my hand', 'Cheerfully he shook my hand', 'and still waved to me', 'But my visitor shook his head', 'and shook my hand', 'he shook his head a little', 'Professor Müller shook his head', 'and Roland shook his head', 'I raised my shoulders', 'Miss Mason shook her head at the activity in the clinic', 'she turned to me, shaking her head', 'and Walter waved to Fred', 'and lowered his head to one side', 'He also made a defensive gesture with his hand', 'Then he grabbed my hand', 'kissed it', 'Fabianen nodded her head vigorously', 'And she held out her hand to Fabianen', 'and shook her shoulder', 'The forester swayed his head apprehensively', 'the more he shook his head', 'he trembled to his innermost pores', 'shy and trembling, she stood a full step away from the table', 'shook his little head and tails', 'waved and nodded to the chiffonier', 'the fresh morning wind blew the dew from the trembling stalks', 'Then she shook her head vigorously', 'But one afternoon she had returned my greeting smile so cheerfully', 'and she smiled confidently, proudly', 'Again I had to smile', 'but she still shook her head', 'so we greeted each other with the recognising smile', 'She shrugged her shoulders with her usual amiable smile', 'again she squeezed my hand', 'and smiled earnestly', 'and we squeezed hands once more', 'shook my hand and was gone', 'the clerk shook his head', 'And heartily he returned the pressure', 'I took my vow with enthusiasm', 'and my hand trembled', 'Then he shook his head', 'she shook her head', 'my heart trembled', 'Kohn listened shaking his head', 'With these words he put his hat on his head', 'to kiss her hand', 'Mum, who waved her handkerchief from the stone staircase', 'and gave him a kiss on the left cheek', 'yes, she kissed her favourite on the forehead to seal the agreement', 'Effi waved her handkerchief', 'and kissed Effi', 'and kissed both her hands', 'greeted Effi from the coupe', 'and shook hands with Innstetten', 'Effi shook hands with the embarrassed entrant', 'which he kissed with a certain impetuosity', 'Effi shook his hand', 'after kissing his hand again', 'Innstetten took her hand', 'shook his head', 'Kruse swayed his head to and fro', 'a tremor came over her', 'and covered her with hot kisses', 'and trembled', 'and she reached out her hand to him', 'and her whole delicate body trembled', 'Another handshake', 'She trembled with excitement', 'Effi shook her head', 'Effi shook her head', 'At the same time Crampas indicated with a movement of his hand', 'and kissed his hand', 'and gave her hand', 'Roswitha gave the child a kiss', 'and gave him a kiss on the forehead', 'and gave her a kiss on the forehead', 'Wiesike swayed his head slowly back and forth', 'shook his head slowly back and forth', 'Bernhard and Reinhold shook hands with delight and understanding', 'Deeply moved, Reinhold shook hands with his friend', 'The grey horse shook his head', 'shook his head', 'After staring for a long time, he nodded his head slowly', 'The old man shook his head', 'Elke shook her head', 'But she shook her head', 'but he turned his head away', 'But she only shook her head', 'She shook her head', 'But the old man shook his head', 'But the head dyke count held out his hand to the girl', 'She shook his hand', 'Hauke shook his head', 'after the boy had insistently offered him his hand', 'The boy shook his head', 'but she shook her head', 'Hauke shook his head', 'and went out of the room nodding his head', 'and as the child lowered her head as if nodding', 'she began to shake her head', 'but Hauke shook his head', 'he shook his head', 'who waved his hand', "That's", 'I', 'when', 'had', 'again', 'before the', 'hotel door', 'Crisparkle, with', 'shaking his head again', 'page', 'at length', 'Jasper', 'place', 'Crisparkle bowed again', 'to the', 'himself', 'She', 'Miss', 'by', 'air', 'Good night', 'at the door', 'Neville', 'face of', 'Landless', 'Thank', 'the double example', 'Mr.', 'Verger', 'daily', 'service', 'until', 'then', 'Thus', 'looking on', 'time', 'off at', 'very moment', 'is', 'Then', 'hideous small boy,', 'story-', 'deferential', 'leads', 'of', 'after some', 'his', "'s", 'young', 'I hear', 'too', 'This', 'broadside', 'gap', 'a', 'by the', 'Canon', 'while']





In [192]:
# Nice! This one works!
# <character_sound></character_sound>
# If with automatically generated annotations, pay attention to the "loudness="\d" because it is not part of the automatically generated annotations from NEISS NTEE.

import os
import re

def process_xml_files(folder_path, strings):
    # Iterate over each xml file in the folder
    for filename in os.listdir(folder_path):
        if filename.endswith('.xml'):
            xml_file_path = os.path.join(folder_path, filename)
            # Read the text file
            with open(xml_file_path, 'r', encoding='utf-8') as file:
                text = file.read()

            # Iterate over strings to find and delete the regex matches
            for string in strings:
                # Define the regex pattern to match the combination
                regex_pattern = fr'<(?:ambient|character)_sound>\s*{re.escape(string)}\s*</(?:ambient|character)_sound>'

                # Find and delete the regex match from the text
                text = re.sub(regex_pattern, string, text)

            # Write the modified text back to the file
            with open(xml_file_path, 'w', encoding='utf-8') as file:
                file.write(text)

# Execute the function
process_xml_files(folder_path, strings)

In [193]:
print("Frequent false positive cleaning finished.")

Frequent false positive cleaning finished.


## Sound Event Span Extraction

In [194]:
#This code extracts the sound event spans and stores them in two separate lists according to their ambient or character sound classification. 

#import os
#import xml.etree.ElementTree as ET

#the following function creates empty lists, iterates over the xml elements extracting the content between the elements ambient_sound and character_sound to store them in the empty list, sorted by xml text file using the list.append and extend commands.

def extract_sound_spans(xml_content):
    ambient_sound_spans = []
    character_sound_spans = []
    root = ET.fromstring(xml_content)

    ambient_sound_text = ""
    character_sound_text = ""
    
    for elem in root.iter():
        if elem.tag.endswith('ambient_sound'):
            ambient_sound_text = elem.text.strip()
            ambient_sound_spans.append(ambient_sound_text)
        elif elem.tag.endswith('character_sound'):
            character_sound_text = elem.text.strip()
            character_sound_spans.append(character_sound_text)
    return ambient_sound_spans, character_sound_spans

def process_xml_file(filepath):
    ambient_sound_spans_list = []
    character_sound_spans_list = []
    with open(filepath, 'r', encoding='utf-8') as file:
        xml_content = file.read()
        ambient_sound_spans, character_sound_spans = extract_sound_spans(xml_content)
        ambient_sound_spans_list.extend(ambient_sound_spans)
        character_sound_spans_list.extend(character_sound_spans)
    return ambient_sound_spans_list, character_sound_spans_list

def process_folder(folder_path):
    sound_spans_per_file = {}
    for filename in os.listdir(folder_path):
        if filename.endswith('.xml'):
            filepath = os.path.join(folder_path, filename)
            ambient_sound_spans_list, character_sound_spans_list = process_xml_file(filepath)
            sound_spans_per_file[filename] = {'ambient_sound_spans': ambient_sound_spans_list, 
                                              'character_sound_spans': character_sound_spans_list}
    return sound_spans_per_file


sound_spans_per_file = process_folder(folder_path)

for filename, sound_spans in sound_spans_per_file.items():
    print("File:", filename)
    print("Ambient Sound Spans:", sound_spans['ambient_sound_spans'])
    print("Character Sound Spans:", sound_spans['character_sound_spans'])
    print()


File: CC_anno_man.xml
Ambient Sound Spans: ['he could hear the people in the court outside go wheezing up and down', 'beating their hands upon their breasts', 'stamping their feet upon the pavement stones to warm them', 'The City clocks had only just gone three', 'struck the hours and quarters in the clouds with tremulous vibrations afterwards', 'berries crackled in the lamp heat of the windows', 'closed it with a bang', 'The sound resounded through the house like thunder', "Every room above and every cask in the wine merchant's cellars below appeared to have a separate peal of echoes of its own", 'it scarcely made a sound', 'soon it rang out loudly', 'so did every bell in the house', 'The bells ceased', 'They were succeeded by a clanking noise deep down below', 'The cellar door flew open with a booming sound', 'then he heard the noise much louder on the floors below', 'shook its chain with such a dismal and appalling noise', 'clanked its chain so hideously in the dead silence of the n

In [195]:
# Counter of the sound events separately for each class and summed up.
def count_sound_events(sound_spans_per_file):
    ambient_sound_count = 0
    character_sound_count = 0
    total_sound_count = 0
    
    for sound_spans in sound_spans_per_file.values():
        ambient_sound_count += len(sound_spans['ambient_sound_spans'])
        character_sound_count += len(sound_spans['character_sound_spans'])
    
    total_sound_count = ambient_sound_count + character_sound_count
    
    return ambient_sound_count, character_sound_count, total_sound_count

# Call the function to count sound events
ambient_sound_count, character_sound_count, total_sound_count = count_sound_events(sound_spans_per_file)

# Print the results
print("Number of Ambient Sound Events:", ambient_sound_count)
print("Number of Character Sound Events:", character_sound_count)
print("Total Number of Sound Events:", total_sound_count)


Number of Ambient Sound Events: 366
Number of Character Sound Events: 2288
Total Number of Sound Events: 2654


In [196]:
# Write the output to a CSV file
#import csv
#output_file = '/Users/sguhr/Desktop/Diss_notebooks/ner_prediction_sicherheitskopie_20240505_15h/20240501_Subcorpus_1848-55_predicted_for_loudness/20240509_sound_spans_output_Subcorpus_1848-55_predicted_test.csv'

output_file = output_file

with open(output_file, 'w', newline='', encoding='utf-8') as csvfile:
    fieldnames = ['File', 'Ambient Sound Spans', 'Character Sound Spans']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for filename, sound_spans in sound_spans_per_file.items():
        writer.writerow({'File': filename,
                         'Ambient Sound Spans': sound_spans['ambient_sound_spans'],
                         'Character Sound Spans': sound_spans['character_sound_spans']})

In [197]:
print("Sound event extraction finished and saved to output csv file.")

Sound event extraction finished and saved to output csv file.


## Open the saved table as a pandas dataframe

The next step is to open the csv file as a pandas data frame.

In [198]:
# Import 
#import os
#import csv
#import pandas as pd
#import regex as re
#from pathlib import Path
#from collections import Counter

#csv_file_path = '/Users/sguhr/Desktop/Diss_notebooks/test_folder_ll/20240510_test_corpus_predicted.csv'

#Indicate the path to the source csv file
csv_file_path = output_file

# Read the CSV file into a Pandas DataFrame
diss_corpus_annotations = pd.read_csv(csv_file_path)

# Display the DataFrame
print(diss_corpus_annotations.head())

              File                                Ambient Sound Spans  \
0  CC_anno_man.xml  ['he could hear the people in the court outsid...   
1  ED_enriched.xml  ['cymbals clash', 'A', 'to the incoherent jarg...   

                               Character Sound Spans  
0  ['cried a cheerful voice', 'said Scrooge', "sa...  
1  ['says this woman, in a querulous, rattling wh...  


The following function cleans the string representation of the lists.
This code reads the CSV file, cleans the string representations of lists in each row, and then extracts the file name, ambient sound spans, and character sound spans. Finally, it prints the extracted data for verification.

In [199]:
#import csv


# Define a function to clean the string representation of lists
def clean_list_string(list_string):
    # Remove leading and trailing whitespace
    cleaned = list_string.strip()
    # Remove leading and trailing square brackets
    cleaned = cleaned.strip("[]")
    # Split the string into a list using comma as separator
    cleaned_list = cleaned.split(", ")
    # Remove leading and trailing quotes from each element in the list
    cleaned_list = [element.strip("'\"") for element in cleaned_list]
    return cleaned_list

# Define a function to process each row of the CSV
def process_csv_row(row):
    file_name = row['File']
    ambient_sound_spans = clean_list_string(row['Ambient Sound Spans'])
    character_sound_spans = clean_list_string(row['Character Sound Spans'])
    return file_name, ambient_sound_spans, character_sound_spans

# Read the CSV file and process each row
#csv_file_path = 'your_csv_file.csv'  # Replace 'your_csv_file.csv' with the path to your CSV file
sound_data = []
with open(csv_file_path, newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        sound_data.append(process_csv_row(row))

# Print the extracted data for verification
for file_name, ambient_sound_spans, character_sound_spans in sound_data:
    print("File:", file_name)
    print("Ambient Sound Spans:", ambient_sound_spans)
    print("Character Sound Spans:", character_sound_spans)
    print()


File: CC_anno_man.xml
Ambient Sound Spans: ['he could hear the people in the court outside go wheezing up and down', 'beating their hands upon their breasts', 'stamping their feet upon the pavement stones to warm them', 'The City clocks had only just gone three', 'struck the hours and quarters in the clouds with tremulous vibrations afterwards', 'berries crackled in the lamp heat of the windows', 'closed it with a bang', 'The sound resounded through the house like thunder', "Every room above and every cask in the wine merchant's cellars below appeared to have a separate peal of echoes of its own", 'it scarcely made a sound', 'soon it rang out loudly', 'so did every bell in the house', 'The bells ceased', 'They were succeeded by a clanking noise deep down below', 'The cellar door flew open with a booming sound', 'then he heard the noise much louder on the floors below', 'shook its chain with such a dismal and appalling noise', 'clanked its chain so hideously in the dead silence of the n

Distribute the data from the table to dataframe columns.

In [200]:
# prepare the data frame by distributing the data over named columns, one with the sound event span, one with the assigned sound class, one with the file name the sound event spans belong to
# import pandas as pd
# import csv

# Define a function to convert the list of spans to a DataFrame
def spans_to_dataframe(spans, annotation_class, filename):
    df = pd.DataFrame({'annotation_span': spans, 'annotation_class': annotation_class, 'filename': filename})
    return df

# Define a function to process each row of the CSV
def process_csv_row(row):
    file_name = row['File']
    ambient_sound_spans = clean_list_string(row['Ambient Sound Spans'])
    character_sound_spans = clean_list_string(row['Character Sound Spans'])
    
    # Convert spans to DataFrame
    ambient_df = spans_to_dataframe(ambient_sound_spans, 'ambient_sound', file_name)
    character_df = spans_to_dataframe(character_sound_spans, 'character_sound', file_name)
    
    return ambient_df, character_df

# Read the CSV file and process each row
#csv_file_path = 'your_csv_file.csv'  # Replace 'your_csv_file.csv' with the path to your CSV file
sound_data = []
with open(csv_file_path, newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        ambient_df, character_df = process_csv_row(row)
        sound_data.append(ambient_df)
        sound_data.append(character_df)

# Concatenate the DataFrame for each row into a single DataFrame
final_df = pd.concat(sound_data, ignore_index=True)

# Print the resulting DataFrame
print(final_df)


                                        annotation_span annotation_class  \
0     he could hear the people in the court outside ...    ambient_sound   
1                beating their hands upon their breasts    ambient_sound   
2     stamping their feet upon the pavement stones t...    ambient_sound   
3              The City clocks had only just gone three    ambient_sound   
4     struck the hours and quarters in the clouds wi...    ambient_sound   
...                                                 ...              ...   
2817                                  says Mr. Datchery  character_sound   
2818     He sighs over the contemplation of its poverty  character_sound   
2819                                       he concludes  character_sound   
2820                                          he chants  character_sound   
2821                                              sings  character_sound   

             filename  
0     CC_anno_man.xml  
1     CC_anno_man.xml  
2     CC_anno_m

Lemmatize and lowercase a copy of the sound event spans saved to the column "lemmatized_sound_span"
The following code reads the CSV file, processes each row to convert the spans into separate DataFrames for ambient and character sounds, and then concatenates all the DataFrames into a single DataFrame. The resulting DataFrame contains two columns: "annotation_span" and "annotation_class", where each row represents a single span and its corresponding class. Furthermore, it adds the filename of the file the each sound span had been extracted from.

In [201]:
# In the following the sound event spans get prepared for the loudness level labeling.
#import pandas as pd
#import csv
#import spacy

# Load the German medium model
#nlp = spacy.load('de_core_news_md')

# Define a function to lemmatize and lowercase the spans
def lemmatize_spans(spans):
    lemmatized_spans = []
    for span in spans:
        doc = nlp(span)
        lemmatized_span = ' '.join([token.lemma_ for token in doc])
        lemmatized_spans.append(lemmatized_span.lower())  # Convert to lowercase
    return lemmatized_spans

# Define a function to convert the list of spans to a DataFrame
def spans_to_dataframe(original_spans, lemmatized_spans, annotation_class, filename):
    df = pd.DataFrame({'filename': filename, 'sound_span': original_spans, 'annotation_class': annotation_class, 'lemmatized_sound_span': lemmatized_spans})
    return df

# Define a function to process each row of the CSV
def process_csv_row(row):
    file_name = row['File']
    ambient_sound_spans = clean_list_string(row['Ambient Sound Spans'])
    character_sound_spans = clean_list_string(row['Character Sound Spans'])
    
    # Lemmatize the spans
    lemmatized_ambient_spans = lemmatize_spans(ambient_sound_spans)
    lemmatized_character_spans = lemmatize_spans(character_sound_spans)
    
    # Convert spans to DataFrame
    ambient_df = spans_to_dataframe(ambient_sound_spans, lemmatized_ambient_spans, 'ambient_sound', file_name)
    character_df = spans_to_dataframe(character_sound_spans, lemmatized_character_spans, 'character_sound', file_name)
    
    return ambient_df, character_df

# Read the CSV file and process each row
#csv_file_path = 'your_csv_file.csv'  # Replace 'your_csv_file.csv' with the path to your CSV file
sound_data = []
with open(csv_file_path, newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        ambient_df, character_df = process_csv_row(row)
        sound_data.append(ambient_df)
        sound_data.append(character_df)

# Concatenate the DataFrame for each row into a single DataFrame
final_df = pd.concat(sound_data, ignore_index=True)

# Print the resulting DataFrame
print(final_df)


             filename                                         sound_span  \
0     CC_anno_man.xml  he could hear the people in the court outside ...   
1     CC_anno_man.xml             beating their hands upon their breasts   
2     CC_anno_man.xml  stamping their feet upon the pavement stones t...   
3     CC_anno_man.xml           The City clocks had only just gone three   
4     CC_anno_man.xml  struck the hours and quarters in the clouds wi...   
...               ...                                                ...   
2817  ED_enriched.xml                                  says Mr. Datchery   
2818  ED_enriched.xml     He sighs over the contemplation of its poverty   
2819  ED_enriched.xml                                       he concludes   
2820  ED_enriched.xml                                          he chants   
2821  ED_enriched.xml                                              sings   

     annotation_class                              lemmatized_sound_span  
0       ambi

Optionally, add part of speech tagging. 

In [202]:
#show the data frame
final_df

Unnamed: 0,filename,sound_span,annotation_class,lemmatized_sound_span
0,CC_anno_man.xml,he could hear the people in the court outside ...,ambient_sound,he could hear the people in the court outside ...
1,CC_anno_man.xml,beating their hands upon their breasts,ambient_sound,beat their hand upon their breast
2,CC_anno_man.xml,stamping their feet upon the pavement stones t...,ambient_sound,stamp their foot upon the pavement stone to wa...
3,CC_anno_man.xml,The City clocks had only just gone three,ambient_sound,the city clock have only just go three
4,CC_anno_man.xml,struck the hours and quarters in the clouds wi...,ambient_sound,strike the hour and quarter in the cloud with ...
...,...,...,...,...
2817,ED_enriched.xml,says Mr. Datchery,character_sound,say mr. datchery
2818,ED_enriched.xml,He sighs over the contemplation of its poverty,character_sound,he sigh over the contemplation of its poverty
2819,ED_enriched.xml,he concludes,character_sound,he conclude
2820,ED_enriched.xml,he chants,character_sound,he chant


In [167]:
print("The dataframe is prepared for automatized loudness level annotation.")

The dataframe is prepared for automatized loudness level annotation.


Given a sound event annotated corpus prepared as a csv file with extracted sound annotations, the following script detects sound words in the spans that match a loudness level dictionary after lemmatizing and lowercasing the sound event spans, and writes the loudness values as attributes back to the XML files. 

## Loudness Level Labeling

The following code defines a function find_sound_words that takes a string of text as input, splits it into words, and then checks if each word is a key in the sound dictionary. If a word is found in the dictionary, it adds it to a list. Finally, it adds a new column to your dataframe called 'found_sound_words', which contains lists of found sound words for each lemmatized sound_span.

In [204]:
# In the following, I provide the loudness level dictionary containing key-value pairs from sound words as keys and loudness level as values.

loudness_dict = {'distract': 3, 'seclusion': 1, 'reject': 3, 'turn off': 3, 'warning': 3, 'disagree': 3, 'judge': 3, 'accorde': 4, 'groan': 4, 'groan': 4, 'groaning': 4, 'groan': 4, 'offer': 3, 'roar': 4, 'reverently': 0, 'hint': 3, 'suggest': 3, 'recommend': 3, 'recommendation': 3, 'contest': 4, 'indicate': 3, 'listen': 0, 'accuse': 3, 'smile at': 0, 'mob': 4, 'denounce': 4, 'praise': 3, 'advise': 3, 'address': 3, 'call': 4, 'shout at': 4, 'denigrate': 3, 'address': 3, 'answer': 3, 'answer': 3, 'order': 3, 'applaud': 4, 'applause': 4, 'breath': 2, 'breath': 1, 'take a breath': 1, 'breath': 1, 'breaths': 1, 'breaths': 1, 'breath': 1, 'breath catch': 1, 'breath': 1, 'breathing': 1, 'breathing': 1, 'breath': 1, 'breathe': 1, 'breathing': 1, 'breathing': 1, 'breathing': 1, 'atmeter': 1, 'breathing': 1, 'breathing': 1, 'breathing in': 1, 'breathing': 1, 'prompting': 3, 'listening': 1, 'rioting': 4, 'sobbing': 2, 'crying out': 4, 'sighing': 3, 'sighing': 3, 'playing up': 3, 'moaning': 4, 'groaning': 3, 'order': 3, 'appearing': 3, 'waiting': 3, 'breathing in': 1, 'confrontation': 4, 'laugh at': 4, 'unpacking': 3, 'whistle': 4, 'shouting': 4, 'shouting': 4, 'utter': 3, 'utter': 3, 'eject': 4, 'expelling': 3, 'to expel': 4, 'axe blow': 5, 'railway station noise': 4, 'shooting': 4, 'perch': 4, 'bass trumpet': 4, 'construction noise': 5, 'pay attention': 3, 'complain': 3, 'answer': 3, 'regret': 3, 'question': 3, 'satisfy': 3, 'congratulate': 3, 'welcome': 3, 'greeting formality': 3, 'assert': 3, 'dominate': 3, 'confess': 3, 'applause': 4, 'applause': 4, 'applause': 4, 'approve': 3, 'affirm': 3, 'affirming': 3, 'cheer': 4, 'confess': 3, 'lament': 3, 'applaud': 4, 'lament': 0, 'affirm': 4, 'smile': 1, 'overhear': 1, 'instruct': 3, 'barking': 4, 'barking': 4, 'criticise': 3, 'remark': 3, 'advise': 3, 'discuss': 3, 'report': 3, 'correct': 3, 'burst': 4, 'reassure': 3, 'ashamed': 0, 'humble': 3, 'insult': 4, 'insulted': 4, 'accuse': 4, 'bombardment': 5, 'wheedling': 3, 'appease': 3, 'conjure': 3, 'sing': 4, 'discuss': 3, 'confirm': 3, 'determine': 3, 'deny': 4, 'dismayed': 3, 'affirm': 3, 'emphasise': 4, 'authorise': 3, 'accuse': 4, 'bitter': 4, 'blowing': 4, 'blowing': 4, 'brass band': 4, 'bleating': 4, 'bleating': 4, 'exposing': 3, 'bomb': 5, 'vicious': 4, 'surf': 4, 'roar': 4, 'roar': 4, 'breeze': 2, 'roar': 4, 'roaring': 4, 'roaring': 4, 'roaring': 4, 'roaring': 4, 'humming': 2, 'snub': 4, 'spelt': 3, 'boom': 5, 'chorus': 4, 'steam': 2, 'steam whistle': 5, 'presented': 3, 'declaiming': 4, 'denounce': 3, 'detonation': 5, 'defame': 4, 'discreet': 2, 'discuss': 4, 'thunder': 5, 'sound of thunder': 5, 'thundering': 5, 'thunderclap': 5, 'thunderclap': 5, 'jostling': 4, 'turning': 3, 'roar': 5, 'roar': 5, 'roaring': 5, 'pressure wave': 5, 'dull': 2, 'dull': 1, 'break through': 3, 'echo': 3, 'echo': 4, 'echoing': 4, 'inhale': 1, 'turn in': 3, 'whispering in': 2, 'falling asleep': 1, 'catch up': 3, 'inviting': 3, 'lulling': 2, 'clearing': 3, 'lonely': 0, 'looping in': 1, 'slurping in': 2, 'monosyllabic': 3, 'objection': 4, 'in tune': 3, 'collapsing': 5, 'monotonous': 3, 'objection': 4, 'objection': 4, 'consent': 3, 'consent': 3, 'objection': 4, 'recommend': 3, 'escaped': 3, 'dreaming away': 0, 'reply': 3, 'apologising': 3, 'roaring': 5, 'experiencing': 3, 'complement': 3, 'raised': 4, 'explain': 3, 'resound': 3, 'enact': 3, 'allow': 3, 'admonition': 4, 'humble': 4, 'serious': 3, 'blush': 1, 'blush': 1, 'resound': 4, 'resounded': 4, 'choke': 2, 'choked': 2, 'request': 3, 'resound': 3, 'reciprocate': 3, 'reciprocate': 3, 'reciprocate': 3, 'reciprocate': 3, 'tell': 3, 'tell': 3, 'tell': 3, 'excommunicate': 3, 'explode': 5, 'explosion': 5, 'explosion': 5, 'fanfare': 4, 'seize': 3, 'snarl': 3, 'celebrate': 4, 'fire': 5, 'fiddle': 4, 'fluttering': 3, 'pleading': 3, 'blubbering': 4, 'flowing': 3, 'flowed': 3, 'flute': 4, 'curse': 4, 'cursing': 4, 'aircraft noise': 5, 'whispering': 2, 'whisper': 2, 'whispering': 2, 'whispering': 2, 'whispering': 2, 'whispered': 2, 'whispering': 3, 'forte': 4, 'proceed': 3, 'continued': 3, 'fortissimo': 5, 'asks': 3, 'asks': 3, 'friendlier': 3, 'peace': 0, 'cemetery peace': 0, 'rejoice': 3, 'frug': 3, 'join': 3, 'kick': 3, 'cackle': 4, 'yawn': 1, 'gallop': 4, 'curtain sermon': 4, 'uttered': 3, 'gesture': 1, 'signing': 1, 'sign language': 1, 'barking': 4, 'prayer': 0, 'prayer bell': 4, 'roar': 4, 'roar': 4, 'muffled': 2, 'roar': 5, 'roar': 4, 'whispered': 2, 'counter-speech': 4, 'dissenting voice': 4, 'secret': 2, 'howling': 4, 'howling': 4, 'hooting': 5, 'howling': 4, 'howling': 4, 'clattering': 4, 'clattering': 4, 'rattling': 4, 'crashing': 5, 'laughter': 4, 'ringing': 4, 'ringing': 4, 'ringing': 4, 'ringing': 4, 'vowing': 3, 'murmuring': 2, 'murmur': 2, 'approve': 3, 'babble': 4, 'babble': 4, 'banging': 4, 'banging': 5, 'pattering': 4, 'rattling': 4, 'rattle': 4, 'noise': 3, 'noise': 3, 'noises': 3, 'noiseless': 1, 'noisy': 4, 'song': 4, 'chattering': 4, 'screaming': 4, 'screaming': 4, 'shouted': 4, 'shouted': 4, 'talking': 3, 'talkative': 3, 'sung': 4, 'blamed': 4, 'bluster': 4, 'bluster': 4, 'thunderstorm': 4, 'clamour': 4, 'ringing of bells': 5, 'chime': 4, 'gurgle': 2, 'grave silence': 0, 'grave silence': 0, 'grenade': 5, 'congratulate': 3, 'roar': 4, 'grumble': 4, 'brooding': 0, 'grunt': 4, 'greet': 3, 'greet': 3, 'gurgle': 3, 'gurgling': 3, 'gurgling': 2, 'cooing': 3, 'half-loud': 3, 'semi-quiet': 2, 'reverb': 4, 'reverb': 4, 'echo': 4, 'hammers': 5, 'hammering': 5, 'hammer blow': 5, 'hand clap': 4, 'handshake': 3, 'breathe': 1, 'hit': 4, 'hehehehehehemeh': 4, 'heiben': 4, 'hoarse': 2, 'hoarse': 2,  'laugh out': 4, 'emphasise': 4, 'evoke': 4, 'heartbeat': 1, 'heartbeat': 1, 'rush': 4, 'howling': 4, 'blow': 4, 'cry for help': 4, 'add': 3, 'sit down': 3, 'ho-ho': 4, 'jeer': 4, 'mocking': 4, 'mocking': 4, 'hellish noise': 5, 'audible': 2, 'listen': 1, 'hearing': 1, 'hear': 1, 'heard': 1, 'hoof': 4, 'hoofbeat': 4, 'pay homage': 3, 'help': 3, 'honk': 5, 'hooray': 4, 'scurry': 3, 'cough': 2, 'cough': 3, 'hymn': 4, 'iah': 3, 'ignore': 1, 'intoned': 4, 'isarrausch': 4, 'moan': 3, 'moaning': 3, 'moaning': 3, 'whooping': 4, 'whoop': 4, 'yowling': 4, 'yodelling': 4, 'hooting': 4, 'hoot': 4, 'cheer': 4, 'cold': 3, 'caricature': 3, 'wheeze': 4, 'wheezing': 3, 'wheezing': 3, 'giggling': 3, 'giggling': 3, 'giggling': 3, 'cock-a-doodle-doo': 4, "children's 'noise": 4, 'yapping': 4, 'complaining': 3, 'wailing': 3, 'wailing': 4, 'lament': 3, 'sound': 3, 'sounds': 3, 'sonorous': 3, 'clatter': 4, 'rattle': 4, 'clapping': 4, 'clapping': 4, 'meekly': 2, 'clicking': 2, 'ringing': 4, 'ringing': 3, 'ringing': 4, 'ringing': 3, 'ringing': 3, 'clang': 4, 'tinkling': 4, 'clanging': 4, 'knocking': 4, 'knocking': 4, 'crackling': 3, 'crack': 2, 'bang': 5, 'bang': 5, 'bang': 5, 'creaking': 4, 'creaking': 4, 'rattling': 4, 'rattling': 4, 'crunching': 2, 'crunching': 2, 'crackling': 2, 'growl': 4, 'growl': 4, 'rumble': 4, 'commanding': 4, 'command calls': 4, 'command words': 4, 'state': 3, 'head nod': 1, 'shake head': 1, 'noise': 4, 'noise': 4, 'crash': 5, 'crashing': 5, 'crashing': 5, 'crackling': 4, 'croaking': 4, 'crowing': 4, 'scratching': 3, 'screeching': 4, 'screeching': 4, 'screeched': 4, 'criticise': 3, 'cuckoo call': 3, 'kissing': 2, 'smiling': 1, 'smiles': 1, 'laugh': 4, 'laughing': 4, 'laughing': 4, 'laughing': 4, 'laughing': 4, 'loading': 3, 'babbling': 4, 'slower': 3, 'noise': 4, 'noise': 4, 'noise pollution': 4, 'noisy': 4, 'noisy': 4, 'blaspheming': 3, 'eavesdropping': 1, 'loud': 4, 'ringing': 4, 'silent': 0, 'silent': 0, 'silent': 0, 'silent': 0, 'quiet': 2, 'quieter': 2, 'reading': 3, 'reading sample': 3, 'more affable': 3, 'little song': 4, 'lispelen': 2, 'praise': 3, 'praise': 3, 'air': 1, 'admonish': 3, 'grumble': 3, 'manly steps': 3, 'bloodcurdling': 5, 'marching music': 4, 'quiet as a mouse': 0, 'grumble': 4, 'mine': 3, 'meow': 3, 'meow': 3, 'discord': 3, 'discordant tone': 3, 'discordant tone': 4, 'discordant': 3, 'discordant': 3, 'mistrust': 3, 'listening': 1, 'compassionate': 3, 'noon silence': 0, 'sharing': 3, 'monotonous': 3, 'grumble': 3, 'engine noise': 5, 'muck': 2, 'quiet as a mouse': 0, 'mumble': 2, 'murmur': 2, 'mumbling': 2, 'mumbling': 2, 'murmuring': 2, 'music': 4, 'thinking': 0, 'thoughtfully': 0, 'emphatic': 3, 'reverberating': 3, 'reverberating': 3, 'reverberation': 3, 'reverberating': 3, 'reverberation': 3, 'musing': 0, 'after-speech': 3, "night's 'rest": 1, 'night silence': 0, 'foghorn': 5, 'background noise': 3, 'background noise': 3, 'call': 3, 'sneeze': 4, 'nagging': 3, 'cry of distress': 4, 'bleak': 0, 'oede': 0, 'offer': 3, 'deafening': 5, 'slap in the face': 4, 'organ': 4, 'paddling': 3, 'cramming': 4, 'pause': 0, 'paused': 0, 'whip-like': 4, 'whiplash': 4, 'whistle': 4, 'whistling': 4, 'whistling': 4, 'whistling': 4, 'whistling': 4, 'pianissimo': 2, 'piano': 2, 'beep': 2, 'beep': 3, 'beep': 2, 'chatter': 3, 'whinge': 4, 'splashing': 3, 'splashing': 3, 'bursting': 4, 'chatting': 3, 'chatting': 3, 'plop': 4, 'throb': 4, 'thump': 1, 'rumble': 4, 'rumble': 4, 'rumbling': 4, 'present': 3, 'pattering': 3, 'pattering': 3, 'pattering': 3, 'preach': 3, 'praise': 3, 'promptly': 3, 'shush': 3, 'puff': 3, 'quack': 4, 'whine': 4, 'squeak': 4, 'squeak': 4, 'noise': 4, 'run': 4, 'rapport': 3, 'rustling': 2, 'rustling': 2, 'rattling': 3, 'rattle': 4, 'rattling': 4, 'rattling': 4, 'rattling': 4, 'restless': 3, 'rattling': 4, 'rattling': 3, 'rattling': 4, 'rough': 3, 'murmur': 3, 'murmur': 3, 'rustling': 3, 'rustling': 3, 'clearing throat': 3, 'justify': 4, 'talk': 3, 'talk': 3, 'talking': 3, 'talking': 3, 'talking': 3, 'resigning': 3, 'trickle': 2, 'rieth': 3, 'ringing noise': 4, 'rattle': 2, 'roaring': 4, 'rolling': 4, 'jerk': 4, 'call': 4, 'call': 4, 'call': 4, 'call word': 4, 'rebuke': 4, 'rest': 1, 'quiet': 1, 'disturbance': 4, 'quiet': 1, 'rumble': 4, 'shake': 3, 'say': 3, 'said': 3, 'salute': 3, 'salute': 3, 'gently': 2, 'sang': 4, 'sann': 0, 'sentence fragment': 3, 'purr': 3, 'whisper': 4, 'whizzing': 4, 'scrape': 4, 'peel': 4, 'sound': 3, 'sound': 4, 'soundproof': 1, 'resounding': 4, 'resounding': 4, 'sound wave': 4, 'sound': 4, 'foaming': 3, 'foaming': 3, 'separating': 3, 'shouting': 4, 'scolding': 4, 'scolding': 4, 'rattling': 4, 'joke': 3, 'to reproach': 3, 'shoot': 5, 'scold': 4, 'scold': 4, 'swear word': 4, 'sleep': 1, 'brawl': 4, 'slogan': 3, 'sneak': 2, 'dragging': 4, 'slept': 1, 'close': 3, 'shivering': 2, 'sob': 3, 'sobbing': 3, 'sobbed': 3, 'swallow': 1, 'shuffle': 1, 'slumber': 1, 'slumber': 1, 'shuffle': 3, 'shuffle': 3, 'smack': 3, 'smacking': 5, 'smacking': 5, 'snoring': 3, 'snoring': 3, 'snoring': 3, 'chattering': 3, 'snort': 3, 'snort': 2, 'snort': 4, 'sniff': 2, 'sniff': 2, 'purr': 2, 'shout': 4, 'scooped': 3, 'scream': 4, 'scream': 4, 'screaming': 4, 'screaming': 4, 'screaming': 4, 'screaming': 4, 'screaming': 4, 'screaming': 4, 'screaming': 4, 'shrill': 5, 'step': 2, 'steps': 2, 'shot': 5, 'pouring': 3, 'shaking': 5, 'weak': 2, 'gush': 4, 'chatter': 3, 'silence': 0, 'silent': 0, 'silent': 0, 'silent': 0, 'swelling': 3, 'silent': 0, 'buzzing': 3, 'swearing': 3, 'bless': 3, 'sigh': 3, 'sigh': 3, 'sighed': 3, 'signal': 5, 'singing': 4, 'singing': 4, 'siren': 5, 'siren songs': 4, 'buzzing': 4, 'sunday silence': 0, 'sonorous': 3, 'worry heavy': 3, 'late afternoon silence': 0, 'spectacle': 4, 'play': 4, 'mockery': 4, 'mocking': 4, 'mocking': 4, 'mocking': 4, 'language': 3, 'speak': 3, 'speech sound': 3, 'speechless': 0, 'speechlessness': 0, 'speak': 3, 'blast': 5, 'blast': 5, 'speak': 3, 'stammering': 3, 'stammering': 3, 'stammering': 4, 'dying': 2, 'stereo sound': 3, 'bumping': 4, 'silent': 0, 'silence': 0, 'silence': 0, 'silence': 0, 'silence': 0, 'voice': 3, 'voices': 3, 'babble of voices': 4, 'voiceless': 2, 'voice': 3, 'faltering': 0, 'faltering': 0, 'moaning': 4, 'moaning': 4, 'moaning': 4, 'bumping': 4, 'stuttering': 3, 'sermon': 4, 'punitive speech': 4, 'street noise': 4, 'stroking': 2, 'arguing': 4, 'mute': 0, 'muteness': 1, 'storm': 4, 'strike': 4, 'stormy': 4, 'torrent': 4, 'tumble': 4, 'humming': 2, 'hum': 2, 'buzzing': 2, 'sympathising': 3, 'rebuke': 4, 'rebuke': 3, 'rebuke': 4, 'tactless': 3, 'tactful': 3, 'dance': 4, 'dance-like': 4, 'more tactful': 3, 'patting': 2, 'swap': 3, 'divide': 3, 'ticking': 2, 'animal sound': 3, 'animal sounds': 3, 'animal voice': 3, 'toast': 4, 'romp': 4, 'raging': 4, 'deathly quiet': 0, 'deathly quiet': 0, 'deadly silence': 0, 'tone': 3, 'tones': 4, 'tones': 4, 'tone': 4, 'roar': 4, 'roar': 4, 'dead silence': 0, 'dead silence': 0, 'dead silence': 0, 'trample': 4, 'tremble': 4, 'trilling': 4, 'drumming': 4, 'drumbeat': 4, 'drum roll': 4, 'drum roll': 5, 'trumpet': 4, 'trumpet': 4, 'trumpeting': 4, 'trumpet blast': 4, 'trickle': 2, 'trubel': 4, 'trutzliedl': 4, 'tumult': 4, 'tumult': 4, 'tumultuous': 4, 'whispering': 2, 'whisper': 2, 'persuade': 3, 'drown out': 5, 'persuade': 3, 'to persuade': 2, 'persuade': 3, 'woo': 3, 'inarticulate': 2, 'unheard': 4, 'interrupt': 3, 'entertain': 3, 'light music': 4, 'teach': 3, 'judge': 3, 'say goodbye': 3, 'curse': 4, 'fade away': 2, 'behave': 0, 'negotiate': 3, 'persist': 0, 'to hush': 1, 'mock': 4, 'traffic': 3, 'traffic noise': 4, 'proclaim': 4, 'proclaim': 4, 'ridicule': 4, 'hear': 0, 'deny': 3, 'conceal': 0, 'to transfer': 3, 'assure': 3, 'mock': 4, 'promise': 3, 'communicate': 3, 'fall silent': 0, 'immersed': 0, 'defend': 4, 'denigrate': 4, 'condemn': 3, 'astonished': 3, 'imprecation': 4, 'forgiveness': 3, 'distorted': 3, 'polyphonic': 4, 'prophesying': 3, 'reproach': 4, 'read aloud': 3, 'suggest': 3, 'audition': 3, 'presentation': 3, 'lecture': 4, 'recite': 4, 'reproach': 4, 'reproach': 4, 'reproach': 4, 'reproachful': 4, 'forest peace': 0, 'woof': 4, 'woof': 4, 'crying': 3, 'crying': 3, 'weeping': 3, 'wise': 3, 'crashing waves': 4, 'raging': 4, 'rage': 4, 'echo': 3, 'echo': 3, 'refutation': 3, 'rebuttal': 4, 'contradict': 3, 'contradiction': 4, 'reluctantly': 3, 'repeat': 3, 'repeated': 3, 'repeated': 3, 'neighing': 4, 'neighing': 4, 'welcome': 3, 'whinnying': 2, 'whimpering': 2, 'whimpers': 2, 'wind noise': 3, 'gust of wind': 4, 'whining': 3, 'whirl': 3, 'whispering': 2, 'whisper': 2, 'euphony': 3, 'want': 3, 'whisper': 4, 'word': 3, 'words': 3, 'wordless': 0, 'wished': 3, 'appreciate': 3, 'angry': 4, 'quarrel': 4, 'procrastinate': 0, 'sign language': 1, 'burst': 4, 'break': 4, 'crumple': 3, 'burst': 4, 'burst': 4, 'scatter': 3, 'clamour': 4, 'clamour': 4, 'chirp': 3, 'hiss': 2, 'hiss': 2, 'hiss': 2, 'hiss': 2, 'tremble': 3, 'tremble': 1, 'hesitate': 0, 'hesitate': 0, 'hesitating': 3, 'concede': 3, 'whisper': 2, 'whisper': 2, 'concede': 3, 'concession': 3, 'concede': 3, 'listen': 1, 'to cheer': 4, 'to cheer': 4, 'to talk': 3, 'to shout': 4, 'promise': 3, 'promise': 3, 'drum up': 4, 'drum up': 4, 'assure': 3, 'agreeing': 3, 'agreeing': 3, 'to shout': 4, 'heckle': 4, 'chirp': 3, 'strike': 4, 'strike': 4, 'stamp': 4, 'began': 3, 'crackle': 2, 'beat': 4, 'cease': 1, 'ring': 3, 'strike': 4, 'stamp': 4, 'began': 3, 'crackle': 2, 'beat': 4, 'cease': 1, 'ring': 3, 'ding': 4, 'dong': 4, 'ding dong': 4, 'announcement': 3, 'knock': 3, 'breathless': 1, 'dash': 4, 'chave':4, 'chirp': 3, 'express': 3, 'tune': 3, 'bade':3, 'bubble': 3, 'cry': 4}
    
#loudness_dict_de = {'abbringen': 3, 'abgeschiedenheit': 1, 'ablehnen': 3, 'abmachen': 3, 'abmahnung': 3, 'absprechen': 3, 'aburteilen': 3, 'accorde': 4, 'ächz': 4, 'ächzen': 4, 'ächzend': 4, 'aechzen': 4, 'anbieten': 3, 'anbrüllen': 4, 'andacht': 0, 'andeuten': 3, 'andichten': 3, 'anempfehlen': 3, 'anempfehlung': 3, 'anfechten': 4, 'angeben': 3, 'anhören': 0, 'anklagen': 3, 'anlächeln': 0, 'anpöbeln': 4, 'anprangern': 4, 'anpreisen': 3, 'anraten': 3, 'anreden': 3, 'anrufen': 4, 'anschreien': 4, 'anschwärzen': 3, 'ansprechen': 3, 'antworteen': 3, 'antworten': 3, 'anweisung': 3, 'applaudieren': 4, 'applaus': 4, 'atem': 2, 'atemholen': 1, 'atemzug': 1, 'atemzüge': 1, 'atemzügen': 1, 'athem': 1, 'athemholen': 1, 'athemzug': 1, 'athmen': 1, 'athmend': 1, 'athmet': 1, 'atme': 1, 'atmeen': 1, 'atmen': 1, 'atmend': 1, 'atmeter': 1, 'aufathmen': 1, 'aufathmend': 1, 'aufatmen': 1, 'aufatmend': 1, 'auffordern': 3, 'aufhorchen': 1, 'aufruhr': 4, 'aufschluchzen': 2, 'aufschreien': 4, 'aufseufzen': 3, 'aufseufzend': 3, 'aufspiele': 3, 'aufstassen': 4, 'aufstöhnend': 3, 'auftrag': 3, 'auftreten': 3, 'aufwartung': 3, 'aufzuatmen': 1, 'auseinandersetzung': 4, 'auslachen': 4, 'auspacken': 3, 'auspfeifen': 4, 'ausrufen': 4, 'ausschreit': 4, 'äußern': 3, 'aussprechen': 3, 'ausstoßen': 4, 'ausstoßend': 3, 'auszustoßen': 4, 'axthieb': 5, 'bahnhofslärm': 4, 'ballern': 4, 'barsch': 4, 'baßtrompete': 4, 'baulärm': 5, 'beachten': 3, 'beanstanden': 3, 'beantworten': 3, 'bedauern': 3, 'befrage': 3, 'befriedigen': 3, 'beglückwünschen': 3, 'begrüßen': 3, 'begrüßungsformalität': 3, 'behaupten': 3, 'beherrschen': 3, 'beichten': 3, 'beifall': 4, 'beifallsäußerung': 4, 'beipflichten': 3, 'bejahen': 3, 'bejahend': 3, 'bejubeln': 4, 'bekenntniß': 3, 'beklagen': 3, 'beklatschen': 4, 'beklommen': 0, 'bekräftigen': 4, 'belächeln': 1, 'belauschen': 1, 'belehren': 3, 'bellen': 4, 'bellend': 4, 'bemäkeln': 3, 'bemerken': 3, 'beraten': 3, 'bereden': 3, 'berichten': 3, 'berichtigen': 3, 'bersten': 4, 'beruhigen': 3, 'beschämt': 0, 'bescheidener': 3, 'beschimpfen': 4, 'beschimpft': 4, 'beschuldigen': 4, 'beschuss': 5, 'beschwatzen': 3, 'beschwichtigen': 3, 'beschwören': 3, 'besingen': 4, 'besprechen': 3, 'bestätigen': 3, 'bestimmen': 3, 'bestreiten': 4, 'bestürzt': 3, 'beteuern': 3, 'betonen': 4, 'bewilligen': 3, 'bezichtigen': 4, 'bitter': 4, 'bitten': 3, 'blasen': 4, 'blasend': 4, 'blasmusik': 4, 'blöken': 4, 'blökend': 4, 'bloßstellen': 3, 'bombe': 5, 'bösartigen': 4, 'brandung': 4, 'brausen': 4, 'brauste': 4, 'brise': 2, 'brüllen': 4, 'brüllend': 4, 'brülln': 4, 'brüllte': 4, 'brummen': 2, 'brüskieren': 4, 'buchstabierte': 3, 'bumm': 5, 'chor': 4, 'dämpfen': 2, 'dampfpfeife': 5, 'danken': 3, 'darbracht': 3, 'declamirende': 4, 'denunzieren': 3, 'detonation': 5, 'diffamieren': 4, 'diskret': 2, 'diskutieren': 4, 'donner': 5, 'donnerklang': 5, 'donnern': 5, 'donnernd': 5, 'donnerschlag': 5, 'drängelen': 4, 'drehen': 3, 'dröhnen': 5, 'dröhnend': 5, 'druckwelle': 5, 'dumpf': 2, 'dumpfen': 1, 'durchbrechen': 3, 'echo': 3, 'echot': 4, 'einatmend': 1, 'eindreangen': 3, 'einflüstern': 2, 'eingeschlafen': 1, 'einholen': 3, 'einladen': 3, 'einlullende': 2, 'einräumen': 3, 'einsam': 0, 'einschlaufen': 1, 'einschlürfend': 2, 'einsilbig': 3, 'einspruch': 4, 'einstimmen': 3, 'einstürzend': 5, 'eintönig': 3, 'einwand': 4, 'einwendung': 4, 'einwilligen': 3, 'einwilligung': 3, 'einwurf': 4, 'empfehlen': 3, 'entfuhr': 3, 'entgegenträumend': 0, 'entgegnen': 3, 'entschuldigend': 3, 'erdröhnen': 5, 'erfahren': 3, 'ergänzen': 3, 'erhobener': 4, 'erklären': 3, 'erklingen': 3, 'erlassen': 3, 'erlauben': 3, 'ermahnung': 4, 'erniedrigen': 4, 'ernst': 3, 'erröten': 1, 'erröthen': 1, 'erschallen': 4, 'erscholl': 4, 'ersticken': 2, 'erstickt': 2, 'ersuchen': 3, 'ertönen': 3, 'erwideren': 3, 'erwidern': 3, 'erwiederen': 3, 'erwiedern': 3, 'erzählen': 3, 'erzählstn': 3, 'erzählung': 3, 'exkommunizieren': 3, 'explodieren': 5, 'explosion': 5, 'fanfare': 4, 'fassen': 3, 'fauchen': 4, 'feiern': 4, 'feuern': 5, 'fiepen': 4, 'flatternd': 3, 'flehen': 3, 'flennen': 4, 'fließen': 3, 'floss': 3, 'flöten': 4, 'fluchen': 4, 'fluchn': 4, 'fluglärm': 5, 'flüsteren': 2, 'flüstern': 2, 'flüsternd': 2, 'flüstert': 2, 'flüsterte': 2, 'föppelt': 3, 'forte': 4, 'fortfahren': 3, 'fortfuhr': 3, 'fortissimo': 5, 'fragen': 3, 'fragt': 3, 'freundlicher': 3, 'friede': 0, 'friedhofsruhe': 0, 'frohlocken': 3, 'frug': 3, 'fügen': 3, 'fußtritt': 3, 'gackern': 4, 'gähnen': 1, 'galoppiern': 4, 'gardinenpredigt': 4, 'geäußert': 3, 'gebärde': 1, 'gebärden': 1, 'gebärdensprache': 1, 'gebell': 4, 'gebet': 0, 'gebetsglocke': 4, 'gebrüll': 4, 'gedämpft': 2, 'gedröhne': 5, 'gedudel': 4, 'geflüstert': 2, 'gegenrede': 4, 'gegenstimme': 4, 'geheim': 2, 'geheul': 4, 'gehupe': 5, 'gejammer': 4, 'gejohle': 4, 'geklapper': 4, 'geklirr': 4, 'geknatter': 4, 'gekrache': 5, 'gelächter': 4, 'geläute': 4, 'gellen': 4, 'gellend': 4, 'gellt': 4, 'geloben': 3, 'gemurmel': 2, 'genehmigen': 3, 'geplapper': 4, 'geplätscher': 4, 'gepolter': 5, 'geprassel': 4, 'gerassel': 4, 'geräusch': 3, 'geräusche': 3, 'geräuschlos': 1, 'geräuschvoll': 4, 'gesang': 4, 'geschnatter': 4, 'geschrei': 4, 'geschrieen': 4, 'geschrien': 4, 'gespräch': 3, 'gesprächig': 3, 'gesungene': 4, 'getadelt': 4, 'getöse': 4, 'gewitter': 4, 'gezeter': 4, 'glockenläuten': 5, 'glockenton': 4, 'glucksen': 2, 'grabesstille': 0, 'granate': 5, 'gratulieren': 3, 'gröhlen': 4, 'grollen': 4, 'grübelen': 0, 'grunzen': 4, 'gruß': 3, 'grüßen': 3, 'gurgeln': 3, 'gurgelnd': 3, 'gurgelton': 2, 'gurren': 3, 'halblaut': 3, 'halbleisen': 2, 'halbleise': 2, 'hall': 4, 'hallen': 4, 'hämmer': 5, 'hämmern': 5, 'hammerschlag': 5, 'händeklatschen': 4, 'handschlag': 3, 'hauchen': 1, 'hauen': 4, 'hehehehehemeh': 4, 'heiben': 4, 'heiser': 2, 'heiseren': 2, 'hellhörig': 1, 'herauslach': 4, 'hervorheben': 4, 'hervorrufen': 4, 'herzschlag': 1, 'hetzen': 4, 'heulen': 4, 'heulend': 4, 'heuln': 4, 'heulte': 4, 'hieb': 4, 'hilferuf': 4, 'hinzufügen': 3, 'hinzusetzen': 3, 'ho-ho': 4, 'höhnen': 4, 'höhnisch': 4, 'höhnisches': 4, 'höllenlärm': 5, 'hörbar': 2, 'horchen': 1, 'hören': 1, 'hörte': 1, 'hufe': 4, 'hufschlag': 4, 'huldigen': 3, 'hülfe': 3, 'hupen': 5, 'hurra': 4, 'huschen': 3, 'hüstelen': 2, 'husten': 3, 'hymne': 4, 'iah': 3, 'ignorieren': 1, 'inbrünstig': 4, 'intoniert': 4, 'isarrausch': 4, 'jammeren': 3, 'jammern': 3, 'jammernd': 3, 'jauchzen': 4, 'jauchzer': 4, 'jaulen': 4, 'jodeln': 4, 'johlen': 4, 'johlten': 4, 'jubeln': 4, 'kalt': 3, 'karikieren': 3, 'keifen': 4, 'keuchen': 3, 'keuchend': 3, 'kicheren': 3, 'kichern': 3, 'kichernd': 3, 'kikeriki': 4, 'kinderlärm': 4, 'kläffen': 4, 'klage': 3, 'klageenden': 3, 'klagelaut': 4, 'klagen': 3, 'klang': 3, 'klänge': 3, 'klangvoll': 3, 'klappern': 4, 'klatschen': 4, 'klatschend': 4, 'kleinlaut': 2, 'klicken': 2, 'klingeln': 4, 'klingelzeichen': 3, 'klingen': 4, 'klingend': 3, 'klingender': 3, 'klirre': 4, 'klirren': 4, 'klirrend': 4, 'klopfen': 4, 'klopfte': 4, 'knabenstimm': 3, 'knacken': 2, 'knall': 5, 'knallen': 5, 'knarren': 4, 'knarrend': 4, 'knatteren': 4, 'knattern': 4, 'knirschen': 2, 'knirschend': 2, 'knistern': 2, 'knurren': 4, 'knurrn': 4, 'kollern': 4, 'kommandieren': 4, 'kommandorufe': 4, 'kommandoworte': 4, 'konstatieren': 3, 'kopfnicken': 1, 'kopfschütteln': 1, 'krach': 4, 'krachen': 5, 'krachend': 5, 'krächzen': 4, 'krächzend': 4, 'krähen': 4, 'kratzen': 3, 'kreischen': 4, 'kreischend': 4, 'kreischte': 4, 'kritisieren': 3, 'kuckucksruf': 3, 'küßen': 2, 'lächelen': 1, 'lächelt': 1, 'lachen': 4, 'lachend': 4, 'lacht': 4, 'lachten': 4, 'laden': 3, 'lallen': 4, 'langsamer': 3, 'lärm': 4, 'lärmbelästigung': 4, 'lärmen': 4, 'lärmend': 4, 'lästern': 3, 'lauschen': 1, 'laut': 4, 'läuten': 4, 'lautlos': 0, 'lautlose': 0, 'lautlosigkeit': 0, 'leise': 2, 'leiser': 2, 'lesen': 3, 'leseprobe': 3, 'leutseliger': 3, 'liedchen': 4, 'lispelen': 2, 'loben': 3, 'lobpreisen': 3, 'luftschöpfen': 1, 'mahnen': 3, 'mäkeln': 3, 'männerschritte': 3, 'markerschütternd': 5, 'marschmusik': 4, 'mäuschenstill': 0, 'meckern': 4, 'meinen': 3, 'miau': 3, 'miauen': 3, 'missklang': 3, 'misston': 3, 'mißton': 4, 'misstönen': 3, 'misstönend': 3, 'mißtrauen': 3, 'mithören': 1, 'mitleidigen': 3, 'mittagsstille': 0, 'mittheilen': 3, 'monoton': 3, 'mosern': 3, 'motorenlärm': 5, 'mucks': 2, 'mucksmäuschenstill': 0, 'murmelen': 2, 'murmeln': 2, 'murmelnd': 2, 'murren': 2, 'musik': 4, 'nachdenken': 0, 'nachdenklich': 0, 'nachdrücklich': 3, 'nachhall': 3, 'nachhallend': 3, 'nachklang': 3, 'nachklingen': 3, 'nachrede': 3, 'nachsinnen': 0, 'nachsprach': 3, 'nachtruhe': 1, 'nachtstille': 0, 'nebelhorn': 5, 'nebengeräusch': 3, 'nebengeräusche': 3, 'nennen': 3, 'niesen': 4, 'nörgeln': 3, 'notschrei': 4, 'öd': 0, 'oede': 0, 'offerieren': 3, 'ohrenbetäubend': 5, 'ohrfeige': 4, 'orgeln': 4, 'paddelnd': 3, 'pauken': 4, 'pause': 0, 'pausierte': 0, 'peitschenartig': 4, 'peitschenhieb': 4, 'pfeifen': 4, 'pfeifend': 4, 'pfeifkonzert': 4, 'pfiffen': 4, 'pianissimo': 2, 'piano': 2, 'piep': 2, 'piepen': 3, 'piepsen': 2, 'plappern': 3, 'plärren': 4, 'plätschern': 3, 'plätschernd': 3, 'platzen': 4, 'plauderen': 3, 'plaudern': 3, 'plumps': 4, 'pochen': 2, 'polteren': 4, 'poltern': 4, 'polternd': 4, 'präsentieren': 3, 'prasselen': 3, 'prasseln': 3, 'prasselnd': 3, 'predigen': 3, 'preisen': 3, 'prompt': 3, 'pst!': 3, 'puff': 3, 'quaken': 4, 'quengeln': 4, 'quieken': 4, 'quietschen': 4, 'radau': 4, 'rannen': 4, 'rapportieren': 3, 'rascheln': 2, 'raschelnd': 2, 'räsonnieren': 3, 'rassel': 4, 'rasseln': 4, 'rasselnd': 4, 'rasselten': 4, 'rastlos': 3, 'ratschen': 4, 'ratschlag': 3, 'rattern': 4, 'rauh': 3, 'raunen': 3, 'rauschen': 3, 'rauschend': 3, 'räusperen': 3, 'rechtfertigen': 4, 'rede': 3, 'reden': 3, 'redend': 3, 'redestrom': 3, 'resignieren': 3, 'rieseln': 2, 'rieth': 3, 'ringgeräusch': 4, 'röcheln': 2, 'röhren': 4, 'rolln': 4, 'ruck': 4, 'ruf': 4, 'rufen': 4, 'rufenwort': 4, 'rügen': 4, 'ruhe': 1, 'ruhestörung': 4, 'ruhig': 1, 'rumpeln': 4, 'rütteln': 3, 'sagen': 3, 'sagte': 3, 'sagts': 3, 'salutieren': 3, 'salve': 3, 'sanft': 2, 'sang': 4, 'sann': 0, 'satzfragment': 3, 'säuseln': 3, 'sausen': 4, 'sausend': 4, 'schaben': 4, 'schalen': 4, 'schall': 4, 'schalldicht': 1, 'schallen': 4, 'schallend': 4, 'schallwelle': 4, 'schalt': 4, 'schäumen': 3, 'schäumend': 3, 'scheiden': 3, 'schellen': 4, 'schelte': 4, 'schelten': 4, 'schepperen': 4, 'scherzen': 3, 'schied': 3, 'schießen': 5, 'schimpfen': 4, 'schimpfirt': 4, 'schimpfwort': 4, 'schlafe': 1, 'schlägerei': 4, 'schlagwort': 3, 'schleichen': 2, 'schleifen': 4, 'schlief': 1, 'schließen': 3, 'schlotternde': 2, 'schluchze': 3, 'schluchzen': 3, 'schluchzte': 3, 'schlucken': 1, 'schlufen': 1, 'schlummer': 1, 'schlummern': 1, 'schlurren': 3, 'schlurrte': 3, 'schmatzen': 3, 'schmettern': 5, 'schmetternd': 5, 'schnalzen': 3, 'schnarchen': 3, 'schnarren': 3, 'schnattern': 3, 'schnauben': 3, 'schnaufen': 2, 'schnauzen': 4, 'schnüffeln': 2, 'schnuppern': 2, 'schnurren': 2, 'scholl': 4, 'schöpfte': 3, 'schrei': 4, 'schreien': 4, 'schreiend': 4, 'schreit': 4, 'schri': 4, 'schrie': 4, 'schrieen': 4, 'schrien': 4, 'schrill': 5, 'schritt': 2, 'schritte': 2, 'schuss': 5, 'schütten': 3, 'schütternd': 5, 'schwach': 2, 'schwall': 4, 'schwatzen': 3, 'schweigen': 0, 'schweigend': 0, 'schweigsam': 0, 'schwellend': 3, 'schwiegen': 0, 'schwirren': 3, 'schwören': 3, 'segnen': 3, 'seufzen': 3, 'seufzer': 3, 'seufzte': 3, 'signal': 5, 'singen': 4, 'singend': 4, 'sirene': 5, 'sirenengesänge': 4, 'sirren': 4, 'sonntagsstille': 0, 'sonor': 3, 'sorgenschwer': 3, 'spätnachmittagsstille': 0, 'spektakel': 4, 'spiel': 4, 'spott': 4, 'spotten': 4, 'spöttisch': 4, 'spöttischer': 4, 'sprache': 3, 'sprächen': 3, 'sprachklang': 3, 'sprachlos': 0, 'sprachlosigkeit': 0, 'sprechen': 3, 'sprengen': 5, 'sprengung': 5, 'sprichen': 3, 'stammeln': 3, 'stammelnd': 3, 'stampfen': 4, 'sterbend': 2, 'stereoton': 3, 'stieß': 4, 'still': 0, 'stille': 0, 'stillschweigen': 0, 'stimme': 3, 'stimmen': 3, 'stimmengewirr': 4, 'stimmlos': 2, 'stimmung': 3, 'stocken': 0, 'stockend': 0, 'stöhnen': 4, 'stöhnend': 4, 'stöhnte': 4, 'stoßen': 4, 'stotteren': 3, 'strafpredigt': 4, 'strafrede': 4, 'straßenlärm': 4, 'streichelen': 2, 'streiten': 4, 'stumm': 0, 'stummheit': 1, 'sturm': 4, 'sturmgeläute': 4, 'stürmisch': 4, 'sturzbach': 4, 'stürzen': 4, 'summen': 2, 'summn': 2, 'surren': 2, 'sympathisierend': 3, 'tadel': 4, 'tadelen': 3, 'tadeln': 4, 'taktlos': 3, 'taktmäßig': 3, 'tamtam': 4, 'tanzweisen': 4, 'täppischer': 3, 'tätschelen': 2, 'tauschen': 3, 'theilen': 3, 'ticken': 2, 'tierlaut': 3, 'tierlaute': 3, 'tierstimme': 3, 'toast': 4, 'toben': 4, 'tobend': 4, 'todesruhe': 0, 'todtenstill': 0, 'todtenstille': 0, 'tone': 3, 'töne': 4, 'tönen': 4, 'tosen': 4, 'tost': 4, 'totenstill': 0, 'totenstille': 0, 'trampeln': 4, 'tremolieren': 4, 'trillern': 4, 'trommeln': 4, 'trommelschlag': 4, 'trommelwirbel': 4, 'trompet': 4, 'trompeten': 4, 'trompetend': 4, 'trompetenstoß': 4, 'tröpfeln': 2, 'trubel': 4, 'trutzliedl': 4, 'tumult': 4, 'tumultuös': 4, 'tuschelen': 2, 'tuscheln': 2, 'überreden': 3, 'übertönen': 5, 'überzeugen': 3, 'umschlich': 2, 'umstimmen': 3, 'umwerben': 3, 'unartikuliert': 2, 'unerhört': 4, 'unterbrechen': 3, 'unterhalten': 3, 'unterhaltungsmusik': 4, 'unterrichten': 3, 'urteilen': 3, 'verabschieden': 3, 'verfluchen': 4, 'verhallen': 2, 'verhalten': 0, 'verhandeln': 3, 'verharren': 0, 'verhauchen': 1, 'verhöhnen': 4, 'verkehren': 3, 'verkehrslärm': 4, 'verkündet': 4, 'verkündigen': 4, 'verlachen': 4, 'vernehmen': 0, 'verneinen': 3, 'verschweigen': 0, 'versetzen': 3, 'versichern': 3, 'verspotten': 4, 'versprechen': 3, 'verständigen': 3, 'verstummen': 0, 'versunken': 0, 'verteidigen': 4, 'verunglimpfen': 4, 'verurteilen': 3, 'verwundertem': 3, 'verwünschung': 4, 'verzeihung': 3, 'verzerrt': 3, 'vielstimmig': 4, 'vorgesagen': 3, 'vorhalten': 4, 'vorlesen': 3, 'vorschlagen': 3, 'vorsprechen': 3, 'vorstellung': 3, 'vortrag': 4, 'vortragen': 4, 'vorwerfen': 4, 'vorwurf': 4, 'vorwürfe': 4, 'vorwurfsvoll': 4, 'waldesfrieden': 0, 'wau': 4, 'wauwau': 4, 'weinen': 3, 'weinend': 3, 'weinte': 3, 'weisen': 3, 'wellenschlag': 4, 'wettern': 4, 'wetzen': 4, 'widerhall': 3, 'widerhallen': 3, 'widerlegung': 3, 'widerrede': 4, 'widersprechen': 3, 'widerspruch': 4, 'widerwillig': 3, 'wiederholen': 3, 'wiederholt': 3, 'wiederholte': 3, 'wiehern': 4, 'wiehernd': 4, 'willkommen': 3, 'wimmeren': 2, 'wimmern': 2, 'wimmert': 2, 'windgeräusche': 3, 'windstoß': 4, 'winseln': 3, 'wirbeln': 3, 'wisperen': 2, 'wispern': 2, 'wohlklang': 3, 'wollen': 3, 'worgeln': 4, 'wort': 3, 'worte': 3, 'wortlos': 0, 'wünschten': 3, 'würdigen': 3, 'wütend': 4, 'zanken': 4, 'zauderen': 0, 'zeichensprache': 1, 'zerbersten': 4, 'zerbrechen': 4, 'zerknäulten': 3, 'zerplatzen': 4, 'zerspringen': 4, 'zerstreuen': 3, 'zetern': 4, 'zeterte': 4, 'zirpen': 3, 'zischelen': 2, 'zischeln': 2, 'zischen': 2, 'zitteren': 3, 'zittern': 1, 'zögeren': 0, 'zögern': 0, 'zögernd': 3, 'zubilligen': 3, 'zuflüsteren': 2, 'zuflüstern': 2, 'zugeben': 3, 'zugeständnis': 3, 'zugestehen': 3, 'zuhören': 1, 'zujubeln': 4, 'zuprosten': 4, 'zureden': 3, 'zurufen': 4, 'zusage': 3, 'zusagen': 3, 'zusammenstauchen': 4, 'zusammentrommeln': 4, 'zusichern': 3, 'zustimmen': 3, 'zustimmend': 3, 'zuzurufen': 4, 'zwischenruf': 4, 'zwitschern': 3,
#}



The following function iterates over the data frame column with the lemmatized sound event spans, looking for matching sound words from the key-value pairs in the loudness dictionary. 
A new column is defined in which the found sound words are saved.

In [205]:
# Function to find words in a text that are keys in the sound dictionary
def find_sound_words(text):
    sound_words = []
    for word in text.split():
        if word in loudness_dict:
            sound_words.append(word)
    return sound_words

# Add a column with the list of found sound words for each lemmatized sound_span
final_df['found_sound_words'] = final_df['lemmatized_sound_span'].apply(find_sound_words)

In [207]:
final_df[:50]

Unnamed: 0,filename,sound_span,annotation_class,lemmatized_sound_span,found_sound_words
0,CC_anno_man.xml,he could hear the people in the court outside ...,ambient_sound,he could hear the people in the court outside ...,"[hear, wheeze]"
1,CC_anno_man.xml,beating their hands upon their breasts,ambient_sound,beat their hand upon their breast,[beat]
2,CC_anno_man.xml,stamping their feet upon the pavement stones t...,ambient_sound,stamp their foot upon the pavement stone to wa...,[stamp]
3,CC_anno_man.xml,The City clocks had only just gone three,ambient_sound,the city clock have only just go three,[]
4,CC_anno_man.xml,struck the hours and quarters in the clouds wi...,ambient_sound,strike the hour and quarter in the cloud with ...,[strike]
5,CC_anno_man.xml,berries crackled in the lamp heat of the windows,ambient_sound,berry crackle in the lamp heat of the window,[crackle]
6,CC_anno_man.xml,closed it with a bang,ambient_sound,close it with a bang,"[close, bang]"
7,CC_anno_man.xml,The sound resounded through the house like thu...,ambient_sound,the sound resound through the house like thunder,"[sound, resound, thunder]"
8,CC_anno_man.xml,Every room above and every cask in the wine me...,ambient_sound,every room above and every cask in the wine me...,[echo]
9,CC_anno_man.xml,it scarcely made a sound,ambient_sound,it scarcely make a sound,[sound]


The following code counts the empty cells in the column "found_sound_words" to verify how many sound events will not receive a loudness level label because they do not provide a sound word that is part of the loudness level dictionary.

In [208]:
# Count empty and non-empty lists in the 'found_sound_words' column
empty_list_count = final_df['found_sound_words'].apply(lambda x: len(x) == 0).sum()
non_empty_list_count = final_df['found_sound_words'].apply(lambda x: len(x) > 0).sum()

print("Number of empty lists in 'found_sound_words' column:", empty_list_count)
print("Number of non-empty lists in 'found_sound_words' column:", non_empty_list_count)


Number of empty lists in 'found_sound_words' column: 1149
Number of non-empty lists in 'found_sound_words' column: 1673


With the counter, one can verify how many cells remain empty, meaning that the sound event spand did not match with any loudness level dictionary key. 
On the one hand it could be the case that an automatically annotated sound event span maybe actually wasn't a sound event, 
on the other hand, it could be that the sound word from the sound event is not provided in the loudness level dictionary, because it is an unusual sound word or because the lemmatization did not work on a uncommon spelling.
A last reasong could be that the sound event does not provide a sound word because it is a sound metaphor or only an indicated perception of an indirectly indicated sound event.

The following code matches the found sound words with their loudness levels providing lists of values in a new column called "listed_loudness_values". Like this every matched sound word's loudness level gets listed to be taken into account for an average calculation.

In [209]:
# Define a function to map sound words to their loudness levels
def map_to_loudness(sound_words):
    return [loudness_dict[word] for word in sound_words if word in loudness_dict]

# Apply the function to the 'found_sound_words' column and create the new column 'listed_loudness_values'
final_df['listed_loudness_values'] = final_df['found_sound_words'].apply(map_to_loudness)

# Print the DataFrame with the new column
print(final_df)

             filename                                         sound_span  \
0     CC_anno_man.xml  he could hear the people in the court outside ...   
1     CC_anno_man.xml             beating their hands upon their breasts   
2     CC_anno_man.xml  stamping their feet upon the pavement stones t...   
3     CC_anno_man.xml           The City clocks had only just gone three   
4     CC_anno_man.xml  struck the hours and quarters in the clouds wi...   
...               ...                                                ...   
2817  ED_enriched.xml                                  says Mr. Datchery   
2818  ED_enriched.xml     He sighs over the contemplation of its poverty   
2819  ED_enriched.xml                                       he concludes   
2820  ED_enriched.xml                                          he chants   
2821  ED_enriched.xml                                              sings   

     annotation_class                              lemmatized_sound_span  \
0       amb

The following code adds a new column called "average_loudness_value" to the data frame. The function calculate_average_loudness calculates the average of the listed loudness values for each row, ignoring any empty cells. 


In [210]:
# Define a function to calculate the average loudness value excluding NAN and silence 0
def calculate_average_loudness(listed_loudness_values):
    # Exclude 0 values from the list
    listed_loudness_values = [val for val in listed_loudness_values if val != 0]
    
    # Calculate the average of the filtered values
    if len(listed_loudness_values) > 0:
        return round(sum(listed_loudness_values) / len(listed_loudness_values), 1)
    else:
        return None

# Apply the function to the 'listed_loudness_values' column and create the new column 'average_loudness_value'
final_df['average_loudness_value'] = final_df['listed_loudness_values'].apply(calculate_average_loudness)

# Print the dictionary
print(final_df)


             filename                                         sound_span  \
0     CC_anno_man.xml  he could hear the people in the court outside ...   
1     CC_anno_man.xml             beating their hands upon their breasts   
2     CC_anno_man.xml  stamping their feet upon the pavement stones t...   
3     CC_anno_man.xml           The City clocks had only just gone three   
4     CC_anno_man.xml  struck the hours and quarters in the clouds wi...   
...               ...                                                ...   
2817  ED_enriched.xml                                  says Mr. Datchery   
2818  ED_enriched.xml     He sighs over the contemplation of its poverty   
2819  ED_enriched.xml                                       he concludes   
2820  ED_enriched.xml                                          he chants   
2821  ED_enriched.xml                                              sings   

     annotation_class                              lemmatized_sound_span  \
0       amb

## Preparation of the matched sound events with their average loudness levels 

The following code prepares the matched sound events with their average loudness levels as a dictionary for the enrichment the XML elements of the revised files from the source folder with loudness level attributes.

In [211]:
# Filter out rows where "average_loudness_value" is NaN
filtered_df = final_df.dropna(subset=['average_loudness_value'])

# Extracting "sound_span" and "average_loudness_value" columns as a dictionary
sound_loudness_dict = filtered_df.set_index('sound_span')['average_loudness_value'].to_dict()

# Print the dictionary
print(sound_loudness_dict)



{'he could hear the people in the court outside go wheezing up and down': 4.0, 'beating their hands upon their breasts': 4.0, 'stamping their feet upon the pavement stones to warm them': 4.0, 'struck the hours and quarters in the clouds with tremulous vibrations afterwards': 4.0, 'berries crackled in the lamp heat of the windows': 2.0, 'closed it with a bang': 4.0, 'The sound resounded through the house like thunder': 4.0, "Every room above and every cask in the wine merchant's cellars below appeared to have a separate peal of echoes of its own": 3.0, 'it scarcely made a sound': 4.0, 'soon it rang out loudly': 3.0, 'The bells ceased': 1.0, 'They were succeeded by a clanking noise deep down below': 4.0, 'The cellar door flew open with a booming sound': 4.0, 'then he heard the noise much louder on the floors below': 4.0, 'shook its chain with such a dismal and appalling noise': 3.5, 'he became sensible of confused noises in the air': 2.5, 'incoherent sounds of lamentation and regret': 3.

To add the defined average loudness value as a loudness attribute to the XML element surrounding the sound event span in the XML file, you'll need to parse the XML file, locate the relevant element, and add the attribute with the calculated average_loudness_value as its value.
The following code will update the XML file with the calculated loudness attribute value for the relevant XML element. Make sure to run this code for each XML file in your corpus folder and replace the loudness value with the calculated average loudness value for each file.
In the following code:
The regex pattern now has two capturing groups: one for the opening tag of the XML element (<(?:ambient|character)_sound>\s*) and one for the content between the opening tag and the closing tag ({re.escape(sound_span)}\s*<).
The replacement string uses the first capturing group (\1) to preserve the opening tag, extends it with the loudness attribute, and uses the second capturing group (\2) to preserve the content of the xml element closed by the < beginning of the closing element.

In [212]:
import os
import re

def process_xml_files(xml_folder, sound_loudness_dict):
    for filename in os.listdir(xml_folder):
        if filename.endswith('.xml'):
            xml_file_path = os.path.join(xml_folder, filename)

            with open(xml_file_path, 'r', encoding='utf-8') as file:
                xml_content = file.read()

            for sound_span, loudness_value in sound_loudness_dict.items():
                # Updated regex: Match opening tag with optional whitespace and allow attributes
                pattern = fr'(<(ambient|character)_sound)(\s*[^>]*)>(\s*{re.escape(sound_span)}\s*<)'
                # Group breakdown:
                # \1 = opening tag name and start
                # \3 = any existing attributes
                # \4 = content starting with sound_span and ending at next tag
                replacement = fr'\1\3 loudness="{loudness_value}">\4'

                xml_content = re.sub(pattern, replacement, xml_content)

            with open(xml_file_path, 'w', encoding='utf-8') as file:
                file.write(xml_content)

# Call the function to process XML files in the folder with the sound_loudness_dict
process_xml_files(folder_path, sound_loudness_dict)


In [213]:
print("The automated loudness level labeling is finished.")

The automated loudness level labeling is finished.
