<a href="https://colab.research.google.com/github/HazelvdW/context-framed-listening/blob/main/NLP_framed_listening.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NLP analysis for Framed Listening study.
> Authored by **Hazel A. van der Walle** (PhD student, Music, Durham University), September 2025.

All datasets generated and used for this study are openly available on GitHub https://github.com/HazelvdW/context-framed-listening.

The cleaned raw data (processed in R) are used in this notebook, so let's clone necessary files and directories:

In [None]:
!git clone https://github.com/HazelvdW/context-framed-listening.git

Cloning into 'context-framed-listening'...
remote: Enumerating objects: 14, done.[K
remote: Counting objects: 100% (14/14), done.[K
remote: Compressing objects: 100% (10/10), done.[K
remote: Total 14 (delta 4), reused 10 (delta 3), pack-reused 0 (from 0)[K
Receiving objects: 100% (14/14), 214.15 KiB | 1.52 MiB/s, done.
Resolving deltas: 100% (4/4), done.


You should now have a file called "context-framed-listening" in this notebook. Check this out by clicking on the folder icon on the lefthand side panel in this webpage.

For this NLP analysis, we are only working from the file **"data_study1_MAIN.csv"** which contains participants qualitative thought descriptions.


---

Start by importing the necessary packages in the cell below:

In [None]:
# importing standard Python libraries
import os
import csv
import pandas as pd
import numpy as np

Load in the data .csv file:

In [None]:
data = pd.read_csv("/content/context-framed-listening/data_study1_MAIN.csv")

This dataset contains every participant's response to all 16 clip-context stimuli pairings.

_For the purposes of this analysis_ we are only interested in trials where music-evoked thoughts (METs) were expereienced and described – this is all rows where "descr_THOUGHT.text" is _not_ NA.

Let's create a new dataset that only contains trials with METs:

In [None]:
dataMET = data[data['descr_THOUGHT.text'].notna()].copy()

Familiarise yourself with the data structure by taking a quick look through before we dig into any analyses.

In [None]:
display(dataMET)

# print out all column headers so we have a quick copy for later reference
print(dataMET.columns)

Unnamed: 0,clip_name,context_word,expName,PROLIFIC_PID,File_ID,date,response_thought_or_not.keys,descr_THOUGHT.text,rating_music_prompted.response,rating_spontaneity.response,...,demographics.livingCountry,demographics.birthCountry,demographics.nativeLanguage,demographics.otherLanguage,demographics.otherLanguageText,demographics.hearingImpariments,demographics.hearingImpairmentsText,demographics.education,demographics.musicianIdentification,demographics.feedback
0,80s_LOW_02_Breaking_Away.mp3,bar,clip_context_g1,5eff5f05b92981000a2aed73,clip_context_g1_5eff5f05b92981000a2aed73_02059...,2025-07-01_10h45.13.126,y,"kind of sad, melancholy. not happy or upbeat. ...",5.0,4.0,...,United Kingdom,United Kingdom,English,False,,False,,5,2,
1,Jazz_MED_07_Turiya_and_Ramakrishna.mp3,video game,clip_context_g1,5eff5f05b92981000a2aed73,clip_context_g1_5eff5f05b92981000a2aed73_02059...,2025-07-01_10h45.13.126,y,it did not sound like a video game. if anythin...,5.0,5.0,...,United Kingdom,United Kingdom,English,False,,False,,5,2,
2,80s_MED_08_After_Tonight.mp3,video game,clip_context_g1,5eff5f05b92981000a2aed73,clip_context_g1_5eff5f05b92981000a2aed73_02059...,2025-07-01_10h45.13.126,y,"overly upbeat. no real emotions, peppy. too mu...",5.0,4.0,...,United Kingdom,United Kingdom,English,False,,False,,5,2,
3,Metal_LOW_09_Darkside.mp3,concert,clip_context_g1,5eff5f05b92981000a2aed73,clip_context_g1_5eff5f05b92981000a2aed73_02059...,2025-07-01_10h45.13.126,y,"very heavy rock, not for me. somewhere that i ...",5.0,5.0,...,United Kingdom,United Kingdom,English,False,,False,,5,2,
4,Metal_MED_20_Welcome_to_the_Family.mp3,video game,clip_context_g1,5eff5f05b92981000a2aed73,clip_context_g1_5eff5f05b92981000a2aed73_02059...,2025-07-01_10h45.13.126,y,"very charged, maybe you've won something or wo...",5.0,4.0,...,United Kingdom,United Kingdom,English,False,,False,,5,2,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2554,Metal_MED_20_Welcome_to_the_Family.mp3,concert,clip_context_g4,6824fc226d9b4777f8695cf0,clip_context_g4_PROLIFIC_PID_992291.csv,2025-07-01_06h12.29.151,y,A rock band made up of teenage white kids play...,5.0,5.0,...,United States,United States,English,False,,False,,3,2,none
2555,80s_LOW_02_Breaking_Away.mp3,concert,clip_context_g4,6824fc226d9b4777f8695cf0,clip_context_g4_PROLIFIC_PID_992291.csv,2025-07-01_06h12.29.151,y,People in a ballroom in elegant dresses slow d...,5.0,5.0,...,United States,United States,English,False,,False,,3,2,none
2557,Jazz_MED_02_I_Guess_Ill_Hang_My_Tears_Out_To_D...,video game,clip_context_g4,6824fc226d9b4777f8695cf0,clip_context_g4_PROLIFIC_PID_992291.csv,2025-07-01_06h12.29.151,y,I imagined a jazz festival and old men on stag...,5.0,5.0,...,United States,United States,English,False,,False,,3,2,none
2558,Electronic_MED_20_The_Distance.mp3,movie,clip_context_g4,6824fc226d9b4777f8695cf0,clip_context_g4_PROLIFIC_PID_992291.csv,2025-07-01_06h12.29.151,y,I imagined a documentary mostly about fun fact...,5.0,5.0,...,United States,United States,English,False,,False,,3,2,none


Index(['clip_name', 'context_word', 'expName', 'PROLIFIC_PID', 'File_ID',
       'date', 'response_thought_or_not.keys', 'descr_THOUGHT.text',
       'rating_music_prompted.response', 'rating_spontaneity.response',
       'rating_novelty.response', 'input_NOT.text',
       'rating_familiarity.response', 'rating_enjoyment.response',
       'demographics.headphones', 'demographics.age', 'demographics.gender',
       'demographics.livingCountry', 'demographics.birthCountry',
       'demographics.nativeLanguage', 'demographics.otherLanguage',
       'demographics.otherLanguageText', 'demographics.hearingImpariments',
       'demographics.hearingImpairmentsText', 'demographics.education',
       'demographics.musicianIdentification', 'demographics.feedback'],
      dtype='object')


Below we are going to perform some basic descriptive statistics on the dataset.





---



In [None]:
import re

def clean_text(text):
    if isinstance(text, str):
        text = re.sub(r'[^a-zA-Z0-9\s]', '', text)
        return text.lower()
    return text

dataMET['cleaned_MET_descr'] = dataMET['MET_descr'].apply(clean_text)
display(dataMET[['MET_descr', 'cleaned_MET_descr']].head())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataMET['cleaned_MET_descr'] = dataMET['MET_descr'].apply(clean_text)


Unnamed: 0,MET_descr,cleaned_MET_descr
0,"kind of sad, melancholy. not happy or upbeat. ...",kind of sad melancholy not happy or upbeat emo...
1,it did not sound like a video game. if anythin...,it did not sound like a video game if anything...
2,"overly upbeat. no real emotions, peppy. too mu...",overly upbeat no real emotions peppy too much
3,"very heavy rock, not for me. somewhere that i ...",very heavy rock not for me somewhere that i do...
4,"very charged, maybe you've won something or wo...",very charged maybe youve won something or won ...
