### Active Engagement
To explore the impact of BTS music on mental health, I began by identifying key areas of focus:
  
- Defined purpose: Assess BTS music’s impact on fans' mental health.  
- Key questions: How does BTS music affect mood? Are there correlations with mental health ratings?  
- Planned two datasets: survey data and Spotify/lyrics data.  

I then defined two datasets for analysis: 
- Designed form to collect data on demographics, listening habits, mental health metrics, and qualitative feedback.  
- Tested the form with peers and refined questions for clarity.  

I designed a Google Form to collect detailed and diverse responses from BTS fans.
- [Google Form Link](https://docs.google.com/forms/d/e/1FAIpQLSdJJ_qkvIGL6EMnlaQrQvmnOLnZB0qCprr8yWjSUuybtRgK7w/viewform?usp=sf_link)  
- Distributed the form via BTS fan forums and social media. 
- Gathered Spotify metadata and lyrics for analysis.  

Throughout the project, I actively sought feedback:
- **Peers**: Reviewed survey design and analysis ideas.  
- **Mentors**: Advised on KDD process and mental health frameworks.  

Feedback improved survey design, added fandom-related questions, and guided integration of Spotify data, enriching the analysis.

## **Libraries**

In [57]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud

print("Libraries are installed and working!")

Libraries are installed and working!


## **Display Survey and Spotify datasets** 

In [133]:
survey_data = pd.read_csv("original/bts_survey_data.csv", encoding='utf-8')

survey_data.head()


Unnamed: 0,Отметка времени,What is your age?,What is your gender?,What country are you from?,How often do you listen to BTS music?,How many hours per day do you listen to music?,What is your primary music streaming service?,What is your favorite BTS song?,"On a scale of 0-10, how would you rate your current level of anxiety?","On a scale of 0-10, how would you rate your current level of depression?",...,How often do you feel sad?,How does listening to BTS music affect your mood?,Do you use BTS music as a coping mechanism for stress or anxiety?,Do you believe that BTS music has improved your mental health?,"If yes, please explain how (optional)",What themes in BTS's music resonate with you the most?,Are you a part of BTS fandom (ARMY)?,How has being part of the BTS fandom (ARMY) impacted your mental health?,Do you discuss BTS music and its impact on mental health with others?,Is there anything else you would like to share about your experience with BTS music and its impact on your mental health?
0,11.11.2024 16:30:01,19,Female,Ukraine,4.0,5,Spotify,Not Today,6,2,...,2,5,Yes,Yes,It gives me more confidence and I do not feel ...,Cool,Yes,ARMY is very strong fandom. They will anything...,Yes,BTS forever💜
1,11.11.2024 17:56:08,16,Female,India,5.0,3,"Spotify, YouTube Music",Boy with luv,8,8,...,4,5,Yes,I don't know,,Lively themes,Yes,It helped me gain confidence and stay happy,Yes,No
2,12.11.2024 3:57:24,19,Female,India,5.0,3,"Spotify, YouTube Music",Zero o clock,3,0,...,1,5,Yes,Yes,"When COVID hit, I was already in deep depressi...","Love yourself for truly who u are, face yourse...",Yes,Very much in a positive way.like whenever I fe...,Yes,I have already shared everything..I think youn...
3,12.11.2024 16:48:51,15,Female,"Maryland United state of America, and fuzhou c...",5.0,5,Spotify,"No more dreams, Spring days, and permission to...",1,2,...,1,5,I don't know,I don't know,I said idk because I listened to bts since 4th...,Idk,Yes,Idk,No,I love bts and they make me happy
4,13.11.2024 0:39:52,17,Male,North America,5.0,5,Spotify,I Need U,4,2,...,1,5,No,Yes,Well.... a lot of their music ive been able to...,"everything, from their music about love to the...",Yes,"honestly, a good amount",No,"no, thats all"


In [132]:
spotify_data = pd.read_csv("original/bts_spotify_data.csv", encoding='utf-16le')
spotify_data.head()

Unnamed: 0,id,album_title,eng_album_title,album_rd,album_seq,track_title,raw_track_title,eng_track_title,lyrics,hidden_track,...,spotify_track_mode,spotify_track_speechiness,spotify_track_acousticness,spotify_track_instrumentalness,spotify_track_liveness,spotify_track_valence,spotify_track_tempo,spotify_track_time_signature,eng_lyrics_source_url,eng_lyrics_credits
0,BTS-1,2 Cool 4 Skool,2 Cool 4 Skool,2013-06-12,1,Intro: 2 Cool 4 Skool (ft. DJ Friz),,Intro: 2 Cool 4 Skool (ft. DJ Friz),we're now going to progress to some steps\nwhi...,False,...,1.0,0.245,0.179,0.266,0.179,0.532,94.871,4.0,,
1,BTS-2,2 Cool 4 Skool,2 Cool 4 Skool,2013-06-12,2,We Are Bulletproof Pt.2,,We Are Bulletproof Pt.2,(what) give it to me\n (what) be nervous\n (wh...,False,...,0.0,0.16,0.0104,6e-06,0.134,0.868,144.02,4.0,,
2,BTS-3,2 Cool 4 Skool,2 Cool 4 Skool,2013-06-12,3,Skit: Circle Room Talk,,Skit: Circle Room Talk,rap monster: it was a big hit\nv: year 2006!\n...,False,...,1.0,0.802,0.912,0.0,0.913,0.817,121.045,3.0,,
3,BTS-4,2 Cool 4 Skool,2 Cool 4 Skool,2013-06-12,4,No More Dream,,No More Dream,"hey, what's your dream?\n hey, what's your dre...",False,...,1.0,0.47,0.0118,2e-06,0.431,0.594,167.898,4.0,,
4,BTS-5,2 Cool 4 Skool,2 Cool 4 Skool,2013-06-12,5,Interlude,,Interlude,,False,...,0.0,0.319,0.494,0.762,0.392,0.854,125.897,4.0,,


## **Data Cleaning**

- Standardize formats (e.g., age as numeric, text fields as lowercase).
    - Formating cvs files(remove, remane columns).
    - Convert all text to lowercase.
    - Strip leading/trailing whitespace from text.
    - Create new cvs file with updated data.
- Handle missing values.
    - Count how many missing values.
    - Replace them with "no data"

**Formating <u>bts_survey_data.cvs</u> file**
1. Remove first column with time when survey was taken.
2. Rename columns with appopriate names

*Remove first column*

In [138]:
survey_data_updated = survey_data.iloc[:, 1:] 

*Remane columns*

In [139]:
new_column_names = {
    "What is your age?": "Age",
    "What is your gender?": "Gender",
    "What country are you from?": "Country",
    "How often do you listen to BTS music?": "BTS_Listening_Freq",
    "How many hours per day do you listen to music?": "Daily_Music_Hours",
    "What is your primary music streaming service?":"Streaming_Service",
    "What is your favorite BTS song?":"Favorite_Song",
    "On a scale of 0-10, how would you rate your current level of anxiety?":"Anxiety_Level",
    "On a scale of 0-10, how would you rate your current level of depression? ": "Depression_Level",
    "On a scale of 0-10, how would you rate your current level of stress?":"Stress_Level",
    "How often do you feel lonely?":"Loneliness_Rating",
    "How often do you feel sad?":"Sadness_Rating",
    "How does listening to BTS music affect your mood?":"BTS_Mood_Impact",
    "Do you use BTS music as a coping mechanism for stress or anxiety?":"BTS_Stress_Management",
    "Do you believe that BTS music has improved your mental health?":"BTS_Mental_Health_Impact",
    "If yes, please explain how (optional)":"BTS_Impact_Explanation",
    "What themes in BTS's music resonate with you the most?":"Themes",
    "Are you a part of BTS fandom (ARMY)?":"ARMY_Membership",
    "How has being part of the BTS fandom (ARMY) impacted your mental health?":"ARMY_Fandom_MH_Impact",
    "Do you discuss BTS music and its impact on mental health with others?":"Health_Talk_with_Friends",
    "Is there anything else you would like to share about your experience with BTS music and its impact on your mental health?":"Additional_BTS_Impact"
}
survey_data_updated.rename(columns=new_column_names, inplace=True)

In [140]:
print(survey_data_updated.columns)

Index(['Age', 'Gender', 'Country', 'BTS_Listening_Freq', 'Daily_Music_Hours',
       'Streaming_Service', 'Favorite_Song', 'Anxiety_Level',
       'Depression_Level', 'Stress_Level', 'Loneliness_Rating',
       'Sadness_Rating', 'BTS_Mood_Impact', 'BTS_Stress_Management',
       'BTS_Mental_Health_Impact', 'BTS_Impact_Explanation', 'Themes',
       'ARMY_Membership', 'ARMY_Fandom_MH_Impact', 'Health_Talk_with_Friends',
       'Additional_BTS_Impact'],
      dtype='object')


*Convert all text to lowercase*

In [141]:
survey_data_updated['Gender'] = survey_data_updated['Gender'].str.lower()
survey_data_updated['Country'] = survey_data_updated['Country'].str.lower()
survey_data_updated['Streaming_Service'] = survey_data_updated['Streaming_Service'].str.lower()
survey_data_updated['Favorite_Song'] = survey_data_updated['Favorite_Song'].str.lower()
survey_data_updated['BTS_Impact_Explanation'] = survey_data_updated['BTS_Impact_Explanation'].str.lower()
survey_data_updated['Themes'] = survey_data_updated['Themes'].str.lower()
survey_data_updated['ARMY_Fandom_MH_Impact'] = survey_data_updated['ARMY_Fandom_MH_Impact'].str.lower()
survey_data_updated['Additional_BTS_Impact'] = survey_data_updated['Additional_BTS_Impact'].str.lower()
survey_data_updated['BTS_Stress_Management'] = survey_data_updated['BTS_Stress_Management'].str.lower()
survey_data_updated['BTS_Mental_Health_Impact'] = survey_data_updated['BTS_Mental_Health_Impact'].str.lower()
survey_data_updated['ARMY_Membership'] = survey_data_updated['ARMY_Membership'].str.lower()
survey_data_updated['Health_Talk_with_Friends'] = survey_data_updated['Health_Talk_with_Friends'].str.lower()

In [142]:
print(survey_data_updated.head())

  Age  Gender                                            Country  \
0  19  female                                            ukraine   
1  16  female                                              india   
2  19  female                                              india   
3  15  female  maryland united state of america, and fuzhou c...   
4  17    male                                      north america   

   BTS_Listening_Freq  Daily_Music_Hours       Streaming_Service  \
0                 4.0                  5                 spotify   
1                 5.0                  3  spotify, youtube music   
2                 5.0                  3  spotify, youtube music   
3                 5.0                  5                 spotify   
4                 5.0                  5                 spotify   

                                       Favorite_Song  Anxiety_Level  \
0                                          not today              6   
1                                       

*Remove extra spaces at the beginning and end*

In [143]:
survey_data_updated = survey_data_updated.applymap(lambda x: x.strip() if isinstance(x, str) else x)

  survey_data_updated = survey_data_updated.applymap(lambda x: x.strip() if isinstance(x, str) else x)


*Count missing values for each column*

In [144]:
print(survey_data_updated.isna().sum())

Age                          1
Gender                       0
Country                      3
BTS_Listening_Freq           1
Daily_Music_Hours            0
Streaming_Service            0
Favorite_Song                1
Anxiety_Level                0
Depression_Level             0
Stress_Level                 0
Loneliness_Rating            0
Sadness_Rating               0
BTS_Mood_Impact              0
BTS_Stress_Management        2
BTS_Mental_Health_Impact     0
BTS_Impact_Explanation      26
Themes                       7
ARMY_Membership              0
ARMY_Fandom_MH_Impact       14
Health_Talk_with_Friends     0
Additional_BTS_Impact       32
dtype: int64


*Replace missing values with "no data"*

In [145]:
survey_data_updated = survey_data_updated.astype(str)

survey_data_updated.replace(["nan"], "no data", inplace=True)


In [146]:
print(survey_data_updated.isna().sum())

Age                         0
Gender                      0
Country                     0
BTS_Listening_Freq          0
Daily_Music_Hours           0
Streaming_Service           0
Favorite_Song               0
Anxiety_Level               0
Depression_Level            0
Stress_Level                0
Loneliness_Rating           0
Sadness_Rating              0
BTS_Mood_Impact             0
BTS_Stress_Management       0
BTS_Mental_Health_Impact    0
BTS_Impact_Explanation      0
Themes                      0
ARMY_Membership             0
ARMY_Fandom_MH_Impact       0
Health_Talk_with_Friends    0
Additional_BTS_Impact       0
dtype: int64


*Check the result for replace empty strings*

In [147]:
print(survey_data_updated.iloc[17])

Age                                             22
Gender                                      female
Country                                    no data
BTS_Listening_Freq                             5.0
Daily_Music_Hours                                4
Streaming_Service           spotify, youtube music
Favorite_Song                         crystal snow
Anxiety_Level                                    6
Depression_Level                                 2
Stress_Level                                     3
Loneliness_Rating                                3
Sadness_Rating                                   4
BTS_Mood_Impact                                  5
BTS_Stress_Management                          yes
BTS_Mental_Health_Impact                       yes
BTS_Impact_Explanation                     no data
Themes                                     no data
ARMY_Membership                                yes
ARMY_Fandom_MH_Impact                      no data
Health_Talk_with_Friends       

*Create a new filtered cvs file*

In [148]:
survey_data_updated.to_csv("survey_data.csv")

**Formating <u>bts_spotify_data.cvs</u> file**
1. Remove unnecessary columns
2. Handle missing values

*Print and remove columns that need to be removed as it does not has important information for the project.*

In [None]:
columns_to_remove = [6, 9, 10, 11, 13, 15, 16, 32, 33]

columns_to_remove_names = spotify_data.columns[columns_to_remove]
print("Columns to be removed:")
print(columns_to_remove_names)

Columns to be removed:
Index(['raw_track_title', 'hidden_track', 'remix', 'featured', 'repackaged',
       'has_full_ver', 'is_alt_lang_ver', 'eng_lyrics_source_url',
       'eng_lyrics_credits'],
      dtype='object')


In [None]:
spotify_data_updated = spotify_data.drop(columns_to_remove_names, axis=1)

*Format all text to lowercase*

In [129]:
spotify_data_updated['album_title'] = spotify_data_updated['album_title'].str.lower()
spotify_data_updated['eng_album_title'] = spotify_data_updated['eng_album_title'].str.lower()
spotify_data_updated['eng_track_title'] = spotify_data_updated['eng_track_title'].str.lower()
spotify_data_updated['lyrics'] = spotify_data_updated['lyrics'].str.lower()

*Remove spaces at the beggining and end*

In [130]:
spotify_data_updated = spotify_data_updated.applymap(lambda x: x.strip() if isinstance(x, str) else x)

  spotify_data_updated = spotify_data_updated.applymap(lambda x: x.strip() if isinstance(x, str) else x)


*Count missing/empty values in the dataset*

In [119]:
print(spotify_data_updated.isna().sum())

id                                 0
album_title                        0
eng_album_title                    0
album_rd                           0
album_seq                          0
track_title                        0
eng_track_title                    0
lyrics                            31
performed_by                      23
lang                              22
spotify_album_id                  23
spotify_track_duration_ms         23
spotify_track_id                  23
spotify_track_danceability        23
spotify_track_energy              23
spotify_track_key                 23
spotify_track_loudness            23
spotify_track_mode                23
spotify_track_speechiness         23
spotify_track_acousticness        23
spotify_track_instrumentalness    23
spotify_track_liveness            23
spotify_track_valence             23
spotify_track_tempo               23
spotify_track_time_signature      23
dtype: int64


*Replace missing values with "no data"*

In [120]:
spotify_data_updated = spotify_data_updated.astype(str)

spotify_data_updated.replace(["nan"], "no data", inplace=True)

In [121]:
print(spotify_data_updated.isna().sum())

id                                0
album_title                       0
eng_album_title                   0
album_rd                          0
album_seq                         0
track_title                       0
eng_track_title                   0
lyrics                            0
performed_by                      0
lang                              0
spotify_album_id                  0
spotify_track_duration_ms         0
spotify_track_id                  0
spotify_track_danceability        0
spotify_track_energy              0
spotify_track_key                 0
spotify_track_loudness            0
spotify_track_mode                0
spotify_track_speechiness         0
spotify_track_acousticness        0
spotify_track_instrumentalness    0
spotify_track_liveness            0
spotify_track_valence             0
spotify_track_tempo               0
spotify_track_time_signature      0
dtype: int64


*Check the result*

In [123]:
print(spotify_data_updated.iloc[8])

id                                                                            BTS-9
album_title                                                          2 Cool 4 Skool
eng_album_title                                                      2 Cool 4 Skool
album_rd                                                                 2013-06-12
album_seq                                                                         9
track_title                                                           길 (Road/Path)
eng_track_title                                                           Road/Path
lyrics                            yeah, wassup\nyou know, time flows like stars\...
performed_by                                                                    BTS
lang                                                                            KOR
spotify_album_id                                                            no data
spotify_track_duration_ms                                                   

*Create a new dataset with updated data*

In [131]:
spotify_data_updated.to_csv('spotify_data.csv', index=False)

## **Data Integration**

Merge Spotify and survey datasets using track_title.

In [137]:
merged_data = pd.merge(survey_data_updated, spotify_data_updated, left_on='Favorite_Song', right_on='track_title', how='inner')

print(merged_data.head())

  Age  Gender  Country BTS_Listening_Freq Daily_Music_Hours  \
0  19  Female  Ukraine                4.0                 5   
1  20  Female      USA                5.0                 4   
2  20  Female      USA                5.0                 4   
3  19  Female      USA                5.0                 5   
4  32    Male  belgium                2.0                 5   

        Streaming_Service Favorite_Song Anxiety_Level Depression_Level  \
0                 Spotify     Not Today             6                2   
1                 Spotify       Ma City             8                2   
2                 Spotify       Ma City             8                2   
3  Spotify, YouTube Music    Black Swan             8                8   
4           YouTube Music      Dionysus             1                4   

  Stress_Level  ... spotify_track_key spotify_track_loudness  \
0            7  ...               8.0                 -3.408   
1            9  ...               6.0           