# Understanding the Raw Data

# 1. Schema




### Block 0 - Backround
Respondents answer generic questions focused on musical background and listening habits.

| column | description |
|:------:|:------------|
| Timestamp | Date and time when form was submitted |
| Age | Respondent's age |
| Primary streaming service | Respondent's primary streaming service |
| Hours per day | Number of hours the respondent listens to music per day |
| While working | Does the respondent listen to music while studying/working? |
| Instrumentalist | Does the respondent play an instrument regularly? |
| Composer | Does the respondent compose music? |
| Fav genre | Respondent's favorite or top genre |
| Exploratory | Does the respondent actively explore new artists/genres? |
| Foreign languages | Does the respondent regularly listen to music with lyrics in a language they are not fluent in? |
| BPM | Beats per minute of favorite genre |

### Block 1 - Music Genres
Respondents rank how often they listen to 16 music genres, where they can select: Never, Rarely, Sometimes or Very frequently

| column | description |
|:------:|:------------|
| Frequency [Classical] | How frequently the respondent listens to classical music |
| Frequency [Country] | How frequently the respondent listens to country music |
| Frequency [EDM] | How frequently the respondent listens to EDM music |
| Frequency [Folk] | How frequently the respondent listens to folks music |
| Frequency [Gospel] | How frequently the respondent listens to Gospel music |
| Frequency [Hip hop] | How frequently the respondent listens to hip hop music |
| Frequency [Jazz] | How frequently the respondent listens to jazz music |
| Frequency [K pop] | How frequently the respondent listens to K pop music |
| Frequency [Latin] | How frequently the respondent listens to Latin music |
| Frequency [Lofi] | How frequently the respondent listens to lofi music |
| Frequency [Metal] | How frequently the respondent listens to metal music |
| Frequency [Pop] | How frequently the respondent listens to pop music |
| Frequency [R&B] | How frequently the respondent listens to R&B music |
| Frequency [Rap] | How frequently the respondent listens to rap music |
| Frequency [Rock] | How frequently the respondent listens to rock music |
| Frequency [Video game music] | How frequently the respondent listens to video game music |

### Block 2: Mental Health
Respondents rank Anxiety, Depression, Insomnia, and OCD on a scale of 0 to 10, where:
* 0 - I do not experience this.
* 10 - I experience this regularly, constantly/or to an extreme.

| column | description |
|:------:|:------------|
| Anxiety | Self-reported anxiety, on a scale of 0-10 |
| Depression | Self-reported depression, on a scale of 0-10 |
| Insomnia | Self-reported insomnia, on a scale of 0-10 |
| OCD | Self-reported OCD, on a scale of 0-10 |
| Music effects | Does music improve/worsen respondent's mental health conditions? |

Additional data that does not fall in these blocks may provide useful background information. See column descriptors.

# 2. Exploring the Data

## General Characteristics (Block 0)

| question | analysis |
|----------|----------|
| How many entries are in this survey result? | There are 736 results for this survey |
| What is the age range in this survey result? | The age range is 10 to 89 years old |
| How many instrumentalists? | The number of Instrumentalists are 235 |
| How many composers? | The number of Composers are 126 |
| How many active genre explorers | The number of Exploratory music listeners are 525|
| How many foreign languages listeners | The number of Foreign languages music listeners are 404 |
| How many listen to music while working? | The number of While working music listeners are 579 |

## General Characteristics (Block 1)

| question | analysis |
|----------|:--------:|
| How many genres are explored in this dataset? | There are 16 genres | 
| What is the 'Very frequently' listened to genre? | Rock |
| What is the 'Sometimes' listened to genre? | Pop |
| What is the 'Rarely' listened to genre? | Classical |
| What is the 'Never' listened to genre? | Gospel |
    

#### Load the data and necessary libraries

In [37]:
# Load the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Suppress unneccesary warnings
import warnings
warnings.filterwarnings('ignore')
sns.set(color_codes = True)   
sns.set_palette(palette = 'magma', n_colors = 8)

# Load the dataset
data = pd.read_csv('../../music_mental_health/data/mxmh_survey_results.csv')
df = data.copy()

#### Supporting Python Code (Block 0):

In [35]:
# Find the number of survey entries
total_entries = len(df)

# Find the age range of this dataset
age_range = (df['Age'].min(), df['Age'].max())

# Display the results
print("Total number of entries:", total_entries)
print("Age Range:", age_range)

# Define a function to count how many entries say "Yes" in a single column
def column_count(df, column):
    return df[column].str.count('Yes').sum()

# Specify the column to count "Yes" responses
column_1 = 'Instrumentalist'
column_2 = 'Composer'
column_3 = 'Exploratory'
column_4 = 'Foreign languages'
column_5 = 'While working'

# Show results of column counts
num_inst = column_count(df, column_1)
print(f"The number of {column_1}s:", num_inst)

num_comp = column_count(df, column_2)
print(f"The number of {column_2}s:", num_comp)

num_expl = column_count(df, column_3)
print(f"The number of {column_3} music listeners:", num_expl)

num_fore = column_count(df, column_4)
print(f"The number of {column_4} music listeners:", num_fore)

num_work = column_count(df, column_5)
print(f"The number of {column_5} music listeners:", num_work)


Total number of entries: 736
Age Range: (10.0, 89.0)
The number of Instrumentalists: 235.0
The number of Composers: 126.0
The number of Exploratory music listeners: 525
The number of Foreign languages music listeners: 404.0
The number of While working music listeners: 579.0


#### Supporting Python Code (Block 1):

In [34]:
# List of columns to explore
genre_columns = [
    "Frequency [Classical]", "Frequency [Country]", "Frequency [EDM]",
    "Frequency [Folk]", "Frequency [Gospel]", "Frequency [Hip hop]",
    "Frequency [Jazz]", "Frequency [K pop]", "Frequency [Latin]",
    "Frequency [Lofi]", "Frequency [Metal]", "Frequency [Pop]",
    "Frequency [R&B]", "Frequency [Rap]", "Frequency [Rock]"
]

# Values to count
values_to_count = ["Never", "Rarely", "Sometimes", "Very frequently"]

# Summarize the counts for each value across all columns
value_counts = {value: {col: df[col].str.count(value).sum() for col in genre_columns} for value in values_to_count}

# Convert to a DataFrame for better visualization
summary_df = pd.DataFrame(value_counts)

# Find the columns with the highest counts for each value
max_counts = summary_df.idxmax()

print("Columns with the highest count for each value:")
print(max_counts)


Columns with the highest count for each value:
Never                 Frequency [Gospel]
Rarely             Frequency [Classical]
Sometimes                Frequency [Pop]
Very frequently         Frequency [Rock]
dtype: object
