# Intermediate Level: Statistical Analysis of DEI in Music Industry

Welcome to the intermediate workshop on DEI in the music industry! This notebook builds on the beginner level with more advanced analysis and visualization techniques.

## Learning Objectives:
- Understand the importance of DEI and how it affects the music industry
- Create advanced visualizations with multiple variables
- Analyze correlation patterns in the data
- Apply grouping and aggregation techniques


## Prerequisites:
You should have completed the beginner level or be familiar with basic pandas and matplotlib operations.

## 🚀 Getting Started
Let's start by importing the libraries we'll need and loading our data.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Set up plotting style
plt.style.use('default')
sns.set_palette("Set2")
plt.rcParams['figure.figsize'] = (12, 8)

print("Libraries imported successfully!")

Libraries imported successfully!


## 📂 Load the Dataset

We have 4 datasets, each representing a different (anonymized) genre, and a legend describing the columns.

In [93]:
# Explicitly list the 4 dataset files
# 💡 Store the file paths in a list called data_files.
# 📖 Docs: Working with file paths in Python: https://docs.python.org/3/library/os.path.html
data_files = [
    "../../data/1 Creators and their Listeners with gender and 6 locations.csv",
    "../../data/2 Creators and their Listeners with gender and 6 locations.csv",
    "../../data/3 Creators and their Listeners with gender and 6 locations.csv",
    "../../data/5 Creators and their Listeners with gender and 6 locations.csv"
]

# Load all datasets into a dictionary
# 💡 Reading the csv file for each data set above
# 📖 Docs: Reading CSV files with pandas: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
datasets = {}
for i, file in enumerate(data_files):
    name = f"Genre {i+1}"
    datasets[name] = pd.read_csv(file)

# Load comments dataset
comments_data = pd.read_csv("../../data/creator and listener comments by gender and anonymised genre.csv")

# Load legend
legend = pd.read_csv("../../data/legend.csv")

## 🔍 Explore the Data
Get familiarized with the genre & comments datasets before jumping to the next step!
- How does the data look like?
- What are the columns? What do they represent?
- What are the data types?
- What are the missing values?
- Are there any duplicates?

Is there anything you would like to transform in the data?  (e.g. change data types, remove certain columns, rows, replace values, deal with null values etc.)

In [34]:
# Explore how the Genre datasets look like
## 🧐 Understand what the data represents
## 🔍 Look up column descriptions in the legend





In [35]:
# Explore how the comments dataset look like
## 🧐Understand what the data represents
## 🔍 Look up column descriptions in the legend






In [36]:
# 📊Do any data transformations you consider necessary before jumping to the next step


Before moving on to the next step, we’d like to emphasize that this workshop is all about fostering creativity and having fun! Feel free to create additional visualizations whenever you feel they enhance your storytelling. Some of our favorite visualization types include:

- Bar plots
- Pie charts
- Heatmaps
- Scatter plots
- Box plots
- Sankey diagrams

That said, you’re not limited to these options. Experiment and explore other visualization methods that align with your data and the story you want to convey!

## 🎧 Step 1: Creator-Listener Gender Dynamics
*Let's investigate whether creator gender identity influences listener demographics and consumption patterns, establishing baseline understanding of potential gender-based preferences, biases, or barriers in music discovery and consumption across genres and global markets.*

### a) 📊 Overall Creator-Listener Gender Influence
Let's examine the question of whether creator gender creates systematic patterns in listener demographics, identifying potential gender-based consumption biases that could impact creator visibility and success.

- Do listeners show preferential consumption patterns based on creator gender?
- Are there gender groups facing systemic disadvantages in reaching diverse audiences?
- How strong is the overall creator-listener gender correlation and what does this mean for equity?

In [43]:
# Calculate total listener gender proportions by creator gender


# Calculate the overall correlation between creator and listeners' gender



In [None]:
# Optional: Analyze the statistical significance with a t-test




In [None]:
# Visualize the results (use the plots you find most appropriate)



### b) 📚 Creator-Listener Gender Influence by Genre
We'll analyze how musical genres may amplify or mitigate gender-based listening patterns, identifying genres that promote cross-gender consumption versus those that may reinforce gender disparities.

- Which genres demonstrate the most inclusive cross-gender listening patterns?
- Are there genres where creator gender significantly limits audience diversity?
- Are there genres where certain creator genders struggle to gain audience attention?


In [44]:
# Calculate Creator gender distribution (how many male/female/custom/null creators per genre)


# Calculate Listener gender % by creator gender per genre



In [45]:
# Optional: Statistical analysis: Does creator gender affect listener demographics within this genre?
# e.g. chi-squared test



In [None]:
# Visualize the results (use the plots you find most appropriate)




### c) 🌍 Creator-Listener Gender Influence by Country
This analysis will reveal how cultural contexts and regional attitudes influence creator-listener gender dynamics, identifying markets that foster inclusive consumption versus those with potential cultural barriers.

- Do certain countries show more equitable cross-gender listening patterns?
- Are there cultural markets where creator gender creates significant audience limitations?
- How do regional cultural norms impact creator opportunities for diverse audience reach?
- Which global markets demonstrate the most inclusive creator-listener gender dynamics?

In [None]:
# Calculate Creator gender distribution (how many male/female/custom/null creators per country) \
# Optional: you can include the genre too


# Calculate Listener gender % by creator gender per country of the listener (optional: and genre)


# Calculate Listener gender % by creator gender per country of the creator (optional: and genre)




In [None]:
# Visualization time!





## 💬 Step 2 (Optional): Creator Engagement Equity Analysis
*Let's analyze gender representation and inclusive participation patterns in creator-driven community engagement, identifying barriers to equitable voice and participation opportunities across gender identities and musical genres.*



### a) 👥 Gender Equity in Creators' Engagement
Let's analyze whether there are equitable participation opportunities across creator genders and identify any barriers to engagement that may disproportionately affect certain gender groups.
- Do all creator genders have equal representation in platform engagement?
- Are there systemic barriers preventing equitable participation by gender?
- How do you think gender identity impacts creator engagement opportunities?

In [None]:
# Find out which creators' genders are more active commenters (as new commenters) across genres



# Find out which creators' genders are more active responders across genres



# Optional: do a statistical significance test




### b) 🎧 Creator's vs Listener's Engagement Analysis
Let's build on top of 3a and contrast the findings with the engagement of the listener. We'll examine whether engagement dynamics create inclusive environments for all genders and identify potential biases in how different creator genders receive listener engagement.

- Are listeners engaging equitably with creators across all gender identities?
- Do certain creator genders receive disproportionate engagement responses?
- How does genre influence inclusive engagement patterns between creators and listeners?

In [46]:
# Which creators' genders have the most listeners' comments?
# Does this finding differ from the findings in 3a?





### c) 🎵 Diversity & Inclusion Across Musical Genres
Let's add another layer to the analysis of 3b) by looking at the engagement by genre. This analysis will identify genres that foster inclusive creator participation and reveal where gender-based engagement disparities may limit diverse representation in music communities.

- Which genres demonstrate the most inclusive engagement environments for all creator genders?
- Are there genres where certain gender groups face engagement disadvantages?
- How can we identify and address genre-specific barriers to equitable creator participation?

In [None]:
# Which genres are the most inclusive for all creator genders?
# It can be measured by the listeners' comments



# Which genres are less inclusive?




In [None]:
# Visualize findings from Step 2





## Step 3 (Optional): 🌐 Integrated Creator Equity Analysis

*Let's examine how consumption patterns (plays) translate into community participation (comments) and identifying systemic barriers that may prevent equal opportunities for creators across genres, countries, and gender identities.*

### a) 🎧 Consumption vs Participation Equity Gap Analysis
Let's examine whether high consumption (plays) translates to equal participation opportunities (comments) across creator genders, identifying potential "listen but don't engage" disparities.

- Are there gender groups who consume content but face barriers to active participation?
- How does the consumption-to-engagement conversion rate vary by creator gender?
- Which creator genders have the highest "engagement efficiency" (comments per play)?

In [None]:
# Calculate comments per plays per creators' gender


# how many plays does a creator have to have 1 comment? (per gender)




### b) 🎵 Genre Ecosystem Equity: From Streams to Voice
This comprehensive analysis will identify music genres that provide both audience reach AND authentic community engagement opportunities for underrepresented creator genders.

- Which genres offer both high listening AND inclusive engagement for all creator genders?
- Are there genres where certain creator genders get plays but lack community voice/engagement?
- How do genre popularity patterns (plays) compare to genre participation patterns (comments)?

In [None]:
# Split 3a results into the different genres



### c) 🌍 Global Inclusion: Cross-Cultural Engagement Patterns
We'll analyze how cultural contexts (countries) influence both listening preferences AND engagement behaviors to identify regions fostering inclusive creator-listener relationships.

- Do countries with diverse listening patterns also show inclusive engagement behaviors?
- How do listener gender demographics by country correlate with creator engagement patterns?
- Are there cultural barriers that affect certain creator genders' engagement opportunities globally?

In [None]:
# Contrast the genres with more geographical diversity among the listeners and the number of comments by listeners
# How does geographical diversity among the listeners correlate with the number of comments by listeners?


# Include gender in that contrast
# How does geographical and gender diversity within genre correlate with the number of comments by listeners?




## 💡 Key Insights

Based on your analysis above, write down 3-5 key insights you've discovered about DEI in the music industry:

### Your Insights:

1. **Gender Representation**: [Write your observation about gender distribution among creators and listeners]

2. **Listener Diversity by Country**: [Write your observation about how listener demographics differ across countries]

3. **Genre-Specific Trends**: [Write your observation about differences across genres]

4. **Data Limitations & Biases**: [Write your observation about missing or anonymized data and its implications]

5. **Additional Insight**: [Any other pattern you noticed]

## Next Steps

Excellent work! You've completed intermediate-level statistical analysis including:

- Extracting insights from data related to creators and listeners in the music industry.
- Visualizing relationships and patterns within the data to uncover the underlying patterns.
- Performing statistical analyses to evaluate the significance of relationships in the data.

### Ready for the advanced level?
Move on to the **Advanced Level** notebook for more advanced statistical analysis and machine learning techniques.