# Beginner Level: Exploring DEI in the Music Industry

Welcome to the beginner workshop on Diversity, Equity, and Inclusion (DEI) in the music industry! This notebook will guide you through basic data exploration and visualization techniques.

## Learning Objectives:
- Load and explore a dataset about music artists
- Understand basic demographics in the music industry
- Create simple visualizations to identify patterns
- Calculate basic statistics about representation

## Getting Started
Let's start by importing the libraries we'll need and loading our data.

In [None]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

print("Libraries imported successfully!")

## Load the Dataset

Our dataset contains information about various music artists including their demographics, streaming numbers, and career information.

In [None]:
# Load the dataset
df = pd.read_csv('../data/music_industry_dei_data.csv')

# Display basic information about the dataset
print(f"Dataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print("\nFirst few rows:")
df.head()

## Basic Data Exploration

Let's explore the basic structure of our data and understand what information we have.

In [None]:
# Get basic information about the dataset
print("Dataset Info:")
print(df.info())
print("\nBasic Statistics:")
print(df.describe())

## Gender Representation

Let's start by looking at gender representation in our dataset.

In [None]:
# Count artists by gender
gender_counts = df['gender'].value_counts()
print("Gender Distribution:")
print(gender_counts)
print(f"\nPercentages:")
print(gender_counts / len(df) * 100)

In [None]:
# Create a pie chart for gender distribution
plt.figure(figsize=(8, 6))
plt.pie(gender_counts.values, labels=gender_counts.index, autopct='%1.1f%%', startangle=90)
plt.title('Gender Distribution in Music Industry Dataset')
plt.axis('equal')
plt.show()

## Ethnic Diversity

Now let's examine ethnic diversity in our dataset.

In [None]:
# Count artists by ethnicity
ethnicity_counts = df['ethnicity'].value_counts()
print("Ethnicity Distribution:")
print(ethnicity_counts)
print(f"\nPercentages:")
print(ethnicity_counts / len(df) * 100)

In [None]:
# Create a bar chart for ethnicity distribution
plt.figure(figsize=(10, 6))
ethnicity_counts.plot(kind='bar')
plt.title('Ethnic Diversity in Music Industry Dataset')
plt.xlabel('Ethnicity')
plt.ylabel('Number of Artists')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## Success Metrics by Demographics

Let's look at how success metrics (like monthly listeners) vary by demographics.

In [None]:
# Average monthly listeners by gender
avg_listeners_by_gender = df.groupby('gender')['monthly_listeners'].mean()
print("Average Monthly Listeners by Gender:")
print(avg_listeners_by_gender)

# Create a bar chart
plt.figure(figsize=(8, 6))
avg_listeners_by_gender.plot(kind='bar')
plt.title('Average Monthly Listeners by Gender')
plt.xlabel('Gender')
plt.ylabel('Average Monthly Listeners')
plt.xticks(rotation=0)
plt.ticklabel_format(style='scientific', axis='y', scilimits=(0,0))
plt.tight_layout()
plt.show()

In [None]:
# Average monthly listeners by ethnicity
avg_listeners_by_ethnicity = df.groupby('ethnicity')['monthly_listeners'].mean().sort_values(ascending=False)
print("Average Monthly Listeners by Ethnicity:")
print(avg_listeners_by_ethnicity)

# Create a bar chart
plt.figure(figsize=(12, 6))
avg_listeners_by_ethnicity.plot(kind='bar')
plt.title('Average Monthly Listeners by Ethnicity')
plt.xlabel('Ethnicity')
plt.ylabel('Average Monthly Listeners')
plt.xticks(rotation=45)
plt.ticklabel_format(style='scientific', axis='y', scilimits=(0,0))
plt.tight_layout()
plt.show()

## Label Type Analysis

Let's examine how artists are distributed between major and independent labels, and how this relates to demographics.

In [None]:
# Distribution of label types
label_counts = df['label_type'].value_counts()
print("Label Type Distribution:")
print(label_counts)
print(f"\nPercentages:")
print(label_counts / len(df) * 100)

In [None]:
# Cross-tabulation: Gender vs Label Type
gender_label_crosstab = pd.crosstab(df['gender'], df['label_type'], margins=True)
print("Gender vs Label Type Cross-tabulation:")
print(gender_label_crosstab)

# Percentage breakdown
print("\nPercentages (by row):")
print(pd.crosstab(df['gender'], df['label_type'], normalize='index') * 100)

## Key Insights

Based on your analysis above, write down 3-5 key insights you've discovered about DEI in the music industry:

### Your Insights:

1. **Gender Representation**: [Write your observation about gender distribution]

2. **Ethnic Diversity**: [Write your observation about ethnic representation]

3. **Success Patterns**: [Write your observation about success metrics across demographics]

4. **Label Distribution**: [Write your observation about major vs independent label representation]

5. **Additional Insight**: [Any other pattern you noticed]

## Next Steps

Congratulations! You've completed the beginner level analysis. You've learned how to:

- Load and explore a dataset
- Calculate basic statistics
- Create visualizations to understand data patterns
- Analyze representation across different demographics

### Ready for more?
Move on to the **Intermediate Level** notebook to dive deeper into statistical analysis and more advanced visualizations!