
![Spotify Image](https://www.dropbox.com/scl/fi/at45glkz3f6wp4ez6cbm4/Spotify.png?rlkey=2jjig1dv331tsoex8h0wa3jnc&dl=1)



# 🎵 **Assignment: Spotify Data Analysis and Visualization **

This assignment aims to help students enhance their skills in using Python for **data visualization** and **storytelling**. By analyzing Spotify data, students will gain hands-on experience in creating meaningful insights using visual tools.

---
## **Steps to Complete the Assignment**
1️⃣ **Install required libraries**: Ensure all necessary tools like Plotly and Seaborn are set up.  
2️⃣ **Explore the Spotify dataset**: Load, inspect, and clean the data.  
3️⃣ **Create visualizations**: Use different libraries to analyze and visualize key metrics (e.g., song popularity, genres).  
4️⃣ **Present findings**: Create a narrative around the visualizations to explain insights effectively.

---

Are you ready to get started? 🎉 Let’s dive into the first step!


 ## Setting Up Visualization Libraries

In [None]:
# Install missing libraries (if not already installed)
!pip install plotly seaborn --quiet

# Importing required libraries
import matplotlib.pyplot as plt  # Core plotting library
import seaborn as sns  # Statistical visualizations
import plotly.express as px  # Interactive plots

# Ensure plots are displayed properly in Colab
%matplotlib inline

print("✅ Libraries successfully installed and imported!")

✅ Libraries successfully installed and imported!


# Exploring Global Music Trends with Spotify Data
## About the Dataset
This dataset provides daily updates on the top 50 songs across 73 countries, offering insights into global music preferences. It includes details such as song title, artist, position, streams, date, and region, enabling analysis of trends and patterns in music consumption worldwide.

📂 **Dataset Source:** [Kaggle - Top Spotify Songs in 73 Countries](https://www.kaggle.com/datasets/asaniczka/top-spotify-songs-in-73-countries-daily-updated)

🚀 Using Kaggle for Data Analysis
Kaggle provides a collaborative environment for working with datasets, running notebooks, and building machine learning models. You can explore the dataset, perform analyses, and share your findings with the community.

Now, let's import the dataset and start exploring!

Complete all the tasks mentioned in each cell


In [None]:
# Import necessary libraries
import pandas as pd

# Load dataset from Kaggle link
dataset_url = "https://www.kaggle.com/datasets/asaniczka/top-spotify-songs-in-73-countries-daily-updated"


# 🗂️ **Task 1️⃣: Loading the Dataset**

### **What You’ll Do**
1. **Read the CSV file**: Load the Spotify dataset into a Pandas DataFrame for analysis.  
2. **Inspect the data**: Use basic functions to understand the structure and key attributes of the dataset.  
3. **Preview the data**: Display the first few rows to ensure it’s loaded correctly.

---

### **Steps to Complete**
a. Load the CSV file into a Pandas DataFrame using the `read_csv()` method.  
b. Display basic information about the dataset (e.g., column names, data types, non-null counts).  
c. Preview the first few rows of the dataset using `.head()`.


In [None]:
# ==== Task 1: Loading the Dataset ====
# 🚀 Goal: Load and inspect the Spotify dataset for analysis.

# 1️⃣ Load the CSV File
# Use pd.read_csv() to load the dataset into a Pandas DataFrame.
# -------- Your code starts here --------


# -------- Your code ends here --------

# 2️⃣ Display Basic Information
# Use .info() to check column names, data types, and non-null counts.
# -------- Your code starts here --------


# -------- Your code ends here --------

# 3️⃣ Preview the First Few Rows
# Use .head() to display a sample of the dataset and verify it loaded correctly.
# -------- Your code starts here --------


# -------- Your code ends here --------


# 🧹 **Task 2️⃣: Handle Missing Values in the Spotify Dataset**

### **Objective**
Calculate the percentage of missing values for each column in the Spotify dataset to identify areas that need data cleaning.

---

### **What You’ll Do**
1. **Calculate Missing Values**:
   - Use the `isnull()` method combined with `sum()` to count missing values in each column.
   - Calculate the percentage of missing values by dividing the count of missing values by the total number of rows and multiplying by 100.

2. **Display the Results**:
   - Print out the percentage of missing data for each column in a clear and concise format.
   

In [None]:
# ==== Task 2: Handling Missing Values ====
# 🚀 Goal: Identify and analyze missing values in the dataset.

# 1️⃣ Count Missing Values
# Use isnull().sum() to count missing values in each column.
# -------- Your code starts here --------


# -------- Your code ends here --------

# 2️⃣ Calculate Percentage of Missing Values
# Compute the percentage of missing values for each column.
# -------- Your code starts here --------


# -------- Your code ends here --------

# 3️⃣ Display Missing Data Summary
# Print the percentage of missing values in a clear format for analysis.
# -------- Your code starts here --------


# -------- Your code ends here --------


# 📊 **Task 3️⃣: Correlation Analysis & Heatmap**

### **What You’ll Do**
1. **Compute the correlation matrix**: Analyze relationships between numerical variables in the dataset.  
2. **Visualize correlations with a heatmap**: Use Seaborn to create an intuitive heatmap representation.  
3. **Identify key correlations**: Determine the strongest, most positive, and most negative correlations.  
4. **Infer insights**: Interpret the results and how they impact music trends.

---

### **Steps to Complete**
a. **Select numerical columns** from the dataset for correlation analysis.  
b. **Compute the correlation matrix** using `.corr()` method in Pandas.  
c. **Generate a heatmap** using `seaborn.heatmap()` to visualize the relationships.  
d. **Identify the strongest correlations** and analyze positive & negative correlations.  
e. **Interpret key findings** and document observations based on trends.

---


In [None]:
# ==== Task 3: Correlation Analysis & Heatmap ====
# 🚀 Goal: Analyze relationships between numerical variables using a heatmap.

# 1️⃣ Select Numerical Columns
# Extract only the numerical columns from the dataset for correlation analysis.
# -------- Your code starts here --------


# -------- Your code ends here --------

# 2️⃣ Compute the Correlation Matrix
# Use .corr() to calculate correlations between numerical features.
# -------- Your code starts here --------


# -------- Your code ends here --------

# 3️⃣ Generate Heatmap
# Use seaborn.heatmap() to visualize correlations between variables.
# -------- Your code starts here --------


# -------- Your code ends here --------

# 4️⃣ Identify Key Correlations
# Analyze the strongest, most positive, and most negative correlations.
# -------- Your code starts here --------


# -------- Your code ends here --------

# 5️⃣ Interpret Insights
# Document key findings based on trends observed in the heatmap.
# -------- Your code starts here --------


# -------- Your code ends here --------


# 🌍 **Task 4️⃣: Geospatial Visualization - Choropleth Map of Spotify Streams**

### **Objective**  
Create a **choropleth map** to visualize the distribution of Spotify streams across different countries, helping identify regional streaming trends.

---

### **What You’ll Do**  
1. **Understand Geospatial Data & Visualization**:  
   - Geospatial data represents information tied to locations on Earth.  
   - Choropleth maps use color shading to represent numerical values across geographic regions.  
   - 📺 Watch this [video](https://www.youtube.com/watch?v=ZS7v_8HhmG8) to learn more.

2. **Prepare the Data**:  
   - Extract **country-wise** streaming data from the dataset.  
   - Ensure country names match standardized geographic codes for mapping.

3. **Generate the Choropleth Map**:  
   - Use `plotly.express.choropleth()` to create an interactive visualization.  
   - Customize the map with labels, colors, and tooltips for clarity.

4. **Analyze Insights**:  
   - Identify countries with **highest and lowest** Spotify streams.  
   - Observe **regional streaming trends** and discuss possible reasons behind them.


In [None]:
# ==== Task 4: Geospatial Visualization - Choropleth Map ====
# 🚀 Goal: Create a choropleth map to visualize the distribution of Spotify streams across countries.

# 1️⃣ Prepare the Data
# Extract country-wise streaming data and ensure country names are standardized.
# -------- Your code starts here --------


# -------- Your code ends here --------

# 2️⃣ Create the Choropleth Map
# Use plotly.express.choropleth() to visualize the data on a world map.
# -------- Your code starts here --------


# -------- Your code ends here --------

# 3️⃣ Customize and Display the Map
# Add labels, colors, and tooltips for better readability.
# -------- Your code starts here --------


# -------- Your code ends here --------

# 4️⃣ Analyze Insights
# Identify top and bottom streaming countries and document key observations.
# -------- Your code starts here --------


# -------- Your code ends here --------
