This repository contains an Exploratory Data Analysis (EDA) project on Spotify data, performed using Python(Jupyter Notebook).
- Project Description
- Installation
- Usage
- Data Source
- Technologies Used
- Exploratory Questions
- Visualization
- Insights
- License
Spotify is a proprietary Swedish audio streaming and media services provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. It is one of the largest music streaming service providers, with over 527 million monthly active users, including 210 million paying subscribers, as of March 2023.
In this project I have performed Exploratory Data Analysis (EDA) on Spotify data using Python(Jupyter Notebook). The analysis includes data exploration, data cleaning, data visualization, and deriving insights from the data. The scripts and notebooks in this repository cover different techniques and visualizations used in EDA.
The purpose of this EDA is to understand the data, discover patterns, identify anomalies, and derive meaningful insights that can help to make informed decisions or guide for further analysis.
To run the code in this repository, you need to have Python and the required libraries installed.
The following libraries are used in this project:
Pandas : [https://pandas.pydata.org/]
NumPy : [https://numpy.org/install/]
Matplotlib : [https://matplotlib.org/]
Seaborn : [https://seaborn.pydata.org/]
If you don't have Python installed, you can download it from the official Python website: [(https://www.python.org/downloads/)] Or you can use Anaconda (Jupyter Notebook) : [https://www.anaconda.com/]
-
Clone the repository to your local machine: git clone https://github.com/Rupanavale/EDA-spotify-using-Python.git
-
Navigate to the specific dataset or analysis of interest.
-
Open the Python script or Jupyter notebook in your preferred environment (e.g., Jupyter Notebook, JupyterLab, or any Python IDE).
-
Execute the code cells or run the script to perform the EDA.
-
Explore the visualizations, summary statistics, and insights obtained from the analysis.
The dataset used in this project is sourced from two Kaggle datasets which are as follows:
- [https://www.kaggle.com/datasets/arnabchaki/indian-restaurants-2023]
- [https://www.kaggle.com/datasets/lehaknarnauli/spotify-datasets?select=artists.csv]
- Microsoft Excel
- Python(Jupyter Notebook)
- Which are the top 10 lowest Popular songs?
- What are the top 10 songs with popularity greater than 90?
- What is the correlation between all variables?
- Are loudness & energy correlated?
- Are popularity & acousticness correlated?
- How is the distribution of total number of songs each year since 1922?
- What is the duration of songs over the years?
- What is the average duration of songs over the years?
- What is the duration of songs for different genres?
- What are the top 5 genres by populrity?
- There is high positive correlation between loudness & energy.
- There is negative correlation between popularity & acousticness.
- Number of songs each year were increased in the recent years since music became more accessible to the people globally with technological advancement
- Duration of the songs in 1920's were less and later it increased in late 1930's, this remained consistent until 2010 where the duration of songs were high & after 2010 again the duration of songs started decreasing.
- Duration of songs of 'World' genre are highest & lowest for 'Children's Music'.
- Top 5 genres by popularity are 'Dance', 'Pop', 'Rap', 'Hip-Hop', 'Reggaeton'.
This project is licensed under the MIT License.