# Readme File: Analytical Practice and Optimization Queries Spotify Dataset

## Overview

This project involves analyzing a [Spotify dataset]('https://www.kaggle.com/datasets/sanjanchaudhari/spotify-dataset') with various attributes about tracks, albums, and artists using SQL. It covers an end-to-end process of normalizing a denormalized dataset, performing SQL queries of varying complexity (easy, medium, and advanced), and optimizing query performance. The primary goals of the project are to practice advanced SQL skills and generate valuable insights.

### Tools I Used:

- **pgAdmin 4**: Used for creating the database and managing data storage.
- **PostgreSQL**: For writing queries to explore, query and manipulate the Spotify dataset.

### Questions to Practice:

#### Easy Lvl:


1. Retrieve the names of all tracks that have more than 1 billion streams.

2. List all albums along with their respective artists.

3. Get the total number of comments for tracks where licensed = TRUE.

4. Find all tracks that belong to the album type single.

5. Count the total number of tracks by each artist.

#### Medium Lvl:

6. Calculate the average danceability of tracks in each album.

7. Find the top 5 tracks with the highest energy values.

8. List all tracks along with their views and likes where official_video = TRUE.

9. For each album, calculate the total views of all associated tracks.

10. Retrieve the track names that have been streamed on Spotify more than YouTube.

#### Advanced Lvl:

11. What are the top 3 most-viewed tracks for each artist using window functions.

12. Tracks where the liveness score is above the average.

13. Use a WITH clause to calculate the difference between the highest and lowest energy values for tracks in each album.

14. Tracks where the energy-to-liveness ratio is greater than 1.2.

15. Calculate the cumulative sum of likes for tracks ordered by the number of views, using window functions.

### Create Table for Data:

![image-2.png](attachment:image-2.png)

### Exploratory Data Analysis:

Before diving into SQL, it’s important to understand the dataset thoroughly. The dataset contains attributes such as:

- Artist: The performer of the track.
- Track: The name of the song.
- Album: The album to which the track belongs.
- Album_type: The type of album (e.g., single or album).
- Various metrics such as danceability, energy, loudness, tempo, and more.

![image-2.png](attachment:image-2.png)

## Querying the Data

After the data is inserted, various SQL queries can be written to explore and analyze the data. Queries are categorized into easy, medium, and advanced levels to help progressively develop SQL proficiency.

### Easy Queries

- Simple data retrieval, filtering and basic aggregations.

### Medium Queries

- More complex queries involving grouping, aggregation functions and joins.

### Advanced Queries

- Nested subqueries, window functions, CTEs and performance optimization.

### Answers the Practice questions with Queries:

#### 1. Easy Lvl:

![image-2.png](attachment:image-2.png)

#### Result:

![image.png](attachment:image.png)

![image.png](attachment:image.png)

#### Result:

![image.png](attachment:image.png)

![image-3.png](attachment:image-3.png)

#### Result:

![image.png](attachment:image.png)

![image.png](attachment:image.png)

#### Result:

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

#### Result:

![image.png](attachment:image.png)

#### Medium Lvl:

![image-2.png](attachment:image-2.png)

#### Result:

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

#### Result:

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

#### Result:

![image.png](attachment:image.png)

![image.png](attachment:image.png)

#### Result:

![image.png](attachment:image.png)

![image.png](attachment:image.png)

#### Result:

![image.png](attachment:image.png)

### Advanced Lvl:

![image.png](attachment:image.png)

#### Result:

![image.png](attachment:image.png)

![image.png](attachment:image.png)

#### Result:

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

#### Result:

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

#### Result:

![image.png](attachment:image.png)

![image.png](attachment:image.png)

#### Result:

![image.png](attachment:image.png)

## Query Optimization:

In advanced stages, the focus shifts to improving query performance. Some optimization strategies include:
- Indexing: Adding indexes on frequently queried columns.

- Query Execution Plan: Using EXPLAIN ANALYZE to review and refine query performance.

![image.png](attachment:image.png)

Explain Graphic:

![image.png](attachment:image.png)

- This Query Execution Plan is showing how database processes the query step by step.

- Input Table ("spotify"): Source data is loaded.
- Sorting: Data is ordered based on a specified criterion.
- Limit: Only a specific number of rows is selected (e.g., top results).

Analyze of Performance: 

![image.png](attachment:image.png)

- Noticed that Planning Time: 0.07 ms and Execution Time: 6.822 ms.

- We'll check for the moment how performance will be look with "Created Index".

#### Create Index:

![image.png](attachment:image.png)

Explain Graphic:

![image-2.png](attachment:image-2.png)

- artist_index: The database uses the artist_index to quickly find all rows for a specific artist.

#### After create index, We can check again Planning and Execution time of Performance our Query.

![image-3.png](attachment:image-3.png)

- Planning Time: 0.152 ms , Execution Time: 0.091ms

- Looks like our Query was execute about 7 times faster than Query without using index. It's showing us huge difference.

##### Conclusion:
- This optimization shows how indexing can drastically reduce query time, improving the overall performance of our database operations in the Spotify project.

## Optionally steps  for User:

- Visualize the Data: Use a data visualization tool like Tableau or Power BI to create dashboards based on the query results.

- Expand Dataset: Add more rows to the dataset for broader analysis and scalability testing.

- Advanced Querying: Dive deeper into query optimization and explore the performance of SQL queries on larger datasets.