# SQL PROJECT- MUSIC STORE DATA ANALYSIS

![](/Workspace/GDS/SQL-Daily-Practice/SQL_Project/data/schema_diagram.png)

In [0]:
select a.artist_id as Artist_ID, a.name as Artist_Name 
from artist a
join album al on a.artist_id = al.artist_id
join track t on al.album_id = t.album_id
join genre g on t.genre_id = g.genre_id
where g.name = 'Rock'
-- where g.genre_id = 1
group by a.name, a.artist_id


In [0]:
select * 
from genre

In [0]:
use catalog workspace;

use schema gds_sql_project;

1. Who is the senior most employee based on job title?

In [0]:
--- SQL Solution

with ranked_emp as (
        select employee_id, first_name, last_name, birthdate, hire_date, title, row_number() over(partition by title order by hire_date, birthdate) as rn
        from employee
) 

select employee_id, first_name, last_name, birthdate, hire_date, title
from ranked_emp
where rn = 1;

In [0]:
%python
# PySpark Solution
df = spark.table("employee")
from pyspark.sql import functions as F
from pyspark.sql.window import Window

window_spec = Window.partitionBy("title").orderBy("hire_date", "birthdate")


# Add row_number column
ranked_emp = (
    df
        .withColumn("rn", F.row_number().over(window_spec))
)

# Filter senior-most employees
result = ranked_emp.filter(F.col("rn") == 1) \
                   .select("employee_id", "first_name", "last_name", "birthdate", "hire_date", "title")



display(result)

2. Which countries have the most Invoices?


In [0]:
select billing_country as Country , count(invoice_id) as Invoice_Count
from invoice
group by billing_country
order by count(invoice_id) desc
limit 10;

In [0]:
%python
from pyspark.sql import functions as F
df = spark.table("invoice")
df_final = df.groupBy("billing_country").count().orderBy(F.desc("count")).limit(10)
display(df_final)


3. What are top 3 values of total invoice?



In [0]:
select invoice_id , round(total,2) as Total 
from invoice
order by total desc
limit 3;

In [0]:
%python
from pyspark.sql import functions as F
df_1 = spark.table("invoice")
df_final = (
    df_1
    .orderBy(F.desc("total"))
    .limit(3)
    .withColumn("total_rounded", F.round(col("total"), 2))
)

display(df_final.select("invoice_id", "total_rounded"))

4. Which city has the best customers? 
We would like to throw a promotional Music Festival in the city we made the most money. Write a query that returns one city that has the highest sum of invoice totals. Return both the city name & sum of all invoice totals


In [0]:
select billing_city as City, round(sum(total), 2) as Total
from invoice
group by billing_city
order by Total desc
limit 1;

In [0]:
%python
df = spark.table("invoice")
df_final = (
    df
    .groupBy("billing_city")
    .agg(F.sum("total").alias("Invoice_Total"))
    .orderBy(F.desc("Invoice_Total"))
    .limit(1)
    .withColumn("Invoice_Total", F.round(col("Invoice_Total"), 2))
)

display(df_final.select("billing_city", "Invoice_Total"))


5. Who is the best customer? 
The customer who has spent the most money will be declared the best customer. Write a query that returns the person who has spent the most money

In [0]:
select customer_id as Customer_ID, round(sum(total), 2) as Invoice_Total
from invoice
group by customer_id
order by Invoice_Total desc
limit 1;

In [0]:
%python
df = spark.table("invoice")
df_final = (
    df
    .groupBy("customer_id")
    .agg(F.sum("total").alias("Invoice_Total"))
    .orderBy(F.desc("Invoice_Total"))
    .limit(1)
    .withColumn("Invoice_Total", F.round(col("Invoice_Total"), 2))
)

display(df_final.select("customer_id", "Invoice_Total"))

1. Write query to return the email, first name, last name, & Genre of all Rock Music listeners. Return your list ordered alphabetically by email starting with A



In [0]:
-- Returns Customer who are Rock Music Listeners
select 'Rock Music Listeners' AS Title, c.first_name as First_Name, c.last_name as Last_Name, c.email as Email
from customer c
join invoice i on c.customer_id = i.customer_id
join invoice_line il on i.invoice_id = il.invoice_id
join track t on il.track_id = t.track_id
join genre g on t.genre_id = g.genre_id
where g.name = 'Rock'
group by c.first_name, c.last_name, c.email
order by c.email;

2. Let's invite the artists who have written the most rock music in our dataset. Write a query that returns the Artist name and total track count of the top 10 rock bands



3. Return all the track names that have a song length longer than the average song length. 
Return the Name and Milliseconds for each track. Order by the song length with the longest songs listed first

1. Find how much amount spent by each customer on artists? Write a query to return customer name, artist name and total spent



2. We want to find out the most popular music Genre for each country. We determine the most popular genre as the genre with the highest amount of purchases. Write a query that returns each country along with the top Genre. For countries where the maximum number of purchases is shared return all Genres



3. Write a query that determines the customer that has spent the most on music for each country. Write a query that returns the country along with the top customer and how much they spent. For countries where the top amount spent is shared, provide all customers who spent this amount