# SQL Practice 12/2/2022

## Q1

Given the table of tweets below, write a query to calculate the 3-day rolling average of tweets published by each user for each date that a tweet was posted. Output the user id, the tweet date, and rolling averages rounded to 2 decimal places.

Data Information

`tweets` **Table**

Column Name | Type
------------|-----
twee_id | integer
user_id | integer
tweet_date | timestamp

You can assume rows in this table are consecutive and ordered by date. Each row represents a different day. A day that does not correspond to a row in this table is not counted.

My solution is below

```SQL
WITH counted AS (
  SELECT
    user_id,
    tweet_date,
    COUNT(*) AS count
  FROM tweets

  GROUP BY user_id, tweet_date

  ORDER BY user_id, tweet_date
)

SELECT user_id,
  tweet_date,
  ROUND(AVG(count) OVER (PARTITION BY user_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW), 2) AS rolling_avg_3days
  
FROM counted

;
```

What we are doing is first getting a count of each users tweets per day. So we apply a group by twice once over user_id and again over tweet_dates. Then we order by the same partition. Then to compute the rolling average we need to use a window function on the counts partioning now only over the user id and limiting it to only 2 before and the current row. This is not something I new you could do using a window function!

The syntax of 
```SQL
(PARTITION BY user_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
```
was new to me

Also, instead of ordering by the tweet date in the CTE, we could order by the tweet date in the window function.
```SQL
(PARTITION BY user_id ORDER BY tweet_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
```

Lastly, I originally had it written as 
```SQL
ROUND(AVG(count), 2) OVER (PARTITION BY user_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_avg_3days
```
But ROUND is not an aggregate function and cannot be applied over a partition. so the ROUND function must contain the entire window expression.

## Q2

Given the table of customers below, write a query to identify the top two highest-grossing products within each category in 2022. Output the category, product, and total spent

Data Information

`product_spend` **Table**

Column Name | Type
------------|-----
category | string
product | string
user_id | integer
spend | decimal
transaction_date | timestamp

My solution is below

```SQL
WITH numbered AS (
  SELECT
    category,
    product,
    SUM(spend) AS total_spend,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY SUM(spend) DESC) AS ranking
  FROM product_spend
  
  WHERE EXTRACT(YEAR FROM transaction_date) = '2022'

  GROUP BY category, product

  ORDER BY category, SUM(spend) DESC
  )
SELECT category, product, total_spend

FROM numbered

WHERE ranking <= 2
;
```