# [SQL Interview Questions on Data Lemur - Hard](https://datalemur.com/questions?difficulty=Hard&category=SQL)

##### Solved by: Dorothy Kunth

### 1. [Active User Retention - Facebook](https://datalemur.com/questions/user-retention)

Assume you're given a table containing information on Facebook user actions. Write a query to obtain number of monthly active users (MAUs) in July 2022, including the month in numerical format "1, 2, 3".

Hint:

An active user is defined as a user who has performed actions such as 'sign-in', 'like', or 'comment' in both the current month and the previous month.

``user_actions`` table

| user_id | event_id | event_type | event_date          |
|---------|----------|------------|---------------------|
| 445     | 7765     | sign-in    | 05/31/2022 12:00:00 |
| 445     | 3634     | like       | 06/05/2022 12:00:00 |
| 648     | 3124     | like       | 06/18/2022 12:00:00 |
| 648     | 2725     | sign-in    | 06/22/2022 12:00:00 |
| 648     | 8568     | comment    | 07/03/2022 12:00:00 |
| 445     | 4363     | sign-in    | 07/05/2022 12:00:00 |
| 445     | 2425     | like       | 07/06/2022 12:00:00 |
| 445     | 2484     | like       | 07/22/2022 12:00:00 |
| 648     | 1423     | sign-in    | 07/26/2022 12:00:00 |
| 445     | 5235     | comment    | 07/29/2022 12:00:00 |
| 742     | 6458     | sign-in    | 07/03/2022 12:00:00 |
| 742     | 1374     | comment    | 07/19/2022 12:00:00 |


### Solution
1. Use window function LAG() partition by user_id and order by event_date then extract the month

```sql
SELECT
    user_id,
    EXTRACT(MONTH FROM event_date) AS event_month,
    EXTRACT(MONTH FROM LAG(event_date) OVER (PARTITION BY user_id ORDER BY event_date)) AS previous_month,
    EXTRACT(MONTH FROM event_date) - 
        EXTRACT(MONTH FROM LAG(event_date) OVER (PARTITION BY user_id ORDER BY event_date)) AS diff
FROM user_actions
```
| user_id | event_month | previous_month | diff |
|---------|-------------|----------------|------|
| 445     | 5           | NULL           | NULL |
| 445     | 6           | 5              | 1    |
| 445     | 7           | 6              | 1    |
| 445     | 7           | 7              | 0    |
| 445     | 7           | 7              | 0    |
| 445     | 7           | 7              | 0    |
| 648     | 6           | NULL           | NULL |
| 648     | 6           | 6              | 0    |
| 648     | 7           | 6              | 1    |
| 648     | 7           | 7              | 0    |
| 742     | 7           | NULL           | NULL |
| 742     | 7           | 7              | 0    |


<br><br>
2. Use the above query as an inline query (FROM clause) and where event_month is 7 and diff is 1
```sql
SELECT
  event_month AS month,
  COUNT(DISTINCT user_id) AS monthly_active_users
FROM(
  SELECT
    user_id,
    EXTRACT(MONTH FROM event_date) AS event_month,
    EXTRACT(MONTH FROM LAG(event_date) OVER (PARTITION BY user_id ORDER BY event_date)) AS previous_month,
    EXTRACT(MONTH FROM event_date) - 
        EXTRACT(MONTH FROM LAG(event_date) OVER (PARTITION BY user_id ORDER BY event_date)) AS diff
  FROM user_actions) user_activity
WHERE event_month = 7 
  AND diff = 1
GROUP BY 1
```
| month | monthly_active_users |
|-------|----------------------|
| 7     | 2                    |


### 2. [Year-on-Year Growth Rate - Wayfair](https://datalemur.com/questions/yoy-growth-rate)
Assume you're given a table containing information about Wayfair user transactions for different products. Write a query to calculate the year-on-year growth rate for the total spend of each product, grouping the results by product ID.

The output should include the year in ascending order, product ID, current year's spend, previous year's spend and year-on-year growth percentage, rounded to 2 decimal places.

``user_transactions`` table

| transaction_id | product_id | spend   | transaction_date    |
|----------------|------------|---------|---------------------|
| 1341           | 123424     | 1500.60 | 12/31/2019 12:00:00 |
| 1423           | 123424     | 1000.20 | 12/31/2020 12:00:00 |
| 1623           | 123424     | 1246.44 | 12/31/2021 12:00:00 |
| 1322           | 123424     | 2145.32 | 12/31/2022 12:00:00 |
| 1344           | 234412     | 1800.00 | 12/31/2019 12:00:00 |
| 1435           | 234412     | 1234.00 | 12/31/2020 12:00:00 |
| 4325           | 234412     | 889.50  | 12/31/2021 12:00:00 |
| 5233           | 234412     | 2900.00 | 12/31/2022 12:00:00 |
| 2134           | 543623     | 6450.00 | 12/31/2019 12:00:00 |
| 1234           | 543623     | 5348.12 | 12/31/2020 12:00:00 |
| 2423           | 543623     | 2345.00 | 12/31/2021 12:00:00 |
| 1245           | 543623     | 5680.00 | 12/31/2022 12:00:00 |

### Solution

```sql
SELECT
  EXTRACT(YEAR FROM transaction_date) AS year,
  product_id,
  spend AS curr_year_spend,
  LAG(spend) OVER(PARTITION BY product_id ORDER BY transaction_date) AS prev_year_spend,
  ROUND(
    (spend - LAG(spend) OVER(PARTITION BY product_id ORDER BY transaction_date)) / 
    LAG(spend) OVER(PARTITION BY product_id ORDER BY transaction_date) * 100, 2) AS yoy_growth
FROM user_transactions
```

| year | product_id | curr_year_spend | prev_year_spend | yoy_growth |
|------|------------|-----------------|-----------------|------------|
| 2019 | 123424     | 1500.60         | NULL            | NULL       |
| 2020 | 123424     | 1000.20         | 1500.60         | -33.35     |
| 2021 | 123424     | 1246.44         | 1000.20         | 24.62      |
| 2022 | 123424     | 2145.32         | 1246.44         | 72.12      |
| 2019 | 234412     | 1800.00         | NULL            | NULL       |
| 2020 | 234412     | 1234.00         | 1800.00         | -31.44     |
| 2021 | 234412     | 889.50          | 1234.00         | -27.92     |
| 2022 | 234412     | 2900.00         | 889.50          | 226.03     |
| 2019 | 543623     | 6450.00         | NULL            | NULL       |
| 2020 | 543623     | 5348.12         | 6450.00         | -17.08     |
| 2021 | 543623     | 2345.00         | 5348.12         | -56.15     |
| 2022 | 543623     | 5680.00         | 2345.00         | 142.22     |


### 3. [Maximize Prime Item Inventory - Amazon](https://datalemur.com/questions/prime-warehouse-storage)
Amazon wants to maximize the number of items it can stock in a 500,000 square feet warehouse. It wants to stock as many prime items as possible, and afterwards use the remaining square footage to stock the most number of non-prime items.

Write a query to find the number of prime and non-prime items that can be stored in the 500,000 square feet warehouse. Output the item type with prime_eligible followed by not_prime and the maximum number of items that can be stocked.

Effective April 3rd 2023, we added some new assumptions to the question to provide additional clarity.

Assumptions:

- Prime and non-prime items have to be stored in equal amounts, regardless of their size or square footage. This implies that prime items will be stored separately from non-prime items in their respective containers, but within each container, all items must be in the same amount.
- Non-prime items must always be available in stock to meet customer demand, so the non-prime item count should never be zero.
- Item count should be whole numbers (integers).

```inventory``` table

| item_id | item_type      | item_category     | square_footage |
|---------|----------------|-------------------|----------------|
| 1374    | prime_eligible | mini refrigerator | 68.00          |
| 4245    | not_prime      | standing lamp     | 26.40          |
| 5743    | prime_eligible | washing machine   | 325.00         |
| 8543    | not_prime      | dining chair      | 64.50          |
| 2556    | not_prime      | vase              | 15.00          |
| 2452    | prime_eligible | television        | 85.00          |
| 3255    | not_prime      | side table        | 22.60          |
| 1672    | prime_eligible | laptop            | 8.50           |
| 4256    | prime_eligible | wall rack         | 55.50          |
| 6325    | prime_eligible | desktop computer  | 13.20          |



##### in progress