# Amazon Interview Question
Write a query to provide the date for nth occurence of Sunday in future from given date

In [0]:
WITH params AS (
  SELECT DATE('2022-01-01') AS start_date, 3 AS n
),
calc AS (
  SELECT 
    start_date,
    n,
    DAYOFWEEK(start_date) AS weekday_num
  FROM params
),
next_sunday AS (
  SELECT
    start_date,
    n,
    CASE 
      WHEN DAYOFWEEK(start_date) = 1 THEN 0  -- Already Sunday
      ELSE 8 - DAYOFWEEK(start_date)
    END AS days_to_next_sunday
  FROM calc
)
SELECT 
  start_date,
  n,
  DATE_ADD(start_date, days_to_next_sunday + ((n - 1) * 7)) AS nth_sunday_date
FROM next_sunday;

In [0]:
%python
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

# Create Spark session (already available in Databricks)
spark = SparkSession.builder.getOrCreate()

# Input parameters
start_date = '2022-01-01'
n = 3

# Create initial DataFrame with params
df = spark.createDataFrame([(start_date, n)], ["start_date", "n"]) \
           .withColumn("start_date", F.to_date("start_date"))

# Step 1Ô∏è‚É£: Calculate weekday number (Sunday = 1, Saturday = 7)
df = df.withColumn("weekday_num", F.dayofweek("start_date"))

# Step 2Ô∏è‚É£: Calculate days to next Sunday
df = df.withColumn(
    "days_to_next_sunday",
    F.when(F.col("weekday_num") == 1, F.lit(0))  # Already Sunday
     .otherwise(8 - F.col("weekday_num"))         # Days until next Sunday
)

# Step 3Ô∏è‚É£: Add (n-1)*7 + days_to_next_sunday
df = df.withColumn(
    "nth_sunday_date",
    F.date_add(
        "start_date",
        F.col("days_to_next_sunday") + ((F.col("n") - 1) * 7).cast("int")
    )
)

# Step 4Ô∏è‚É£: Display final result
display(df.select("start_date", "n", "nth_sunday_date"))


# ‚úÖ **1) Deep analysis of the SQL code**

The SQL does the following:

* Takes a starting date (`2022-01-01`).
* Takes a number `n` (3).
* Determines what day of the week the start date falls on.
* Calculates how many days until the *next* Sunday.
* Adds `(n-1) * 7` days to that next Sunday to get the **nth upcoming Sunday** after the start date.
* Returns the computed date.

This strongly hints the problem is about **finding the nth Sunday after a given date**.

---

# ‚úÖ **2) Reconstructed original problem (from SQL logic)**

**Likely original question:**

> ‚ÄúGiven a starting date and a number *n*, find the date of the *nth* Sunday that occurs on or after the starting date.‚Äù

---

# ‚úÖ **3) Plain English explanation of the problem**

You choose a date, for example 1st January 2022.
You choose a number `n`, like 3.

Now you want to know:
üëâ *What is the date of the 3rd Sunday after (or including) Jan 1, 2022?*

You first find the next Sunday.
Then you keep adding 7 days until the nth Sunday arrives.

---

# ‚úÖ **4) Why this problem matters & what concepts it tests**

This is a very realistic date-manipulation SQL problem.

**It tests your ability to:**

* Work with *date arithmetic* (`DATE_ADD`, day differences, intervals).
* Understand *DAYOFWEEK()* (1 = Sunday, 2 = Monday, etc.).
* Use *CASE statements* to handle special conditions.
* Use *week calculations* with multiples of 7.
* Build logic in modular steps using *CTEs*.

This is common in scheduling, subscription billing, forecasting, and calendar-related analytics.

---

# ‚úÖ **5) How to logically think about solving this (general methodology)**

When solving date-nth-weekday questions, think like this:

### **Step 1 ‚Äî Understand the weekday of the starting date**

Use functions like `DAYOFWEEK(date)`, `WEEKDAY(date)`, etc.

### **Step 2 ‚Äî Determine how many days to reach the next target weekday**

If start date is already the correct weekday ‚Üí special handling.

### **Step 3 ‚Äî Add the offset to get the first occurrence**

For a Sunday, this may be between 0 and 6 days away.

### **Step 4 ‚Äî Add (n-1) * 7 days**

Because each additional Sunday is exactly 7 days apart.

### **Step 5 ‚Äî Return the computed date**

That is your nth Sunday.

---

# ‚úÖ **6) Line-by-line breakdown of the SQL**

Given SQL:

```sql
WITH params AS (
  SELECT DATE('2022-01-01') AS start_date, 3 AS n
),
calc AS (
  SELECT 
    start_date,
    n,
    DAYOFWEEK(start_date) AS weekday_num
  FROM params
),
next_sunday AS (
  SELECT
    start_date,
    n,
    CASE 
      WHEN DAYOFWEEK(start_date) = 1 THEN 0  -- Already Sunday
      ELSE 8 - DAYOFWEEK(start_date)
    END AS days_to_next_sunday
  FROM calc
)
SELECT 
  start_date,
  n,
  DATE_ADD(start_date, days_to_next_sunday + ((n - 1) * 7)) AS nth_sunday_date
FROM next_sunday;
```

### **CTE 1 ‚Äî `params`**

Defines inputs:

* `start_date` = `'2022-01-01'`
* `n` = 3

### **CTE 2 ‚Äî `calc`**

Computes:

* The weekday number of the start date.
  `DAYOFWEEK()` returns 1 = Sunday.

Example:
1 Jan 2022 ‚Üí Saturday ‚Üí 7

### **CTE 3 ‚Äî `next_sunday`**

Computes `days_to_next_sunday`.

Formula:

* If the date is already Sunday ‚Üí 0 days.
* Otherwise ‚Üí `8 - weekday_number`.

Example for Saturday:
`8 - 7 = 1` (next Sunday is 1 day away)

### **Final SELECT**

Computes the nth Sunday:

```
DATE_ADD(start_date, days_to_next_sunday + ((n - 1) * 7))
```

Meaning:

* Add days to next Sunday.
* Then add 7 for each additional Sunday (n-1 times).

---

# ‚úÖ **7) Rewritten SQL (clean, readable, fully commented)**

```sql
-- Find the nth Sunday on or after a given start date
WITH params AS (
  -- Inputs
  SELECT 
    DATE('2022-01-01') AS start_date,
    3 AS n
),

weekday_calc AS (
  -- Determine day of week (1 = Sunday, 2 = Monday, ‚Ä¶)
  SELECT 
    start_date,
    n,
    DAYOFWEEK(start_date) AS weekday_num
  FROM params
),

next_sunday_calc AS (
  -- Compute how many days until the next Sunday
  SELECT 
    start_date,
    n,
    CASE 
      WHEN weekday_num = 1 THEN 0       -- already Sunday
      ELSE 8 - weekday_num              -- days until next Sunday
    END AS days_to_next_sunday
  FROM weekday_calc
)

-- Compute nth Sunday after start date
SELECT 
  start_date,
  n,
  DATE_ADD(
    start_date, 
    days_to_next_sunday + ((n - 1) * 7)
  ) AS nth_sunday
FROM next_sunday_calc;
```

---

# ‚úÖ **8) Equivalent PySpark DataFrame code**

```python
from pyspark.sql import functions as F

# 1) Inputs
params_df = spark.createDataFrame(
    [("2022-01-01", 3)],
    ["start_date", "n"]
).withColumn("start_date", F.to_date("start_date"))

# 2) Add weekday number
calc_df = params_df.withColumn(
    "weekday_num", 
    F.dayofweek("start_date")
)

# 3) Days to next Sunday
next_sunday_df = calc_df.withColumn(
    "days_to_next_sunday",
    F.when(F.col("weekday_num") == 1, 0)
     .otherwise(8 - F.col("weekday_num"))
)

# 4) Compute nth Sunday
result_df = next_sunday_df.withColumn(
    "nth_sunday_date",
    F.date_add(
        "start_date",
        F.col("days_to_next_sunday") + (F.col("n") - 1) * 7
    )
)

result_df.show(truncate=False)
```

---

# ‚úÖ **9) Line-by-line explanation of the PySpark code**

### **params_df**

Creates the input values.

### **weekday_num**

`dayofweek()` returns 1‚Äì7 with 1 = Sunday.

### **days_to_next_sunday**

Same logic as SQL CASE:

* If already Sunday ‚Üí 0 days
* Else ‚Üí `8 - weekday_num`

### **nth_sunday_date**

Computes:

```
start_date + days_to_next_sunday + (n-1)*7
```

Which matches the SQL exactly.

---

# ‚úÖ **10) How thinking differs in SQL vs PySpark**

| SQL Thinking                                  | PySpark Thinking                                |
| --------------------------------------------- | ----------------------------------------------- |
| Write everything declaratively using CTEs     | Build a transformation pipeline step-by-step    |
| Window functions and built-in date arithmetic | Mostly the same, but you use DataFrame APIs     |
| Easy CASE expressions                         | Use `when().otherwise()`                        |
| SQL calculator-like operations                | Must translate into DataFrame column operations |
| CTEs represent logical steps                  | Intermediate DataFrames represent steps         |

But the **logic stays identical**.

---

# ‚úÖ **11) Hints for solving (3 basic + 3 advanced)**

### **Basic Hints**

1. Compute the weekday of the start date first.
2. Determine how many days until the next Sunday.
3. Add 7 days repeatedly to reach the nth Sunday.

### **Advanced Hints**

1. Use a CASE (or `when`) expression to handle the ‚Äúalready Sunday‚Äù case.
2. Use `(n-1) * 7` because you only add full weeks *after* the first Sunday.
3. `DAYOFWEEK()` gives 1 = Sunday; use that to compute offset easily.

---

# ‚úÖ **12) Final SQL & PySpark solutions (clean)**

### **FINAL SQL**

```sql
WITH params AS (
  SELECT DATE('2022-01-01') AS start_date, 3 AS n
),

weekday_calc AS (
  SELECT 
    start_date,
    n,
    DAYOFWEEK(start_date) AS weekday_num
  FROM params
),

next_sunday_calc AS (
  SELECT 
    start_date,
    n,
    CASE 
      WHEN weekday_num = 1 THEN 0
      ELSE 8 - weekday_num
    END AS days_to_next_sunday
  FROM weekday_calc
)

SELECT 
  start_date,
  n,
  DATE_ADD(start_date, days_to_next_sunday + ((n - 1) * 7)) AS nth_sunday
FROM next_sunday_calc;
```

---

### **FINAL PYSPARK**

```python
from pyspark.sql import functions as F

params_df = spark.createDataFrame(
    [("2022-01-01", 3)],
    ["start_date", "n"]
).withColumn("start_date", F.to_date("start_date"))

calc_df = params_df.withColumn(
    "weekday_num",
    F.dayofweek("start_date")
)

next_sunday_df = calc_df.withColumn(
    "days_to_next_sunday",
    F.when(F.col("weekday_num") == 1, 0)
     .otherwise(8 - F.col("weekday_num"))
)

result_df = next_sunday_df.withColumn(
    "nth_sunday_date",
    F.date_add(
        "start_date",
        F.col("days_to_next_sunday") + (F.col("n") - 1) * 7
    )
)

result_df.show(truncate=False)
```

---

# ‚úÖ **13) Final learning takeaway**

To solve any ‚Äúnth weekday after a given date‚Äù problem:

1. Convert the date into a weekday number
2. Calculate how far away the next target weekday is
3. Add `(n-1) * 7` to jump to the nth occurrence
4. Use date arithmetic functions to compute final date

You can repeat this pattern for:

* nth Monday
* nth Friday
* nth Weekend day
* nth business day
* nth occurrence of ANY weekday

---