
---

# **Problem: Find COVID Recovery Patients**

You have two tables:

### Table: `covid_tests`

| Column      | Type    | Description                                             |
| ----------- | ------- | ------------------------------------------------------- |
| test\_id    | int     | Unique ID of the test (chronological order)             |
| patient\_id | int     | Patient's unique ID                                     |
| test\_date  | date    | Date when test was taken                                |
| result      | varchar | Result of test: 'Positive', 'Negative', or 'Conclusive' |

* We **ignore** tests with `"Conclusive"` results in this problem.

---

### Table: `patients`

| Column        | Type    | Description       |
| ------------- | ------- | ----------------- |
| patient\_id   | int     | Unique patient ID |
| patient\_name | varchar | Patient’s name    |
| age           | int     | Patient’s age     |

---

# **Goal**

For each patient who tested **Positive** at least once and then later tested **Negative**, find:

* The **first date** they tested positive.
* The **first date after that** they tested negative.
* The number of days between these two tests (their **recovery time**).

Return the following columns:

* `patient_id`
* `patient_name`
* `age`
* `recovery_time` (days between first positive and first negative test)

Order the results by recovery time (smallest first), then by patient name.

---

# **Example Data**

### covid\_tests

| test\_id | patient\_id | test\_date | result     |
| -------- | ----------- | ---------- | ---------- |
| 1        | 101         | 2020-05-01 | Positive   |
| 2        | 101         | 2020-05-10 | Negative   |
| 3        | 102         | 2020-05-03 | Positive   |
| 4        | 102         | 2020-05-05 | Positive   |
| 5        | 102         | 2020-05-15 | Negative   |
| 6        | 103         | 2020-05-02 | Negative   |
| 7        | 103         | 2020-05-08 | Positive   |
| 8        | 103         | 2020-05-18 | Negative   |
| 9        | 104         | 2020-05-05 | Positive   |
| 10       | 104         | 2020-05-06 | Conclusive |

---

### patients

| patient\_id | patient\_name | age |
| ----------- | ------------- | --- |
| 101         | Alice         | 28  |
| 102         | Bob           | 35  |
| 103         | Charlie       | 40  |
| 104         | Diana         | 30  |

---

# **Expected Output**

| patient\_id | patient\_name | age | recovery\_time |
| ----------- | ------------- | --- | -------------- |
| 101         | Alice         | 28  | 9              |
| 103         | Charlie       | 40  | 10             |
| 102         | Bob           | 35  | 12             |

(Note: Diana is excluded because she does not have a negative test after positive — and “Conclusive” tests are ignored.)

---



```sql
    WITH first_positive AS
    (
        SELECT patient_id, MIN(test_date) AS first_positive_date
        FROM covid_tests
        WHERE result = 'Positive'
        GROUP BY patient_id
    ),
    first_negative AS 
    (
        SELECT c.patient_id, MIN(c.test_date) AS first_negative_date
        FROM covid_tests c
        INNER JOIN first_positive fp ON c.patient_id = fp.patient_id
        WHERE c.test_date > fp.first_positive_date
          AND c.result = 'Negative'
        GROUP BY c.patient_id
    )
    SELECT 
      p.patient_id,
      p.patient_name,
      p.age,
      DATEDIFF(day, fp.first_positive_date, fn.first_negative_date) AS recovery_time
    FROM first_positive fp
    JOIN first_negative fn ON fp.patient_id = fn.patient_id
    JOIN patients p ON p.patient_id = fp.patient_id
    ORDER BY recovery_time, p.patient_name;
```

### Optimized Version using Window function



### Possible optimization points:

1. **Joining `covid_tests` to itself?**
   You don’t join the table to itself here; you join the main table to the smaller CTE of first positives — which is good. Self-joins on large tables often hurt performance, but this avoids that.

2. **Indexes:**
   Make sure the `covid_tests` table has indexes on `(patient_id, test_date, result)` for fast filtering and sorting.

3. **Avoid repeated scans:**
   If the dataset is huge, sometimes splitting into multiple CTEs causes repeated scans. Some SQL engines inline CTEs, others materialize them. If your engine materializes, rewriting as a single query with window functions might be faster.

4. **Window functions alternative:**
   Using window functions like `ROW_NUMBER()` or `FILTER()` can sometimes be more performant and clearer:

Example:

```sql
WITH positives AS (
  SELECT 
    patient_id, test_date,
    ROW_NUMBER() OVER (PARTITION BY patient_id ORDER BY test_date) AS pos_rank
  FROM covid_tests
  WHERE result = 'Positive'
),
negatives AS (
  SELECT 
    patient_id, test_date,
    ROW_NUMBER() OVER (PARTITION BY patient_id ORDER BY test_date) AS neg_rank
  FROM covid_tests
  WHERE result = 'Negative'
)
SELECT 
  p.patient_id,
  pat.patient_name,
  pat.age,
  DATEDIFF(day, p.test_date, n.test_date) AS recovery_time
FROM positives p
JOIN negatives n ON p.patient_id = n.patient_id AND n.test_date > p.test_date
JOIN patients pat ON p.patient_id = pat.patient_id
WHERE p.pos_rank = 1
  AND n.test_date = (
    SELECT MIN(test_date) 
    FROM covid_tests c2 
    WHERE c2.patient_id = n.patient_id AND c2.result = 'Negative' AND c2.test_date > p.test_date
  )
ORDER BY recovery_time, pat.patient_name;
```

*Note:* The above is a more advanced pattern, but sometimes window functions can avoid repeated grouping and improve readability.

---



If you want to squeeze out more performance or your dataset is huge, explore:

* Window functions
* Proper indexing
* Execution plans to spot slow parts

