---

## ✅ Problem Statement

You are given two tables:

* `Trip(driver_id, trip_date, distance, fuel_consumption)`
* `Driver(driver_id, driver_name)`

Each row in the `Trip` table records a trip taken by a driver, with the distance traveled and fuel consumed.

Your task is to:

1. Calculate the **fuel efficiency** of each trip as:
   `fuel_efficiency = distance / fuel_consumption`
2. For each driver:

   * Calculate their **average fuel efficiency** for:

     * First half of the year (Jan–Jun)
     * Second half of the year (Jul–Dec)
3. Find and display the drivers whose **second half average fuel efficiency is higher** than the first half.
4. Show the amount of **efficiency improvement** and sort:

   * First by the **most improved** (descending difference)
   * Then by **driver name** (ascending)

---

## 📥 Example Input

### Table: `Trip`

| driver\_id | trip\_date | distance | fuel\_consumption |
| ---------- | ---------- | -------- | ----------------- |
| 1          | 2025-01-10 | 100      | 10                |
| 1          | 2025-03-20 | 200      | 20                |
| 1          | 2025-07-10 | 150      | 10                |
| 1          | 2025-08-15 | 100      | 5                 |
| 2          | 2025-04-15 | 180      | 20                |
| 2          | 2025-11-01 | 100      | 10                |
| 3          | 2025-09-01 | 200      | 25                |

### Table: `Driver`

| driver\_id | driver\_name |
| ---------- | ------------ |
| 1          | Alice        |
| 2          | Bob          |
| 3          | Charlie      |

---

## 📤 Expected Output

| driver\_id | driver\_name | first\_half\_avg | second\_half\_avg | efficiency\_improvement |
| ---------- | ------------ | ---------------- | ----------------- | ----------------------- |
| 1          | Alice        | 10.00            | 17.50             | 7.50                    |
| 2          | Bob          | 9.00             | 10.00             | 1.00                    |

> ✅ `Charlie` is excluded — no first-half trip, so no comparison
> ✅ Fuel efficiency is better when **higher**
> ✅ `efficiency_improvement = second_half_avg - first_half_avg`

---

## 🧪 Sample Test Cases

---

### 🔹 **Test Case 1: Both halves present**

```sql
Trip:
| driver_id | trip_date  | distance | fuel_consumption |
|-----------|------------|----------|------------------|
| 1         | 2025-01-10 | 100      | 10               |
| 1         | 2025-08-10 | 100      | 5                |

Driver:
| driver_id | driver_name |
|-----------|-------------|
| 1         | Alice       |
```

**Output:**

| driver\_id | driver\_name | first\_half\_avg | second\_half\_avg | efficiency\_improvement |
| ---------- | ------------ | ---------------- | ----------------- | ----------------------- |
| 1          | Alice        | 10.00            | 20.00             | 10.00                   |

---

### 🔹 **Test Case 2: No second half trip**

```sql
Trip:
| driver_id | trip_date  | distance | fuel_consumption |
|-----------|------------|----------|------------------|
| 2         | 2025-02-10 | 200      | 20               |

Driver:
| driver_id | driver_name |
|-----------|-------------|
| 2         | Bob         |
```

**Output:** *(Empty)* – Driver has no second half trip.

---

### 🔹 **Test Case 3: Efficiency reduced in second half**

```sql
Trip:
| driver_id | trip_date  | distance | fuel_consumption |
|-----------|------------|----------|------------------|
| 3         | 2025-01-10 | 100      | 5                |
| 3         | 2025-10-10 | 100      | 10               |

Driver:
| driver_id | driver_name |
|-----------|-------------|
| 3         | Charlie     |
```

**Output:** *(Empty)* – Efficiency decreased, not selected.

---

## ✅ Notes:

* `fuel_efficiency = distance / fuel_consumption` — higher is better.
* Drivers **must** have data in both halves to qualify.
* Only those with **increased fuel efficiency** (i.e., improved driving habits or vehicle condition) are returned.

Would you like a SQL schema + insert statements + query execution example for this?


In [None]:
WITH CTE1 AS (
    SELECT 
        *,
        distance / fuel_consumption AS fuel_efficiency
    FROM 
        Trip
),
CTE2 AS (
    SELECT 
        c.driver_id, 
        driver_name,
        CASE WHEN EXTRACT(MONTH FROM trip_date) BETWEEN 1 AND 6 
             THEN ROUND(AVG(fuel_efficiency), 2) 
             ELSE NULL END AS first_half_avg,
        CASE WHEN EXTRACT(MONTH FROM trip_date) BETWEEN 7 AND 12 
             THEN ROUND(AVG(fuel_efficiency), 2) 
             ELSE NULL END AS second_half_avg
        from cte c
        left_join driver d
        on c.driver_id = d.driver_id
        group by c.driver_id, driver_name
)

Select *, second_half_avg-first_half_avg as efficiency_improvement

FROM CTE2 
where second_half_avg > first_half_avg
ORDER BY
efficiency_improvement DESC, driver_name;

### With a single CTE

In [None]:
WITH FuelEfficiencyByDriver AS (
    SELECT 
        t.driver_id,
        d.driver_name,
        EXTRACT(MONTH FROM t.trip_date) AS trip_month,
        (t.distance / NULLIF(t.fuel_consumption, 0)) AS fuel_efficiency
    FROM 
        Trip t
    LEFT JOIN 
        Driver d ON t.driver_id = d.driver_id
)
SELECT 
    driver_id,
    driver_name,
    ROUND(AVG(CASE WHEN trip_month BETWEEN 1 AND 6 THEN fuel_efficiency END), 2) AS first_half_avg,
    ROUND(AVG(CASE WHEN trip_month BETWEEN 7 AND 12 THEN fuel_efficiency END), 2) AS second_half_avg,
    ROUND(
        AVG(CASE WHEN trip_month BETWEEN 7 AND 12 THEN fuel_efficiency END) -
        AVG(CASE WHEN trip_month BETWEEN 1 AND 6 THEN fuel_efficiency END), 2
    ) AS efficiency_improvement
FROM 
    FuelEfficiencyByDriver
GROUP BY 
    driver_id, driver_name
HAVING 
    AVG(CASE WHEN trip_month BETWEEN 1 AND 6 THEN fuel_efficiency END) IS NOT NULL AND
    AVG(CASE WHEN trip_month BETWEEN 7 AND 12 THEN fuel_efficiency END) IS NOT NULL AND
    AVG(CASE WHEN trip_month BETWEEN 7 AND 12 THEN fuel_efficiency END) > 
    AVG(CASE WHEN trip_month BETWEEN 1 AND 6 THEN fuel_efficiency END)
ORDER BY 
    efficiency_improvement DESC, 
    driver_name;
