## Optimized Query

In [None]:
-- Optimized query
SELECT
    c.Name,
    f.Role,
    SUM(c.hours) AS Total_Tracked_Hours,
    SUM(f.Estimed_Hours) AS Total_Allocated_Hours,
    MAX(c.Date) AS Latest_Date  
FROM
    ClickUp c
JOIN
    Float f
    ON c.Name = f.Name
GROUP BY
    c.Name,
    f.Role
HAVING
    SUM(c.hours) > 100
ORDER BY
    Total_Allocated_Hours DESC;


### Optimizations

In [None]:
MAX(c.Date) AS Latest_Date  

- Since the `date` column isn't in the `GROUP BY` clause, I replaced it with `MAX(c.Date)`, assuming we want the latest date for each `Name` and `Role`. This makes the query simpler and keeps it logically correct.

In [None]:
-- Index on `Name` in ClickUp and Flaot tables to optimize the JOIN operation
CREATE INDEX idx_clickup_name ON ClickUp(Name);
CREATE INDEX idx_float_name ON Float(Name);

-- Index on `hours` in ClickUp table to optimize the aggregation and filtering in HAVING clause
CREATE INDEX idx_clickup_hours ON ClickUp(hours);

- I prefer indexing the `Name` column in both the `Clickup` and `Float` tables and the `hours` column on the `Clickup` table to improve the efficiency of the `JOIN` operation, especially with large datasets.  

    The `Name` indexes will ensure that the database can efficiently match rows between the `ClickUp` and `Float` tables during the `JOIN` operation, also index on `hours` in the `ClickUp` table helps optimize the `SUM(c.hours)` aggregation and makes the filtering in the `HAVING` clause faster.

In [None]:
CREATE TABLE ClickUp (
    ...
    PARTITION BY RANGE (YEAR(Date)) (
        PARTITION p_2023 VALUES LESS THAN (2024),
        PARTITION p_2024 VALUES LESS THAN (2025),
        ...
    );
);

- Partitioning the data by date is especially useful when dealing with large datasets. 

    It allows the database engine to focus only on the specific partitions needed, making queries like `SUM` or `JOIN` faster by reducing the amount of data it needs to scan.