🔥 BlazeDB Query Processing

This project implements a basic query processor with a focus on efficient join evaluation. It handles the extraction of selection and join conditions from the WHERE clause and applies them in such a way as to reduce the size of intermediate results.

🔗 Join Condition Extraction

In this project, I employ an advanced strategy for processing SQL queries by extracting join conditions from the WHERE clause. This technique is key to optimizing query performance.

⚙️ How It Works:

🧩 Separation of Conditions: The WHERE clause often contains a mix of predicates. Some of these conditions are used to filter rows based on values from a single table (selection conditions), while others compare columns between two tables (join conditions). Our system analyzes the WHERE clause to differentiate between these two types.
📤 Extraction and Application of Join Conditions: Conditions that involve columns from two different tables are extracted as join conditions. Instead of evaluating these after all rows have been combined, they are attached directly to the join operators. During the join process (typically a tuple-nested-loop join in our implementation), the system evaluates these conditions on-the-fly as tuples are combined. This prevents the generation of large intermediate result sets that would later require filtering, thus avoiding unnecessary Cartesian products.
🌳 Left-Deep Join Tree Structure: To further enhance performance, the join operations are organized in a left-deep tree structure. This means that joins are performed in the order specified by the FROM clause, with each join operator processing only those tuple combinations that satisfy its associated join condition.

🌟 Benefits:

💨 Reduced Overhead: Extracting join conditions minimizes the computational cost of handling full Cartesian products, as only promising tuple pairs are combined and evaluated.
⚡ Optimized Query Execution: The clear separation between single-table and multi-table predicates enables the creation of an efficient operator tree, which is crucial for managing resource utilization and achieving fast query response times.

🛠️ Query Optimization Rules

To improve query performance and resource efficiency, I employ several optimization strategies. The main techniques implemented are Predicate Pushdown, Projection Pruning, and Selection (Predicate) Combination. Each of these optimizations contributes to reducing the volume and complexity of data processed during query evaluation.

1️⃣ Predicate Pushdown 🔽

Description: Predicate pushdown moves selection conditions (typically derived from the WHERE clause) as close as possible to the data source. This optimization ensures that filtering happens at the earliest stage (e.g., during or immediately after the scan operation), thereby reducing the volume of data that flows through the operator tree.

Why It Is Correct:

Applying a filter sooner in the execution plan does not change the semantics of the query because filtering conditions are logically independent of later operations like joins or projections.
The transformation is safe as long as the pushed condition does not rely on computations performed later in the plan, ensuring the final result remains consistent with the original query.