Skip to content

bobby060/pg_hint_engine

Repository files navigation

Generic SQL Hint Engine

A plan-agnostic library for generating PostgreSQL pg_hint_plan hints from any query optimizer's execution plans.

Overview

This project provides a generic, reusable hint generation engine that can convert query plans from any optimizer into PostgreSQL hints. The core hint-engine library is completely independent of any specific query engine—it works with logical plans, physical plans, or any other plan representation through a simple visitor interface.

The DataFusion implementations are reference examples, demonstrating how to integrate the hint engine with a real query optimizer. The same pattern can be applied to other systems like Apache Calcite, Spark, commercial databases, or custom query planners.

Project Goal

Build a universal hint generation library that:

  • Works with any query plan format (logical, physical, custom)
  • Generates PostgreSQL pg_hint_plan hints for join order, algorithms, and scan methods
  • Requires only a simple visitor implementation to support new query engines
  • Provides a mostly complete reference implementation with DataFusion

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Your Query Engine                      │
│              (DataFusion, Calcite, Spark, etc.)             │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           │ Implement PlanVisitor trait
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                    hint-engine (Core)                       │
│         Generic, plan-agnostic hint generation              │
│    Input: Vec<PlanNodeMetadata> (post-order traversal)      │
│    Output: PostgreSQL pg_hint_plan hints                    │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
                /*+ Leading(t1 t2 t3)
                    HashJoin(t1 t2) */

Project Structure

generic-hint-engine/
├── hint-engine/                     # ⭐ Core library (plan-agnostic)
│   ├── src/
│   │   └── datafusion_visitor/     # Optional DataFusion visitors (feature: datafusion-visitor)
│   └── README.md
├── auto_explain_rs/                 # PostgreSQL auto-explain log processor
│   └── src/plan_visitor.rs         # PostgreSQL plan visitor implementation
└── df-autohint-runner/              # Example: CLI tool for DataFusion benchmarks
    └── datafusion-logical-runner/

Core Component

hint-engine ⭐ - The main library. Completely generic and plan-agnostic. Converts standardized plan metadata into PostgreSQL hints. This is the reusable core that works with any query engine.

  • Optional datafusion-visitor feature provides reference DataFusion implementations
  • Optional hint-table feature enables PostgreSQL hint table integration

Reference Implementations

auto_explain_rs - PostgreSQL auto-explain log processor with a PostgresPlanVisitor that converts PostgreSQL execution plans to hints.

datafusion-logical-runner - Example CLI tool that integrates the hint engine with DataFusion for TPC-H/JOB benchmarking. Uses the datafusion-visitor feature.

Using the Hint Engine with Your Optimizer

Key Architectural Principle

Hints are derived from plan node metadata, not computed by the engine. As the visitor implementor, you are responsible for:

  • Setting appropriate ScanMethod on LeafNode (e.g., SeqScan, Index("idx_name"))
  • Setting appropriate JoinAlgorithm on join metadata (e.g., HashJoin, NestedLoopJoin)
  • Providing cardinality estimates, parallel hints, and memoization settings

The engine's role is to:

  • Build the join tree structure from your metadata
  • Traverse the tree and generate hint text based on the metadata you provided
  • Apply configuration filters (which hint types to include)

Step 1: Add Dependency

[dependencies]
hint-engine = { path = "path/to/hint-engine" }

Step 2: Implement PlanVisitor

Implement the PlanVisitor trait and populate metadata with hint information from your plan:

  • Set scan_method on LeafNode to control scan hints (SeqScan, Index, etc.)
  • Set join_algorithm on join metadata to control join method hints (HashJoin, NestLoop, etc.)
  • Provide optional fields for cardinality, parallel hints, and memoization

See hint-engine/src/datafusion_visitor/logical.rs or hint-engine/src/datafusion_visitor/execution.rs for DataFusion implementations (enabled with datafusion-visitor feature).

See auto_explain_rs/src/plan_visitor.rs for a PostgreSQL plan visitor implementation.

Important: The visitor determines what hints are generated by setting metadata fields. The engine simply reads this metadata and formats it as PostgreSQL hints.

Step 3: Generate Hints

Configure the engine, generate hints, and combine with SQL. See the hint-engine crate documentation for detailed examples and API reference.

Hint Engine Features

The core hint-engine library provides support for most postgres hints. See the hints module documentation for full details.

DataFusion Benchmark Workflows

See df-autohint-runner/ for instructions on running benchmarks

Documentation

Core Library

DataFusion Examples

PostgreSQL Integration

Development

Testing

# Test core library
cargo test -p hint-engine

# Test DataFusion visitor implementations (requires datafusion-visitor feature)
cargo test -p hint-engine --features datafusion-visitor

# Test PostgreSQL integration
cargo test -p auto_explain_rs

# Test all
cargo test

Requirements

For Core Library Only

  • Rust 1.70+

For DataFusion Examples

  • Rust 1.70+
  • PostgreSQL 17 with pg_hint_plan (for running generated hints)
  • ~10GB disk for TPC-H scale factor 1
  • ~50GB disk for JOB dataset

Use Cases

  1. Cross-optimizer comparison - Compare join orders chosen by different optimizers
  2. Hint injection - Transfer optimization decisions between systems
  3. Query optimization research - Study the impact of different join orders and algorithms
  4. Performance tuning - Generate hints from a research optimizer for production databases
  5. Benchmark analysis - Understand how different optimizers handle standard benchmarks

Why PostgreSQL Hints?

PostgreSQL's pg_hint_plan extension is widely used and well-documented, making it an ideal target format. However, the core hint generation logic could be adapted to other hint formats (Oracle, SQL Server, etc.) by implementing different hint formatters.

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •