Skip to content

Gaurav31U/SQLite-using-Cpp

Repository files navigation

SQLite Using C++

This project is a custom, from-scratch implementation of a read-only SQLite database file parser and query engine, written in modern C++. It is designed to demonstrate the core principles of database storage and retrieval by directly reading and interpreting the official SQLite 3 file format, without linking against the actual SQLite library.

This tool can parse table schemas, list tables, and execute simple SELECT queries, including optimized searches using B-Tree indexes. It's an excellent resource for anyone interested in learning how relational databases work under the hood.

Features

  • Direct File Parsing: Reads and interprets .sqlite files byte-by-byte according to the official file format specification.
  • Database Info: Implements the .dbinfo command to display essential metadata like page size and the number of tables.
  • Schema Discovery: Implements the .tables command by parsing the sqlite_schema master table on Page 1.
  • SQL Query Engine: Executes simple SELECT statements with support for:
    • Selecting specific columns.
    • COUNT(*) for efficient row counting.
    • WHERE clauses for filtering results.
  • Query Optimization: Features a rudimentary but effective query planner that can:
    • Perform a Full Table Scan for queries without a usable index.
    • Perform a highly efficient Index Seek when a WHERE clause matches the first column of an existing index.

How It Works

This project is split into two main components: a low-level utility library for parsing the SQLite file structure, and a high-level query execution engine.

1. Low-Level Parsing (util.hpp)

This "header-only" library provides the fundamental tools to interact with the database file.

  • Page Management: A file is treated as a sequence of fixed-size pages. An in-memory cache (g_pageCache) is used to minimize disk I/O by storing frequently accessed pages.
  • Data Primitives: Functions like readBE16, readBE32, and readVarint correctly decode Big-Endian integers and SQLite's custom variable-length integer format.
  • Record Decoding: The code carefully decodes the SQLite Record Format. It reads the record header to determine the data type and size of each column, then extracts the values from the record body.

2. Query Execution (main.cpp)

This is where user commands are parsed and executed.

  1. Command Parsing: The SELECT statement is broken down into its components (columns, table, and WHERE clause) using basic string manipulation.
  2. Schema Lookup: The engine queries the sqlite_schema table on Page 1 to find the target table's metadata, including its schema (CREATE TABLE statement) and the root page number of its B-Tree.
  3. Query Planning: The engine makes a critical decision:
    • No Index: If the WHERE clause column is not indexed, it defaults to a full table scan, traversing the entire table's B-Tree and checking every single row.
    • Index Available: If an index exists on the WHERE clause column, it performs an index seek. It first traverses the much smaller and more targeted index B-Tree to quickly find the rowid(s) of matching rows. It then fetches only those specific rows from the main table B-Tree, which is significantly faster.
  4. B-Tree Traversal: The engine navigates the database's B-Tree structure page by page.
    • For Interior Pages, it reads keys to decide which child page to descend into.
    • For Leaf Pages, it reads the cells containing the actual row data (or index entries).

Project Structure

  • main.cpp: The main application entry point. Handles command-line arguments, parses SQL commands, and orchestrates the query execution plan.
  • util.hpp: A header file containing all the low-level utility functions for file reading, page caching, B-Tree navigation, and decoding of the SQLite record format.

How to Build and Run

Prerequisites

You need a modern C++ compiler (like g++ or clang++) that supports C++17 or newer.

Compilation

Compile the project using the following command:

g++ -std=c++17 main.cpp -o my_sqlite

Usage

You can run the executable with a database file and a command.

./my_sqlite <database_file.db> "<command>"

Examples:

  1. Get database info:

    ./my_sqlite sample.db ".dbinfo"
    # Output:
    # database page size: 4096
    # number of tables: 4
  2. List all tables:

    ./my_sqlite sample.db ".tables"
    # Output:
    # apples oranges sqlite_sequence superheroes

How listing of table works at Byte Level

byte_parser

  1. Count rows in a table:

    ./my_sqlite sample.db "SELECT count(*) FROM superheroes"
    # Output:
    # 4
  2. Select all columns (full table scan):

    ./my_sqlite sample.db "SELECT id, name, alias FROM superheroes"
    # Output:
    # 1|Bruce Wayne|Batman
    # 2|Clark Kent|Superman
    # 3|Peter Parker|Spider-Man
    # 4|Diana Prince|Wonder Woman
  3. Select specific rows using an index: (Assuming an index exists on the name column)

    ./my_sqlite sample.db "SELECT id, alias FROM superheroes WHERE name = 'Peter Parker'"
    # Output:
    # 3|Spider-Man

How The Select and Where clause works

query

Limitations and Future Work

This project is an educational tool and does not implement the full SQLite feature set.

  • Read-Only: No support for INSERT, UPDATE, DELETE, or CREATE.
  • Simple Parser: The SQL parser is basic and not robust against complex syntax.
  • Limited WHERE clauses: Only supports column = 'value' on a single column.
  • No JOINs: Does not support joining tables.
  • No Transaction Control: Does not implement ACID properties like locking or a journal file.

Future improvements could include:

  • Adding support for more complex WHERE clauses (>, <, LIKE).
  • Implementing write operations (INSERT, UPDATE).
  • Building a more robust, tokenizer-based SQL parser.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published