This project is a custom, from-scratch implementation of a read-only SQLite database file parser and query engine, written in modern C++. It is designed to demonstrate the core principles of database storage and retrieval by directly reading and interpreting the official SQLite 3 file format, without linking against the actual SQLite library.
This tool can parse table schemas, list tables, and execute simple SELECT
queries, including optimized searches using B-Tree indexes. It's an excellent resource for anyone interested in learning how relational databases work under the hood.
- Direct File Parsing: Reads and interprets
.sqlite
files byte-by-byte according to the official file format specification. - Database Info: Implements the
.dbinfo
command to display essential metadata like page size and the number of tables. - Schema Discovery: Implements the
.tables
command by parsing thesqlite_schema
master table on Page 1. - SQL Query Engine: Executes simple
SELECT
statements with support for:- Selecting specific columns.
COUNT(*)
for efficient row counting.WHERE
clauses for filtering results.
- Query Optimization: Features a rudimentary but effective query planner that can:
- Perform a Full Table Scan for queries without a usable index.
- Perform a highly efficient Index Seek when a
WHERE
clause matches the first column of an existing index.
This project is split into two main components: a low-level utility library for parsing the SQLite file structure, and a high-level query execution engine.
This "header-only" library provides the fundamental tools to interact with the database file.
- Page Management: A file is treated as a sequence of fixed-size pages. An in-memory cache (
g_pageCache
) is used to minimize disk I/O by storing frequently accessed pages. - Data Primitives: Functions like
readBE16
,readBE32
, andreadVarint
correctly decode Big-Endian integers and SQLite's custom variable-length integer format. - Record Decoding: The code carefully decodes the SQLite Record Format. It reads the record header to determine the data type and size of each column, then extracts the values from the record body.
This is where user commands are parsed and executed.
- Command Parsing: The
SELECT
statement is broken down into its components (columns, table, andWHERE
clause) using basic string manipulation. - Schema Lookup: The engine queries the
sqlite_schema
table on Page 1 to find the target table's metadata, including its schema (CREATE TABLE
statement) and the root page number of its B-Tree. - Query Planning: The engine makes a critical decision:
- No Index: If the
WHERE
clause column is not indexed, it defaults to a full table scan, traversing the entire table's B-Tree and checking every single row. - Index Available: If an index exists on the
WHERE
clause column, it performs an index seek. It first traverses the much smaller and more targeted index B-Tree to quickly find therowid
(s) of matching rows. It then fetches only those specific rows from the main table B-Tree, which is significantly faster.
- No Index: If the
- B-Tree Traversal: The engine navigates the database's B-Tree structure page by page.
- For Interior Pages, it reads keys to decide which child page to descend into.
- For Leaf Pages, it reads the cells containing the actual row data (or index entries).
main.cpp
: The main application entry point. Handles command-line arguments, parses SQL commands, and orchestrates the query execution plan.util.hpp
: A header file containing all the low-level utility functions for file reading, page caching, B-Tree navigation, and decoding of the SQLite record format.
You need a modern C++ compiler (like g++
or clang++
) that supports C++17 or newer.
Compile the project using the following command:
g++ -std=c++17 main.cpp -o my_sqlite
You can run the executable with a database file and a command.
./my_sqlite <database_file.db> "<command>"
Examples:
-
Get database info:
./my_sqlite sample.db ".dbinfo" # Output: # database page size: 4096 # number of tables: 4
-
List all tables:
./my_sqlite sample.db ".tables" # Output: # apples oranges sqlite_sequence superheroes
How listing of table works at Byte Level
-
Count rows in a table:
./my_sqlite sample.db "SELECT count(*) FROM superheroes" # Output: # 4
-
Select all columns (full table scan):
./my_sqlite sample.db "SELECT id, name, alias FROM superheroes" # Output: # 1|Bruce Wayne|Batman # 2|Clark Kent|Superman # 3|Peter Parker|Spider-Man # 4|Diana Prince|Wonder Woman
-
Select specific rows using an index: (Assuming an index exists on the
name
column)./my_sqlite sample.db "SELECT id, alias FROM superheroes WHERE name = 'Peter Parker'" # Output: # 3|Spider-Man
How The Select and Where clause works
This project is an educational tool and does not implement the full SQLite feature set.
- Read-Only: No support for
INSERT
,UPDATE
,DELETE
, orCREATE
. - Simple Parser: The SQL parser is basic and not robust against complex syntax.
- Limited
WHERE
clauses: Only supportscolumn = 'value'
on a single column. - No
JOIN
s: Does not support joining tables. - No Transaction Control: Does not implement ACID properties like locking or a journal file.
Future improvements could include:
- Adding support for more complex
WHERE
clauses (>
,<
,LIKE
). - Implementing write operations (
INSERT
,UPDATE
). - Building a more robust, tokenizer-based SQL parser.