CDB is a SQL database built for learning about the inner workings of databases. CDB was heavily inspired by SQLite, CockroachDB, and the CMU Database Group. CDB implements a subset of SQL described below.
cdb_schema
which holds the database schema.
graph LR
begin(( ))
explain([EXPLAIN])
queryPlan([QUERY PLAN])
select([SELECT])
all[*]
from([FROM])
begin --> explain
explain --> queryPlan
queryPlan --> select
begin --> select
explain --> select
select --> all
all --> from
from --> table
Create supports the PRIMARY KEY
column constraint for a single integer column.
graph LR
begin(( ))
explain([EXPLAIN])
queryPlan([QUERY PLAN])
create([CREATE])
table([TABLE])
colTypeInt([INTEGER])
colTypeText([TEXT])
lparen["("]
colSep[","]
rparen[")"]
tableIdent["Table Identifier"]
colIdent["Column Identifier"]
pkConstraint["PRIMARY KEY"]
begin --> explain
explain --> queryPlan
queryPlan --> create
begin --> create
explain --> create
create --> table
table --> tableIdent
tableIdent --> lparen
lparen --> colIdent
colIdent --> colTypeInt
colIdent --> colTypeText
colTypeInt --> pkConstraint
colTypeInt --> colSep
colTypeInt --> rparen
colTypeText --> colSep
colTypeText --> rparen
pkConstraint --> colSep
pkConstraint --> rparen
colSep --> rparen
colSep --> colIdent
graph LR
begin(( ))
explain([EXPLAIN])
queryPlan([QUERY PLAN])
insert([INSERT])
into([INTO])
tableIdent["Table Identifier"]
lparen["("]
rparen[")"]
colSep[","]
values([VALUES])
lparen2["("]
rparen2[")"]
colSep2[","]
literal["literal"]
valSep[","]
begin --> explain
explain --> queryPlan
queryPlan --> insert
begin --> insert
explain --> insert
insert --> into
into --> tableIdent
tableIdent --> lparen
lparen --> colIdent
colIdent --> colSep
colSep --> colIdent
colSep --> rparen
rparen --> values
values --> lparen2
lparen2 --> literal
literal --> colSep2
colSep2 --> literal
literal --> rparen2
colSep2 --> rparen2
rparen2 --> valSep
valSep --> lparen2
Run cdb -h
for command line flags.
---
title: Packages
---
graph LR
subgraph Adapters
Driver
REPL
end
Driver --> DB
REPL --> DB
DB --> Compiler
subgraph Compiler
Lexer --> Parser --> AST
end
AST --> Planner
Planner --> VM
Planner --> Catalog
VM --> KV
subgraph KV
Cursor
Catalog
Encoder
end
KV --> Pager
subgraph Pager
Storage
end
The REPL works with the DB (Database) layer and is responsible for two things. Passing down the SQL strings that are read by the REPL to the DB. Printing out execution results that are returned from the DB layer. The REPL can be thought of as an adapter layer.
The Driver plays the same role as the REPL in that it adapts the DB to be used
in a Go program. This is done by implementing the Go standard library
database/sql/driver.Driver
interface.
The DB (Database) layer is an interface that is called by adapters like the REPL. In theory, the DB layer could be called directly by a consumer of the package or by something like a TCP connection adapter.
The Compiler is responsible for converting a raw SQL string to a AST (Abstract syntax tree). In doing this, the compiler performs two major steps known as lexing and parsing.
The Planner is what is known as a query planner. The planner takes the AST
generated by the compiler and performs steps to generate an optimal "byte code"
routine consisting of commands defined in the VM. This routine can be examined
by prefixing any SQL statement with the EXPLAIN
keyword.
The VM defines a set of commands that can be executed or explained. Each command performs basic calls into the KV layer that make up a query execution. This mechanism makes queries predictable and consistent.
The KV layer implements a data structure known as a
B+ tree this tree enables the
database to perform fast lookups. At this layer, data is encoded into byte
slices by the Encoder
. The KV layer implements a cursor abstraction, which
enables queries to scan and seek the B trees associated with a
table or index. Additionally this layer maintains the Catalog
, an in memory
representation of the database schema.
The Pager sits on top of a contiguous block of bytes defined in the Storage
interface. This block is typically a single file enabling the database to
persist data, but it can be an in memory representation. The pager abstracts
this block into pages which represent nodes in the KV layer's B tree. The pager
is capable of caching the pages. The pager implements a read write mutex for
concurrency control. The pager implements atomic writes to its storage through
what is known as the journal file.