A complete, production-quality scripting language implemented in modern C++17, featuring a hand-written lexer, recursive-descent parser, AST, bytecode compiler, and stack-based virtual machine — all from scratch.
- Architecture Overview
- Building
- Usage
- Language Reference
- Instruction Set Architecture (ISA)
- Internals Deep Dive
- Standard Library
- Debug & Introspection
- Sample Programs
- What Makes CVM++ Stand Out
Source Code (.cvm)
│
▼
┌─────────────┐
│ LEXER │ Tokenizes raw text → Token stream
│ lexer.hpp │ Handles: numbers, floats, strings, keywords,
│ lexer.cpp │ operators, multi-line comments (/* */), # comments
└──────┬──────┘
│ std::vector<Token>
▼
┌─────────────┐
│ PARSER │ Recursive Descent → Abstract Syntax Tree
│ parser.hpp │ Pratt-style precedence climbing
│ parser.cpp │ 20+ node types, full operator precedence
└──────┬──────┘
│ ASTNodePtr (unique_ptr tree)
▼
┌──────────────┐
│ COMPILER │ AST Visitor → Bytecode Chunks
│ compiler.hpp │ Scope resolution, jump patching,
│ compiler.cpp │ function compilation, break/continue
└──────┬───────┘
│ Compiler::Result { Chunk, FunctionProto[] }
▼
┌──────────────┐
│ VM │ Stack-based fetch-decode-execute loop
│ vm.hpp │ Call frames, global/local vars, native fns,
│ vm.cpp │ I/O, 40+ opcodes, execution statistics
└──────────────┘
- C++17 compiler (GCC 8+, Clang 7+, MSVC 2019+)
- CMake 3.16+ (optional) or a direct g++ command
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j4g++ -std=c++17 -O2 -Wall -Iinclude \
src/lexer.cpp src/parser.cpp src/compiler.cpp src/vm.cpp src/main.cpp \
-o cvmg++ -std=c++17 -O2 -Iinclude \
src/lexer.cpp src/parser.cpp src/compiler.cpp src/vm.cpp tests/test_runner.cpp \
-o cvm_test && ./cvm_test./cvm script.cvm
./cvm --bytecode --stats script.cvm # with debug output./cvm
./cvm --repl| Flag | Description |
|---|---|
--repl |
Launch interactive REPL |
--ast |
Print AST in debug mode |
--bytecode |
Print disassembled bytecode |
--stats |
Print execution statistics |
--no-color |
Disable ANSI color output |
--help |
Show help message |
| Command | Description |
|---|---|
:quit / :q |
Exit |
:ast |
Toggle AST dump |
:bytecode |
Toggle bytecode disassembly |
:stats |
Toggle execution stats |
:globals |
Show all global variables |
:clear |
Clear all globals and reset VM |
:load <file> |
Load and execute a .cvm file |
line \ |
Multi-line continuation |
| Type | Example | Notes |
|---|---|---|
int |
42, -7, 1_000_000 |
64-bit signed integer |
float |
3.14, 2.5e-3 |
64-bit IEEE 754 double |
bool |
true, false |
Boolean |
string |
"hello", 'world' |
UTF-8 text, escape sequences |
nil |
nil |
Null/undefined value |
let x = 10
let name = "CVM++"
let pi = 3.14159
let flag = true
let nothing = nil
# Arithmetic
a + b a - b a * b a / b a % b
-a # unary minus
# Comparison
a == b a != b a < b a > b a <= b a >= b
# Logical
a and b a or b not a # short-circuit evaluation
# Assignment
x = 10
x += 5 x -= 3 x *= 2 x /= 4
++x --x x++ x-- # increment/decrement
# String
"hello" + " world" # concatenation
"ha" * 3 # repeat → "hahaha"
# If / else if / else
if (x > 0) {
print("positive")
} else {
if (x < 0) { print("negative") }
else { print("zero") }
}
# While loop
while (i < 10) {
i += 1
}
# For loop
for (let i = 0; i < 10; i = i + 1) {
print(i)
}
# Break & Continue
while (true) {
if (done) { break }
if (skip) { continue }
process()
}
fn add(a, b) {
return a + b
}
print(add(3, 4)) # → 7
# Recursive
fn fib(n) {
if (n <= 1) { return n }
return fib(n - 1) + fib(n - 2)
}
# Mutual recursion
fn is_even(n) {
if (n == 0) { return true }
return is_odd(n - 1)
}
fn is_odd(n) {
if (n == 0) { return false }
return is_even(n - 1)
}
print("Hello, World!")
print(42, 3.14, true) # multiple values
let name = input("Your name: ")
let age = input() # no prompt, auto-converts to int/float
# Single-line comment
/* Multi-line
comment */
CVM++ uses a compact, typed bytecode. Each instruction is 1 byte (opcode) followed by 0–8 bytes of operands.
| Opcode | Operands | Description |
|---|---|---|
PUSH_INT |
8 bytes (i64) | Push 64-bit integer |
PUSH_FLOAT |
8 bytes (f64) | Push 64-bit float |
PUSH_BOOL |
1 byte | Push boolean |
PUSH_STRING |
2 bytes (idx) | Push string from constant pool |
PUSH_NIL |
— | Push nil |
POP |
— | Discard top of stack |
| Opcode | Description |
|---|---|
ADD |
Pop b, a → push a+b |
SUB |
Pop b, a → push a-b |
MUL |
Pop b, a → push a*b |
DIV |
Pop b, a → push a/b |
MOD |
Pop b, a → push a%b |
NEG |
Pop a → push -a |
| Opcode | Description |
|---|---|
EQ |
a == b |
NEQ |
a != b |
LT |
a < b |
GT |
a > b |
LTE |
a <= b |
GTE |
a >= b |
AND |
logical and (short-circuit) |
OR |
logical or (short-circuit) |
NOT |
logical not |
| Opcode | Operands | Description |
|---|---|---|
DEFINE_VAR |
2 bytes | Pop value, store at local slot |
LOAD_VAR |
2 bytes | Push local variable |
STORE_VAR |
2 bytes | Write local (no pop) |
LOAD_GLOBAL |
2 bytes | Push global by name index |
STORE_GLOBAL |
2 bytes | Write global (no pop) |
| Opcode | Operands | Description |
|---|---|---|
JUMP |
4 bytes | Unconditional jump |
JUMP_IF_FALSE |
4 bytes | Jump if top is falsy (peek) |
JUMP_IF_TRUE |
4 bytes | Jump if top is truthy (peek) |
| Opcode | Operands | Description |
|---|---|---|
MAKE_FUNC |
2 bytes | Push function reference |
CALL |
1 byte | Call with N args |
RETURN |
— | Pop return value, restore frame |
RETURN_NIL |
— | Return nil, restore frame |
| Opcode | Description |
|---|---|
PRINT |
Pop and print with newline |
INPUT |
Pop prompt, read line, push result |
STR_CONCAT |
Pop two, concatenate, push |
HALT |
Stop execution |
NOP |
No-operation |
DEBUG_BREAK |
Emit debug trace |
The VM uses a frame-based call convention:
Before CALL (argc=2):
stack: [ ... | fn_ref | arg0 | arg1 ]
^fn_pos ^base
CALL handler:
1. Resolve fn_ref to FunctionProto
2. Shift args left over fn_ref slot
3. Create CallFrame { chunk, ip=0, base=fn_pos }
Inside function:
LOAD_VAR slot=0 → stack_[frame.base + 0] (= arg0)
LOAD_VAR slot=1 → stack_[frame.base + 1] (= arg1)
LET x = ... → slot=2 (local after params)
RETURN:
1. Pop return value
2. Shrink stack to frame.base (remove all locals + args)
3. Push return value
4. Pop frame
Local variables are resolved at compile time using a scope stack. Each scope records a mapping of name → slot_index. During code generation:
- If a name resolves to a local slot → emit
LOAD_VAR / STORE_VAR - If not found in any local scope → emit
LOAD_GLOBAL / STORE_GLOBAL
Function scopes start fresh at slot 0, preventing cross-function slot aliasing.
The compiler uses a backpatching strategy for forward jumps:
- Emit
JUMP_IF_FALSEwith placeholder0xFFFFFFFF - Record the offset of the placeholder
- After compiling the target code, call
patch_jump(offset, target) - Overwrite the 4 placeholder bytes with the real target address
Loop break/continue statements accumulate unpatched jump offsets in a LoopContext stack, all patched at loop end.
and / or use peek-based jumps for lazy evaluation:
AND: eval left → JUMP_IF_FALSE (peek, don't pop) → POP → eval right → [target]
OR: eval left → JUMP_IF_TRUE (peek, don't pop) → POP → eval right → [target]
This ensures the right operand is only evaluated when needed.
| Function | Args | Returns | Description |
|---|---|---|---|
sqrt(x) |
float/int | float | Square root |
abs(x) |
number | number | Absolute value |
pow(base, exp) |
numbers | float | Exponentiation |
floor(x) |
float | float | Round down |
ceil(x) |
float | float | Round up |
round(x) |
float | float | Round to nearest |
max(a, b) |
comparable | any | Maximum of two values |
min(a, b) |
comparable | any | Minimum of two values |
str(x) |
any | string | Convert to string |
int(x) |
any | int | Convert to integer |
float(x) |
any | float | Convert to float |
bool(x) |
any | bool | Convert to boolean |
len(s) |
string | int | String length |
type(x) |
any | string | Type name of value |
chr(n) |
int | string | ASCII char from code point |
ord(s) |
string | int | Code point of first char |
(Program
(Let x
[Literal: 10])
(Print
(Binary +
[Ident: x]
[Literal: 5])))
══════════════════════════════════════
CHUNK: main
══════════════════════════════════════
String constants:
[0] = "x"
Instructions:
000000 L 1 PUSH_INT 10
000009 L 1 DEFINE_VAR slot=0
000012 L 2 LOAD_VAR slot=0
000015 L 2 PUSH_INT 5
000024 L 2 ADD
000025 L 2 PRINT
000026 L 2 HALT
── Execution Stats ─────────────────────
Compile time : 0.142 ms
Execution time : 2.831 ms
Instructions : 10482
Stack peak : 12
Function calls : 77
for (let i = 1; i <= 100; i = i + 1) {
if (i % 15 == 0) { print("FizzBuzz") }
else { if (i % 3 == 0) { print("Fizz") }
else { if (i % 5 == 0) { print("Buzz") }
else { print(i) } } }
}
fn fib(n) {
if (n <= 1) { return n }
return fib(n - 1) + fib(n - 2)
}
for (let i = 0; i <= 15; i = i + 1) {
print("fib(" + str(i) + ") = " + str(fib(i)))
}
fn is_prime(n) {
if (n < 2) { return false }
let d = 2
while (d * d <= n) {
if (n % d == 0) { return false }
d += 1
}
return true
}
for (let n = 2; n <= 50; n = n + 1) {
if (is_prime(n)) { print(n) }
}
-
Complete Pipeline — Lexer → Parser → AST → Compiler → VM, all from scratch with no libraries
-
Production-quality ISA — 40+ opcodes, 64-bit integers and doubles, typed value system using
std::variant -
Correct Call Convention — Frame-based stack discipline with proper local variable slots, recursive functions work correctly with any call depth
-
Short-circuit Evaluation —
and/oruse peek-based jump semantics, not post-eval boolean ops -
Compile-time Jump Patching — All forward jumps use the backpatch pattern; break/continue loop through a
LoopContextstack -
Extensible Native Runtime — Register C++ lambdas as CVM++ functions with
vm.register_native() -
Rich Debug Output — AST pretty-printer, bytecode disassembler with line numbers, execution stats
-
Interactive REPL — Persistent global state, toggle debug modes,
:load,:globals, multi-line input -
100% Test Coverage — 55 automated tests covering every language feature, with expected-output matching
-
Zero dependencies — Pure C++17, STL only
Built for the CVM++ project — mentored by Raman (7977779056)