NestQL

A lightweight query engine that treats a folder of CSV/JSON files as a relational database. Write queries in a compact GraphQL-like syntax and get back nested JSON — no SQL, no database server.

Installation

# Core engine only
pip install .

# With web UI (Gradio)
pip install ".[ui]"

# Development (editable)
pip install -e .

After installation the nestql command is available globally:

nestql --help
nestql query.nestql
nestql query.nestql --data-folder ./my-data
nestql query.nestql --no-ast -v

VS Code Extension

The nestql-vscode/ folder contains a VS Code extension that provides:

Syntax highlighting for .nestql files
File icons for .nestql files in the Explorer
Live linting (structural brace checks + Python-backed deep parsing)

Installation

Option A — install from folder (recommended for local use)

Open VS Code.
Open the Command Palette (Ctrl+Shift+P / Cmd+Shift+P) and run Extensions: Install from VSIX…
If you have already packaged the extension, select the .vsix file and click Install.

Or, to install directly from the source folder without packaging:
- Open the Command Palette and run Developer: Install Extension from Location…
- Select the nestql-vscode/ folder.

Option B — package and install

# Install the VS Code Extension CLI (once)
npm install -g @vscode/vsce

# From inside the extension folder
cd nestql-vscode
vsce package          # produces nestql-1.0.0.vsix
code --install-extension nestql-1.0.0.vsix

Extension settings

Setting	Default	Description
`nestql.pythonPath`	`"python"`	Python executable used for deep linting. Set to a venv path if needed (e.g. `"/home/user/.venv/bin/python"`).

The extension automatically lints on open and on every keystroke (debounced 500 ms). Structural errors (unmatched braces, bad alias syntax) are shown immediately; Python-backed errors appear within a second.

Quick Start

from nestql import FolderAdapter

results = FolderAdapter(folder_path="./data").execute("""
<directors> {
    name
    country : {"India", "USA"}
}
""")
# results → {"directors": [{"name": "Rajamouli", "country": "India"}, …]}

Language Reference

Query Structure

$global {               # optional macro block
    var : value
}

$tables {               # optional temporary intermediate tables (excluded from output)
    alias := <source> { fields … }
}

<table_name> {          # select a CSV/JSON file
    field               # include field
    field : filter      # filter field (equality, range, IN list, LIKE)
    field := alias      # rename field in output
    ~field              # internal field (used in logic, hidden from output)
    !<sub_table> { }    # strict sub-table (hide parent row if sub is empty)
    <sub_table> *{ }    # single pluck — return list of value arrays
    <sub_table> **{ }   # double pluck — flatten all values into one array
    func(<table> { })   # sub-table aggregate: count(), sum(), avg(), min(), max()
    count() := alias    # in-row aggregate (use with @group)
    avg(field) := alias # in-row field aggregate (use with @group)
    (expr) := alias     # computed column — arithmetic on row fields
    @sort  : f, -f2     # ORDER BY (prefix - = descending)
    @limit : N          # LIMIT
    @offset: N          # OFFSET
    @distinct           # DISTINCT
    @group : field      # GROUP BY
    @having: f op val   # HAVING (requires @group)
} := result_alias

Operators

Syntax	Meaning
`<name>`	Select table / file
`field`	Include field in output
`field : value`	Equality or range filter
`field : {"a","b"}`	IN filter (list)
`field : [min, max]`	Inclusive range filter
`field : (min, max)`	Exclusive range filter
`field := alias`	Rename field / table in output
`~field`	Internal — used in filters, excluded from output
`!<table>`	Strict — hide parent row when sub-table is empty
`*{ }`	Single pluck — return `[[v1,v2],[v3,v4],…]`
`**{ }`	Double pluck — flatten to `[v1,v2,v3,v4,…]`
`field = parent_var`	Join on parent row's context variable
`$var`	Reference a global macro
`count(<table>{})`	Count rows in sub-table
`sum(<table>{field})`	Sum a numeric field across sub-table rows
`avg(<table>{field})`	Average a numeric field across sub-table rows
`min(<table>{field})`	Minimum value across sub-table rows
`max(<table>{field})`	Maximum value across sub-table rows
`field : "prefix%"`	LIKE / wildcard filter — `%` or `*` match any substring
`(expr) := alias`	Computed column — arithmetic on row fields

Query Directives

Place these inside a { } block to control result shape.

Directive	Meaning
`@sort : field, -field2`	ORDER BY — prefix `-` for descending
`@limit : N`	Return at most N rows
`@offset : N`	Skip the first N rows
`@distinct`	Remove duplicate rows
`@group : field1, field2`	GROUP BY — collapses rows into groups
`@having : field op value`	Filter on grouped results (requires `@group`)

In-row Aggregates (for use with `@group`)

Syntax	Meaning
`count() := alias`	Count rows in each group
`sum(field) := alias`	Sum a field across each group
`avg(field) := alias`	Average a field across each group
`min(field) := alias`	Minimum field value in each group
`max(field) := alias`	Maximum field value in each group

`$global` Macros

Macros support arithmetic, bitwise ops, ranges, lists, f-strings, destructuring, and print() for debugging.

$global {
    min_b  : 50
    range  : [$min_b, $min_b * 2]      # range → {__ge__: 50, __le__: 100}
    mask   : 1 << 3                     # bitwise → 8
    flags  : {$mask, 16}                # list  → [8, 16]
    label  : f"Budget ≥ {$min_b}"      # f-string
    {a, *rest, z} : {1,2,3,4,5}        # destructuring
    print($range)                       # debug print during parsing
}

Supported expression operators: + - * / // % ** << >> & | ^ ~

`$tables` — Temporary Intermediate Tables

Pre-execute a sub-query and bind the result to a name. Temporary tables are available as join targets inside subsequent blocks but are excluded from the final output.

$tables {
    big_movies := <movies> {
        name
        director_id
        budget : [100, 999]
        rating
    }
}

<directors> {
    ~id
    name
    <big_movies> {
        name := title
        ~director_id = id
        rating
    } := hits
}

Examples

All examples use the sample data in data/:

directors.csv

id	name	country	experience
d1	Rajamouli	India	25
d2	Hari	India	15
d3	Shankar	India	30
d4	James Cameron	USA	40

movies.json

id	name	director_id	budget	rating	genre
1	Bahubali	d1	200	8.1	Action
2	Singam	d2	50	6.5	Action
3	Robo	d3	150	7.1	Sci-Fi
4	S3	d2	5	6.2	Action
5	Avatar	d4	237	7.9	Sci-Fi

Basic filtering and nesting

Sample Queries/basic/basic.nestql

$global {
    min_budget       : 50
    low_movie_budget : [$min_budget, 100]
    high_movie_budget: [101, 500]
    target_country   : {"India", "Sri Lanka"}
}

<directors> {
    id := dir_var
    country : $target_country
    name
    <movies> *{
        name
        ~budget       : $low_movie_budget
        ~director_id = dir_var := directors_id
    } := low_budget_movies
    <movies> *{
        name
        budget        : $high_movie_budget
        ~director_id = dir_var := directors_id
    } := high_budget_movies
} := directors_india

<directors> {
    id := dir_var
    country : "USA"
    name
    <movies> {
        name
        ~budget       : $low_movie_budget
        ~director_id = dir_var := directors_id
    } := low_budget_movies
    <movies> {
        name
        budget        : $high_movie_budget
        ~director_id = dir_var := directors_id
    } := high_budget_movies
} := directors_usa

Output

{
    "directors_india": [
        {"dir_var": "d1", "country": "India", "name": "Rajamouli",
         "low_budget_movies": [], "high_budget_movies": [{"name": "Bahubali", "budget": "200"}]},
        {"dir_var": "d2", "country": "India", "name": "Hari",
         "low_budget_movies": [["Singam"]], "high_budget_movies": []},
        {"dir_var": "d3", "country": "India", "name": "Shankar",
         "low_budget_movies": [], "high_budget_movies": [{"name": "Robo", "budget": "150"}]}
    ],
    "directors_usa": [
        {"dir_var": "d4", "country": "USA", "name": "James Cameron",
         "low_budget_movies": [], "high_budget_movies": [{"name": "Avatar", "budget": "237"}]}
    ]
}

Arithmetic and ranges

Sample Queries/arthimetic/arthimetic.nestql

$global {
    min_budget        : 50
    print("min_budget :", $min_budget)
    double_min        : $min_budget * 2
    low_movie_budget  : [$min_budget, $min_budget * 2]
    high_movie_budget : [$double_min + 1, 1 << 9]
    mask              : 1 << 3
    print($mask)
    flags             : {$mask, 8}
    target_country    : {"India", "USA"}
    print($target_country)
}

<directors> {
    id := dir_var,
    country : $target_country,
    name,
    <movies> {
        name,
        ~budget       : $low_movie_budget,
        ~director_id = dir_var := directors_id
    } := low_budget_movies,
    <movies> {
        name,
        budget        : $high_movie_budget,
        ~director_id = dir_var := directors_id
    } := high_budget_movies
} := directors_math_1

Console output (from print() calls during parsing):

min_budget : 50
8
{'India', 'USA'}

Count aggregation

Sample Queries/count/count.nestql

<directors> {
    id := dir_var
    country,
    name,
    count(<movies> {
        name,
        budget,
        ~director_id = dir_var := directors_id
    }) := alias_count,
    count(<movies> {
        name,
        budget,
        ~director_id = dir_var := directors_id
    })
}

Output

{
    "directors": [
        {"dir_var": "d1", "country": "India", "name": "Rajamouli", "alias_count": 1, "count": 1},
        {"dir_var": "d2", "country": "India", "name": "Hari",       "alias_count": 2, "count": 2},
        {"dir_var": "d3", "country": "India", "name": "Shankar",    "alias_count": 1, "count": 1},
        {"dir_var": "d4", "country": "USA",   "name": "James Cameron", "alias_count": 1, "count": 1}
    ]
}

Sum aggregation

Sample Queries/sum/sum.nestql

<directors> {
    id := dir_var,
    country,
    name,
    sum(<movies> {
        budget,
        ~director_id = dir_var := directors_id
    }) := budget_consumed
}

Output

{
    "directors": [
        {"dir_var": "d1", "country": "India", "name": "Rajamouli",    "budget_consumed": 200},
        {"dir_var": "d2", "country": "India", "name": "Hari",          "budget_consumed": 55},
        {"dir_var": "d3", "country": "India", "name": "Shankar",       "budget_consumed": 150},
        {"dir_var": "d4", "country": "USA",   "name": "James Cameron", "budget_consumed": 237}
    ]
}

Computed columns

<movies> {
    name
    budget
    rating
    (budget * 1.1)    := adjusted_budget
    (rating * 10)     := score
    @sort : -adjusted_budget
    @limit : 3
}

Output

{
    "movies": [
        {"name": "Avatar",   "budget": "237", "rating": "7.9", "adjusted_budget": 260.7, "score": 79},
        {"name": "Bahubali", "budget": "200", "rating": "8.1", "adjusted_budget": 220.0, "score": 81},
        {"name": "Robo",     "budget": "150", "rating": "7.1", "adjusted_budget": 165,   "score": 71}
    ]
}

LIKE / wildcard filter

Use % or * in a string filter for SQL-style pattern matching:

<movies> {
    name
    genre : "Sci%"
}

Output

{
    "movies": [
        {"name": "Robo"},
        {"name": "Avatar"}
    ]
}

ORDER BY, LIMIT, OFFSET

<movies> {
    name
    budget
    rating
    @sort   : -budget, -rating
    @limit  : 3
    @offset : 1
}

Output

{
    "movies": [
        {"name": "Bahubali", "budget": "200", "rating": "8.1"},
        {"name": "Robo",     "budget": "150", "rating": "7.1"},
        {"name": "Singam",   "budget": "50",  "rating": "6.5"}
    ]
}

DISTINCT

<directors> {
    country
    @distinct
}

Output

{
    "directors": [
        {"country": "India"},
        {"country": "USA"}
    ]
}

GROUP BY with in-row aggregates

<movies> {
    genre
    count()      := num_movies
    avg(budget)  := avg_budget
    max(rating)  := top_rating
    @group  : genre
    @having : num_movies >= 2
    @sort   : -avg_budget
}

Output

{
    "movies": [
        {"genre": "Sci-Fi",  "num_movies": 2, "avg_budget": 193.5, "top_rating": 7.9},
        {"genre": "Action",  "num_movies": 3, "avg_budget": 85.0,  "top_rating": 8.1}
    ]
}

Sub-table avg / min / max

<directors> {
    ~id
    name
    avg(<movies> { budget, ~director_id = id }) := avg_budget
    min(<movies> { budget, ~director_id = id }) := min_budget
    max(<movies> { budget, ~director_id = id }) := max_budget
    @sort : -max_budget
}

Output

{
    "directors": [
        {"name": "James Cameron", "avg_budget": 237.0, "min_budget": 237.0, "max_budget": 237.0},
        {"name": "Rajamouli",     "avg_budget": 200.0, "min_budget": 200.0, "max_budget": 200.0},
        {"name": "Shankar",       "avg_budget": 150.0, "min_budget": 150.0, "max_budget": 150.0},
        {"name": "Hari",          "avg_budget": 27.5,  "min_budget": 5.0,   "max_budget": 50.0}
    ]
}

Pluck operators

Sample Queries/pluck/pluck.nestql — * returns a 2-D list; ** flattens to 1-D.

<directors> {
    id := dir_var
    country,
    name,
    <movies> *{
        name,
        budget,
        ~director_id = dir_var := directors_id
    } := movies_single_pluck
    <movies> **{
        name,
        budget,
        ~director_id = dir_var := directors_id
    } := movies_double_pluck
}

Output (single pluck gives `[[name, budget], …]`; double pluck flattens everything)

{
    "directors": [
        {
            "dir_var": "d1", "country": "India", "name": "Rajamouli",
            "movies_single_pluck": [["Bahubali", "200"]],
            "movies_double_pluck": ["Bahubali", "200"]
        },
        {
            "dir_var": "d2", "country": "India", "name": "Hari",
            "movies_single_pluck": [["Singam", "50"], ["S3", "5"]],
            "movies_double_pluck": ["Singam", "50", "S3", "5"]
        }
    ]
}

String operations

Sample Queries/string/string.nestql

$global {
    s_hello      : "Hello"
    s_world      : "World"
    s_combined   : $s_hello + ", " + $s_world + "!"
    laugh        : "ha" * 4
    names        : {"Rajamouli", "Hari", "Shankar", "James Cameron"}
    pi           : 3.14159
    greeting_fmt : f"{ $s_hello } { $s_world } - PI={ $pi:.2f }"

    print($s_combined)
    print($laugh)
    print($names)
    print($greeting_fmt)
}

<directors> {
    id := dir_var,
    name,
    country,
    name : $names
} := directors_named

Console output:

Hello, World!
hahahaha
{'Rajamouli', 'Hari', 'Shankar', 'James Cameron'}
Hello World - PI=3.14

List operations and destructuring

Sample Queries/list/list.nestql

$global {
    list_1 : {1, 2, 3}
    list_2 : {"a", 'b', 4.5}
    list_3 : {*list_1, *list_2}         # python-style splat
    list_4 : list_1 + list_2
    {list_a, *list_b, list_c} : list_4  # destructuring with rest
    list_5 : list_1 * 2
    list_6 : {...list_1, ...list_2}      # js-style spread

    print("List 1 :", $list_1)
    print("List 2 :", $list_2)
    print("List 3 :", $list_3)
    print("List 4 :", $list_4)
    print("List 5 :", $list_5)
    print("List 6 :", $list_6)
    print("List a :", $list_a)
    print("List b :", $list_b)
    print("List c :", $list_c)
}

Console output:

List 1 : [1, 2, 3]
List 2 : ['a', 'b', 4.5]
List 3 : [1, 2, 3, 'a', 'b', 4.5]
List 4 : [1, 2, 3, 'a', 'b', 4.5]
List 5 : [1, 2, 3, 1, 2, 3]
List 6 : [1, 2, 3, 'a', 'b', 4.5]
List a : 1
List b : [2, 3, 'a', 'b']
List c : 4.5

Python API

from nestql import FolderAdapter, GQLParser, GQLConfig

# ── Simple: pass the query string directly ────────────────────────────────
adapter = FolderAdapter(folder_path="./data")
results = adapter.execute(open("query.nestql").read())
# returns {"alias": [rows…], …}  — $tables blocks handled automatically

# ── Advanced: parse first, then execute individual tables ─────────────────
config  = GQLConfig(data_folder="./data", max_rows_per_table=50_000)
adapter = FolderAdapter(folder_path="./data", config=config)
ast     = GQLParser().parse(open("query.nestql").read())

tables_ast = ast.pop("__tables__", {})
adapter.load_temp_tables(tables_ast)

results = {}
for table_name, node in ast.items():
    table_source = node["__meta__"]["table_source"]
    results[node["__meta__"].get("alias", table_name)] = adapter.execute(table_source, node)

Configuration via environment variables

Variable	Default	Description
`NESTQL_DATA_FOLDER`	`./data`	Data directory
`NESTQL_MAX_DEPTH`	`10`	Max query nesting depth
`NESTQL_MAX_ROWS`	`100000`	Max rows per table load
`NESTQL_LOG_LEVEL`	`WARNING`	Log level
`NESTQL_VERBOSE`	(unset)	Set to `1` for DEBUG output

Exception hierarchy

GQLError
├── GQLParseError           # syntax or macro errors
├── GQLAdapterError
│   └── GQLFileNotFoundError  # missing data file
└── GQLQueryComplexityError   # nesting limit exceeded

Web UI (Gradio)

python app.py
# open http://localhost:7865

Features: data-file browser, schema viewer, query editor, Results / AST / Console output views.

Writing a Custom Adapter

from nestql.adapters.base import BaseAdapter
from typing import Any

class MyAdapter(BaseAdapter):
    def execute(
        self,
        table_key: str,
        node: dict[str, Any],
        context: dict[str, Any] | None = None,
    ) -> list[Any]:
        meta = node.get("__meta__", {})
        table = meta.get("table_source", table_key)
        # load data, apply filters from node, recurse for sub-tables …
        return []

Project Layout

nestql/                 # installable Python package
├── __init__.py         # public API re-exports
├── parser.py           # GQLParser: tokenizer + AST builder + macro resolver
├── exceptions.py       # GQLError hierarchy
├── config.py           # GQLConfig (env-var aware)
├── utils.py            # ql_to_json() helper
├── cli.py              # `nestql` CLI entry point
└── adapters/
    ├── base.py         # BaseAdapter ABC
    └── folder.py       # FolderAdapter (CSV/JSON)

nestql-vscode/          # VS Code extension
├── extension.js        # activation + linting logic
├── package.json        # extension manifest
├── syntaxes/
│   └── nestql.tmLanguage.json   # TextMate grammar
├── icons/
│   └── nestql-file.svg          # file icon
└── nestql-icon-theme.json       # icon theme mapping

app.py                  # Gradio web UI
pyproject.toml          # pip install config
data/                   # sample data files
Sample Queries/         # example queries — each in its own subfolder
├── basic/
│   ├── basic.nestql    # query file
│   └── basic.json      # expected output
├── group_by/
│   ├── group_by.nestql
│   └── group_by.json
└── …                   # arthimetic, computed, count, distinct, sort_limit, wildcard, temp_tables, …

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Sample Data		Sample Data
Sample Queries		Sample Queries
nestql-vscode		nestql-vscode
nestql		nestql
.gitignore		.gitignore
README.md		README.md
app.py		app.py
banner.png		banner.png
fetch_imdb.py		fetch_imdb.py
make_banner.py		make_banner.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

NestQL

Installation

VS Code Extension

Installation

Extension settings

Quick Start

Language Reference

Query Structure

Operators

Query Directives

In-row Aggregates (for use with @group)

$global Macros

$tables — Temporary Intermediate Tables

Examples

Basic filtering and nesting

Arithmetic and ranges

Count aggregation

Sum aggregation

Computed columns

LIKE / wildcard filter

ORDER BY, LIMIT, OFFSET

DISTINCT

GROUP BY with in-row aggregates

Sub-table avg / min / max

Pluck operators

String operations

List operations and destructuring

Python API

Configuration via environment variables

Exception hierarchy

Web UI (Gradio)

Writing a Custom Adapter

Project Layout

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

In-row Aggregates (for use with `@group`)

`$global` Macros

`$tables` — Temporary Intermediate Tables

Packages