Skip to content

AkashSCIENTIST/nestql

Repository files navigation

NestQL

NestQL banner

A lightweight query engine that treats a folder of CSV/JSON files as a relational database. Write queries in a compact GraphQL-like syntax and get back nested JSON — no SQL, no database server.

Installation

# Core engine only
pip install .

# With web UI (Gradio)
pip install ".[ui]"

# Development (editable)
pip install -e .

After installation the nestql command is available globally:

nestql --help
nestql query.nestql
nestql query.nestql --data-folder ./my-data
nestql query.nestql --no-ast -v

VS Code Extension

The nestql-vscode/ folder contains a VS Code extension that provides:

  • Syntax highlighting for .nestql files
  • File icons for .nestql files in the Explorer
  • Live linting (structural brace checks + Python-backed deep parsing)

Installation

Option A — install from folder (recommended for local use)

  1. Open VS Code.

  2. Open the Command Palette (Ctrl+Shift+P / Cmd+Shift+P) and run Extensions: Install from VSIX…

  3. If you have already packaged the extension, select the .vsix file and click Install.

    Or, to install directly from the source folder without packaging:

    • Open the Command Palette and run Developer: Install Extension from Location…
    • Select the nestql-vscode/ folder.

Option B — package and install

# Install the VS Code Extension CLI (once)
npm install -g @vscode/vsce

# From inside the extension folder
cd nestql-vscode
vsce package          # produces nestql-1.0.0.vsix
code --install-extension nestql-1.0.0.vsix

Extension settings

Setting Default Description
nestql.pythonPath "python" Python executable used for deep linting. Set to a venv path if needed (e.g. "/home/user/.venv/bin/python").

The extension automatically lints on open and on every keystroke (debounced 500 ms). Structural errors (unmatched braces, bad alias syntax) are shown immediately; Python-backed errors appear within a second.


Quick Start

from nestql import FolderAdapter

results = FolderAdapter(folder_path="./data").execute("""
<directors> {
    name
    country : {"India", "USA"}
}
""")
# results → {"directors": [{"name": "Rajamouli", "country": "India"}, …]}

Language Reference

Query Structure

$global {               # optional macro block
    var : value
}

$tables {               # optional temporary intermediate tables (excluded from output)
    alias := <source> { fields … }
}

<table_name> {          # select a CSV/JSON file
    field               # include field
    field : filter      # filter field (equality, range, IN list, LIKE)
    field := alias      # rename field in output
    ~field              # internal field (used in logic, hidden from output)
    !<sub_table> { }    # strict sub-table (hide parent row if sub is empty)
    <sub_table> *{ }    # single pluck — return list of value arrays
    <sub_table> **{ }   # double pluck — flatten all values into one array
    func(<table> { })   # sub-table aggregate: count(), sum(), avg(), min(), max()
    count() := alias    # in-row aggregate (use with @group)
    avg(field) := alias # in-row field aggregate (use with @group)
    (expr) := alias     # computed column — arithmetic on row fields
    @sort  : f, -f2     # ORDER BY (prefix - = descending)
    @limit : N          # LIMIT
    @offset: N          # OFFSET
    @distinct           # DISTINCT
    @group : field      # GROUP BY
    @having: f op val   # HAVING (requires @group)
} := result_alias

Operators

Syntax Meaning
<name> Select table / file
field Include field in output
field : value Equality or range filter
field : {"a","b"} IN filter (list)
field : [min, max] Inclusive range filter
field : (min, max) Exclusive range filter
field := alias Rename field / table in output
~field Internal — used in filters, excluded from output
!<table> Strict — hide parent row when sub-table is empty
*{ } Single pluck — return [[v1,v2],[v3,v4],…]
**{ } Double pluck — flatten to [v1,v2,v3,v4,…]
field = parent_var Join on parent row's context variable
$var Reference a global macro
count(<table>{}) Count rows in sub-table
sum(<table>{field}) Sum a numeric field across sub-table rows
avg(<table>{field}) Average a numeric field across sub-table rows
min(<table>{field}) Minimum value across sub-table rows
max(<table>{field}) Maximum value across sub-table rows
field : "prefix%" LIKE / wildcard filter — % or * match any substring
(expr) := alias Computed column — arithmetic on row fields

Query Directives

Place these inside a { } block to control result shape.

Directive Meaning
@sort : field, -field2 ORDER BY — prefix - for descending
@limit : N Return at most N rows
@offset : N Skip the first N rows
@distinct Remove duplicate rows
@group : field1, field2 GROUP BY — collapses rows into groups
@having : field op value Filter on grouped results (requires @group)

In-row Aggregates (for use with @group)

Syntax Meaning
count() := alias Count rows in each group
sum(field) := alias Sum a field across each group
avg(field) := alias Average a field across each group
min(field) := alias Minimum field value in each group
max(field) := alias Maximum field value in each group

$global Macros

Macros support arithmetic, bitwise ops, ranges, lists, f-strings, destructuring, and print() for debugging.

$global {
    min_b  : 50
    range  : [$min_b, $min_b * 2]      # range → {__ge__: 50, __le__: 100}
    mask   : 1 << 3                     # bitwise → 8
    flags  : {$mask, 16}                # list  → [8, 16]
    label  : f"Budget ≥ {$min_b}"      # f-string
    {a, *rest, z} : {1,2,3,4,5}        # destructuring
    print($range)                       # debug print during parsing
}

Supported expression operators: + - * / // % ** << >> & | ^ ~

$tables — Temporary Intermediate Tables

Pre-execute a sub-query and bind the result to a name. Temporary tables are available as join targets inside subsequent blocks but are excluded from the final output.

$tables {
    big_movies := <movies> {
        name
        director_id
        budget : [100, 999]
        rating
    }
}

<directors> {
    ~id
    name
    <big_movies> {
        name := title
        ~director_id = id
        rating
    } := hits
}

Examples

All examples use the sample data in data/:

directors.csv

id name country experience
d1 Rajamouli India 25
d2 Hari India 15
d3 Shankar India 30
d4 James Cameron USA 40

movies.json

id name director_id budget rating genre
1 Bahubali d1 200 8.1 Action
2 Singam d2 50 6.5 Action
3 Robo d3 150 7.1 Sci-Fi
4 S3 d2 5 6.2 Action
5 Avatar d4 237 7.9 Sci-Fi

Basic filtering and nesting

Sample Queries/basic/basic.nestql

$global {
    min_budget       : 50
    low_movie_budget : [$min_budget, 100]
    high_movie_budget: [101, 500]
    target_country   : {"India", "Sri Lanka"}
}

<directors> {
    id := dir_var
    country : $target_country
    name
    <movies> *{
        name
        ~budget       : $low_movie_budget
        ~director_id = dir_var := directors_id
    } := low_budget_movies
    <movies> *{
        name
        budget        : $high_movie_budget
        ~director_id = dir_var := directors_id
    } := high_budget_movies
} := directors_india

<directors> {
    id := dir_var
    country : "USA"
    name
    <movies> {
        name
        ~budget       : $low_movie_budget
        ~director_id = dir_var := directors_id
    } := low_budget_movies
    <movies> {
        name
        budget        : $high_movie_budget
        ~director_id = dir_var := directors_id
    } := high_budget_movies
} := directors_usa
Output
{
    "directors_india": [
        {"dir_var": "d1", "country": "India", "name": "Rajamouli",
         "low_budget_movies": [], "high_budget_movies": [{"name": "Bahubali", "budget": "200"}]},
        {"dir_var": "d2", "country": "India", "name": "Hari",
         "low_budget_movies": [["Singam"]], "high_budget_movies": []},
        {"dir_var": "d3", "country": "India", "name": "Shankar",
         "low_budget_movies": [], "high_budget_movies": [{"name": "Robo", "budget": "150"}]}
    ],
    "directors_usa": [
        {"dir_var": "d4", "country": "USA", "name": "James Cameron",
         "low_budget_movies": [], "high_budget_movies": [{"name": "Avatar", "budget": "237"}]}
    ]
}

Arithmetic and ranges

Sample Queries/arthimetic/arthimetic.nestql

$global {
    min_budget        : 50
    print("min_budget :", $min_budget)
    double_min        : $min_budget * 2
    low_movie_budget  : [$min_budget, $min_budget * 2]
    high_movie_budget : [$double_min + 1, 1 << 9]
    mask              : 1 << 3
    print($mask)
    flags             : {$mask, 8}
    target_country    : {"India", "USA"}
    print($target_country)
}

<directors> {
    id := dir_var,
    country : $target_country,
    name,
    <movies> {
        name,
        ~budget       : $low_movie_budget,
        ~director_id = dir_var := directors_id
    } := low_budget_movies,
    <movies> {
        name,
        budget        : $high_movie_budget,
        ~director_id = dir_var := directors_id
    } := high_budget_movies
} := directors_math_1

Console output (from print() calls during parsing):

min_budget : 50
8
{'India', 'USA'}

Count aggregation

Sample Queries/count/count.nestql

<directors> {
    id := dir_var
    country,
    name,
    count(<movies> {
        name,
        budget,
        ~director_id = dir_var := directors_id
    }) := alias_count,
    count(<movies> {
        name,
        budget,
        ~director_id = dir_var := directors_id
    })
}
Output
{
    "directors": [
        {"dir_var": "d1", "country": "India", "name": "Rajamouli", "alias_count": 1, "count": 1},
        {"dir_var": "d2", "country": "India", "name": "Hari",       "alias_count": 2, "count": 2},
        {"dir_var": "d3", "country": "India", "name": "Shankar",    "alias_count": 1, "count": 1},
        {"dir_var": "d4", "country": "USA",   "name": "James Cameron", "alias_count": 1, "count": 1}
    ]
}

Sum aggregation

Sample Queries/sum/sum.nestql

<directors> {
    id := dir_var,
    country,
    name,
    sum(<movies> {
        budget,
        ~director_id = dir_var := directors_id
    }) := budget_consumed
}
Output
{
    "directors": [
        {"dir_var": "d1", "country": "India", "name": "Rajamouli",    "budget_consumed": 200},
        {"dir_var": "d2", "country": "India", "name": "Hari",          "budget_consumed": 55},
        {"dir_var": "d3", "country": "India", "name": "Shankar",       "budget_consumed": 150},
        {"dir_var": "d4", "country": "USA",   "name": "James Cameron", "budget_consumed": 237}
    ]
}

Computed columns

<movies> {
    name
    budget
    rating
    (budget * 1.1)    := adjusted_budget
    (rating * 10)     := score
    @sort : -adjusted_budget
    @limit : 3
}
Output
{
    "movies": [
        {"name": "Avatar",   "budget": "237", "rating": "7.9", "adjusted_budget": 260.7, "score": 79},
        {"name": "Bahubali", "budget": "200", "rating": "8.1", "adjusted_budget": 220.0, "score": 81},
        {"name": "Robo",     "budget": "150", "rating": "7.1", "adjusted_budget": 165,   "score": 71}
    ]
}

LIKE / wildcard filter

Use % or * in a string filter for SQL-style pattern matching:

<movies> {
    name
    genre : "Sci%"
}
Output
{
    "movies": [
        {"name": "Robo"},
        {"name": "Avatar"}
    ]
}

ORDER BY, LIMIT, OFFSET

<movies> {
    name
    budget
    rating
    @sort   : -budget, -rating
    @limit  : 3
    @offset : 1
}
Output
{
    "movies": [
        {"name": "Bahubali", "budget": "200", "rating": "8.1"},
        {"name": "Robo",     "budget": "150", "rating": "7.1"},
        {"name": "Singam",   "budget": "50",  "rating": "6.5"}
    ]
}

DISTINCT

<directors> {
    country
    @distinct
}
Output
{
    "directors": [
        {"country": "India"},
        {"country": "USA"}
    ]
}

GROUP BY with in-row aggregates

<movies> {
    genre
    count()      := num_movies
    avg(budget)  := avg_budget
    max(rating)  := top_rating
    @group  : genre
    @having : num_movies >= 2
    @sort   : -avg_budget
}
Output
{
    "movies": [
        {"genre": "Sci-Fi",  "num_movies": 2, "avg_budget": 193.5, "top_rating": 7.9},
        {"genre": "Action",  "num_movies": 3, "avg_budget": 85.0,  "top_rating": 8.1}
    ]
}

Sub-table avg / min / max

<directors> {
    ~id
    name
    avg(<movies> { budget, ~director_id = id }) := avg_budget
    min(<movies> { budget, ~director_id = id }) := min_budget
    max(<movies> { budget, ~director_id = id }) := max_budget
    @sort : -max_budget
}
Output
{
    "directors": [
        {"name": "James Cameron", "avg_budget": 237.0, "min_budget": 237.0, "max_budget": 237.0},
        {"name": "Rajamouli",     "avg_budget": 200.0, "min_budget": 200.0, "max_budget": 200.0},
        {"name": "Shankar",       "avg_budget": 150.0, "min_budget": 150.0, "max_budget": 150.0},
        {"name": "Hari",          "avg_budget": 27.5,  "min_budget": 5.0,   "max_budget": 50.0}
    ]
}

Pluck operators

Sample Queries/pluck/pluck.nestql* returns a 2-D list; ** flattens to 1-D.

<directors> {
    id := dir_var
    country,
    name,
    <movies> *{
        name,
        budget,
        ~director_id = dir_var := directors_id
    } := movies_single_pluck
    <movies> **{
        name,
        budget,
        ~director_id = dir_var := directors_id
    } := movies_double_pluck
}
Output (single pluck gives `[[name, budget], …]`; double pluck flattens everything)
{
    "directors": [
        {
            "dir_var": "d1", "country": "India", "name": "Rajamouli",
            "movies_single_pluck": [["Bahubali", "200"]],
            "movies_double_pluck": ["Bahubali", "200"]
        },
        {
            "dir_var": "d2", "country": "India", "name": "Hari",
            "movies_single_pluck": [["Singam", "50"], ["S3", "5"]],
            "movies_double_pluck": ["Singam", "50", "S3", "5"]
        }
    ]
}

String operations

Sample Queries/string/string.nestql

$global {
    s_hello      : "Hello"
    s_world      : "World"
    s_combined   : $s_hello + ", " + $s_world + "!"
    laugh        : "ha" * 4
    names        : {"Rajamouli", "Hari", "Shankar", "James Cameron"}
    pi           : 3.14159
    greeting_fmt : f"{ $s_hello } { $s_world } - PI={ $pi:.2f }"

    print($s_combined)
    print($laugh)
    print($names)
    print($greeting_fmt)
}

<directors> {
    id := dir_var,
    name,
    country,
    name : $names
} := directors_named

Console output:

Hello, World!
hahahaha
{'Rajamouli', 'Hari', 'Shankar', 'James Cameron'}
Hello World - PI=3.14

List operations and destructuring

Sample Queries/list/list.nestql

$global {
    list_1 : {1, 2, 3}
    list_2 : {"a", 'b', 4.5}
    list_3 : {*list_1, *list_2}         # python-style splat
    list_4 : list_1 + list_2
    {list_a, *list_b, list_c} : list_4  # destructuring with rest
    list_5 : list_1 * 2
    list_6 : {...list_1, ...list_2}      # js-style spread

    print("List 1 :", $list_1)
    print("List 2 :", $list_2)
    print("List 3 :", $list_3)
    print("List 4 :", $list_4)
    print("List 5 :", $list_5)
    print("List 6 :", $list_6)
    print("List a :", $list_a)
    print("List b :", $list_b)
    print("List c :", $list_c)
}

Console output:

List 1 : [1, 2, 3]
List 2 : ['a', 'b', 4.5]
List 3 : [1, 2, 3, 'a', 'b', 4.5]
List 4 : [1, 2, 3, 'a', 'b', 4.5]
List 5 : [1, 2, 3, 1, 2, 3]
List 6 : [1, 2, 3, 'a', 'b', 4.5]
List a : 1
List b : [2, 3, 'a', 'b']
List c : 4.5

Python API

from nestql import FolderAdapter, GQLParser, GQLConfig

# ── Simple: pass the query string directly ────────────────────────────────
adapter = FolderAdapter(folder_path="./data")
results = adapter.execute(open("query.nestql").read())
# returns {"alias": [rows…], …}  — $tables blocks handled automatically

# ── Advanced: parse first, then execute individual tables ─────────────────
config  = GQLConfig(data_folder="./data", max_rows_per_table=50_000)
adapter = FolderAdapter(folder_path="./data", config=config)
ast     = GQLParser().parse(open("query.nestql").read())

tables_ast = ast.pop("__tables__", {})
adapter.load_temp_tables(tables_ast)

results = {}
for table_name, node in ast.items():
    table_source = node["__meta__"]["table_source"]
    results[node["__meta__"].get("alias", table_name)] = adapter.execute(table_source, node)

Configuration via environment variables

Variable Default Description
NESTQL_DATA_FOLDER ./data Data directory
NESTQL_MAX_DEPTH 10 Max query nesting depth
NESTQL_MAX_ROWS 100000 Max rows per table load
NESTQL_LOG_LEVEL WARNING Log level
NESTQL_VERBOSE (unset) Set to 1 for DEBUG output

Exception hierarchy

GQLError
├── GQLParseError           # syntax or macro errors
├── GQLAdapterError
│   └── GQLFileNotFoundError  # missing data file
└── GQLQueryComplexityError   # nesting limit exceeded

Web UI (Gradio)

python app.py
# open http://localhost:7865

Features: data-file browser, schema viewer, query editor, Results / AST / Console output views.


Writing a Custom Adapter

from nestql.adapters.base import BaseAdapter
from typing import Any

class MyAdapter(BaseAdapter):
    def execute(
        self,
        table_key: str,
        node: dict[str, Any],
        context: dict[str, Any] | None = None,
    ) -> list[Any]:
        meta = node.get("__meta__", {})
        table = meta.get("table_source", table_key)
        # load data, apply filters from node, recurse for sub-tables …
        return []

Project Layout

nestql/                 # installable Python package
├── __init__.py         # public API re-exports
├── parser.py           # GQLParser: tokenizer + AST builder + macro resolver
├── exceptions.py       # GQLError hierarchy
├── config.py           # GQLConfig (env-var aware)
├── utils.py            # ql_to_json() helper
├── cli.py              # `nestql` CLI entry point
└── adapters/
    ├── base.py         # BaseAdapter ABC
    └── folder.py       # FolderAdapter (CSV/JSON)

nestql-vscode/          # VS Code extension
├── extension.js        # activation + linting logic
├── package.json        # extension manifest
├── syntaxes/
│   └── nestql.tmLanguage.json   # TextMate grammar
├── icons/
│   └── nestql-file.svg          # file icon
└── nestql-icon-theme.json       # icon theme mapping

app.py                  # Gradio web UI
pyproject.toml          # pip install config
data/                   # sample data files
Sample Queries/         # example queries — each in its own subfolder
├── basic/
│   ├── basic.nestql    # query file
│   └── basic.json      # expected output
├── group_by/
│   ├── group_by.nestql
│   └── group_by.json
└── …                   # arthimetic, computed, count, distinct, sort_limit, wildcard, temp_tables, …

License

MIT

About

A lightweight query engine that treats a folder of CSV/JSON files as a relational database. Write queries in a compact GraphQL-like syntax and get back nested JSON — no SQL, no database server.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors