Skip to content

[ISSUE-4390]CarbonData as new data format for supporting Agents #4391

Merged
chenliang613 merged 2 commits into
apache:masterfrom
chenliang613:master
May 6, 2026
Merged

[ISSUE-4390]CarbonData as new data format for supporting Agents #4391
chenliang613 merged 2 commits into
apache:masterfrom
chenliang613:master

Conversation

@chenliang613
Copy link
Copy Markdown
Contributor

Why is this PR needed?

CarbonData as new data format for supporting Agents

What changes were proposed in this PR?

New modules

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • No

@Bayarea0608
Copy link
Copy Markdown
Contributor

Bayarea0608 commented May 4, 2026

Great PR. some comments:

  1. Please add a detailed introduction, how to understand this module
  2. Please add a DEMO example

@chenliang613
Copy link
Copy Markdown
Contributor Author

Thanks for your comments @Bayarea0608 .
Add agent module introduction to README.md, and also add example for quick DEMO.

@Bayarea0608
Copy link
Copy Markdown
Contributor

i just run demo via : .venv/bin/python examples/carbondata_quickstart.py

Got the below results, looks good.

============================================================

  1. ingest_text — three short documents
    ============================================================
    entities 4
    chunks 11
    embeddings 11

============================================================
2. semantic search — query: 'python web framework'

doc:django (score=0.866) Django is a high-level Python web framework that encourages ...
doc:flask (score=0.866) Flask is a lightweight Python web framework with a minimal c...
doc:django (score=0.000) It ships with an ORM, an admin panel, and a templating syste...

============================================================
3. keyword search — query: 'transformer'

doc:transformer (BM25=1.344) A transformer is a neural network architecture built around ...
doc:transformer (BM25=1.171) Modern embedding models and large language models use the tr...

============================================================
4. hybrid search — query: 'neural training'

doc:transformer (RRF=0.0164) It largely replaced earlier recurrent neural network designs...
doc:transformer (RRF=0.0161) A transformer is a neural network architecture built around ...
doc:transformer (RRF=0.0159) Modern embedding models and large language models use the tr...

============================================================
5. ingest_table — users

team=ml rows: 2
u3 {'id': 'u3', 'lang': 'Python', 'name': 'Carol', 'team': 'ml'}
u2 {'id': 'u2', 'lang': 'Python', 'name': 'Bob', 'team': 'ml'}

============================================================
6. memory — remember + recall

score=0.500 sal=0.9 user prefers django over flask
score=0.354 sal=0.7 user works mainly with python web framework code
score=0.000 sal=0.4 user once asked about a neural transformer model
with min_salience=0.6 2 memories

============================================================
7. graph — relations + traversal

neighbors of doc:django (out):
-> doc:flask compares_with (w=0.8)
-> doc:transformer tangentially_about (w=0.2)
traverse from doc:django, max_hops=2:
hop=1 doc:flask
hop=1 doc:transformer
hop=2 doc:recipe
subgraph from {doc:django, doc:flask}, max_hops=1:
entities ['doc:django', 'doc:flask', 'doc:transformer']
relations [('doc:django', 'doc:flask', 'compares_with'), ('doc:django', 'doc:transformer', 'tangentially_about'), ('doc:flask', 'doc:transformer', 'tangentially_about')]

============================================================
8. filter pushdown — kind=document only

doc:django (document) Django is a high-level Python web framework that encourages ...
doc:flask (document) Flask is a lightweight Python web framework with a minimal c...
doc:django (document) It ships with an ORM, an admin panel, and a templating syste...

============================================================
9. admin

validate.ok (before compact) False
! vector_index[vocab-demo] stale: meta.count=11 != live=18
compact size_before 4096
compact size_after 4096
validate.ok (after compact) True
export bytes 12848
export entities 11
export embeddings 18

============================================================
10. persistence — reopen the file

post-reopen top-1 doc:flask
post-reopen recall user prefers django over flask

Done. Demo data lives at: /var/folders/d3/x_28r1q932g6bq6pxcf8c6rh0000gp/T/carbondata_demo_9rx4uhc0/kb.carbondata

@Bayarea0608
Copy link
Copy Markdown
Contributor

Bayarea0608 commented May 5, 2026

looks good to me, +1

@chenliang613 chenliang613 merged commit dba44c5 into apache:master May 6, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants