Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration: Improve Chunk splitter + Relationships between chunks for python files / repositories #13446

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jimysancho
Copy link

@jimysancho jimysancho commented May 12, 2024

Description

A new feature has been added using rag-pychunk: Python Library to chunk your python files levereging the python programming language to improve two things:

  • Chunk size: make your chunk size dynamic, keeping in the same chunk a hole funcion, a hole class method, a hole class and block of code.
  • Chunk relationships: create relationships between your chunks other than Parent-Child and Prev-Next. NodeRelationship.REFERENCE has been created, which will include these relationships. These type of relationship can be used as well for example in pdfs when a chunk is referencing another chunk (example: In section 3.7 there is an explanation ... -> section 3.7 will need to be a NodeRelationship.REFERENCE for "self" node).

Motivation: leverage python programming language syntax to improve the Chunking + relationship part of the RAG pipeline. This logic can be used for all programming languages, since all of them have a defined syntax.

New dependency: rag-pychunk library.

Fixes # (issue)

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Added new notebook (that tests end-to-end)

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 12, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant