Skip to content

d1pankarmedhi/chunkr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

12 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

chunkr

A fast and quick chunking library for ๐Ÿฆ€

Latest version License

The project aims to help Rust developers build text and language-based applications that utilize some kind of documents or text. It is built for developers to chunkify large documents into smaller chunks without using heavy resources.

use chunkr to split large pdf documents into smaller chunks for LLM training and RAG (Retrieval Augmented Generation) application development.

๐Ÿš€ Getting Started

To add chunkr to your project and start chunking, use the cargo cli

cargo add chunkr

There are some examples mentioned in the examples directory. Checkout those to get started.

To checkout code and build it yourself

Clone the repository and run one of the examples from the examples directory.

git clone https://github.com/d1pankarmedhi/chunkr.git
cd chunkr

๐Ÿ—๏ธ Examples

Check out these examples to quickly get started:

Chunking

These are some chunking strategy examples:

Run them using the cargo command like:

# cargo run --example example-name chunk-size overlap file-path
cargo run --example chunk_document 1000 20 /home/home/Downloads/clean_code.pdf

๐Ÿ’ก Contributing

As an open-source project, we are open to all kinds of contributions, be it through code, documentation, issues, bugs, or even feature suggestions.

Feel free to check out Contribution guide for more details.

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE.md file for details