-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Add split_length by token in preprocessor #4983
Copy link
Copy link
Closed
Labels
Contributions wanted!Looking for external contributionsLooking for external contributionsP3Low priority, leave it in the backlogLow priority, leave it in the backlogtopic:preprocessing
Description
Is your feature request related to a problem? Please describe.
With LLM like chatgpt, its measurement is token, not word. In order to maximize the embedding etc, it is best to have split by token feature
Describe the solution you'd like
Add another split choice token, and measure the chunk by token size.
Other packages like langchain, llama-index already has this feature.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Contributions wanted!Looking for external contributionsLooking for external contributionsP3Low priority, leave it in the backlogLow priority, leave it in the backlogtopic:preprocessing
Type
Projects
Status
Done