Skip to content

Commit

Permalink
Merge pull request #295 from larinam/min_tokens_logic_fix
Browse files Browse the repository at this point in the history
Fix min_tokens logic for grouping documents
  • Loading branch information
dartpain committed Aug 5, 2023
2 parents 1687e66 + bed25b3 commit eac7b1e
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion application/parser/token_func.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def group_documents(documents: List[Document], min_tokens: int, max_tokens: int)
current_group = Document(text=doc.text, doc_id=doc.doc_id, embedding=doc.embedding,
extra_info=doc.extra_info)
elif len(tiktoken.get_encoding("cl100k_base").encode(
current_group.text)) + doc_len < max_tokens and doc_len >= min_tokens:
current_group.text)) + doc_len < max_tokens and doc_len < min_tokens:
current_group.text += " " + doc.text
else:
docs.append(current_group)
Expand Down
2 changes: 1 addition & 1 deletion scripts/parser/token_func.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def group_documents(documents: List[Document], min_tokens: int, max_tokens: int)
current_group = Document(text=doc.text, doc_id=doc.doc_id, embedding=doc.embedding,
extra_info=doc.extra_info)
elif len(tiktoken.get_encoding("cl100k_base").encode(
current_group.text)) + doc_len < max_tokens and doc_len >= min_tokens:
current_group.text)) + doc_len < max_tokens and doc_len < min_tokens:
current_group.text += " " + doc.text
else:
docs.append(current_group)
Expand Down

1 comment on commit eac7b1e

@vercel
Copy link

@vercel vercel bot commented on eac7b1e Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully deployed to the following URLs:

docs-gpt – ./

docs-gpt-git-main-arc53.vercel.app
docs-gpt-brown.vercel.app
docs-gpt-arc53.vercel.app

Please sign in to comment.