Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation History Trimming with max_tokens and max_history_length #8

Merged
merged 4 commits into from Aug 2, 2023

Conversation

adivik2000
Copy link
Contributor

This PR has the PoC for conversation trimming based on tokens given and default lengths.

Added a dummy conversation via ChatGPT - wanted content of diff lengths.

Took a bit of your code from your gist code you had shared, @NirantK . Hope this is what you were looking for in -> #7

Would be helpful if you could guide with further steps.

Thanks,
Aditya Thoomati

@NirantK
Copy link
Owner

NirantK commented Aug 1, 2023

Split this into two:

  1. Move the usage example to docs
  2. Add the trimming logic to Conversation class itself in conversation.py itself

The usage example should show that the trimming works!

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@adivik2000
Copy link
Contributor Author

Split this into two:

  1. Move the usage example to docs
  2. Add the trimming logic to Conversation class itself in conversation.py itself

The usage example should show that the trimming works!

Done. Made the changes requested.

Copy link
Owner

@NirantK NirantK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API Design Changes primarily, core connection/integration logic looks good

agentai/conversation.py Outdated Show resolved Hide resolved
Comment on lines 73 to 76
def get_history(self, max_tokens=100) -> List[Message]:
"""Function to get the conversation history based on the number of tokens"""
self.trim_history_by_tokens(max_tokens=max_tokens)
return self.history
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a utility wrapper. Would prefer to remove this and rename self.trim_history_by_tokens(max_tokens=max_tokens) to get_history instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done.

@@ -0,0 +1,75 @@
import os
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks identical to the notebook? Let's keep only one copy of the logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook didn't work. Was picking agentai library version and not the folder. So, had to use the python file to test things out

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip install -e . might be worth looking into for local development

agentai/conversation.py Outdated Show resolved Hide resolved
Comment on lines +45 to +50
local = threading.local()
try:
enc = local.gpt2enc
except AttributeError:
enc = tiktoken.get_encoding("gpt2")
local.gpt2enc = enc
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check and confirm if the GPT4 tokeniser is same as gpt2? From what I recall, this is wrong. The tokeniser depends on the LLM.

@NirantK NirantK merged commit 7ed24a4 into NirantK:main Aug 2, 2023
@NirantK
Copy link
Owner

NirantK commented Aug 2, 2023

This closes #7

@NirantK NirantK mentioned this pull request Aug 2, 2023
@@ -11,7 +13,7 @@ class Message(BaseModel):


class Conversation:
def __init__(self, history: List[Message] = [], id: Optional[str] = None):
def __init__(self, history: List[Message] = [], id: Optional[str] = None, max_history_tokens: int = 200):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should default to infinite max_history_tokens to be backward compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants