Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Anthropic invocation layer #4818

Merged
merged 26 commits into from
May 11, 2023
Merged

feat: Add Anthropic invocation layer #4818

merged 26 commits into from
May 11, 2023

Conversation

silvanocerza
Copy link
Contributor

Proposed Changes:

Add new invocation layer to support Anthropic Claude.

How did you test it?

I wrote tests and run them locally.

Notes for the reviewer

Supersedes #4512.

@silvanocerza silvanocerza self-assigned this May 5, 2023
@silvanocerza silvanocerza requested a review from a team as a code owner May 5, 2023 12:37
@silvanocerza silvanocerza requested review from julian-risch and removed request for a team May 5, 2023 12:37
@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels May 5, 2023
@silvanocerza silvanocerza requested a review from vblagoje May 5, 2023 12:37
@coveralls
Copy link
Collaborator

coveralls commented May 5, 2023

Coverage Status

Coverage: 37.524% (+1.0%) from 36.543% when pulling fd67dce on AnthropicClaude into 705a2c0 on main.

@julian-risch julian-risch removed their request for review May 8, 2023 07:36
# if a List[str] is used we must also set is_pretokenized to True.
# We split at spaces because if we pass the string directly the encoded prompts
# contains strange characters in place of spaces.
encoded_prompt: Encoding = self.tokenizer.encode(prompt.split(" "), is_pretokenized=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code comment and the code seem to be out of sync? Do we deal with List[str] here, most likely not, right?

Copy link
Contributor Author

@silvanocerza silvanocerza May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the comment is correct.

At line 146 we fail when prompt is a List so in here it can only be a str.

The first two lines clarify why we're setting is_pretokenized and the other clarifies why we split prompt.


kwargs_with_defaults = self.model_input_kwargs

if "stop_sequence" in kwargs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have stop_words as the parameter at the prompt node init level that we then pass to the invocation layer. And this invocation layer then translates "stop_words" to the specific parameter name in the invocation layer. As we stand here, but I am not 100% sure, it seems like stop _words from PromptNode level will not get translated to "stop_sequence" parameter. Please double check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: it will get translated but double-check why the default case when no stop_words are passed from prompt node level text generation fails

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_invoke_non_streamed integration test passes with not issues.
That one invokes the layers with just the prompt, it should be enough shouldn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, but in this case, it is a problem because stop_words from PromptNode get passed as None because we allow None. And Anthropic, unlike all other providers, doesn't like None and barfs the exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. I'll see to handle that case.

stream = (
kwargs_with_defaults.get("stream", False) or kwargs_with_defaults.get("stream_handler", None) is not None
)
stop_words = kwargs_with_defaults.get("stop_words", [human_prompt])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should ensure that user-specified stop_words should be added to the default Anthropic uses, i.e. we should have a list that contains user-specified stop_words and their default ["\n\nHuman:"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's added in any case by Anthropic server side but I handled it in any case.

Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments regarding stop_words

@vblagoje vblagoje self-requested a review May 10, 2023 13:44
Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@silvanocerza silvanocerza merged commit 98947e4 into main May 11, 2023
@silvanocerza silvanocerza deleted the AnthropicClaude branch May 11, 2023 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants