Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Google's Vertex AI #265

Merged
merged 5 commits into from
Apr 10, 2024

Conversation

flexchar
Copy link
Contributor

@flexchar flexchar commented Mar 20, 2024

Title:
Support for Vertex AI.

Vertex AI doesn't impose geolocation (based on IP) restrictions like Google AI Studio does thus allowing use from within Europe.

Example config to be used with this provider:

{
    "provider": "vertex-ai",
    "vertex_region": "europe-west3",
    "vertex_project_id": "your-project-id",
    "api_key": "ya29...."
}

Note

Api key for google cloud is typically short lived (60 minutes) and can be retrieved through variety of ways including Client SDKs or using CLI gcloud auth application-default print-access-token.

Motivation: (optional)

  • Overcome geo-restrictions.

Related Issues: (optional)

@flexchar flexchar changed the title #10 support vertex ai Support for Google's Vertex AI Mar 20, 2024
@VisargD
Copy link
Collaborator

VisargD commented Mar 29, 2024

Hey @flexchar - Thanks for this PR! Vertex was a frequently requested integration in community.

We have released some changes today related to code formatting as well as changes in provider api config structure. Can you please update your branch with latest changes. Apart from the formatting changes, you will also have to make changes to google-vertex/api.ts. All the provider configs now follow a standardized structure where it will have 3 main functions: getBaseURL, headers and getEndpoint. When these functions are called, they are passed a fixed set of arguments which can be used inside them. You can check some other providers in main branch to get an idea. Please let me know if you need any help.

@flexchar
Copy link
Contributor Author

flexchar commented Mar 29, 2024

It was so messy, I started out fresh and adapted the implementation. I've also introduced support for safety_settings that can be passed as such:

"safety_settings": [
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_NONE"
        },
        {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "threshold": "BLOCK_ONLY_HIGH"
        }
    ]

Embedding API is not set up since I don't have the payload/example to test on the spot. It can totally be added down the road.

};
}

return {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace this with new functions which here released yesterday.

return generateInvalidProviderResponseError(response, VERTEX);

Please check some other provider integrations to get an idea about this.

return `/models/${model}:generateContent`;
}
case 'stream-chatComplete': {
return `/models/${model}:streamGenerateContent`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vertex allows to send ?alt=sse in url to get SSE instead of json stream. I think that will make things easy for us. Should we make that change?

@flexchar
Copy link
Contributor Author

flexchar commented Apr 1, 2024

Hey V, thanks for the review. Great catches.

I'm having trouble getting streaming to work at all.

This is the output I get:


> @portkey-ai/gateway@1.1.0 dev
> wrangler dev src/index.ts

 ⛅️ wrangler 3.1.0 (update available 3.41.0)
------------------------------------------------------
wrangler dev now uses local mode by default, powered by 🔥 Miniflare and 👷 workerd.
To run an edge preview session for your Worker, use wrangler dev --remote
▲ [WARNING] Enabling Node.js compatibility mode for built-ins and globals. This is experimental and has serious tradeoffs. Please see https://github.com/ionic-team/rollup-plugin-node-polyfills/ for more details.


Your worker has access to the following bindings:
- Vars:
  - ENVIRONMENT: "dev"
  - CUSTOM_HEADERS_TO_IGNORE: []
⎔ Starting local server...
[mf:inf] Ready on http://127.0.0.1:8787/
Returned in Retry Attempt 1. Status: true 200
╭─────────────────────────────────────────────────────────────────────────────────────────╮
│ [b open a browser,[d] open Devtools,[l turn off local mode,[c] clear console[x] to exit │
│                                                                                         │
╰─────────────────────────────────────────────────────────────────────────────────────────╯
/Users/luke/dev/portkey-gateway/node_modules/wrangler/wrangler-dist/cli.js:30632
            throw a;
            ^

TypeError [ERR_INVALID_ARG_TYPE]: The "strategy" argument must be of type object. Received type number (0)
    at new ReadableStream (node:internal/webstreams/readablestream:254:5)
    at safeReadableStreamFrom (/Users/luke/dev/portkey-gateway/node_modules/miniflare/dist/src/index.js:8759:10)
    at #handleLoopback (/Users/luke/dev/portkey-gateway/node_modules/miniflare/dist/src/index.js:8959:36)
    at Server.emit (node:events:531:35)
    at parserOnIncoming (node:_http_server:1137:12)
    at HTTPParser.parserOnHeadersComplete (node:_http_common:119:17) {
  code: 'ERR_INVALID_ARG_TYPE'
}

Node.js v21.7.1

It's extremely uninformative. Have you seen this before?

I'm calling with x-portkey-config header only. Never set any strategy before.

@VisargD
Copy link
Collaborator

VisargD commented Apr 1, 2024

This might be happening because by default, vertex sends a JSON stream. So you will have to add the provider here to convert the final response header to event stream.

// Convert GEMINI/COHERE json stream to text/event-stream for non-proxy calls
if (
[GOOGLE, COHERE, BEDROCK].includes(proxyProvider) &&
responseTransformer
) {
return new Response(readable, {
...response,
headers: new Headers({
...Object.fromEntries(response.headers),
'content-type': 'text/event-stream',
}),
});
}

@flexchar
Copy link
Contributor Author

flexchar commented Apr 1, 2024

Doesn't seem to change. I tried with sse endpoint too. I recall it was working before the great refactor but either Google changed something up, or I messed something or who knows.

I'm at a road block. I will try again later. If you would have time to try yourself, I'd appreciate. I pushed the other updates.


EDIT: It gets a bit more hopeful if I use dev:node instead of Wrangler (they tend to mask some errors in my experience) but still mysterious.

I cannot explain where this undefined comes from.

npm run dev:node

> @portkey-ai/gateway@1.1.0 dev:node
> tsx src/start-server.ts

Your AI Gateway is now running on http://localhost:8787 🚀
Returned in Retry Attempt 1. Status: true 200
undefined:3
data: {"candidates": [{"content": {"role": "model","parts": [{"text": "."}]},"finishReason": "STOP","safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.08787644,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.124425635},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.07821887,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.050988145},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.17036992,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.08787644},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.034358688,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.06681233}]}],"usageMetadata": {"promptTokenCount": 5,"candidatesTokenCount": 5,"totalTokenCount": 10}}
^

SyntaxError: Unexpected non-whitespace character after JSON at position 90 (line 3 column 1)
    at JSON.parse (<anonymous>)
    at GoogleChatCompleteStreamChunkTransform (file:///Users/luke/dev/portkey-gateway/src/providers/google-vertex-ai/chatComplete.ts:3:24)
    at readStream (file:///Users/luke/dev/portkey-gateway/src/handlers/streamHandler.ts:1:2649)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async file:///Users/luke/dev/portkey-gateway/src/handlers/streamHandler.ts:1:5328

Node.js v21.7.1

@VisargD
Copy link
Collaborator

VisargD commented Apr 1, 2024

If you are using sse then you will have to handle the data: prefix for each chunk like we do for other providers like mistral-ai, perplexity-ai, etc.

@VisargD
Copy link
Collaborator

VisargD commented Apr 3, 2024

Hey @flexchar - Just checking up on this. Are there any blockers that you are facing for this PR?

@flexchar
Copy link
Contributor Author

flexchar commented Apr 3, 2024

Hey @flexchar - Just checking up on this. Are there any blockers that you are facing for this PR?

Unfortunately I haven't had time to look since last messages. It's the streaming that I need to figure out.

I plan to take a stab at it again this weekend.

@flexchar
Copy link
Contributor Author

flexchar commented Apr 7, 2024

I'm back at it. I updated packages and began using Bun which has superior error logging. It turns out that the chunk from Vertex AI is actually a string of multiple chunks...

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "This is a test"
          }
        ]
      }
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "."
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.08787644,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.124425635
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.07821887,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.050988145
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.17036992,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.08787644
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.034358688,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.06681233
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 5,
    "candidatesTokenCount": 5,
    "totalTokenCount": 10
  }
}

This is why I would get a JSON.parse error, which previously was hidden in the way Wrangler/Node handles rejected promises (would be worth looking into it one day), in

Expected a Response object
330 |   chunk = chunk.trim();
331 |   if (chunk === '[DONE]') {
332 |     return `data: ${chunk}\n\n`;
333 |   }
334 |
335 |   let parsedChunk: GoogleGenerateContentResponse = JSON.parse(chunk);
                                                         ^
SyntaxError: JSON Parse error: Unable to parse JSON string
      at GoogleChatCompleteStreamChunkTransform (src/providers/google-vertex-ai/chatComplete.ts:335:52)

So now I will have to find out who is responsible for passing the responseChunk and why it is not properly chunked.

@flexchar
Copy link
Contributor Author

flexchar commented Apr 7, 2024

Got it! I tested the stream support using the official OpenAI library in TypeScript & Python.

It's wild to think that I wouldn't have solved it without Bun's help, which I became a great fan of since last autumn. It helped me see the exact reply I got from GCP and how the streamHandler.ts was working, thanks to hot reloading and native typescript support with bun run --watch src/start-server.ts.

That being sad, Bun closes the request too early and it never returned response leading me to debug an issue that was never there in the first place. Switching back to node/wrangler got me to the working stage. So it wouldn't be ready to replace node just yet. I would like to propose that in the future because we could add tests using bun as the test runner.

There was one more catch. I updated all packages to the latest to be sure I'm not dealing with a stale issue and that turned out to be wise because wrangler, which supports hot reloading too, was having it's own issue with the TypeError [ERR_INVALID_ARG_TYPE]: The "strategy" argument must be of type object. Received type number (0) as seen before. I don't know why and I will not debug as updating to the latest version fixes the issue.

Visarg, you were right regarding ?alt=sse. It was helpful however the pattern between the chunks was different to Gemini API and the standard OpenAI. It matches Anthropic's style and I added a line to getStreamModeSplitPattern method.

I also updated the fallbackChunkId to include the name of the provider in streamHandler.ts. Let me know if you'd like to remove that.

TL:DR;

  • ?alt=sse was a good idea;
  • Vertex AI uses \r\n\r\n split between SSE chunks;
  • Current Wrangler version caused error that is fixed in latest version;
  • Wrangler/Node was hiding info and Bun has superior logging to debug;
  • Bun seems to drop connection mid stream so not ready to be used for the project;
  • Updated fallbackChunkId to include provider name.

@VisargD
Copy link
Collaborator

VisargD commented Apr 10, 2024

Hey @flexchar - Awesome! I have reviewed the PR and it LGTM. I have added one minor comment. Once you address it, I will merge the PR. And please also resolve fetch the latest changes from main branch and resolve the merge conflicts. Thanks!

Comment: #265 (comment)

@flexchar
Copy link
Contributor Author

Done 👍

@VisargD VisargD merged commit a50f600 into Portkey-AI:main Apr 10, 2024
1 check passed
@VisargD VisargD linked an issue Apr 10, 2024 that may be closed by this pull request
@flexchar flexchar deleted the #10-support-vertex-ai branch April 10, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Provider] Add support for Google Vertex AI
3 participants