Support for Google's Vertex AI #265

flexchar · 2024-03-20T08:20:36Z

Title:
Support for Vertex AI.

Vertex AI doesn't impose geolocation (based on IP) restrictions like Google AI Studio does thus allowing use from within Europe.

Example config to be used with this provider:

{
    "provider": "vertex-ai",
    "vertex_region": "europe-west3",
    "vertex_project_id": "your-project-id",
    "api_key": "ya29...."
}

Note

Api key for google cloud is typically short lived (60 minutes) and can be retrieved through variety of ways including Client SDKs or using CLI gcloud auth application-default print-access-token.

Motivation: (optional)

Overcome geo-restrictions.

Related Issues: (optional)

[Provider] Add support for Google Vertex AI #10

VisargD · 2024-03-29T11:27:10Z

Hey @flexchar - Thanks for this PR! Vertex was a frequently requested integration in community.

We have released some changes today related to code formatting as well as changes in provider api config structure. Can you please update your branch with latest changes. Apart from the formatting changes, you will also have to make changes to google-vertex/api.ts. All the provider configs now follow a standardized structure where it will have 3 main functions: getBaseURL, headers and getEndpoint. When these functions are called, they are passed a fixed set of arguments which can be used inside them. You can check some other providers in main branch to get an idea. Please let me know if you need any help.

flexchar · 2024-03-29T12:51:17Z

It was so messy, I started out fresh and adapted the implementation. I've also introduced support for safety_settings that can be passed as such:

"safety_settings": [
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_NONE"
        },
        {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "threshold": "BLOCK_ONLY_HIGH"
        }
    ]

Embedding API is not set up since I don't have the payload/example to test on the spot. It can totally be added down the road.

VisargD · 2024-03-30T06:55:26Z

src/providers/google-vertex-ai/chatComplete.ts

+    };
+  }
+
+  return {


You can replace this with new functions which here released yesterday.

return generateInvalidProviderResponseError(response, VERTEX);

Please check some other provider integrations to get an idea about this.

src/providers/google-vertex-ai/chatComplete.ts

VisargD · 2024-03-30T07:12:51Z

src/providers/google-vertex-ai/api.ts

+        return `/models/${model}:generateContent`;
+      }
+      case 'stream-chatComplete': {
+        return `/models/${model}:streamGenerateContent`;


Vertex allows to send ?alt=sse in url to get SSE instead of json stream. I think that will make things easy for us. Should we make that change?

flexchar · 2024-04-01T11:17:13Z

Hey V, thanks for the review. Great catches.

I'm having trouble getting streaming to work at all.

This is the output I get:


> @portkey-ai/gateway@1.1.0 dev
> wrangler dev src/index.ts

 ⛅️ wrangler 3.1.0 (update available 3.41.0)
------------------------------------------------------
wrangler dev now uses local mode by default, powered by 🔥 Miniflare and 👷 workerd.
To run an edge preview session for your Worker, use wrangler dev --remote
▲ [WARNING] Enabling Node.js compatibility mode for built-ins and globals. This is experimental and has serious tradeoffs. Please see https://github.com/ionic-team/rollup-plugin-node-polyfills/ for more details.


Your worker has access to the following bindings:
- Vars:
  - ENVIRONMENT: "dev"
  - CUSTOM_HEADERS_TO_IGNORE: []
⎔ Starting local server...
[mf:inf] Ready on http://127.0.0.1:8787/
Returned in Retry Attempt 1. Status: true 200
╭─────────────────────────────────────────────────────────────────────────────────────────╮
│ [b open a browser,[d] open Devtools,[l turn off local mode,[c] clear console[x] to exit │
│                                                                                         │
╰─────────────────────────────────────────────────────────────────────────────────────────╯
/Users/luke/dev/portkey-gateway/node_modules/wrangler/wrangler-dist/cli.js:30632
            throw a;
            ^

TypeError [ERR_INVALID_ARG_TYPE]: The "strategy" argument must be of type object. Received type number (0)
    at new ReadableStream (node:internal/webstreams/readablestream:254:5)
    at safeReadableStreamFrom (/Users/luke/dev/portkey-gateway/node_modules/miniflare/dist/src/index.js:8759:10)
    at #handleLoopback (/Users/luke/dev/portkey-gateway/node_modules/miniflare/dist/src/index.js:8959:36)
    at Server.emit (node:events:531:35)
    at parserOnIncoming (node:_http_server:1137:12)
    at HTTPParser.parserOnHeadersComplete (node:_http_common:119:17) {
  code: 'ERR_INVALID_ARG_TYPE'
}

Node.js v21.7.1

It's extremely uninformative. Have you seen this before?

I'm calling with x-portkey-config header only. Never set any strategy before.

VisargD · 2024-04-01T11:50:09Z

This might be happening because by default, vertex sends a JSON stream. So you will have to add the provider here to convert the final response header to event stream.

gateway/src/handlers/streamHandler.ts

Lines 275 to 287 in 0454d41

    
           // Convert GEMINI/COHERE json stream to text/event-stream for non-proxy calls 
        
           if ( 
        
             [GOOGLE, COHERE, BEDROCK].includes(proxyProvider) && 
        
             responseTransformer 
        
           ) { 
        
             return new Response(readable, { 
        
               ...response, 
        
               headers: new Headers({ 
        
                 ...Object.fromEntries(response.headers), 
        
                 'content-type': 'text/event-stream', 
        
               }), 
        
             }); 
        
           }

flexchar · 2024-04-01T11:55:22Z

Doesn't seem to change. I tried with sse endpoint too. I recall it was working before the great refactor but either Google changed something up, or I messed something or who knows.

I'm at a road block. I will try again later. If you would have time to try yourself, I'd appreciate. I pushed the other updates.

EDIT: It gets a bit more hopeful if I use dev:node instead of Wrangler (they tend to mask some errors in my experience) but still mysterious.

I cannot explain where this undefined comes from.

npm run dev:node

> @portkey-ai/gateway@1.1.0 dev:node
> tsx src/start-server.ts

Your AI Gateway is now running on http://localhost:8787 🚀
Returned in Retry Attempt 1. Status: true 200
undefined:3
data: {"candidates": [{"content": {"role": "model","parts": [{"text": "."}]},"finishReason": "STOP","safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.08787644,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.124425635},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.07821887,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.050988145},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.17036992,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.08787644},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.034358688,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.06681233}]}],"usageMetadata": {"promptTokenCount": 5,"candidatesTokenCount": 5,"totalTokenCount": 10}}
^

SyntaxError: Unexpected non-whitespace character after JSON at position 90 (line 3 column 1)
    at JSON.parse (<anonymous>)
    at GoogleChatCompleteStreamChunkTransform (file:///Users/luke/dev/portkey-gateway/src/providers/google-vertex-ai/chatComplete.ts:3:24)
    at readStream (file:///Users/luke/dev/portkey-gateway/src/handlers/streamHandler.ts:1:2649)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async file:///Users/luke/dev/portkey-gateway/src/handlers/streamHandler.ts:1:5328

Node.js v21.7.1

VisargD · 2024-04-01T12:09:45Z

If you are using sse then you will have to handle the data: prefix for each chunk like we do for other providers like mistral-ai, perplexity-ai, etc.

VisargD · 2024-04-03T06:32:35Z

Hey @flexchar - Just checking up on this. Are there any blockers that you are facing for this PR?

flexchar · 2024-04-03T08:57:09Z

Hey @flexchar - Just checking up on this. Are there any blockers that you are facing for this PR?

Unfortunately I haven't had time to look since last messages. It's the streaming that I need to figure out.

I plan to take a stab at it again this weekend.

flexchar · 2024-04-07T06:53:59Z

I'm back at it. I updated packages and began using Bun which has superior error logging. It turns out that the chunk from Vertex AI is actually a string of multiple chunks...

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "This is a test"
          }
        ]
      }
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "."
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.08787644,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.124425635
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.07821887,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.050988145
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.17036992,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.08787644
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.034358688,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.06681233
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 5,
    "candidatesTokenCount": 5,
    "totalTokenCount": 10
  }
}

This is why I would get a JSON.parse error, which previously was hidden in the way Wrangler/Node handles rejected promises (would be worth looking into it one day), in

Expected a Response object
330 |   chunk = chunk.trim();
331 |   if (chunk === '[DONE]') {
332 |     return `data: ${chunk}\n\n`;
333 |   }
334 |
335 |   let parsedChunk: GoogleGenerateContentResponse = JSON.parse(chunk);
                                                         ^
SyntaxError: JSON Parse error: Unable to parse JSON string
      at GoogleChatCompleteStreamChunkTransform (src/providers/google-vertex-ai/chatComplete.ts:335:52)

So now I will have to find out who is responsible for passing the responseChunk and why it is not properly chunked.

flexchar · 2024-04-07T08:54:33Z

Got it! I tested the stream support using the official OpenAI library in TypeScript & Python.

It's wild to think that I wouldn't have solved it without Bun's help, which I became a great fan of since last autumn. It helped me see the exact reply I got from GCP and how the streamHandler.ts was working, thanks to hot reloading and native typescript support with bun run --watch src/start-server.ts.

That being sad, Bun closes the request too early and it never returned response leading me to debug an issue that was never there in the first place. Switching back to node/wrangler got me to the working stage. So it wouldn't be ready to replace node just yet. I would like to propose that in the future because we could add tests using bun as the test runner.

There was one more catch. I updated all packages to the latest to be sure I'm not dealing with a stale issue and that turned out to be wise because wrangler, which supports hot reloading too, was having it's own issue with the TypeError [ERR_INVALID_ARG_TYPE]: The "strategy" argument must be of type object. Received type number (0) as seen before. I don't know why and I will not debug as updating to the latest version fixes the issue.

Visarg, you were right regarding ?alt=sse. It was helpful however the pattern between the chunks was different to Gemini API and the standard OpenAI. It matches Anthropic's style and I added a line to getStreamModeSplitPattern method.

I also updated the fallbackChunkId to include the name of the provider in streamHandler.ts. Let me know if you'd like to remove that.

TL:DR;

?alt=sse was a good idea;
Vertex AI uses \r\n\r\n split between SSE chunks;
Current Wrangler version caused error that is fixed in latest version;
Wrangler/Node was hiding info and Bun has superior logging to debug;
Bun seems to drop connection mid stream so not ready to be used for the project;
Updated fallbackChunkId to include provider name.

src/providers/google-vertex-ai/chatComplete.ts

VisargD · 2024-04-10T12:30:49Z

Hey @flexchar - Awesome! I have reviewed the PR and it LGTM. I have added one minor comment. Once you address it, I will merge the PR. And please also resolve fetch the latest changes from main branch and resolve the merge conflicts. Thanks!

Comment: #265 (comment)

flexchar · 2024-04-10T12:43:24Z

Done 👍

vrushankportkey requested a review from VisargD March 20, 2024 08:45

flexchar changed the title ~~#10 support vertex ai~~ Support for Google's Vertex AI Mar 20, 2024

Deterty88 approved these changes Mar 20, 2024

View reviewed changes

support vertex ai

5d30f4c

flexchar force-pushed the #10-support-vertex-ai branch from 6a9a1de to 5d30f4c Compare March 29, 2024 12:49

VisargD reviewed Mar 30, 2024

View reviewed changes

src/providers/google-vertex-ai/chatComplete.ts Show resolved Hide resolved

VisargD reviewed Mar 30, 2024

View reviewed changes

src/providers/google-vertex-ai/chatComplete.ts Show resolved Hide resolved

VisargD reviewed Mar 30, 2024

View reviewed changes

use standard methods

6392d10

support streaming for Vertex AI

1547882

VisargD reviewed Apr 10, 2024

View reviewed changes

src/providers/google-vertex-ai/chatComplete.ts Outdated Show resolved Hide resolved

flexchar added 2 commits April 10, 2024 14:40

remove obsolete methods on the chunk

d774ee9

Merge branch 'main' into Portkey-AI#10-support-vertex-ai

f82669e

VisargD approved these changes Apr 10, 2024

View reviewed changes

VisargD merged commit a50f600 into Portkey-AI:main Apr 10, 2024
1 check passed

VisargD linked an issue Apr 10, 2024 that may be closed by this pull request

[Provider] Add support for Google Vertex AI #10

Closed

flexchar deleted the #10-support-vertex-ai branch April 10, 2024 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Google's Vertex AI #265

Support for Google's Vertex AI #265

flexchar commented Mar 20, 2024 •

edited

VisargD commented Mar 29, 2024

flexchar commented Mar 29, 2024 •

edited

VisargD Mar 30, 2024

VisargD Mar 30, 2024

flexchar commented Apr 1, 2024

VisargD commented Apr 1, 2024

flexchar commented Apr 1, 2024 •

edited

VisargD commented Apr 1, 2024

VisargD commented Apr 3, 2024

flexchar commented Apr 3, 2024

flexchar commented Apr 7, 2024

flexchar commented Apr 7, 2024

VisargD commented Apr 10, 2024

flexchar commented Apr 10, 2024

Support for Google's Vertex AI #265

Support for Google's Vertex AI #265

Conversation

flexchar commented Mar 20, 2024 • edited

VisargD commented Mar 29, 2024

flexchar commented Mar 29, 2024 • edited

VisargD Mar 30, 2024

Choose a reason for hiding this comment

VisargD Mar 30, 2024

Choose a reason for hiding this comment

flexchar commented Apr 1, 2024

VisargD commented Apr 1, 2024

flexchar commented Apr 1, 2024 • edited

VisargD commented Apr 1, 2024

VisargD commented Apr 3, 2024

flexchar commented Apr 3, 2024

flexchar commented Apr 7, 2024

flexchar commented Apr 7, 2024

VisargD commented Apr 10, 2024

flexchar commented Apr 10, 2024

flexchar commented Mar 20, 2024 •

edited

flexchar commented Mar 29, 2024 •

edited

flexchar commented Apr 1, 2024 •

edited