Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create separate proxy-based wrapper for openai. #65

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

manugoyal
Copy link
Contributor

@manugoyal manugoyal commented Dec 17, 2023

This is the new default for when people call wrap_openai, though they can opt-into using the non-proxy wrappers by passing proxy_base_url=None (or equivalent in javascript).

Other changes to the SDK to facilitate this:

  • Add logic for serializing spans, so we can send the root span over to the proxy.

  • Add logic for serializing login info, so we can send that over to the proxy and instantiate a BackgroundLogger for the span using the serialized state. This requires parameterizing the BackgroundLogger on a given logConn, rather than using the global state connection.

  • Change the get_caller_location logic to ignore files in the stacktrace that in a subdirectory of the first stacktrace entry, not just the same directory. This seems to slightly change a couple of the internal expect test stack traces, but the changes are not really important (getting rid of meaningless, internal nodeJS callsites).

This is the new default for when people call `wrap_openai`, though they
can opt-into using the non-proxy wrappers by passing
`proxy_base_url=None` (or equivalent in javascript).

Other changes to the SDK to facilitate this:

- Add logic for serializing spans, so we can send the root span over to
  the proxy.

- Add logic for serializing login info, so we can send that over to the
  proxy and instantiate a BackgroundLogger for the span using the
  serialized state. This requires parameterizing the BackgroundLogger on
  a given `logConn`, rather than using the global state connection.

- Change the `get_caller_location` logic to ignore files in the
  stacktrace that in a subdirectory of the first stacktrace entry, not
  just the same directory. This seems to slightly change a couple of the
  expect test stack traces, but the changes are not really important
  (getting rid of meaningless, internal nodeJS callsites).

- Remove the UnterminatedObjectsHandler. I wanted to "stop using" the
  root span in the proxy after spawning a subspan, without actually
  ending the root span, and I didn't want these warning messages. I
  think the warnings are no longer useful, because we really only care
  about warning people when they forget to unmark their spans as
  current. But we've already changed the behavior so that spans can only
  be marked current under a context manager / within an
  `AsyncLocalStorage` callback, so there's really very little risk of
  them making this mistake.
manugoyal added a commit to braintrustdata/braintrust-proxy that referenced this pull request Dec 17, 2023
@@ -0,0 +1,139 @@
import { Span, startSpan } from "../logger";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is purely a cut-and-paste from oai.ts, except for changing the wrapper function name.

@@ -0,0 +1,180 @@
"""Wrappers for the openAI client which do not go through the Braintrust
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is purely a cut-and-paste from oai.py, except for changing the names of the wrapper classes.

Copy link
Contributor

@ankrgyl ankrgyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline: can we simplify the state we're sending to the proxy?

js/src/logger.ts Outdated
*
* @param args.setCurrent If true (the default), the span will be marked as the currently-active span for the duration of the callback. Equivalent to calling `braintrust.withCurrent(span, callback)`.
*/
export function withResumedSpan<R>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this will only be used internally. If so, could we avoid adding yet another one of these functions? Or if not, is there a more elegant way that allows you to resume a span, and then use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will just remove this for now.

js/src/logger.ts Outdated
}

let unterminatedObjects = new UnterminatedObjectsHandler();

class FailedHTTPResponse extends Error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this carried over from the other change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was done independently, because I didn't want a warning from resuming the serialized root span and not ending it in the proxy. But now that we've decided to only create subspans out of a serialized span, I can leave the removal to the other PR.

js/src/logger.ts Outdated
Comment on lines 223 to 231
public serializeLoginInfo(): string {
return JSON.stringify({
api_url: this.apiUrl,
login_token: this.loginToken,
org_id: this.orgId,
org_name: this.orgName,
log_url: this.logUrl,
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant, and if you use your BRAINTRUST_API_KEY we already re-derive this info in the logger. Perhaps we should just send x-bt-org-name, and let the logic in the proxy handle the rest?

js/src/logger.ts Outdated

type SerializedSpanInfo = SpanObjectIds & SpanParentSpanIds;

function SerializedSpanInfoToString(info: SerializedSpanInfo): string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this capitalized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

un-capitalized it

js/src/logger.ts Outdated
const objectKindIds = (() => {
if (info.object_kind === "e") {
return [info.project_id, info.experiment_id];
} else if (info.object_kind === "pl") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like good logic to write some simple unit tests for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

js/src/oai.ts Outdated
*
* Currently, this only supports the `v4` API.
*
* @param openai
* @param openai The `OpenAI` object.
* @param proxyBaseUrl Use the Braintrust proxy (https://github.com/braintrustdata/braintrust-proxy) as the base URL. When left `undefined`, the URL is obtained from the environment variable `BRAINTRUST_PROXY_URL`, defaulting to `https://braintrustproxy.com/v[api_version]`. Pass `null` to leave the original base URL intact and use the non-proxy wrapper, or pass a string to use a custom URL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a fairly subtle distinction (especially between undefined and null). I think it'd be more intuitive to:

  • rename args to options
  • make the first argument useProxy and make it a string | bool. If true, it uses the default proxy url. If a string, it uses that URL. If we want to keep the proxy off by default, then maybe we create a flag like disableProxy.
  • make the second argument apiKey.

js/src/oai.ts Outdated
*
* Currently, this only supports the `v4` API.
*
* @param openai
* @param openai The `OpenAI` object.
* @param proxyBaseUrl Use the Braintrust proxy (https://github.com/braintrustdata/braintrust-proxy) as the base URL. When left `undefined`, the URL is obtained from the environment variable `BRAINTRUST_PROXY_URL`, defaulting to `https://braintrustproxy.com/v[api_version]`. Pass `null` to leave the original base URL intact and use the non-proxy wrapper, or pass a string to use a custom URL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to just mention v1 instead of the api version

Comment on lines 10 to 12
function getProxyApiKeyDefault(): string | undefined {
return process.env["BRAINTRUST_API_KEY"];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to check whether process is defined or not. In browser/edge environments, it's not

* @param proxyBaseUrl Use the Braintrust proxy (https://github.com/braintrustdata/braintrust-proxy) as the base URL. When left `undefined`, the URL is obtained from the environment variable `BRAINTRUST_PROXY_URL`, defaulting to `https://braintrustproxy.com/v[api_version]`. Pass `null` to leave the original base URL intact and use the non-proxy wrapper, or pass a string to use a custom URL.
* @param proxyApiKey For clients using the proxy wrapper (`proxyBaseUrl` is not `null`). When left `undefined`, the API key is set from `BRAINTRUST_API_KEY` if available. Pass `null` to leave the original API key intact, or pass a string to use a custom API key.
* @param options.useProxy By default (or if `false`), the wrapper does not trace through the proxy. Pass `true` to use the Braintrust proxy as the base URL. The URL is obtained from the environment variable `BRAINTRUST_PROXY_URL`, defaulting to `https://braintrustproxy.com/v1. Pass a string to use a custom proxy URL.
* @param options.apiKey Only used when `useProxy` is set. By default, the API key is set from `BRAINTRUST_API_KEY` if available. Pass a string to use a custom API key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow the apiKey setting. Why would someone need to set this, instead of setting the api key in the OpenAI client? I'm imagining the code like

wrapOpenAI(new OpenAI({ apiKey: process.env.BRAINTRUST_API_KEY }), { useProxy: true })

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there are two ways someone can initialize the OpenAI client

// Uses OPENAI_API_KEY
new OpenAI();

// Uses custom api key
new OpenAI({apiKey: [custom_api_key]});

I was thinking that in the first case (which I imagine is the most common), they don't have to change anything about their code (just add the wrapper) and it'll change the API key to BRAINTRUST_API_KEY. In the second case, they can move the apiKey argument to the wrapper to tell it to set a custom API key instead of BRAINTRUST_API_KEY. But that wouldn't work if they hadn't defined OPENAI_API_KEY at all.

We could change the API key functionality so that if openai.apiKey === process.env.OPENAI_API_KEY, only then do we change it to BRAINTRUST_API_KEY, otherwise we leave it intact? And we could add a flag keepApiKey?: boolean, which if true, will not alter the API key at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. I think this is fine as is then, because by default it'll just use their braintrust key.

Maybe instead of using the env variable though, we should use whatever key they used to login? Eg if someone does

logger = braintrust.initLogger({apiKey: myEnv.BRAINTRUST_KEY})

then the wrap call should use that, instead of process.env.BRAINTRUST_API_KEY by default?



def wrap_openai(openai, proxy_base_url=PROXY_SENTINEL, proxy_api_key=PROXY_SENTINEL):
def wrap_openai(openai, use_proxy: Union[None, str, bool] = None, api_key: Optional[str] = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: slightly easier to read as Union[None, bool, str]

@ankrgyl
Copy link
Contributor

ankrgyl commented Dec 22, 2023

Oops I didn't mean quite to "accept" -- it would be good to flesh out that remaining api key argument first. But looking really good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants