Skip to content

Conversation

@ncybul
Copy link
Contributor

@ncybul ncybul commented May 21, 2025

Currently, all LLM interactions are sent to LLM Obs as LLM spans; however this does not gracefully handle the case where an LLM request is directed to a proxy server which internally makes the actual LLM call. Currently, for these cases, a customer may end up with nested LLM spans (one span sent from the client and one span sent from the server). This PR updates all LLM Obs integrations to conform to sending client-side requests to a proxy as workflow spans to LLM Obs.

Originally, we assumed that a non-default base URL was a good heuristic for identifying requests that were directed to a proxy; however, this assumption does not hold as customers can specify alternative model provider endpoints using the base URL (among potentially other use cases) which does not work with our previous assumption. In order to more accurately detect when a request is being directed to a proxy server, we are putting the onus on users to configure what URLs should be considered proxies. Users can configure this either by setting the DD_LLMOBS_INSTRUMENTED_PROXY_URLS environment variable (defined in ddtrace/settings/_config.py) or by enabling LLM Obs with the instrumented_proxy_urls field defined. We then check whether an LLM interaction is being sent to one of these proxy URLs. If so, we create a workflow span as it is expected that the underlying LLM span is captured in the proxy itself. Otherwise, we create an LLM span which is the current and default behavior.

Existing integrations were modified as follows:

Anthropic: An LLM Obs workflow span will be sent for proxy requests.
Bedrock: An LLM Obs workflow span will be sent for proxy requests.
CrewAI: Crew AI uses LiteLLM under the hood to make LLM calls; therefore, these cases should already be handled by the LiteLLM integration.
Gemini: no changes since this library does not allow users to specify a custom base URL
Langchain: An LLM Obs workflow span will be sent for proxy requests.
Langgraph: Langgraph is model agnostic, so there is nothing to change within this integration itself.
LiteLLM: LLM Obs spans will be sent from the LiteLLM integration as long as there is no downstream Open AI span detected. The span kind will be a workflow if the span is a LiteLLM router operation or proxy request. Otherwise, the span kind is an LLM.
Open AI: An LLM Obs workflow span will be sent for proxy requests.
Open AI Agents: The Open AI agents SDK also uses LiteLLM to allow users to call non-Open AI models; therefore, these cases should already be handled by the LiteLLM integration.
Vertex AI: no changes since this library does not allow users to specify a custom base URL

Every time a span is created by one of the LLM Obs integrations, the self._get_base_url method is called to retrieve the base URL for that interaction if it exists. Then, self._is_proxy_url(base_url) is called to determine whether to set an item in the context that indicates that the current span represents a proxy request. This will later be used in the integration code to determine the appropriate span kind. With this design, any new integrations simply need to implement the _get_base_url method and then use the PROXY_REQUEST context item to tag their LLM Obs spans accordingly.

Manual Testing

For each integration, I tested three cases:

  1. No base URL is set (this should result in an LLM span)
  2. The base URL is set to a proxy URL configured with DD_LLMOBS_INSTRUMENTED_PROXY_URLS (this should result in a top-level workflow span and perhaps other child spans which may include an LLM span depending on how the proxy server is instrumented)
  3. The base URL Is set but not to a proxy URL (this should result in an LLM span)

Anthropic

Request with default base URL (trace).

from anthropic import Anthropic

client = Anthropic()

message = client.messages.create(
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "What color is the sky?",
        }
    ],
    model="claude-3-5-sonnet-20240620",
)

Request with base URL specified (when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set and when it is not).

from anthropic import Anthropic

client = Anthropic(
    base_url="http://localhost:4000",
)

message = client.messages.create(
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "What color is the sky?",
        }
    ],
    model="claude-3.5",
)

Bedrock

I chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server.

Request with default base URL (trace).

import boto3
import json

session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1")
brt = session.client(
    service_name='bedrock-runtime', 
)
modelId = 'amazon.titan-text-lite-v1'
accept = 'application/json'
contentType = 'application/json'
input_text = "Explain black holes to 8th graders."
body = {
    "inputText": input_text,
}
body = json.dumps(body)
response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())

Request with base URL specified (when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set and when it is not)
Server Code (I created a proxy server of my own to test this out!)

from fastapi import FastAPI, Request
import uvicorn
import boto3
import json

app = FastAPI()

@app.post("/model/{model_id}/invoke")
async def invoke_model(model_id: str, request: Request):
    body = await request.json()

    session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1")
    brt = session.client(
        service_name='bedrock-runtime', 
    )
    
    body = json.dumps(body)
    response = brt.invoke_model(body=body, modelId=request.path_params.get("model_id"), accept=request.headers.get("accept"), contentType=request.headers.get("content-type"))
    response_body = json.loads(response.get('body').read())
    
    return response_body

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=4000) 

Client code

import boto3
import json

session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1")
brt = session.client(
    service_name='bedrock-runtime', 
    endpoint_url="http://0.0.0.0:4000",
)
modelId = 'amazon.titan-text-lite-v1'
accept = 'application/json'
contentType = 'application/json'
input_text = "Explain black holes to 8th graders."
body = {
    "inputText": input_text,
}
body = json.dumps(body)
response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())

Crew AI

To test out these changes with Crew AI, I used the following simple Crew AI flow:

from crewai import Agent, Task, Crew, LLM

llm = LLM(
    model="gpt-3.5-turbo",
    base_url="http://0.0.0.0:4000", # optionally set for testing
)

calculator = Agent(
    role='Mathematical Calculator',
    goal='Perform accurate mathematical calculations',
    backstory='You are an expert mathematician who can solve complex calculations with precision.',
    llm=llm,
    verbose=True
)

calculation_task = Task(
    description='Calculate the sum of all numbers from 1 to 100',
    agent=calculator,
    expected_output='The sum of all numbers from 1 to 100'
)

crew = Crew(
    agents=[calculator],
    tasks=[calculation_task]
)

result = crew.kickoff()

When the base URL is not set, I get this trace with LLM spans.

When the base URL is set to the same URL as in DD_LLMOBS_INSTRUMENTED_PROXY_URLS, I get this trace with workflow spans from the client and underlying LLM spans nested within. And when the base URL is set but not to a proxy URL, I get this trace, again with just the LLM span as expected.

Langchain

For the request using a proxy URL, I instrumented both the client and the server, except for Open AI. This was to make things simpler as the only integrations emitting spans would be Langchain and LiteLLM (since I am using a LiteLLM proxy server). I also chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server.

Request with default base URL (trace).

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

chat = ChatOpenAI(
    model = "gpt-3.5-turbo",
    temperature=0.1,
)
messages = [HumanMessage(content="how are you?")]
response = chat(messages)
print(response)

Request with base URL specified (when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set and when it is not)

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

chat = ChatOpenAI(
    base_url="http://0.0.0.0:4000",
    model = "gpt-3.5-turbo",
    temperature=0.1,
)
messages = [HumanMessage(content="how are you?")]
response = chat(messages)
print(response)

Langgraph

For these tests, I used the following application code:

from langgraph.graph import StateGraph, START, END
from typing import TypedDict
from langchain_openai import ChatOpenAI

class GraphState(TypedDict):
    question: str
    conclusion: str

class Mathematician():
    def __init__(self):
        self.llm = ChatOpenAI(model="gpt-3.5-turbo")

    def __call__(self, state: GraphState):
        prompt = f"You are a mathematician that should only answer questions with a number. You are given a question: {state['question']}. Please answer the question."
        return {"conclusion": self.llm.invoke(prompt)}

graph_builder = StateGraph(GraphState)
graph_builder.add_node("mathematician", Mathematician())
graph_builder.add_edge(START, "mathematician")
graph_builder.add_edge("mathematician", END)
graph = graph_builder.compile()

conclusion = graph.invoke({
        "question": "sum the numbers 1 to 100",
})['conclusion']
print(conclusion)

I then made changes to the LLM model used to showcase the traces that result in the following cases:

  1. Request with default base URL (trace)
  2. Request with base URL specified (trace)
  3. Request with base URL specified and DD_LLMOBS_INSTRUMENTED_PROXY_URLS set (trace)

LiteLLM

For these tests, I started a LiteLLM server and sent requests to it by specifying the base URL as "http://localhost:4000". To make the examples more relevant, I disabled the Open AI integration which means all spans were coming from the LiteLLM integration (this should not change the number of spans or the span kinds present in each trace). I also chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server.

Request with default base URL (trace).

import os
import litellm
from litellm import completion

litellm.api_key = os.environ["OPENAI_API_KEY"]

messages = [{ "content": "What color is the sky?","role": "user"}]
response = completion(model="gpt-3.5-turbo", messages=messages)
print(response)

Request with base URL specified (when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set and when it is not).

import os
import litellm
from litellm import completion

litellm.api_key = os.environ["OPENAI_API_KEY"]

messages = [{ "content": "What color is the sky?","role": "user"}]
response = completion(model="gpt-3.5-turbo", messages=messages, api_base="http://localhost:4000")
print(response)

Open AI

I chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server.

Request with default base URL (trace).

import os
from openai import OpenAI

oai_client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
)
completion = oai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "testing openai"},
    ],
)

Request with base URL specified (when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set and when it is not).

import os
from openai import OpenAI

oai_client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    base_url="http://0.0.0.0:4000",
)
completion = oai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "testing openai"},
    ],
)

Open AI Agents

Request with default base URL (trace).

from agents import Agent, Runner
import asyncio

math_tutor_agent = Agent(
    name="Math Tutor",
    handoff_description="Specialist agent for math questions",
    instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
    model="gpt-3.5-turbo",
)

triage_agent = Agent(
    name="Triage Agent",
    instructions="You determine which agent to use based on the user's homework question",
    handoffs=[math_tutor_agent],
    model="gpt-3.5-turbo",
)

async def main():
    result = await Runner.run(triage_agent, "what is the sum of the numbers between 1 and 100?", max_turns=3)
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())

Request with base URL specified (when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set and when it is not).

# only change was updating the model used in each agent
from agents.extensions.models.litellm_model import LitellmModel
import os

model = LitellmModel(
    model="gpt-3.5-turbo",
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url="http://localhost:4000",
)

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@ncybul ncybul changed the title send client side workflow spans feat(litellm): [MLOB-2787] send client side workflow spans May 21, 2025
@github-actions
Copy link
Contributor

github-actions bot commented May 21, 2025

CODEOWNERS have been resolved as:

releasenotes/notes/llmobs-configure-proxy-urls-1edb993ac7ccb895.yaml    @DataDog/apm-python
ddtrace/_trace/trace_handlers.py                                        @DataDog/apm-sdk-api-python
ddtrace/contrib/internal/anthropic/patch.py                             @DataDog/ml-observability
ddtrace/contrib/internal/botocore/services/bedrock.py                   @DataDog/ml-observability
ddtrace/contrib/internal/langchain/patch.py                             @DataDog/ml-observability
ddtrace/contrib/internal/langchain/utils.py                             @DataDog/ml-observability
ddtrace/contrib/internal/litellm/patch.py                               @DataDog/ml-observability
ddtrace/contrib/internal/openai/patch.py                                @DataDog/ml-observability
ddtrace/llmobs/_constants.py                                            @DataDog/ml-observability
ddtrace/llmobs/_integrations/anthropic.py                               @DataDog/ml-observability
ddtrace/llmobs/_integrations/base.py                                    @DataDog/ml-observability
ddtrace/llmobs/_integrations/bedrock.py                                 @DataDog/ml-observability
ddtrace/llmobs/_integrations/langchain.py                               @DataDog/ml-observability
ddtrace/llmobs/_integrations/litellm.py                                 @DataDog/ml-observability
ddtrace/llmobs/_integrations/openai.py                                  @DataDog/ml-observability
ddtrace/llmobs/_integrations/utils.py                                   @DataDog/ml-observability
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_telemetry.py                                            @DataDog/ml-observability
ddtrace/settings/_config.py                                             @DataDog/apm-core-python
tests/contrib/anthropic/test_anthropic_llmobs.py                        @DataDog/ml-observability
tests/contrib/anthropic/utils.py                                        @DataDog/ml-observability
tests/contrib/botocore/bedrock_utils.py                                 @DataDog/ml-observability
tests/contrib/botocore/conftest.py                                      @DataDog/apm-core-python @DataDog/apm-idm-python
tests/contrib/botocore/test_bedrock_llmobs.py                           @DataDog/ml-observability
tests/contrib/langchain/conftest.py                                     @DataDog/ml-observability
tests/contrib/langchain/test_langchain_llmobs.py                        @DataDog/ml-observability
tests/contrib/langchain/utils.py                                        @DataDog/ml-observability
tests/contrib/litellm/test_litellm_llmobs.py                            @DataDog/ml-observability
tests/contrib/openai/test_openai_llmobs.py                              @DataDog/ml-observability
tests/contrib/openai/utils.py                                           @DataDog/ml-observability
tests/telemetry/test_writer.py                                          @DataDog/apm-python
tests/utils.py                                                          @DataDog/python-guild

@github-actions
Copy link
Contributor

github-actions bot commented May 21, 2025

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 274 ± 3 ms.

The average import time from base is: 276 ± 3 ms.

The import time difference between this PR and base is: -2.1 ± 0.1 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 1.999 ms (0.73%)
ddtrace.bootstrap.sitecustomize 1.323 ms (0.48%)
ddtrace.bootstrap.preload 1.323 ms (0.48%)
ddtrace.internal.remoteconfig.client 0.648 ms (0.24%)
ddtrace 0.675 ms (0.25%)
ddtrace.internal._unpatched 0.030 ms (0.01%)
json 0.030 ms (0.01%)
json.decoder 0.030 ms (0.01%)
re 0.030 ms (0.01%)
enum 0.030 ms (0.01%)
types 0.030 ms (0.01%)

@pr-commenter
Copy link

pr-commenter bot commented May 21, 2025

Benchmarks

Benchmark execution time: 2025-06-18 17:59:51

Comparing candidate commit 80c0760 in PR branch nicole-cybul/send-client-side-workflow-spans with baseline commit 5592908 in branch main.

Found 1 performance improvements and 2 performance regressions! Performance is the same for 564 metrics, 5 unstable metrics.

scenario:iastaspects-replace_aspect

  • 🟥 execution_time [+544.065ns; +620.462ns] or [+11.587%; +13.214%]

scenario:iastaspectssplit-splitlines_aspect

  • 🟥 execution_time [+127.022ns; +154.226ns] or [+8.683%; +10.542%]

scenario:iastdjangostartup-appsec

  • 🟩 execution_time [-1.485s; -1.313s] or [-66.522%; -58.827%]

Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update CODEOWNERS for ddtrace/contrib/internal/litellm/ and tests/contrib/litellm/ (doesn't have to be in this PR)

Approval is for the files owned by the apm-python/core/guild. I did not review the integration/LLMObs specific changes

@ncybul
Copy link
Contributor Author

ncybul commented Jun 12, 2025

We should update CODEOWNERS for ddtrace/contrib/internal/litellm/ and tests/contrib/litellm/ (doesn't have to be in this PR)

Approval is for the files owned by the apm-python/core/guild. I did not review the integration/LLMObs specific changes

Ah gotcha, I made a separate PR for this.

Copy link
Member

@Kyle-Verhoog Kyle-Verhoog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent PR description and test coverage (both manual and automated) @ncybul 👏 👏 👏

While reviewing I thought about a user having a proxy where server-side spans may only be generated for certain endpoints (see https://github.com/DataDog/llm-obs/pull/76 for example). To handle this case I was thinking we could add an argument to annotation_context in order to do this proxy logic on a span-by-span basis. Then we're covered for both generic auto-instrumentation as well as specific manual instrumentation cases. WDYT @ncybul?

Also for the setting naming I am thinking we should refer to it as "INSTRUMENTED_PROXY_URLS" since the use-case is only for proxies that will generate an llm span downstream. Leaving it as just proxy urls is ambiguous IMO.

@ncybul ncybul merged commit 027f277 into main Jun 20, 2025
756 checks passed
@ncybul ncybul deleted the nicole-cybul/send-client-side-workflow-spans branch June 20, 2025 17:39
sydney-tung pushed a commit that referenced this pull request Jun 24, 2025
Currently, all LLM interactions are sent to LLM Obs as LLM spans;
however this does not gracefully handle the case where an LLM request is
directed to a proxy server which internally makes the actual LLM call.
Currently, for these cases, a customer may end up with nested LLM spans
(one span sent from the client and one span sent from the server). This
PR updates all LLM Obs integrations to conform to sending client-side
requests to a proxy as workflow spans to LLM Obs.

Originally, we assumed that a non-default base URL was a good heuristic
for identifying requests that were directed to a proxy; however, this
assumption does not hold as customers can specify alternative model
provider endpoints using the base URL (among potentially other use
cases) which does not work with our previous assumption. In order to
more accurately detect when a request is being directed to a proxy
server, we are putting the onus on users to configure what URLs should
be considered proxies. Users can configure this either by setting the
**DD_LLMOBS_INSTRUMENTED_PROXY_URLS** environment variable (defined in
`ddtrace/settings/_config.py`) or by enabling LLM Obs with the
`instrumented_proxy_urls` field defined. We then check whether an LLM
interaction is being sent to one of these proxy URLs. If so, we create a
workflow span as it is expected that the underlying LLM span is captured
in the proxy itself. Otherwise, we create an LLM span which is the
current and default behavior.

Existing integrations were modified as follows:

**Anthropic**: An LLM Obs workflow span will be sent for proxy requests.
**Bedrock**: An LLM Obs workflow span will be sent for proxy requests.
**CrewAI**: Crew AI [uses LiteLLM under the
hood](https://github.com/crewAIInc/crewAI/blob/main/src/crewai/llm.py#L768)
to make LLM calls; therefore, these cases should already be handled by
the LiteLLM integration.
**Gemini**: no changes since this library does not allow users to
specify a custom base URL
**Langchain**: An LLM Obs workflow span will be sent for proxy requests.
**Langgraph**: Langgraph is model agnostic, so there is nothing to
change within this integration itself.
**LiteLLM**: LLM Obs spans will be sent from the LiteLLM integration as
long as there is no downstream Open AI span detected. The span kind will
be a workflow if the span is a LiteLLM router operation or proxy
request. Otherwise, the span kind is an LLM.
**Open AI**: An LLM Obs workflow span will be sent for proxy requests.
**Open AI Agents**: The Open AI agents SDK also [uses LiteLLM to allow
users to call non-Open AI
models](https://openai.github.io/openai-agents-python/models/litellm/);
therefore, these cases should already be handled by the LiteLLM
integration.
**Vertex AI**: no changes since this library does not allow users to
specify a custom base URL

Every time a span is created by one of the LLM Obs integrations, the
`self._get_base_url` method is called to retrieve the base URL for that
interaction if it exists. Then, `self._is_proxy_url(base_url)` is called
to determine whether to set an item in the context that indicates that
the current span represents a proxy request. This will later be used in
the integration code to determine the appropriate span kind. With this
design, any new integrations simply need to implement the
`_get_base_url` method and then use the `PROXY_REQUEST` context item to
tag their LLM Obs spans accordingly.

# Manual Testing
For each integration, I tested three cases:
1. No base URL is set (this should result in an LLM span)
2. The base URL is set to a proxy URL configured with
`DD_LLMOBS_INSTRUMENTED_PROXY_URLS` (this should result in a top-level
workflow span and perhaps other child spans which may include an LLM
span depending on how the proxy server is instrumented)
3. The base URL Is set but not to a proxy URL (this should result in an
LLM span)

## Anthropic
Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWENSzdO5DDAAAABhBWmRXRU5TekFBRDBJZ1hMelVFc0FBQUEAAAAkZjE5NzU2MTAtZjU5Ny00NWU1LWI1M2UtMmE3OWQ3OWVmNjNlAAAADQ%22%7D%5D&spanId=4488892102153659416&start=1749494753131&end=1749495653131&paused=false)).
```
from anthropic import Anthropic

client = Anthropic()

message = client.messages.create(
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "What color is the sky?",
        }
    ],
    model="claude-3-5-sonnet-20240620",
)
```
Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWEi0I5prmNQAAABhBWmRXRWkwSUFBQldmbnNkU0FOZ0FBQUEAAAAkZjE5NzU2MTItNTM0OS00NjVkLThlM2QtZDEwYTcxZmMzNzBmAAAABg%22%7D%5D&spanId=15578566980693053138&start=1749494846544&end=1749495746544&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWEaqpbXi0tAAAABhBWmRXRWFxcEFBQnJBcVlPR0JoN0FBQUEAAAAkZjE5NzU2MTEtYWNlZi00MDgyLWJjN2QtZjVkM2MzMmIxNTQ4AAAABA%22%7D%5D&spanId=11578799453780556987&start=1749494826126&end=1749495726126&paused=false)).
```
from anthropic import Anthropic

client = Anthropic(
    base_url="http://localhost:4000",
)

message = client.messages.create(
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "What color is the sky?",
        }
    ],
    model="claude-3.5",
)
```

## Bedrock
I chose to not instrument the server in the case where the base URL is
specified but is not set as the proxy URL to avoid sending spans from
the server.

Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGvk-sXTBZQAAABhBWmRXR3ZrLUFBQ25ncEVYTmlhQ0FBQUEAAAAkZjE5NzU2MWEtZjkzZS00OTlhLTg0NTktNzdmN2EyZWM2MzhjAAAAAA%22%7D%5D&spanId=15802925627459513713&start=1749495407424&end=1749496307424&paused=false)).
```
import boto3
import json

session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1")
brt = session.client(
    service_name='bedrock-runtime', 
)
modelId = 'amazon.titan-text-lite-v1'
accept = 'application/json'
contentType = 'application/json'
input_text = "Explain black holes to 8th graders."
body = {
    "inputText": input_text,
}
body = json.dumps(body)
response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
```

Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGhtvbh20tAAAABhBWmRXR2h0dkFBQmFrTVVqb09Vc0FBQUEAAAAkZjE5NzU2MWEtMjRkMS00ZDY3LTk2ODctMjdjYzk0ZGQ4ZTUwAAAABg%22%7D%5D&spanId=939675140636485153&start=1749495362196&end=1749496262196&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGBvtsT3BZQAAABhBWmRXR0J2dEFBQURzcm1jV05ZYkFBQUEAAAAkZjE5NzU2MTgtMjdiZS00MzMzLWIwZWEtNmQ3YmIzN2Y2M2JmAAAAAQ%22%7D%5D&spanId=2663441806507313625&start=1749495240767&end=1749496140767&paused=false))
Server Code (I created a proxy server of my own to test this out!)
```
from fastapi import FastAPI, Request
import uvicorn
import boto3
import json

app = FastAPI()

@app.post("/model/{model_id}/invoke")
async def invoke_model(model_id: str, request: Request):
    body = await request.json()

    session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1")
    brt = session.client(
        service_name='bedrock-runtime', 
    )
    
    body = json.dumps(body)
    response = brt.invoke_model(body=body, modelId=request.path_params.get("model_id"), accept=request.headers.get("accept"), contentType=request.headers.get("content-type"))
    response_body = json.loads(response.get('body').read())
    
    return response_body

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=4000) 
```
Client code
```
import boto3
import json

session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1")
brt = session.client(
    service_name='bedrock-runtime', 
    endpoint_url="http://0.0.0.0:4000",
)
modelId = 'amazon.titan-text-lite-v1'
accept = 'application/json'
contentType = 'application/json'
input_text = "Explain black holes to 8th graders."
body = {
    "inputText": input_text,
}
body = json.dumps(body)
response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
```

## Crew AI
To test out these changes with Crew AI, I used the following simple Crew
AI flow:
```
from crewai import Agent, Task, Crew, LLM

llm = LLM(
    model="gpt-3.5-turbo",
    base_url="http://0.0.0.0:4000", # optionally set for testing
)

calculator = Agent(
    role='Mathematical Calculator',
    goal='Perform accurate mathematical calculations',
    backstory='You are an expert mathematician who can solve complex calculations with precision.',
    llm=llm,
    verbose=True
)

calculation_task = Task(
    description='Calculate the sum of all numbers from 1 to 100',
    agent=calculator,
    expected_output='The sum of all numbers from 1 to 100'
)

crew = Crew(
    agents=[calculator],
    tasks=[calculation_task]
)

result = crew.kickoff()
```

When the base URL is not set, I get this
[trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWW0GEz1QXyQAAABhBWmRXVzBHRUFBQ0EzWnhKTzZVaUFBQUEAAAAkZjE5NzU2NWItNDNkOS00MWRhLWI4NDEtNzRkZWRhNDY2YjcxAAAABw%22%7D%5D&spanId=491126038724058424&start=1749496923362&end=1749500523362&paused=false)
with LLM spans.

When the base URL is set to the same URL as in
`DD_LLMOBS_INSTRUMENTED_PROXY_URLS`, I get this
[trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWXx01ozqQPwAAABhBWmRXWHgwMUFBQVBFcU8yell3eEFBQUEAAAAkZjE5NzU2NWYtMzhhYS00NzlmLWFkMzItNThjOTBmYTllMGZiAAADjw%22%7D%5D&spanId=8869414125491016509&start=1749499885536&end=1749500785536&paused=false)
with workflow spans from the client and underlying LLM spans nested
within. And when the base URL is set but not to a proxy URL, I get this
[trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWXRmv8fYWxgAAABhBWmRXWFJtdkFBRDM5RTNkYXk4WEFBQUEAAAAkZjE5NzU2NWQtMzliNC00NjVhLWIzYjItMWRiNWU2MWQ0MjQ5AAAEug%22%7D%5D&spanId=7586973026220693213&start=1749499746513&end=1749500646513&paused=false),
again with just the LLM span as expected.


## Langchain
For the request using a proxy URL, I instrumented both the client and
the server, except for Open AI. This was to make things simpler as the
only integrations emitting spans would be Langchain and LiteLLM (since I
am using a LiteLLM proxy server). I also chose to not instrument the
server in the case where the base URL is specified but is not set as the
proxy URL to avoid sending spans from the server.

Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWLxYtSWdrcwAAABhBWmRXTHhZdEFBRHFfMnhRT1lVU0FBQUEAAAAkZjE5NzU2MmYtMWM4Mi00ZTJiLWE1OWMtYTk1MjcwMDcwZDVjAAAAAg%22%7D%5D&spanId=9094709675449713583&start=1749496813623&end=1749497713623&paused=false)).
```
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

chat = ChatOpenAI(
    model = "gpt-3.5-turbo",
    temperature=0.1,
)
messages = [HumanMessage(content="how are you?")]
response = chat(messages)
print(response)
```

Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWM0jM9LLMLQAAABhBWmRXTTBqTUFBQnRLbVpjdEJBcEFBQUEAAAAkZjE5NzU2MzMtNGE4NC00ODNkLWIwZjktMDBlN2MwY2E5Nzg5AAAABQ%22%7D%5D&spanId=16659272787581089973&start=1749497004926&end=1749497904926&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWNE1cSZNrcwAAABhBWmRXTkUxY0FBQkhWU3VFdmgxYUFBQUEAAAAkZjE5NzU2MzQtNGQ1ZC00NzQ5LTkwYWItMWE4MThmZmJkN2VjAAAAAg%22%7D%5D&spanId=4643233463687100041&start=1749497076231&end=1749497976231&paused=false))
```
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

chat = ChatOpenAI(
    base_url="http://0.0.0.0:4000",
    model = "gpt-3.5-turbo",
    temperature=0.1,
)
messages = [HumanMessage(content="how are you?")]
response = chat(messages)
print(response)
```

## Langgraph
For these tests, I used the following application code:
```
from langgraph.graph import StateGraph, START, END
from typing import TypedDict
from langchain_openai import ChatOpenAI

class GraphState(TypedDict):
    question: str
    conclusion: str

class Mathematician():
    def __init__(self):
        self.llm = ChatOpenAI(model="gpt-3.5-turbo")

    def __call__(self, state: GraphState):
        prompt = f"You are a mathematician that should only answer questions with a number. You are given a question: {state['question']}. Please answer the question."
        return {"conclusion": self.llm.invoke(prompt)}

graph_builder = StateGraph(GraphState)
graph_builder.add_node("mathematician", Mathematician())
graph_builder.add_edge(START, "mathematician")
graph_builder.add_edge("mathematician", END)
graph = graph_builder.compile()

conclusion = graph.invoke({
        "question": "sum the numbers 1 to 100",
})['conclusion']
print(conclusion)
```
I then made changes to the LLM model used to showcase the traces that
result in the following cases:
1. Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZda-xlM9SEw6wAAABhBWmRhLXhsTUFBQmdnOEVKYVplc0FBQUEAAAAkZjE5NzVhZmItMTk0ZC00YjYwLTgyYTQtNTY2YTNhOGMwZjllAAAABg%22%7D%5D&spanId=8325589879530822573&start=1749577652721&end=1749578552721&paused=false))
2. Request with base URL specified
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdbBQy75H9rcwAAABhBWmRiQlF5N0FBQlBKcmNfNExUUUFBQUEAAAAkZjE5NzViMDUtMjY2OC00Nzg1LTgzMjctMmIyZjQxZGJjMjVhAAAABg%22%7D%5D&spanId=732132008673596287&start=1749577883240&end=1749578783240&paused=false))
3. Request with base URL specified and
`DD_LLMOBS_INSTRUMENTED_PROXY_URLS` set
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdbBI5bzhpfFAAAABhBWmRiQkk1YkFBRGVkaUVHdWZJY0FBQUEAAAAkZjE5NzViMDQtOGU4Yy00ZTE1LThiMjYtYTg4NDhhYjkyZjBiAAAAAg%22%7D%5D&spanId=2296149025644956944&start=1749577854444&end=1749578754444&paused=false))


## LiteLLM
For these tests, I started a LiteLLM server and sent requests to it by
specifying the base URL as `"http://localhost:4000"`. To make the
examples more relevant, I disabled the Open AI integration which means
all spans were coming from the LiteLLM integration (this should not
change the number of spans or the span kinds present in each trace). I
also chose to not instrument the server in the case where the base URL
is specified but is not set as the proxy URL to avoid sending spans from
the server.

Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV3XswcyZDDAAAABhBWmRWM1hzd0FBQ2NzOWtqSlpaTkFBQUEAAAAkZjE5NzU1ZGQtN2IzMC00NjBiLWE1NGUtYTA2NDM3Y2ZjMDNjAAAAAA%22%7D%5D&spanId=1159980771162140816&start=1749491382194&end=1749492282194&paused=false)).
```
import os
import litellm
from litellm import completion

litellm.api_key = os.environ["OPENAI_API_KEY"]

messages = [{ "content": "What color is the sky?","role": "user"}]
response = completion(model="gpt-3.5-turbo", messages=messages)
print(response)
```
Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV6IDmv3BfGAAAABhBWmRWNklEbUFBQTh3N2ZhTzhjaUFBQUEAAAAkZjE5NzU1ZTgtODEzNy00MGQ4LWJkOGYtYzFiZWIxNDI1ZTcxAAAABA%22%7D%5D&spanId=832522128674090551&start=1749492100252&end=1749493000252&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV6ljxyUkI0QAAABhBWmRWNmxqeEFBRDdkY1dNX2MwVkFBQUEAAAAkZjE5NzU1ZWEtN2JjNy00MGYwLThjY2ItZjhlODAxMWM4YmYyAAAAAw%22%7D%5D&spanId=3962954564061385568&start=1749492226691&end=1749493126691&paused=false)).
```
import os
import litellm
from litellm import completion

litellm.api_key = os.environ["OPENAI_API_KEY"]

messages = [{ "content": "What color is the sky?","role": "user"}]
response = completion(model="gpt-3.5-turbo", messages=messages, api_base="http://localhost:4000")
print(response)
```


## Open AI
I chose to not instrument the server in the case where the base URL is
specified but is not set as the proxy URL to avoid sending spans from
the server.

Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWIxjd6BrmNQAAABhBWmRXSXhqZEFBQkFhSlRiV05DeEFBQUEAAAAkZjE5NzU2MjMtNDU2Mi00MGJhLTlmYjQtNDZjMzczNjUxYmU5AAAABw%22%7D%5D&spanId=5010571712001212560&start=1749495949169&end=1749496849169&paused=false)).
```
import os
from openai import OpenAI

oai_client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
)
completion = oai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "testing openai"},
    ],
)
```
Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWJGwwdmBDDAAAABhBWmRXSkd3d0FBQ01GV0VRc29QUUFBQUEAAAAkZjE5NzU2MjQtN2NiNS00OTBmLWI3NmEtMTZlZmQ3NDYxNzE5AAAABw%22%7D%5D&spanId=9189504309494995843&start=1749496040239&end=1749496940239&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWJbwtwkVfGAAAABhBWmRXSmJ3dEFBQXdsTHhtNU1iMEFBQUEAAAAkZjE5NzU2MjUtZDEzZC00ZDVlLTgwMDItMzk0ODA0NjNlZGY0AAAABA%22%7D%5D&spanId=16803264818418539708&start=1749496121106&end=1749497021106&paused=false)).
```
import os
from openai import OpenAI

oai_client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    base_url="http://0.0.0.0:4000",
)
completion = oai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "testing openai"},
    ],
)
```

## Open AI Agents
Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWfOtO9p0WxgAAABhBWmRXZk90T0FBQ2RUMHJnWk9MY0FBQUEAAAAkZjE5NzU2N2QtMWZlYi00MWMwLThkYmItZjc0ZGViOWQ2MTU5AAAAFw%22%7D%5D&spanId=12382352369827199770&start=1749501871180&end=1749502771180&paused=false)).
```
from agents import Agent, Runner
import asyncio

math_tutor_agent = Agent(
    name="Math Tutor",
    handoff_description="Specialist agent for math questions",
    instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
    model="gpt-3.5-turbo",
)

triage_agent = Agent(
    name="Triage Agent",
    instructions="You determine which agent to use based on the user's homework question",
    handoffs=[math_tutor_agent],
    model="gpt-3.5-turbo",
)

async def main():
    result = await Runner.run(triage_agent, "what is the sum of the numbers between 1 and 100?", max_turns=3)
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())
```

Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWguw4pn2-8QAAABhBWmRXZ3V3NEFBQXhMYVlDQ0czVUFBQUEAAAAkZjE5NzU2ODItZWYxZC00NzExLWFkYmUtNGE4NmNkZDA3NGM3AAAAEQ%22%7D%5D&spanId=15430591163304707830&start=1749502304313&end=1749503204313&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWhbr7gAG0tAAAABhBWmRXaGJyN0FBQ283U0ZGN3B3ckFBQUEAAAAkZjE5NzU2ODYtMDNhNi00ZjA4LTkwODgtOWY3ODcxMGNiODI4AAAAEQ%22%7D%5D&spanId=5364068118937893712&start=1749502418618&end=1749503318618&paused=false)).
```
# only change was updating the model used in each agent
from agents.extensions.models.litellm_model import LitellmModel
import os

model = LitellmModel(
    model="gpt-3.5-turbo",
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url="http://localhost:4000",
)
```

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

---------

Co-authored-by: kyle <kyle@verhoog.ca>
happynancee pushed a commit that referenced this pull request Jul 7, 2025
Currently, all LLM interactions are sent to LLM Obs as LLM spans;
however this does not gracefully handle the case where an LLM request is
directed to a proxy server which internally makes the actual LLM call.
Currently, for these cases, a customer may end up with nested LLM spans
(one span sent from the client and one span sent from the server). This
PR updates all LLM Obs integrations to conform to sending client-side
requests to a proxy as workflow spans to LLM Obs.

Originally, we assumed that a non-default base URL was a good heuristic
for identifying requests that were directed to a proxy; however, this
assumption does not hold as customers can specify alternative model
provider endpoints using the base URL (among potentially other use
cases) which does not work with our previous assumption. In order to
more accurately detect when a request is being directed to a proxy
server, we are putting the onus on users to configure what URLs should
be considered proxies. Users can configure this either by setting the
**DD_LLMOBS_INSTRUMENTED_PROXY_URLS** environment variable (defined in
`ddtrace/settings/_config.py`) or by enabling LLM Obs with the
`instrumented_proxy_urls` field defined. We then check whether an LLM
interaction is being sent to one of these proxy URLs. If so, we create a
workflow span as it is expected that the underlying LLM span is captured
in the proxy itself. Otherwise, we create an LLM span which is the
current and default behavior.

Existing integrations were modified as follows:

**Anthropic**: An LLM Obs workflow span will be sent for proxy requests.
**Bedrock**: An LLM Obs workflow span will be sent for proxy requests.
**CrewAI**: Crew AI [uses LiteLLM under the
hood](https://github.com/crewAIInc/crewAI/blob/main/src/crewai/llm.py#L768)
to make LLM calls; therefore, these cases should already be handled by
the LiteLLM integration.
**Gemini**: no changes since this library does not allow users to
specify a custom base URL
**Langchain**: An LLM Obs workflow span will be sent for proxy requests.
**Langgraph**: Langgraph is model agnostic, so there is nothing to
change within this integration itself.
**LiteLLM**: LLM Obs spans will be sent from the LiteLLM integration as
long as there is no downstream Open AI span detected. The span kind will
be a workflow if the span is a LiteLLM router operation or proxy
request. Otherwise, the span kind is an LLM.
**Open AI**: An LLM Obs workflow span will be sent for proxy requests.
**Open AI Agents**: The Open AI agents SDK also [uses LiteLLM to allow
users to call non-Open AI
models](https://openai.github.io/openai-agents-python/models/litellm/);
therefore, these cases should already be handled by the LiteLLM
integration.
**Vertex AI**: no changes since this library does not allow users to
specify a custom base URL

Every time a span is created by one of the LLM Obs integrations, the
`self._get_base_url` method is called to retrieve the base URL for that
interaction if it exists. Then, `self._is_proxy_url(base_url)` is called
to determine whether to set an item in the context that indicates that
the current span represents a proxy request. This will later be used in
the integration code to determine the appropriate span kind. With this
design, any new integrations simply need to implement the
`_get_base_url` method and then use the `PROXY_REQUEST` context item to
tag their LLM Obs spans accordingly.

# Manual Testing
For each integration, I tested three cases:
1. No base URL is set (this should result in an LLM span)
2. The base URL is set to a proxy URL configured with
`DD_LLMOBS_INSTRUMENTED_PROXY_URLS` (this should result in a top-level
workflow span and perhaps other child spans which may include an LLM
span depending on how the proxy server is instrumented)
3. The base URL Is set but not to a proxy URL (this should result in an
LLM span)

## Anthropic
Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWENSzdO5DDAAAABhBWmRXRU5TekFBRDBJZ1hMelVFc0FBQUEAAAAkZjE5NzU2MTAtZjU5Ny00NWU1LWI1M2UtMmE3OWQ3OWVmNjNlAAAADQ%22%7D%5D&spanId=4488892102153659416&start=1749494753131&end=1749495653131&paused=false)).
```
from anthropic import Anthropic

client = Anthropic()

message = client.messages.create(
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "What color is the sky?",
        }
    ],
    model="claude-3-5-sonnet-20240620",
)
```
Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWEi0I5prmNQAAABhBWmRXRWkwSUFBQldmbnNkU0FOZ0FBQUEAAAAkZjE5NzU2MTItNTM0OS00NjVkLThlM2QtZDEwYTcxZmMzNzBmAAAABg%22%7D%5D&spanId=15578566980693053138&start=1749494846544&end=1749495746544&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWEaqpbXi0tAAAABhBWmRXRWFxcEFBQnJBcVlPR0JoN0FBQUEAAAAkZjE5NzU2MTEtYWNlZi00MDgyLWJjN2QtZjVkM2MzMmIxNTQ4AAAABA%22%7D%5D&spanId=11578799453780556987&start=1749494826126&end=1749495726126&paused=false)).
```
from anthropic import Anthropic

client = Anthropic(
    base_url="http://localhost:4000",
)

message = client.messages.create(
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "What color is the sky?",
        }
    ],
    model="claude-3.5",
)
```

## Bedrock
I chose to not instrument the server in the case where the base URL is
specified but is not set as the proxy URL to avoid sending spans from
the server.

Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGvk-sXTBZQAAABhBWmRXR3ZrLUFBQ25ncEVYTmlhQ0FBQUEAAAAkZjE5NzU2MWEtZjkzZS00OTlhLTg0NTktNzdmN2EyZWM2MzhjAAAAAA%22%7D%5D&spanId=15802925627459513713&start=1749495407424&end=1749496307424&paused=false)).
```
import boto3
import json

session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1")
brt = session.client(
    service_name='bedrock-runtime', 
)
modelId = 'amazon.titan-text-lite-v1'
accept = 'application/json'
contentType = 'application/json'
input_text = "Explain black holes to 8th graders."
body = {
    "inputText": input_text,
}
body = json.dumps(body)
response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
```

Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGhtvbh20tAAAABhBWmRXR2h0dkFBQmFrTVVqb09Vc0FBQUEAAAAkZjE5NzU2MWEtMjRkMS00ZDY3LTk2ODctMjdjYzk0ZGQ4ZTUwAAAABg%22%7D%5D&spanId=939675140636485153&start=1749495362196&end=1749496262196&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGBvtsT3BZQAAABhBWmRXR0J2dEFBQURzcm1jV05ZYkFBQUEAAAAkZjE5NzU2MTgtMjdiZS00MzMzLWIwZWEtNmQ3YmIzN2Y2M2JmAAAAAQ%22%7D%5D&spanId=2663441806507313625&start=1749495240767&end=1749496140767&paused=false))
Server Code (I created a proxy server of my own to test this out!)
```
from fastapi import FastAPI, Request
import uvicorn
import boto3
import json

app = FastAPI()

@app.post("/model/{model_id}/invoke")
async def invoke_model(model_id: str, request: Request):
    body = await request.json()

    session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1")
    brt = session.client(
        service_name='bedrock-runtime', 
    )
    
    body = json.dumps(body)
    response = brt.invoke_model(body=body, modelId=request.path_params.get("model_id"), accept=request.headers.get("accept"), contentType=request.headers.get("content-type"))
    response_body = json.loads(response.get('body').read())
    
    return response_body

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=4000) 
```
Client code
```
import boto3
import json

session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1")
brt = session.client(
    service_name='bedrock-runtime', 
    endpoint_url="http://0.0.0.0:4000",
)
modelId = 'amazon.titan-text-lite-v1'
accept = 'application/json'
contentType = 'application/json'
input_text = "Explain black holes to 8th graders."
body = {
    "inputText": input_text,
}
body = json.dumps(body)
response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
```

## Crew AI
To test out these changes with Crew AI, I used the following simple Crew
AI flow:
```
from crewai import Agent, Task, Crew, LLM

llm = LLM(
    model="gpt-3.5-turbo",
    base_url="http://0.0.0.0:4000", # optionally set for testing
)

calculator = Agent(
    role='Mathematical Calculator',
    goal='Perform accurate mathematical calculations',
    backstory='You are an expert mathematician who can solve complex calculations with precision.',
    llm=llm,
    verbose=True
)

calculation_task = Task(
    description='Calculate the sum of all numbers from 1 to 100',
    agent=calculator,
    expected_output='The sum of all numbers from 1 to 100'
)

crew = Crew(
    agents=[calculator],
    tasks=[calculation_task]
)

result = crew.kickoff()
```

When the base URL is not set, I get this
[trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWW0GEz1QXyQAAABhBWmRXVzBHRUFBQ0EzWnhKTzZVaUFBQUEAAAAkZjE5NzU2NWItNDNkOS00MWRhLWI4NDEtNzRkZWRhNDY2YjcxAAAABw%22%7D%5D&spanId=491126038724058424&start=1749496923362&end=1749500523362&paused=false)
with LLM spans.

When the base URL is set to the same URL as in
`DD_LLMOBS_INSTRUMENTED_PROXY_URLS`, I get this
[trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWXx01ozqQPwAAABhBWmRXWHgwMUFBQVBFcU8yell3eEFBQUEAAAAkZjE5NzU2NWYtMzhhYS00NzlmLWFkMzItNThjOTBmYTllMGZiAAADjw%22%7D%5D&spanId=8869414125491016509&start=1749499885536&end=1749500785536&paused=false)
with workflow spans from the client and underlying LLM spans nested
within. And when the base URL is set but not to a proxy URL, I get this
[trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWXRmv8fYWxgAAABhBWmRXWFJtdkFBRDM5RTNkYXk4WEFBQUEAAAAkZjE5NzU2NWQtMzliNC00NjVhLWIzYjItMWRiNWU2MWQ0MjQ5AAAEug%22%7D%5D&spanId=7586973026220693213&start=1749499746513&end=1749500646513&paused=false),
again with just the LLM span as expected.


## Langchain
For the request using a proxy URL, I instrumented both the client and
the server, except for Open AI. This was to make things simpler as the
only integrations emitting spans would be Langchain and LiteLLM (since I
am using a LiteLLM proxy server). I also chose to not instrument the
server in the case where the base URL is specified but is not set as the
proxy URL to avoid sending spans from the server.

Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWLxYtSWdrcwAAABhBWmRXTHhZdEFBRHFfMnhRT1lVU0FBQUEAAAAkZjE5NzU2MmYtMWM4Mi00ZTJiLWE1OWMtYTk1MjcwMDcwZDVjAAAAAg%22%7D%5D&spanId=9094709675449713583&start=1749496813623&end=1749497713623&paused=false)).
```
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

chat = ChatOpenAI(
    model = "gpt-3.5-turbo",
    temperature=0.1,
)
messages = [HumanMessage(content="how are you?")]
response = chat(messages)
print(response)
```

Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWM0jM9LLMLQAAABhBWmRXTTBqTUFBQnRLbVpjdEJBcEFBQUEAAAAkZjE5NzU2MzMtNGE4NC00ODNkLWIwZjktMDBlN2MwY2E5Nzg5AAAABQ%22%7D%5D&spanId=16659272787581089973&start=1749497004926&end=1749497904926&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWNE1cSZNrcwAAABhBWmRXTkUxY0FBQkhWU3VFdmgxYUFBQUEAAAAkZjE5NzU2MzQtNGQ1ZC00NzQ5LTkwYWItMWE4MThmZmJkN2VjAAAAAg%22%7D%5D&spanId=4643233463687100041&start=1749497076231&end=1749497976231&paused=false))
```
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

chat = ChatOpenAI(
    base_url="http://0.0.0.0:4000",
    model = "gpt-3.5-turbo",
    temperature=0.1,
)
messages = [HumanMessage(content="how are you?")]
response = chat(messages)
print(response)
```

## Langgraph
For these tests, I used the following application code:
```
from langgraph.graph import StateGraph, START, END
from typing import TypedDict
from langchain_openai import ChatOpenAI

class GraphState(TypedDict):
    question: str
    conclusion: str

class Mathematician():
    def __init__(self):
        self.llm = ChatOpenAI(model="gpt-3.5-turbo")

    def __call__(self, state: GraphState):
        prompt = f"You are a mathematician that should only answer questions with a number. You are given a question: {state['question']}. Please answer the question."
        return {"conclusion": self.llm.invoke(prompt)}

graph_builder = StateGraph(GraphState)
graph_builder.add_node("mathematician", Mathematician())
graph_builder.add_edge(START, "mathematician")
graph_builder.add_edge("mathematician", END)
graph = graph_builder.compile()

conclusion = graph.invoke({
        "question": "sum the numbers 1 to 100",
})['conclusion']
print(conclusion)
```
I then made changes to the LLM model used to showcase the traces that
result in the following cases:
1. Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZda-xlM9SEw6wAAABhBWmRhLXhsTUFBQmdnOEVKYVplc0FBQUEAAAAkZjE5NzVhZmItMTk0ZC00YjYwLTgyYTQtNTY2YTNhOGMwZjllAAAABg%22%7D%5D&spanId=8325589879530822573&start=1749577652721&end=1749578552721&paused=false))
2. Request with base URL specified
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdbBQy75H9rcwAAABhBWmRiQlF5N0FBQlBKcmNfNExUUUFBQUEAAAAkZjE5NzViMDUtMjY2OC00Nzg1LTgzMjctMmIyZjQxZGJjMjVhAAAABg%22%7D%5D&spanId=732132008673596287&start=1749577883240&end=1749578783240&paused=false))
3. Request with base URL specified and
`DD_LLMOBS_INSTRUMENTED_PROXY_URLS` set
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdbBI5bzhpfFAAAABhBWmRiQkk1YkFBRGVkaUVHdWZJY0FBQUEAAAAkZjE5NzViMDQtOGU4Yy00ZTE1LThiMjYtYTg4NDhhYjkyZjBiAAAAAg%22%7D%5D&spanId=2296149025644956944&start=1749577854444&end=1749578754444&paused=false))


## LiteLLM
For these tests, I started a LiteLLM server and sent requests to it by
specifying the base URL as `"http://localhost:4000"`. To make the
examples more relevant, I disabled the Open AI integration which means
all spans were coming from the LiteLLM integration (this should not
change the number of spans or the span kinds present in each trace). I
also chose to not instrument the server in the case where the base URL
is specified but is not set as the proxy URL to avoid sending spans from
the server.

Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV3XswcyZDDAAAABhBWmRWM1hzd0FBQ2NzOWtqSlpaTkFBQUEAAAAkZjE5NzU1ZGQtN2IzMC00NjBiLWE1NGUtYTA2NDM3Y2ZjMDNjAAAAAA%22%7D%5D&spanId=1159980771162140816&start=1749491382194&end=1749492282194&paused=false)).
```
import os
import litellm
from litellm import completion

litellm.api_key = os.environ["OPENAI_API_KEY"]

messages = [{ "content": "What color is the sky?","role": "user"}]
response = completion(model="gpt-3.5-turbo", messages=messages)
print(response)
```
Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV6IDmv3BfGAAAABhBWmRWNklEbUFBQTh3N2ZhTzhjaUFBQUEAAAAkZjE5NzU1ZTgtODEzNy00MGQ4LWJkOGYtYzFiZWIxNDI1ZTcxAAAABA%22%7D%5D&spanId=832522128674090551&start=1749492100252&end=1749493000252&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV6ljxyUkI0QAAABhBWmRWNmxqeEFBRDdkY1dNX2MwVkFBQUEAAAAkZjE5NzU1ZWEtN2JjNy00MGYwLThjY2ItZjhlODAxMWM4YmYyAAAAAw%22%7D%5D&spanId=3962954564061385568&start=1749492226691&end=1749493126691&paused=false)).
```
import os
import litellm
from litellm import completion

litellm.api_key = os.environ["OPENAI_API_KEY"]

messages = [{ "content": "What color is the sky?","role": "user"}]
response = completion(model="gpt-3.5-turbo", messages=messages, api_base="http://localhost:4000")
print(response)
```


## Open AI
I chose to not instrument the server in the case where the base URL is
specified but is not set as the proxy URL to avoid sending spans from
the server.

Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWIxjd6BrmNQAAABhBWmRXSXhqZEFBQkFhSlRiV05DeEFBQUEAAAAkZjE5NzU2MjMtNDU2Mi00MGJhLTlmYjQtNDZjMzczNjUxYmU5AAAABw%22%7D%5D&spanId=5010571712001212560&start=1749495949169&end=1749496849169&paused=false)).
```
import os
from openai import OpenAI

oai_client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
)
completion = oai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "testing openai"},
    ],
)
```
Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWJGwwdmBDDAAAABhBWmRXSkd3d0FBQ01GV0VRc29QUUFBQUEAAAAkZjE5NzU2MjQtN2NiNS00OTBmLWI3NmEtMTZlZmQ3NDYxNzE5AAAABw%22%7D%5D&spanId=9189504309494995843&start=1749496040239&end=1749496940239&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWJbwtwkVfGAAAABhBWmRXSmJ3dEFBQXdsTHhtNU1iMEFBQUEAAAAkZjE5NzU2MjUtZDEzZC00ZDVlLTgwMDItMzk0ODA0NjNlZGY0AAAABA%22%7D%5D&spanId=16803264818418539708&start=1749496121106&end=1749497021106&paused=false)).
```
import os
from openai import OpenAI

oai_client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    base_url="http://0.0.0.0:4000",
)
completion = oai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "testing openai"},
    ],
)
```

## Open AI Agents
Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWfOtO9p0WxgAAABhBWmRXZk90T0FBQ2RUMHJnWk9MY0FBQUEAAAAkZjE5NzU2N2QtMWZlYi00MWMwLThkYmItZjc0ZGViOWQ2MTU5AAAAFw%22%7D%5D&spanId=12382352369827199770&start=1749501871180&end=1749502771180&paused=false)).
```
from agents import Agent, Runner
import asyncio

math_tutor_agent = Agent(
    name="Math Tutor",
    handoff_description="Specialist agent for math questions",
    instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
    model="gpt-3.5-turbo",
)

triage_agent = Agent(
    name="Triage Agent",
    instructions="You determine which agent to use based on the user's homework question",
    handoffs=[math_tutor_agent],
    model="gpt-3.5-turbo",
)

async def main():
    result = await Runner.run(triage_agent, "what is the sum of the numbers between 1 and 100?", max_turns=3)
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())
```

Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWguw4pn2-8QAAABhBWmRXZ3V3NEFBQXhMYVlDQ0czVUFBQUEAAAAkZjE5NzU2ODItZWYxZC00NzExLWFkYmUtNGE4NmNkZDA3NGM3AAAAEQ%22%7D%5D&spanId=15430591163304707830&start=1749502304313&end=1749503204313&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWhbr7gAG0tAAAABhBWmRXaGJyN0FBQ283U0ZGN3B3ckFBQUEAAAAkZjE5NzU2ODYtMDNhNi00ZjA4LTkwODgtOWY3ODcxMGNiODI4AAAAEQ%22%7D%5D&spanId=5364068118937893712&start=1749502418618&end=1749503318618&paused=false)).
```
# only change was updating the model used in each agent
from agents.extensions.models.litellm_model import LitellmModel
import os

model = LitellmModel(
    model="gpt-3.5-turbo",
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url="http://localhost:4000",
)
```

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

---------

Co-authored-by: kyle <kyle@verhoog.ca>
alyshawang pushed a commit that referenced this pull request Jul 25, 2025
Currently, all LLM interactions are sent to LLM Obs as LLM spans;
however this does not gracefully handle the case where an LLM request is
directed to a proxy server which internally makes the actual LLM call.
Currently, for these cases, a customer may end up with nested LLM spans
(one span sent from the client and one span sent from the server). This
PR updates all LLM Obs integrations to conform to sending client-side
requests to a proxy as workflow spans to LLM Obs.

Originally, we assumed that a non-default base URL was a good heuristic
for identifying requests that were directed to a proxy; however, this
assumption does not hold as customers can specify alternative model
provider endpoints using the base URL (among potentially other use
cases) which does not work with our previous assumption. In order to
more accurately detect when a request is being directed to a proxy
server, we are putting the onus on users to configure what URLs should
be considered proxies. Users can configure this either by setting the
**DD_LLMOBS_INSTRUMENTED_PROXY_URLS** environment variable (defined in
`ddtrace/settings/_config.py`) or by enabling LLM Obs with the
`instrumented_proxy_urls` field defined. We then check whether an LLM
interaction is being sent to one of these proxy URLs. If so, we create a
workflow span as it is expected that the underlying LLM span is captured
in the proxy itself. Otherwise, we create an LLM span which is the
current and default behavior.

Existing integrations were modified as follows:

**Anthropic**: An LLM Obs workflow span will be sent for proxy requests.
**Bedrock**: An LLM Obs workflow span will be sent for proxy requests.
**CrewAI**: Crew AI [uses LiteLLM under the
hood](https://github.com/crewAIInc/crewAI/blob/main/src/crewai/llm.py#L768)
to make LLM calls; therefore, these cases should already be handled by
the LiteLLM integration.
**Gemini**: no changes since this library does not allow users to
specify a custom base URL
**Langchain**: An LLM Obs workflow span will be sent for proxy requests.
**Langgraph**: Langgraph is model agnostic, so there is nothing to
change within this integration itself.
**LiteLLM**: LLM Obs spans will be sent from the LiteLLM integration as
long as there is no downstream Open AI span detected. The span kind will
be a workflow if the span is a LiteLLM router operation or proxy
request. Otherwise, the span kind is an LLM.
**Open AI**: An LLM Obs workflow span will be sent for proxy requests.
**Open AI Agents**: The Open AI agents SDK also [uses LiteLLM to allow
users to call non-Open AI
models](https://openai.github.io/openai-agents-python/models/litellm/);
therefore, these cases should already be handled by the LiteLLM
integration.
**Vertex AI**: no changes since this library does not allow users to
specify a custom base URL

Every time a span is created by one of the LLM Obs integrations, the
`self._get_base_url` method is called to retrieve the base URL for that
interaction if it exists. Then, `self._is_proxy_url(base_url)` is called
to determine whether to set an item in the context that indicates that
the current span represents a proxy request. This will later be used in
the integration code to determine the appropriate span kind. With this
design, any new integrations simply need to implement the
`_get_base_url` method and then use the `PROXY_REQUEST` context item to
tag their LLM Obs spans accordingly.

# Manual Testing
For each integration, I tested three cases:
1. No base URL is set (this should result in an LLM span)
2. The base URL is set to a proxy URL configured with
`DD_LLMOBS_INSTRUMENTED_PROXY_URLS` (this should result in a top-level
workflow span and perhaps other child spans which may include an LLM
span depending on how the proxy server is instrumented)
3. The base URL Is set but not to a proxy URL (this should result in an
LLM span)

## Anthropic
Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWENSzdO5DDAAAABhBWmRXRU5TekFBRDBJZ1hMelVFc0FBQUEAAAAkZjE5NzU2MTAtZjU5Ny00NWU1LWI1M2UtMmE3OWQ3OWVmNjNlAAAADQ%22%7D%5D&spanId=4488892102153659416&start=1749494753131&end=1749495653131&paused=false)).
```
from anthropic import Anthropic

client = Anthropic()

message = client.messages.create(
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "What color is the sky?",
        }
    ],
    model="claude-3-5-sonnet-20240620",
)
```
Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWEi0I5prmNQAAABhBWmRXRWkwSUFBQldmbnNkU0FOZ0FBQUEAAAAkZjE5NzU2MTItNTM0OS00NjVkLThlM2QtZDEwYTcxZmMzNzBmAAAABg%22%7D%5D&spanId=15578566980693053138&start=1749494846544&end=1749495746544&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWEaqpbXi0tAAAABhBWmRXRWFxcEFBQnJBcVlPR0JoN0FBQUEAAAAkZjE5NzU2MTEtYWNlZi00MDgyLWJjN2QtZjVkM2MzMmIxNTQ4AAAABA%22%7D%5D&spanId=11578799453780556987&start=1749494826126&end=1749495726126&paused=false)).
```
from anthropic import Anthropic

client = Anthropic(
    base_url="http://localhost:4000",
)

message = client.messages.create(
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "What color is the sky?",
        }
    ],
    model="claude-3.5",
)
```

## Bedrock
I chose to not instrument the server in the case where the base URL is
specified but is not set as the proxy URL to avoid sending spans from
the server.

Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGvk-sXTBZQAAABhBWmRXR3ZrLUFBQ25ncEVYTmlhQ0FBQUEAAAAkZjE5NzU2MWEtZjkzZS00OTlhLTg0NTktNzdmN2EyZWM2MzhjAAAAAA%22%7D%5D&spanId=15802925627459513713&start=1749495407424&end=1749496307424&paused=false)).
```
import boto3
import json

session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1")
brt = session.client(
    service_name='bedrock-runtime', 
)
modelId = 'amazon.titan-text-lite-v1'
accept = 'application/json'
contentType = 'application/json'
input_text = "Explain black holes to 8th graders."
body = {
    "inputText": input_text,
}
body = json.dumps(body)
response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
```

Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGhtvbh20tAAAABhBWmRXR2h0dkFBQmFrTVVqb09Vc0FBQUEAAAAkZjE5NzU2MWEtMjRkMS00ZDY3LTk2ODctMjdjYzk0ZGQ4ZTUwAAAABg%22%7D%5D&spanId=939675140636485153&start=1749495362196&end=1749496262196&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGBvtsT3BZQAAABhBWmRXR0J2dEFBQURzcm1jV05ZYkFBQUEAAAAkZjE5NzU2MTgtMjdiZS00MzMzLWIwZWEtNmQ3YmIzN2Y2M2JmAAAAAQ%22%7D%5D&spanId=2663441806507313625&start=1749495240767&end=1749496140767&paused=false))
Server Code (I created a proxy server of my own to test this out!)
```
from fastapi import FastAPI, Request
import uvicorn
import boto3
import json

app = FastAPI()

@app.post("/model/{model_id}/invoke")
async def invoke_model(model_id: str, request: Request):
    body = await request.json()

    session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1")
    brt = session.client(
        service_name='bedrock-runtime', 
    )
    
    body = json.dumps(body)
    response = brt.invoke_model(body=body, modelId=request.path_params.get("model_id"), accept=request.headers.get("accept"), contentType=request.headers.get("content-type"))
    response_body = json.loads(response.get('body').read())
    
    return response_body

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=4000) 
```
Client code
```
import boto3
import json

session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1")
brt = session.client(
    service_name='bedrock-runtime', 
    endpoint_url="http://0.0.0.0:4000",
)
modelId = 'amazon.titan-text-lite-v1'
accept = 'application/json'
contentType = 'application/json'
input_text = "Explain black holes to 8th graders."
body = {
    "inputText": input_text,
}
body = json.dumps(body)
response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
```

## Crew AI
To test out these changes with Crew AI, I used the following simple Crew
AI flow:
```
from crewai import Agent, Task, Crew, LLM

llm = LLM(
    model="gpt-3.5-turbo",
    base_url="http://0.0.0.0:4000", # optionally set for testing
)

calculator = Agent(
    role='Mathematical Calculator',
    goal='Perform accurate mathematical calculations',
    backstory='You are an expert mathematician who can solve complex calculations with precision.',
    llm=llm,
    verbose=True
)

calculation_task = Task(
    description='Calculate the sum of all numbers from 1 to 100',
    agent=calculator,
    expected_output='The sum of all numbers from 1 to 100'
)

crew = Crew(
    agents=[calculator],
    tasks=[calculation_task]
)

result = crew.kickoff()
```

When the base URL is not set, I get this
[trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWW0GEz1QXyQAAABhBWmRXVzBHRUFBQ0EzWnhKTzZVaUFBQUEAAAAkZjE5NzU2NWItNDNkOS00MWRhLWI4NDEtNzRkZWRhNDY2YjcxAAAABw%22%7D%5D&spanId=491126038724058424&start=1749496923362&end=1749500523362&paused=false)
with LLM spans.

When the base URL is set to the same URL as in
`DD_LLMOBS_INSTRUMENTED_PROXY_URLS`, I get this
[trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWXx01ozqQPwAAABhBWmRXWHgwMUFBQVBFcU8yell3eEFBQUEAAAAkZjE5NzU2NWYtMzhhYS00NzlmLWFkMzItNThjOTBmYTllMGZiAAADjw%22%7D%5D&spanId=8869414125491016509&start=1749499885536&end=1749500785536&paused=false)
with workflow spans from the client and underlying LLM spans nested
within. And when the base URL is set but not to a proxy URL, I get this
[trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWXRmv8fYWxgAAABhBWmRXWFJtdkFBRDM5RTNkYXk4WEFBQUEAAAAkZjE5NzU2NWQtMzliNC00NjVhLWIzYjItMWRiNWU2MWQ0MjQ5AAAEug%22%7D%5D&spanId=7586973026220693213&start=1749499746513&end=1749500646513&paused=false),
again with just the LLM span as expected.


## Langchain
For the request using a proxy URL, I instrumented both the client and
the server, except for Open AI. This was to make things simpler as the
only integrations emitting spans would be Langchain and LiteLLM (since I
am using a LiteLLM proxy server). I also chose to not instrument the
server in the case where the base URL is specified but is not set as the
proxy URL to avoid sending spans from the server.

Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWLxYtSWdrcwAAABhBWmRXTHhZdEFBRHFfMnhRT1lVU0FBQUEAAAAkZjE5NzU2MmYtMWM4Mi00ZTJiLWE1OWMtYTk1MjcwMDcwZDVjAAAAAg%22%7D%5D&spanId=9094709675449713583&start=1749496813623&end=1749497713623&paused=false)).
```
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

chat = ChatOpenAI(
    model = "gpt-3.5-turbo",
    temperature=0.1,
)
messages = [HumanMessage(content="how are you?")]
response = chat(messages)
print(response)
```

Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWM0jM9LLMLQAAABhBWmRXTTBqTUFBQnRLbVpjdEJBcEFBQUEAAAAkZjE5NzU2MzMtNGE4NC00ODNkLWIwZjktMDBlN2MwY2E5Nzg5AAAABQ%22%7D%5D&spanId=16659272787581089973&start=1749497004926&end=1749497904926&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWNE1cSZNrcwAAABhBWmRXTkUxY0FBQkhWU3VFdmgxYUFBQUEAAAAkZjE5NzU2MzQtNGQ1ZC00NzQ5LTkwYWItMWE4MThmZmJkN2VjAAAAAg%22%7D%5D&spanId=4643233463687100041&start=1749497076231&end=1749497976231&paused=false))
```
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

chat = ChatOpenAI(
    base_url="http://0.0.0.0:4000",
    model = "gpt-3.5-turbo",
    temperature=0.1,
)
messages = [HumanMessage(content="how are you?")]
response = chat(messages)
print(response)
```

## Langgraph
For these tests, I used the following application code:
```
from langgraph.graph import StateGraph, START, END
from typing import TypedDict
from langchain_openai import ChatOpenAI

class GraphState(TypedDict):
    question: str
    conclusion: str

class Mathematician():
    def __init__(self):
        self.llm = ChatOpenAI(model="gpt-3.5-turbo")

    def __call__(self, state: GraphState):
        prompt = f"You are a mathematician that should only answer questions with a number. You are given a question: {state['question']}. Please answer the question."
        return {"conclusion": self.llm.invoke(prompt)}

graph_builder = StateGraph(GraphState)
graph_builder.add_node("mathematician", Mathematician())
graph_builder.add_edge(START, "mathematician")
graph_builder.add_edge("mathematician", END)
graph = graph_builder.compile()

conclusion = graph.invoke({
        "question": "sum the numbers 1 to 100",
})['conclusion']
print(conclusion)
```
I then made changes to the LLM model used to showcase the traces that
result in the following cases:
1. Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZda-xlM9SEw6wAAABhBWmRhLXhsTUFBQmdnOEVKYVplc0FBQUEAAAAkZjE5NzVhZmItMTk0ZC00YjYwLTgyYTQtNTY2YTNhOGMwZjllAAAABg%22%7D%5D&spanId=8325589879530822573&start=1749577652721&end=1749578552721&paused=false))
2. Request with base URL specified
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdbBQy75H9rcwAAABhBWmRiQlF5N0FBQlBKcmNfNExUUUFBQUEAAAAkZjE5NzViMDUtMjY2OC00Nzg1LTgzMjctMmIyZjQxZGJjMjVhAAAABg%22%7D%5D&spanId=732132008673596287&start=1749577883240&end=1749578783240&paused=false))
3. Request with base URL specified and
`DD_LLMOBS_INSTRUMENTED_PROXY_URLS` set
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdbBI5bzhpfFAAAABhBWmRiQkk1YkFBRGVkaUVHdWZJY0FBQUEAAAAkZjE5NzViMDQtOGU4Yy00ZTE1LThiMjYtYTg4NDhhYjkyZjBiAAAAAg%22%7D%5D&spanId=2296149025644956944&start=1749577854444&end=1749578754444&paused=false))


## LiteLLM
For these tests, I started a LiteLLM server and sent requests to it by
specifying the base URL as `"http://localhost:4000"`. To make the
examples more relevant, I disabled the Open AI integration which means
all spans were coming from the LiteLLM integration (this should not
change the number of spans or the span kinds present in each trace). I
also chose to not instrument the server in the case where the base URL
is specified but is not set as the proxy URL to avoid sending spans from
the server.

Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV3XswcyZDDAAAABhBWmRWM1hzd0FBQ2NzOWtqSlpaTkFBQUEAAAAkZjE5NzU1ZGQtN2IzMC00NjBiLWE1NGUtYTA2NDM3Y2ZjMDNjAAAAAA%22%7D%5D&spanId=1159980771162140816&start=1749491382194&end=1749492282194&paused=false)).
```
import os
import litellm
from litellm import completion

litellm.api_key = os.environ["OPENAI_API_KEY"]

messages = [{ "content": "What color is the sky?","role": "user"}]
response = completion(model="gpt-3.5-turbo", messages=messages)
print(response)
```
Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV6IDmv3BfGAAAABhBWmRWNklEbUFBQTh3N2ZhTzhjaUFBQUEAAAAkZjE5NzU1ZTgtODEzNy00MGQ4LWJkOGYtYzFiZWIxNDI1ZTcxAAAABA%22%7D%5D&spanId=832522128674090551&start=1749492100252&end=1749493000252&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV6ljxyUkI0QAAABhBWmRWNmxqeEFBRDdkY1dNX2MwVkFBQUEAAAAkZjE5NzU1ZWEtN2JjNy00MGYwLThjY2ItZjhlODAxMWM4YmYyAAAAAw%22%7D%5D&spanId=3962954564061385568&start=1749492226691&end=1749493126691&paused=false)).
```
import os
import litellm
from litellm import completion

litellm.api_key = os.environ["OPENAI_API_KEY"]

messages = [{ "content": "What color is the sky?","role": "user"}]
response = completion(model="gpt-3.5-turbo", messages=messages, api_base="http://localhost:4000")
print(response)
```


## Open AI
I chose to not instrument the server in the case where the base URL is
specified but is not set as the proxy URL to avoid sending spans from
the server.

Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWIxjd6BrmNQAAABhBWmRXSXhqZEFBQkFhSlRiV05DeEFBQUEAAAAkZjE5NzU2MjMtNDU2Mi00MGJhLTlmYjQtNDZjMzczNjUxYmU5AAAABw%22%7D%5D&spanId=5010571712001212560&start=1749495949169&end=1749496849169&paused=false)).
```
import os
from openai import OpenAI

oai_client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
)
completion = oai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "testing openai"},
    ],
)
```
Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWJGwwdmBDDAAAABhBWmRXSkd3d0FBQ01GV0VRc29QUUFBQUEAAAAkZjE5NzU2MjQtN2NiNS00OTBmLWI3NmEtMTZlZmQ3NDYxNzE5AAAABw%22%7D%5D&spanId=9189504309494995843&start=1749496040239&end=1749496940239&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWJbwtwkVfGAAAABhBWmRXSmJ3dEFBQXdsTHhtNU1iMEFBQUEAAAAkZjE5NzU2MjUtZDEzZC00ZDVlLTgwMDItMzk0ODA0NjNlZGY0AAAABA%22%7D%5D&spanId=16803264818418539708&start=1749496121106&end=1749497021106&paused=false)).
```
import os
from openai import OpenAI

oai_client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    base_url="http://0.0.0.0:4000",
)
completion = oai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "testing openai"},
    ],
)
```

## Open AI Agents
Request with default base URL
([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWfOtO9p0WxgAAABhBWmRXZk90T0FBQ2RUMHJnWk9MY0FBQUEAAAAkZjE5NzU2N2QtMWZlYi00MWMwLThkYmItZjc0ZGViOWQ2MTU5AAAAFw%22%7D%5D&spanId=12382352369827199770&start=1749501871180&end=1749502771180&paused=false)).
```
from agents import Agent, Runner
import asyncio

math_tutor_agent = Agent(
    name="Math Tutor",
    handoff_description="Specialist agent for math questions",
    instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
    model="gpt-3.5-turbo",
)

triage_agent = Agent(
    name="Triage Agent",
    instructions="You determine which agent to use based on the user's homework question",
    handoffs=[math_tutor_agent],
    model="gpt-3.5-turbo",
)

async def main():
    result = await Runner.run(triage_agent, "what is the sum of the numbers between 1 and 100?", max_turns=3)
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())
```

Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS
is
set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWguw4pn2-8QAAABhBWmRXZ3V3NEFBQXhMYVlDQ0czVUFBQUEAAAAkZjE5NzU2ODItZWYxZC00NzExLWFkYmUtNGE4NmNkZDA3NGM3AAAAEQ%22%7D%5D&spanId=15430591163304707830&start=1749502304313&end=1749503204313&paused=false)
and [when it is
not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWhbr7gAG0tAAAABhBWmRXaGJyN0FBQ283U0ZGN3B3ckFBQUEAAAAkZjE5NzU2ODYtMDNhNi00ZjA4LTkwODgtOWY3ODcxMGNiODI4AAAAEQ%22%7D%5D&spanId=5364068118937893712&start=1749502418618&end=1749503318618&paused=false)).
```
# only change was updating the model used in each agent
from agents.extensions.models.litellm_model import LitellmModel
import os

model = LitellmModel(
    model="gpt-3.5-turbo",
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url="http://localhost:4000",
)
```

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

---------

Co-authored-by: kyle <kyle@verhoog.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants