-
Notifications
You must be signed in to change notification settings - Fork 474
feat(litellm): [MLOB-2787] send client side workflow spans #13477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 274 ± 3 ms. The average import time from base is: 276 ± 3 ms. The import time difference between this PR and base is: -2.1 ± 0.1 ms. Import time breakdownThe following import paths have shrunk:
|
BenchmarksBenchmark execution time: 2025-06-18 17:59:51 Comparing candidate commit 80c0760 in PR branch Found 1 performance improvements and 2 performance regressions! Performance is the same for 564 metrics, 5 unstable metrics. scenario:iastaspects-replace_aspect
scenario:iastaspectssplit-splitlines_aspect
scenario:iastdjangostartup-appsec
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update CODEOWNERS for ddtrace/contrib/internal/litellm/ and tests/contrib/litellm/ (doesn't have to be in this PR)
Approval is for the files owned by the apm-python/core/guild. I did not review the integration/LLMObs specific changes
Ah gotcha, I made a separate PR for this. |
Kyle-Verhoog
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent PR description and test coverage (both manual and automated) @ncybul 👏 👏 👏
While reviewing I thought about a user having a proxy where server-side spans may only be generated for certain endpoints (see https://github.com/DataDog/llm-obs/pull/76 for example). To handle this case I was thinking we could add an argument to annotation_context in order to do this proxy logic on a span-by-span basis. Then we're covered for both generic auto-instrumentation as well as specific manual instrumentation cases. WDYT @ncybul?
Also for the setting naming I am thinking we should refer to it as "INSTRUMENTED_PROXY_URLS" since the use-case is only for proxies that will generate an llm span downstream. Leaving it as just proxy urls is ambiguous IMO.
Co-authored-by: kyle <kyle@verhoog.ca>
Currently, all LLM interactions are sent to LLM Obs as LLM spans; however this does not gracefully handle the case where an LLM request is directed to a proxy server which internally makes the actual LLM call. Currently, for these cases, a customer may end up with nested LLM spans (one span sent from the client and one span sent from the server). This PR updates all LLM Obs integrations to conform to sending client-side requests to a proxy as workflow spans to LLM Obs. Originally, we assumed that a non-default base URL was a good heuristic for identifying requests that were directed to a proxy; however, this assumption does not hold as customers can specify alternative model provider endpoints using the base URL (among potentially other use cases) which does not work with our previous assumption. In order to more accurately detect when a request is being directed to a proxy server, we are putting the onus on users to configure what URLs should be considered proxies. Users can configure this either by setting the **DD_LLMOBS_INSTRUMENTED_PROXY_URLS** environment variable (defined in `ddtrace/settings/_config.py`) or by enabling LLM Obs with the `instrumented_proxy_urls` field defined. We then check whether an LLM interaction is being sent to one of these proxy URLs. If so, we create a workflow span as it is expected that the underlying LLM span is captured in the proxy itself. Otherwise, we create an LLM span which is the current and default behavior. Existing integrations were modified as follows: **Anthropic**: An LLM Obs workflow span will be sent for proxy requests. **Bedrock**: An LLM Obs workflow span will be sent for proxy requests. **CrewAI**: Crew AI [uses LiteLLM under the hood](https://github.com/crewAIInc/crewAI/blob/main/src/crewai/llm.py#L768) to make LLM calls; therefore, these cases should already be handled by the LiteLLM integration. **Gemini**: no changes since this library does not allow users to specify a custom base URL **Langchain**: An LLM Obs workflow span will be sent for proxy requests. **Langgraph**: Langgraph is model agnostic, so there is nothing to change within this integration itself. **LiteLLM**: LLM Obs spans will be sent from the LiteLLM integration as long as there is no downstream Open AI span detected. The span kind will be a workflow if the span is a LiteLLM router operation or proxy request. Otherwise, the span kind is an LLM. **Open AI**: An LLM Obs workflow span will be sent for proxy requests. **Open AI Agents**: The Open AI agents SDK also [uses LiteLLM to allow users to call non-Open AI models](https://openai.github.io/openai-agents-python/models/litellm/); therefore, these cases should already be handled by the LiteLLM integration. **Vertex AI**: no changes since this library does not allow users to specify a custom base URL Every time a span is created by one of the LLM Obs integrations, the `self._get_base_url` method is called to retrieve the base URL for that interaction if it exists. Then, `self._is_proxy_url(base_url)` is called to determine whether to set an item in the context that indicates that the current span represents a proxy request. This will later be used in the integration code to determine the appropriate span kind. With this design, any new integrations simply need to implement the `_get_base_url` method and then use the `PROXY_REQUEST` context item to tag their LLM Obs spans accordingly. # Manual Testing For each integration, I tested three cases: 1. No base URL is set (this should result in an LLM span) 2. The base URL is set to a proxy URL configured with `DD_LLMOBS_INSTRUMENTED_PROXY_URLS` (this should result in a top-level workflow span and perhaps other child spans which may include an LLM span depending on how the proxy server is instrumented) 3. The base URL Is set but not to a proxy URL (this should result in an LLM span) ## Anthropic Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWENSzdO5DDAAAABhBWmRXRU5TekFBRDBJZ1hMelVFc0FBQUEAAAAkZjE5NzU2MTAtZjU5Ny00NWU1LWI1M2UtMmE3OWQ3OWVmNjNlAAAADQ%22%7D%5D&spanId=4488892102153659416&start=1749494753131&end=1749495653131&paused=false)). ``` from anthropic import Anthropic client = Anthropic() message = client.messages.create( max_tokens=1024, messages=[ { "role": "user", "content": "What color is the sky?", } ], model="claude-3-5-sonnet-20240620", ) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWEi0I5prmNQAAABhBWmRXRWkwSUFBQldmbnNkU0FOZ0FBQUEAAAAkZjE5NzU2MTItNTM0OS00NjVkLThlM2QtZDEwYTcxZmMzNzBmAAAABg%22%7D%5D&spanId=15578566980693053138&start=1749494846544&end=1749495746544&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWEaqpbXi0tAAAABhBWmRXRWFxcEFBQnJBcVlPR0JoN0FBQUEAAAAkZjE5NzU2MTEtYWNlZi00MDgyLWJjN2QtZjVkM2MzMmIxNTQ4AAAABA%22%7D%5D&spanId=11578799453780556987&start=1749494826126&end=1749495726126&paused=false)). ``` from anthropic import Anthropic client = Anthropic( base_url="http://localhost:4000", ) message = client.messages.create( max_tokens=1024, messages=[ { "role": "user", "content": "What color is the sky?", } ], model="claude-3.5", ) ``` ## Bedrock I chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGvk-sXTBZQAAABhBWmRXR3ZrLUFBQ25ncEVYTmlhQ0FBQUEAAAAkZjE5NzU2MWEtZjkzZS00OTlhLTg0NTktNzdmN2EyZWM2MzhjAAAAAA%22%7D%5D&spanId=15802925627459513713&start=1749495407424&end=1749496307424&paused=false)). ``` import boto3 import json session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1") brt = session.client( service_name='bedrock-runtime', ) modelId = 'amazon.titan-text-lite-v1' accept = 'application/json' contentType = 'application/json' input_text = "Explain black holes to 8th graders." body = { "inputText": input_text, } body = json.dumps(body) response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType) response_body = json.loads(response.get('body').read()) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGhtvbh20tAAAABhBWmRXR2h0dkFBQmFrTVVqb09Vc0FBQUEAAAAkZjE5NzU2MWEtMjRkMS00ZDY3LTk2ODctMjdjYzk0ZGQ4ZTUwAAAABg%22%7D%5D&spanId=939675140636485153&start=1749495362196&end=1749496262196&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGBvtsT3BZQAAABhBWmRXR0J2dEFBQURzcm1jV05ZYkFBQUEAAAAkZjE5NzU2MTgtMjdiZS00MzMzLWIwZWEtNmQ3YmIzN2Y2M2JmAAAAAQ%22%7D%5D&spanId=2663441806507313625&start=1749495240767&end=1749496140767&paused=false)) Server Code (I created a proxy server of my own to test this out!) ``` from fastapi import FastAPI, Request import uvicorn import boto3 import json app = FastAPI() @app.post("/model/{model_id}/invoke") async def invoke_model(model_id: str, request: Request): body = await request.json() session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1") brt = session.client( service_name='bedrock-runtime', ) body = json.dumps(body) response = brt.invoke_model(body=body, modelId=request.path_params.get("model_id"), accept=request.headers.get("accept"), contentType=request.headers.get("content-type")) response_body = json.loads(response.get('body').read()) return response_body if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=4000) ``` Client code ``` import boto3 import json session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1") brt = session.client( service_name='bedrock-runtime', endpoint_url="http://0.0.0.0:4000", ) modelId = 'amazon.titan-text-lite-v1' accept = 'application/json' contentType = 'application/json' input_text = "Explain black holes to 8th graders." body = { "inputText": input_text, } body = json.dumps(body) response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType) response_body = json.loads(response.get('body').read()) ``` ## Crew AI To test out these changes with Crew AI, I used the following simple Crew AI flow: ``` from crewai import Agent, Task, Crew, LLM llm = LLM( model="gpt-3.5-turbo", base_url="http://0.0.0.0:4000", # optionally set for testing ) calculator = Agent( role='Mathematical Calculator', goal='Perform accurate mathematical calculations', backstory='You are an expert mathematician who can solve complex calculations with precision.', llm=llm, verbose=True ) calculation_task = Task( description='Calculate the sum of all numbers from 1 to 100', agent=calculator, expected_output='The sum of all numbers from 1 to 100' ) crew = Crew( agents=[calculator], tasks=[calculation_task] ) result = crew.kickoff() ``` When the base URL is not set, I get this [trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWW0GEz1QXyQAAABhBWmRXVzBHRUFBQ0EzWnhKTzZVaUFBQUEAAAAkZjE5NzU2NWItNDNkOS00MWRhLWI4NDEtNzRkZWRhNDY2YjcxAAAABw%22%7D%5D&spanId=491126038724058424&start=1749496923362&end=1749500523362&paused=false) with LLM spans. When the base URL is set to the same URL as in `DD_LLMOBS_INSTRUMENTED_PROXY_URLS`, I get this [trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWXx01ozqQPwAAABhBWmRXWHgwMUFBQVBFcU8yell3eEFBQUEAAAAkZjE5NzU2NWYtMzhhYS00NzlmLWFkMzItNThjOTBmYTllMGZiAAADjw%22%7D%5D&spanId=8869414125491016509&start=1749499885536&end=1749500785536&paused=false) with workflow spans from the client and underlying LLM spans nested within. And when the base URL is set but not to a proxy URL, I get this [trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWXRmv8fYWxgAAABhBWmRXWFJtdkFBRDM5RTNkYXk4WEFBQUEAAAAkZjE5NzU2NWQtMzliNC00NjVhLWIzYjItMWRiNWU2MWQ0MjQ5AAAEug%22%7D%5D&spanId=7586973026220693213&start=1749499746513&end=1749500646513&paused=false), again with just the LLM span as expected. ## Langchain For the request using a proxy URL, I instrumented both the client and the server, except for Open AI. This was to make things simpler as the only integrations emitting spans would be Langchain and LiteLLM (since I am using a LiteLLM proxy server). I also chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWLxYtSWdrcwAAABhBWmRXTHhZdEFBRHFfMnhRT1lVU0FBQUEAAAAkZjE5NzU2MmYtMWM4Mi00ZTJiLWE1OWMtYTk1MjcwMDcwZDVjAAAAAg%22%7D%5D&spanId=9094709675449713583&start=1749496813623&end=1749497713623&paused=false)). ``` from langchain.chat_models import ChatOpenAI from langchain.schema import HumanMessage chat = ChatOpenAI( model = "gpt-3.5-turbo", temperature=0.1, ) messages = [HumanMessage(content="how are you?")] response = chat(messages) print(response) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWM0jM9LLMLQAAABhBWmRXTTBqTUFBQnRLbVpjdEJBcEFBQUEAAAAkZjE5NzU2MzMtNGE4NC00ODNkLWIwZjktMDBlN2MwY2E5Nzg5AAAABQ%22%7D%5D&spanId=16659272787581089973&start=1749497004926&end=1749497904926&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWNE1cSZNrcwAAABhBWmRXTkUxY0FBQkhWU3VFdmgxYUFBQUEAAAAkZjE5NzU2MzQtNGQ1ZC00NzQ5LTkwYWItMWE4MThmZmJkN2VjAAAAAg%22%7D%5D&spanId=4643233463687100041&start=1749497076231&end=1749497976231&paused=false)) ``` from langchain.chat_models import ChatOpenAI from langchain.schema import HumanMessage chat = ChatOpenAI( base_url="http://0.0.0.0:4000", model = "gpt-3.5-turbo", temperature=0.1, ) messages = [HumanMessage(content="how are you?")] response = chat(messages) print(response) ``` ## Langgraph For these tests, I used the following application code: ``` from langgraph.graph import StateGraph, START, END from typing import TypedDict from langchain_openai import ChatOpenAI class GraphState(TypedDict): question: str conclusion: str class Mathematician(): def __init__(self): self.llm = ChatOpenAI(model="gpt-3.5-turbo") def __call__(self, state: GraphState): prompt = f"You are a mathematician that should only answer questions with a number. You are given a question: {state['question']}. Please answer the question." return {"conclusion": self.llm.invoke(prompt)} graph_builder = StateGraph(GraphState) graph_builder.add_node("mathematician", Mathematician()) graph_builder.add_edge(START, "mathematician") graph_builder.add_edge("mathematician", END) graph = graph_builder.compile() conclusion = graph.invoke({ "question": "sum the numbers 1 to 100", })['conclusion'] print(conclusion) ``` I then made changes to the LLM model used to showcase the traces that result in the following cases: 1. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZda-xlM9SEw6wAAABhBWmRhLXhsTUFBQmdnOEVKYVplc0FBQUEAAAAkZjE5NzVhZmItMTk0ZC00YjYwLTgyYTQtNTY2YTNhOGMwZjllAAAABg%22%7D%5D&spanId=8325589879530822573&start=1749577652721&end=1749578552721&paused=false)) 2. Request with base URL specified ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdbBQy75H9rcwAAABhBWmRiQlF5N0FBQlBKcmNfNExUUUFBQUEAAAAkZjE5NzViMDUtMjY2OC00Nzg1LTgzMjctMmIyZjQxZGJjMjVhAAAABg%22%7D%5D&spanId=732132008673596287&start=1749577883240&end=1749578783240&paused=false)) 3. Request with base URL specified and `DD_LLMOBS_INSTRUMENTED_PROXY_URLS` set ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdbBI5bzhpfFAAAABhBWmRiQkk1YkFBRGVkaUVHdWZJY0FBQUEAAAAkZjE5NzViMDQtOGU4Yy00ZTE1LThiMjYtYTg4NDhhYjkyZjBiAAAAAg%22%7D%5D&spanId=2296149025644956944&start=1749577854444&end=1749578754444&paused=false)) ## LiteLLM For these tests, I started a LiteLLM server and sent requests to it by specifying the base URL as `"http://localhost:4000"`. To make the examples more relevant, I disabled the Open AI integration which means all spans were coming from the LiteLLM integration (this should not change the number of spans or the span kinds present in each trace). I also chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV3XswcyZDDAAAABhBWmRWM1hzd0FBQ2NzOWtqSlpaTkFBQUEAAAAkZjE5NzU1ZGQtN2IzMC00NjBiLWE1NGUtYTA2NDM3Y2ZjMDNjAAAAAA%22%7D%5D&spanId=1159980771162140816&start=1749491382194&end=1749492282194&paused=false)). ``` import os import litellm from litellm import completion litellm.api_key = os.environ["OPENAI_API_KEY"] messages = [{ "content": "What color is the sky?","role": "user"}] response = completion(model="gpt-3.5-turbo", messages=messages) print(response) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV6IDmv3BfGAAAABhBWmRWNklEbUFBQTh3N2ZhTzhjaUFBQUEAAAAkZjE5NzU1ZTgtODEzNy00MGQ4LWJkOGYtYzFiZWIxNDI1ZTcxAAAABA%22%7D%5D&spanId=832522128674090551&start=1749492100252&end=1749493000252&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV6ljxyUkI0QAAABhBWmRWNmxqeEFBRDdkY1dNX2MwVkFBQUEAAAAkZjE5NzU1ZWEtN2JjNy00MGYwLThjY2ItZjhlODAxMWM4YmYyAAAAAw%22%7D%5D&spanId=3962954564061385568&start=1749492226691&end=1749493126691&paused=false)). ``` import os import litellm from litellm import completion litellm.api_key = os.environ["OPENAI_API_KEY"] messages = [{ "content": "What color is the sky?","role": "user"}] response = completion(model="gpt-3.5-turbo", messages=messages, api_base="http://localhost:4000") print(response) ``` ## Open AI I chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWIxjd6BrmNQAAABhBWmRXSXhqZEFBQkFhSlRiV05DeEFBQUEAAAAkZjE5NzU2MjMtNDU2Mi00MGJhLTlmYjQtNDZjMzczNjUxYmU5AAAABw%22%7D%5D&spanId=5010571712001212560&start=1749495949169&end=1749496849169&paused=false)). ``` import os from openai import OpenAI oai_client = OpenAI( api_key=os.environ.get("OPENAI_API_KEY"), ) completion = oai_client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "user", "content": "testing openai"}, ], ) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWJGwwdmBDDAAAABhBWmRXSkd3d0FBQ01GV0VRc29QUUFBQUEAAAAkZjE5NzU2MjQtN2NiNS00OTBmLWI3NmEtMTZlZmQ3NDYxNzE5AAAABw%22%7D%5D&spanId=9189504309494995843&start=1749496040239&end=1749496940239&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWJbwtwkVfGAAAABhBWmRXSmJ3dEFBQXdsTHhtNU1iMEFBQUEAAAAkZjE5NzU2MjUtZDEzZC00ZDVlLTgwMDItMzk0ODA0NjNlZGY0AAAABA%22%7D%5D&spanId=16803264818418539708&start=1749496121106&end=1749497021106&paused=false)). ``` import os from openai import OpenAI oai_client = OpenAI( api_key=os.environ.get("OPENAI_API_KEY"), base_url="http://0.0.0.0:4000", ) completion = oai_client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "user", "content": "testing openai"}, ], ) ``` ## Open AI Agents Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWfOtO9p0WxgAAABhBWmRXZk90T0FBQ2RUMHJnWk9MY0FBQUEAAAAkZjE5NzU2N2QtMWZlYi00MWMwLThkYmItZjc0ZGViOWQ2MTU5AAAAFw%22%7D%5D&spanId=12382352369827199770&start=1749501871180&end=1749502771180&paused=false)). ``` from agents import Agent, Runner import asyncio math_tutor_agent = Agent( name="Math Tutor", handoff_description="Specialist agent for math questions", instructions="You provide help with math problems. Explain your reasoning at each step and include examples", model="gpt-3.5-turbo", ) triage_agent = Agent( name="Triage Agent", instructions="You determine which agent to use based on the user's homework question", handoffs=[math_tutor_agent], model="gpt-3.5-turbo", ) async def main(): result = await Runner.run(triage_agent, "what is the sum of the numbers between 1 and 100?", max_turns=3) print(result.final_output) if __name__ == "__main__": asyncio.run(main()) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWguw4pn2-8QAAABhBWmRXZ3V3NEFBQXhMYVlDQ0czVUFBQUEAAAAkZjE5NzU2ODItZWYxZC00NzExLWFkYmUtNGE4NmNkZDA3NGM3AAAAEQ%22%7D%5D&spanId=15430591163304707830&start=1749502304313&end=1749503204313&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWhbr7gAG0tAAAABhBWmRXaGJyN0FBQ283U0ZGN3B3ckFBQUEAAAAkZjE5NzU2ODYtMDNhNi00ZjA4LTkwODgtOWY3ODcxMGNiODI4AAAAEQ%22%7D%5D&spanId=5364068118937893712&start=1749502418618&end=1749503318618&paused=false)). ``` # only change was updating the model used in each agent from agents.extensions.models.litellm_model import LitellmModel import os model = LitellmModel( model="gpt-3.5-turbo", api_key=os.getenv("OPENAI_API_KEY"), base_url="http://localhost:4000", ) ``` ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: kyle <kyle@verhoog.ca>
Currently, all LLM interactions are sent to LLM Obs as LLM spans; however this does not gracefully handle the case where an LLM request is directed to a proxy server which internally makes the actual LLM call. Currently, for these cases, a customer may end up with nested LLM spans (one span sent from the client and one span sent from the server). This PR updates all LLM Obs integrations to conform to sending client-side requests to a proxy as workflow spans to LLM Obs. Originally, we assumed that a non-default base URL was a good heuristic for identifying requests that were directed to a proxy; however, this assumption does not hold as customers can specify alternative model provider endpoints using the base URL (among potentially other use cases) which does not work with our previous assumption. In order to more accurately detect when a request is being directed to a proxy server, we are putting the onus on users to configure what URLs should be considered proxies. Users can configure this either by setting the **DD_LLMOBS_INSTRUMENTED_PROXY_URLS** environment variable (defined in `ddtrace/settings/_config.py`) or by enabling LLM Obs with the `instrumented_proxy_urls` field defined. We then check whether an LLM interaction is being sent to one of these proxy URLs. If so, we create a workflow span as it is expected that the underlying LLM span is captured in the proxy itself. Otherwise, we create an LLM span which is the current and default behavior. Existing integrations were modified as follows: **Anthropic**: An LLM Obs workflow span will be sent for proxy requests. **Bedrock**: An LLM Obs workflow span will be sent for proxy requests. **CrewAI**: Crew AI [uses LiteLLM under the hood](https://github.com/crewAIInc/crewAI/blob/main/src/crewai/llm.py#L768) to make LLM calls; therefore, these cases should already be handled by the LiteLLM integration. **Gemini**: no changes since this library does not allow users to specify a custom base URL **Langchain**: An LLM Obs workflow span will be sent for proxy requests. **Langgraph**: Langgraph is model agnostic, so there is nothing to change within this integration itself. **LiteLLM**: LLM Obs spans will be sent from the LiteLLM integration as long as there is no downstream Open AI span detected. The span kind will be a workflow if the span is a LiteLLM router operation or proxy request. Otherwise, the span kind is an LLM. **Open AI**: An LLM Obs workflow span will be sent for proxy requests. **Open AI Agents**: The Open AI agents SDK also [uses LiteLLM to allow users to call non-Open AI models](https://openai.github.io/openai-agents-python/models/litellm/); therefore, these cases should already be handled by the LiteLLM integration. **Vertex AI**: no changes since this library does not allow users to specify a custom base URL Every time a span is created by one of the LLM Obs integrations, the `self._get_base_url` method is called to retrieve the base URL for that interaction if it exists. Then, `self._is_proxy_url(base_url)` is called to determine whether to set an item in the context that indicates that the current span represents a proxy request. This will later be used in the integration code to determine the appropriate span kind. With this design, any new integrations simply need to implement the `_get_base_url` method and then use the `PROXY_REQUEST` context item to tag their LLM Obs spans accordingly. # Manual Testing For each integration, I tested three cases: 1. No base URL is set (this should result in an LLM span) 2. The base URL is set to a proxy URL configured with `DD_LLMOBS_INSTRUMENTED_PROXY_URLS` (this should result in a top-level workflow span and perhaps other child spans which may include an LLM span depending on how the proxy server is instrumented) 3. The base URL Is set but not to a proxy URL (this should result in an LLM span) ## Anthropic Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWENSzdO5DDAAAABhBWmRXRU5TekFBRDBJZ1hMelVFc0FBQUEAAAAkZjE5NzU2MTAtZjU5Ny00NWU1LWI1M2UtMmE3OWQ3OWVmNjNlAAAADQ%22%7D%5D&spanId=4488892102153659416&start=1749494753131&end=1749495653131&paused=false)). ``` from anthropic import Anthropic client = Anthropic() message = client.messages.create( max_tokens=1024, messages=[ { "role": "user", "content": "What color is the sky?", } ], model="claude-3-5-sonnet-20240620", ) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWEi0I5prmNQAAABhBWmRXRWkwSUFBQldmbnNkU0FOZ0FBQUEAAAAkZjE5NzU2MTItNTM0OS00NjVkLThlM2QtZDEwYTcxZmMzNzBmAAAABg%22%7D%5D&spanId=15578566980693053138&start=1749494846544&end=1749495746544&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWEaqpbXi0tAAAABhBWmRXRWFxcEFBQnJBcVlPR0JoN0FBQUEAAAAkZjE5NzU2MTEtYWNlZi00MDgyLWJjN2QtZjVkM2MzMmIxNTQ4AAAABA%22%7D%5D&spanId=11578799453780556987&start=1749494826126&end=1749495726126&paused=false)). ``` from anthropic import Anthropic client = Anthropic( base_url="http://localhost:4000", ) message = client.messages.create( max_tokens=1024, messages=[ { "role": "user", "content": "What color is the sky?", } ], model="claude-3.5", ) ``` ## Bedrock I chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGvk-sXTBZQAAABhBWmRXR3ZrLUFBQ25ncEVYTmlhQ0FBQUEAAAAkZjE5NzU2MWEtZjkzZS00OTlhLTg0NTktNzdmN2EyZWM2MzhjAAAAAA%22%7D%5D&spanId=15802925627459513713&start=1749495407424&end=1749496307424&paused=false)). ``` import boto3 import json session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1") brt = session.client( service_name='bedrock-runtime', ) modelId = 'amazon.titan-text-lite-v1' accept = 'application/json' contentType = 'application/json' input_text = "Explain black holes to 8th graders." body = { "inputText": input_text, } body = json.dumps(body) response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType) response_body = json.loads(response.get('body').read()) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGhtvbh20tAAAABhBWmRXR2h0dkFBQmFrTVVqb09Vc0FBQUEAAAAkZjE5NzU2MWEtMjRkMS00ZDY3LTk2ODctMjdjYzk0ZGQ4ZTUwAAAABg%22%7D%5D&spanId=939675140636485153&start=1749495362196&end=1749496262196&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGBvtsT3BZQAAABhBWmRXR0J2dEFBQURzcm1jV05ZYkFBQUEAAAAkZjE5NzU2MTgtMjdiZS00MzMzLWIwZWEtNmQ3YmIzN2Y2M2JmAAAAAQ%22%7D%5D&spanId=2663441806507313625&start=1749495240767&end=1749496140767&paused=false)) Server Code (I created a proxy server of my own to test this out!) ``` from fastapi import FastAPI, Request import uvicorn import boto3 import json app = FastAPI() @app.post("/model/{model_id}/invoke") async def invoke_model(model_id: str, request: Request): body = await request.json() session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1") brt = session.client( service_name='bedrock-runtime', ) body = json.dumps(body) response = brt.invoke_model(body=body, modelId=request.path_params.get("model_id"), accept=request.headers.get("accept"), contentType=request.headers.get("content-type")) response_body = json.loads(response.get('body').read()) return response_body if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=4000) ``` Client code ``` import boto3 import json session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1") brt = session.client( service_name='bedrock-runtime', endpoint_url="http://0.0.0.0:4000", ) modelId = 'amazon.titan-text-lite-v1' accept = 'application/json' contentType = 'application/json' input_text = "Explain black holes to 8th graders." body = { "inputText": input_text, } body = json.dumps(body) response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType) response_body = json.loads(response.get('body').read()) ``` ## Crew AI To test out these changes with Crew AI, I used the following simple Crew AI flow: ``` from crewai import Agent, Task, Crew, LLM llm = LLM( model="gpt-3.5-turbo", base_url="http://0.0.0.0:4000", # optionally set for testing ) calculator = Agent( role='Mathematical Calculator', goal='Perform accurate mathematical calculations', backstory='You are an expert mathematician who can solve complex calculations with precision.', llm=llm, verbose=True ) calculation_task = Task( description='Calculate the sum of all numbers from 1 to 100', agent=calculator, expected_output='The sum of all numbers from 1 to 100' ) crew = Crew( agents=[calculator], tasks=[calculation_task] ) result = crew.kickoff() ``` When the base URL is not set, I get this [trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWW0GEz1QXyQAAABhBWmRXVzBHRUFBQ0EzWnhKTzZVaUFBQUEAAAAkZjE5NzU2NWItNDNkOS00MWRhLWI4NDEtNzRkZWRhNDY2YjcxAAAABw%22%7D%5D&spanId=491126038724058424&start=1749496923362&end=1749500523362&paused=false) with LLM spans. When the base URL is set to the same URL as in `DD_LLMOBS_INSTRUMENTED_PROXY_URLS`, I get this [trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWXx01ozqQPwAAABhBWmRXWHgwMUFBQVBFcU8yell3eEFBQUEAAAAkZjE5NzU2NWYtMzhhYS00NzlmLWFkMzItNThjOTBmYTllMGZiAAADjw%22%7D%5D&spanId=8869414125491016509&start=1749499885536&end=1749500785536&paused=false) with workflow spans from the client and underlying LLM spans nested within. And when the base URL is set but not to a proxy URL, I get this [trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWXRmv8fYWxgAAABhBWmRXWFJtdkFBRDM5RTNkYXk4WEFBQUEAAAAkZjE5NzU2NWQtMzliNC00NjVhLWIzYjItMWRiNWU2MWQ0MjQ5AAAEug%22%7D%5D&spanId=7586973026220693213&start=1749499746513&end=1749500646513&paused=false), again with just the LLM span as expected. ## Langchain For the request using a proxy URL, I instrumented both the client and the server, except for Open AI. This was to make things simpler as the only integrations emitting spans would be Langchain and LiteLLM (since I am using a LiteLLM proxy server). I also chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWLxYtSWdrcwAAABhBWmRXTHhZdEFBRHFfMnhRT1lVU0FBQUEAAAAkZjE5NzU2MmYtMWM4Mi00ZTJiLWE1OWMtYTk1MjcwMDcwZDVjAAAAAg%22%7D%5D&spanId=9094709675449713583&start=1749496813623&end=1749497713623&paused=false)). ``` from langchain.chat_models import ChatOpenAI from langchain.schema import HumanMessage chat = ChatOpenAI( model = "gpt-3.5-turbo", temperature=0.1, ) messages = [HumanMessage(content="how are you?")] response = chat(messages) print(response) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWM0jM9LLMLQAAABhBWmRXTTBqTUFBQnRLbVpjdEJBcEFBQUEAAAAkZjE5NzU2MzMtNGE4NC00ODNkLWIwZjktMDBlN2MwY2E5Nzg5AAAABQ%22%7D%5D&spanId=16659272787581089973&start=1749497004926&end=1749497904926&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWNE1cSZNrcwAAABhBWmRXTkUxY0FBQkhWU3VFdmgxYUFBQUEAAAAkZjE5NzU2MzQtNGQ1ZC00NzQ5LTkwYWItMWE4MThmZmJkN2VjAAAAAg%22%7D%5D&spanId=4643233463687100041&start=1749497076231&end=1749497976231&paused=false)) ``` from langchain.chat_models import ChatOpenAI from langchain.schema import HumanMessage chat = ChatOpenAI( base_url="http://0.0.0.0:4000", model = "gpt-3.5-turbo", temperature=0.1, ) messages = [HumanMessage(content="how are you?")] response = chat(messages) print(response) ``` ## Langgraph For these tests, I used the following application code: ``` from langgraph.graph import StateGraph, START, END from typing import TypedDict from langchain_openai import ChatOpenAI class GraphState(TypedDict): question: str conclusion: str class Mathematician(): def __init__(self): self.llm = ChatOpenAI(model="gpt-3.5-turbo") def __call__(self, state: GraphState): prompt = f"You are a mathematician that should only answer questions with a number. You are given a question: {state['question']}. Please answer the question." return {"conclusion": self.llm.invoke(prompt)} graph_builder = StateGraph(GraphState) graph_builder.add_node("mathematician", Mathematician()) graph_builder.add_edge(START, "mathematician") graph_builder.add_edge("mathematician", END) graph = graph_builder.compile() conclusion = graph.invoke({ "question": "sum the numbers 1 to 100", })['conclusion'] print(conclusion) ``` I then made changes to the LLM model used to showcase the traces that result in the following cases: 1. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZda-xlM9SEw6wAAABhBWmRhLXhsTUFBQmdnOEVKYVplc0FBQUEAAAAkZjE5NzVhZmItMTk0ZC00YjYwLTgyYTQtNTY2YTNhOGMwZjllAAAABg%22%7D%5D&spanId=8325589879530822573&start=1749577652721&end=1749578552721&paused=false)) 2. Request with base URL specified ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdbBQy75H9rcwAAABhBWmRiQlF5N0FBQlBKcmNfNExUUUFBQUEAAAAkZjE5NzViMDUtMjY2OC00Nzg1LTgzMjctMmIyZjQxZGJjMjVhAAAABg%22%7D%5D&spanId=732132008673596287&start=1749577883240&end=1749578783240&paused=false)) 3. Request with base URL specified and `DD_LLMOBS_INSTRUMENTED_PROXY_URLS` set ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdbBI5bzhpfFAAAABhBWmRiQkk1YkFBRGVkaUVHdWZJY0FBQUEAAAAkZjE5NzViMDQtOGU4Yy00ZTE1LThiMjYtYTg4NDhhYjkyZjBiAAAAAg%22%7D%5D&spanId=2296149025644956944&start=1749577854444&end=1749578754444&paused=false)) ## LiteLLM For these tests, I started a LiteLLM server and sent requests to it by specifying the base URL as `"http://localhost:4000"`. To make the examples more relevant, I disabled the Open AI integration which means all spans were coming from the LiteLLM integration (this should not change the number of spans or the span kinds present in each trace). I also chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV3XswcyZDDAAAABhBWmRWM1hzd0FBQ2NzOWtqSlpaTkFBQUEAAAAkZjE5NzU1ZGQtN2IzMC00NjBiLWE1NGUtYTA2NDM3Y2ZjMDNjAAAAAA%22%7D%5D&spanId=1159980771162140816&start=1749491382194&end=1749492282194&paused=false)). ``` import os import litellm from litellm import completion litellm.api_key = os.environ["OPENAI_API_KEY"] messages = [{ "content": "What color is the sky?","role": "user"}] response = completion(model="gpt-3.5-turbo", messages=messages) print(response) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV6IDmv3BfGAAAABhBWmRWNklEbUFBQTh3N2ZhTzhjaUFBQUEAAAAkZjE5NzU1ZTgtODEzNy00MGQ4LWJkOGYtYzFiZWIxNDI1ZTcxAAAABA%22%7D%5D&spanId=832522128674090551&start=1749492100252&end=1749493000252&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV6ljxyUkI0QAAABhBWmRWNmxqeEFBRDdkY1dNX2MwVkFBQUEAAAAkZjE5NzU1ZWEtN2JjNy00MGYwLThjY2ItZjhlODAxMWM4YmYyAAAAAw%22%7D%5D&spanId=3962954564061385568&start=1749492226691&end=1749493126691&paused=false)). ``` import os import litellm from litellm import completion litellm.api_key = os.environ["OPENAI_API_KEY"] messages = [{ "content": "What color is the sky?","role": "user"}] response = completion(model="gpt-3.5-turbo", messages=messages, api_base="http://localhost:4000") print(response) ``` ## Open AI I chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWIxjd6BrmNQAAABhBWmRXSXhqZEFBQkFhSlRiV05DeEFBQUEAAAAkZjE5NzU2MjMtNDU2Mi00MGJhLTlmYjQtNDZjMzczNjUxYmU5AAAABw%22%7D%5D&spanId=5010571712001212560&start=1749495949169&end=1749496849169&paused=false)). ``` import os from openai import OpenAI oai_client = OpenAI( api_key=os.environ.get("OPENAI_API_KEY"), ) completion = oai_client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "user", "content": "testing openai"}, ], ) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWJGwwdmBDDAAAABhBWmRXSkd3d0FBQ01GV0VRc29QUUFBQUEAAAAkZjE5NzU2MjQtN2NiNS00OTBmLWI3NmEtMTZlZmQ3NDYxNzE5AAAABw%22%7D%5D&spanId=9189504309494995843&start=1749496040239&end=1749496940239&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWJbwtwkVfGAAAABhBWmRXSmJ3dEFBQXdsTHhtNU1iMEFBQUEAAAAkZjE5NzU2MjUtZDEzZC00ZDVlLTgwMDItMzk0ODA0NjNlZGY0AAAABA%22%7D%5D&spanId=16803264818418539708&start=1749496121106&end=1749497021106&paused=false)). ``` import os from openai import OpenAI oai_client = OpenAI( api_key=os.environ.get("OPENAI_API_KEY"), base_url="http://0.0.0.0:4000", ) completion = oai_client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "user", "content": "testing openai"}, ], ) ``` ## Open AI Agents Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWfOtO9p0WxgAAABhBWmRXZk90T0FBQ2RUMHJnWk9MY0FBQUEAAAAkZjE5NzU2N2QtMWZlYi00MWMwLThkYmItZjc0ZGViOWQ2MTU5AAAAFw%22%7D%5D&spanId=12382352369827199770&start=1749501871180&end=1749502771180&paused=false)). ``` from agents import Agent, Runner import asyncio math_tutor_agent = Agent( name="Math Tutor", handoff_description="Specialist agent for math questions", instructions="You provide help with math problems. Explain your reasoning at each step and include examples", model="gpt-3.5-turbo", ) triage_agent = Agent( name="Triage Agent", instructions="You determine which agent to use based on the user's homework question", handoffs=[math_tutor_agent], model="gpt-3.5-turbo", ) async def main(): result = await Runner.run(triage_agent, "what is the sum of the numbers between 1 and 100?", max_turns=3) print(result.final_output) if __name__ == "__main__": asyncio.run(main()) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWguw4pn2-8QAAABhBWmRXZ3V3NEFBQXhMYVlDQ0czVUFBQUEAAAAkZjE5NzU2ODItZWYxZC00NzExLWFkYmUtNGE4NmNkZDA3NGM3AAAAEQ%22%7D%5D&spanId=15430591163304707830&start=1749502304313&end=1749503204313&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWhbr7gAG0tAAAABhBWmRXaGJyN0FBQ283U0ZGN3B3ckFBQUEAAAAkZjE5NzU2ODYtMDNhNi00ZjA4LTkwODgtOWY3ODcxMGNiODI4AAAAEQ%22%7D%5D&spanId=5364068118937893712&start=1749502418618&end=1749503318618&paused=false)). ``` # only change was updating the model used in each agent from agents.extensions.models.litellm_model import LitellmModel import os model = LitellmModel( model="gpt-3.5-turbo", api_key=os.getenv("OPENAI_API_KEY"), base_url="http://localhost:4000", ) ``` ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: kyle <kyle@verhoog.ca>
Currently, all LLM interactions are sent to LLM Obs as LLM spans; however this does not gracefully handle the case where an LLM request is directed to a proxy server which internally makes the actual LLM call. Currently, for these cases, a customer may end up with nested LLM spans (one span sent from the client and one span sent from the server). This PR updates all LLM Obs integrations to conform to sending client-side requests to a proxy as workflow spans to LLM Obs. Originally, we assumed that a non-default base URL was a good heuristic for identifying requests that were directed to a proxy; however, this assumption does not hold as customers can specify alternative model provider endpoints using the base URL (among potentially other use cases) which does not work with our previous assumption. In order to more accurately detect when a request is being directed to a proxy server, we are putting the onus on users to configure what URLs should be considered proxies. Users can configure this either by setting the **DD_LLMOBS_INSTRUMENTED_PROXY_URLS** environment variable (defined in `ddtrace/settings/_config.py`) or by enabling LLM Obs with the `instrumented_proxy_urls` field defined. We then check whether an LLM interaction is being sent to one of these proxy URLs. If so, we create a workflow span as it is expected that the underlying LLM span is captured in the proxy itself. Otherwise, we create an LLM span which is the current and default behavior. Existing integrations were modified as follows: **Anthropic**: An LLM Obs workflow span will be sent for proxy requests. **Bedrock**: An LLM Obs workflow span will be sent for proxy requests. **CrewAI**: Crew AI [uses LiteLLM under the hood](https://github.com/crewAIInc/crewAI/blob/main/src/crewai/llm.py#L768) to make LLM calls; therefore, these cases should already be handled by the LiteLLM integration. **Gemini**: no changes since this library does not allow users to specify a custom base URL **Langchain**: An LLM Obs workflow span will be sent for proxy requests. **Langgraph**: Langgraph is model agnostic, so there is nothing to change within this integration itself. **LiteLLM**: LLM Obs spans will be sent from the LiteLLM integration as long as there is no downstream Open AI span detected. The span kind will be a workflow if the span is a LiteLLM router operation or proxy request. Otherwise, the span kind is an LLM. **Open AI**: An LLM Obs workflow span will be sent for proxy requests. **Open AI Agents**: The Open AI agents SDK also [uses LiteLLM to allow users to call non-Open AI models](https://openai.github.io/openai-agents-python/models/litellm/); therefore, these cases should already be handled by the LiteLLM integration. **Vertex AI**: no changes since this library does not allow users to specify a custom base URL Every time a span is created by one of the LLM Obs integrations, the `self._get_base_url` method is called to retrieve the base URL for that interaction if it exists. Then, `self._is_proxy_url(base_url)` is called to determine whether to set an item in the context that indicates that the current span represents a proxy request. This will later be used in the integration code to determine the appropriate span kind. With this design, any new integrations simply need to implement the `_get_base_url` method and then use the `PROXY_REQUEST` context item to tag their LLM Obs spans accordingly. # Manual Testing For each integration, I tested three cases: 1. No base URL is set (this should result in an LLM span) 2. The base URL is set to a proxy URL configured with `DD_LLMOBS_INSTRUMENTED_PROXY_URLS` (this should result in a top-level workflow span and perhaps other child spans which may include an LLM span depending on how the proxy server is instrumented) 3. The base URL Is set but not to a proxy URL (this should result in an LLM span) ## Anthropic Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWENSzdO5DDAAAABhBWmRXRU5TekFBRDBJZ1hMelVFc0FBQUEAAAAkZjE5NzU2MTAtZjU5Ny00NWU1LWI1M2UtMmE3OWQ3OWVmNjNlAAAADQ%22%7D%5D&spanId=4488892102153659416&start=1749494753131&end=1749495653131&paused=false)). ``` from anthropic import Anthropic client = Anthropic() message = client.messages.create( max_tokens=1024, messages=[ { "role": "user", "content": "What color is the sky?", } ], model="claude-3-5-sonnet-20240620", ) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWEi0I5prmNQAAABhBWmRXRWkwSUFBQldmbnNkU0FOZ0FBQUEAAAAkZjE5NzU2MTItNTM0OS00NjVkLThlM2QtZDEwYTcxZmMzNzBmAAAABg%22%7D%5D&spanId=15578566980693053138&start=1749494846544&end=1749495746544&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWEaqpbXi0tAAAABhBWmRXRWFxcEFBQnJBcVlPR0JoN0FBQUEAAAAkZjE5NzU2MTEtYWNlZi00MDgyLWJjN2QtZjVkM2MzMmIxNTQ4AAAABA%22%7D%5D&spanId=11578799453780556987&start=1749494826126&end=1749495726126&paused=false)). ``` from anthropic import Anthropic client = Anthropic( base_url="http://localhost:4000", ) message = client.messages.create( max_tokens=1024, messages=[ { "role": "user", "content": "What color is the sky?", } ], model="claude-3.5", ) ``` ## Bedrock I chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGvk-sXTBZQAAABhBWmRXR3ZrLUFBQ25ncEVYTmlhQ0FBQUEAAAAkZjE5NzU2MWEtZjkzZS00OTlhLTg0NTktNzdmN2EyZWM2MzhjAAAAAA%22%7D%5D&spanId=15802925627459513713&start=1749495407424&end=1749496307424&paused=false)). ``` import boto3 import json session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1") brt = session.client( service_name='bedrock-runtime', ) modelId = 'amazon.titan-text-lite-v1' accept = 'application/json' contentType = 'application/json' input_text = "Explain black holes to 8th graders." body = { "inputText": input_text, } body = json.dumps(body) response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType) response_body = json.loads(response.get('body').read()) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGhtvbh20tAAAABhBWmRXR2h0dkFBQmFrTVVqb09Vc0FBQUEAAAAkZjE5NzU2MWEtMjRkMS00ZDY3LTk2ODctMjdjYzk0ZGQ4ZTUwAAAABg%22%7D%5D&spanId=939675140636485153&start=1749495362196&end=1749496262196&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWGBvtsT3BZQAAABhBWmRXR0J2dEFBQURzcm1jV05ZYkFBQUEAAAAkZjE5NzU2MTgtMjdiZS00MzMzLWIwZWEtNmQ3YmIzN2Y2M2JmAAAAAQ%22%7D%5D&spanId=2663441806507313625&start=1749495240767&end=1749496140767&paused=false)) Server Code (I created a proxy server of my own to test this out!) ``` from fastapi import FastAPI, Request import uvicorn import boto3 import json app = FastAPI() @app.post("/model/{model_id}/invoke") async def invoke_model(model_id: str, request: Request): body = await request.json() session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1") brt = session.client( service_name='bedrock-runtime', ) body = json.dumps(body) response = brt.invoke_model(body=body, modelId=request.path_params.get("model_id"), accept=request.headers.get("accept"), contentType=request.headers.get("content-type")) response_body = json.loads(response.get('body').read()) return response_body if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=4000) ``` Client code ``` import boto3 import json session = boto3.Session(profile_name='601427279990_account-admin', region_name="us-east-1") brt = session.client( service_name='bedrock-runtime', endpoint_url="http://0.0.0.0:4000", ) modelId = 'amazon.titan-text-lite-v1' accept = 'application/json' contentType = 'application/json' input_text = "Explain black holes to 8th graders." body = { "inputText": input_text, } body = json.dumps(body) response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType) response_body = json.loads(response.get('body').read()) ``` ## Crew AI To test out these changes with Crew AI, I used the following simple Crew AI flow: ``` from crewai import Agent, Task, Crew, LLM llm = LLM( model="gpt-3.5-turbo", base_url="http://0.0.0.0:4000", # optionally set for testing ) calculator = Agent( role='Mathematical Calculator', goal='Perform accurate mathematical calculations', backstory='You are an expert mathematician who can solve complex calculations with precision.', llm=llm, verbose=True ) calculation_task = Task( description='Calculate the sum of all numbers from 1 to 100', agent=calculator, expected_output='The sum of all numbers from 1 to 100' ) crew = Crew( agents=[calculator], tasks=[calculation_task] ) result = crew.kickoff() ``` When the base URL is not set, I get this [trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWW0GEz1QXyQAAABhBWmRXVzBHRUFBQ0EzWnhKTzZVaUFBQUEAAAAkZjE5NzU2NWItNDNkOS00MWRhLWI4NDEtNzRkZWRhNDY2YjcxAAAABw%22%7D%5D&spanId=491126038724058424&start=1749496923362&end=1749500523362&paused=false) with LLM spans. When the base URL is set to the same URL as in `DD_LLMOBS_INSTRUMENTED_PROXY_URLS`, I get this [trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWXx01ozqQPwAAABhBWmRXWHgwMUFBQVBFcU8yell3eEFBQUEAAAAkZjE5NzU2NWYtMzhhYS00NzlmLWFkMzItNThjOTBmYTllMGZiAAADjw%22%7D%5D&spanId=8869414125491016509&start=1749499885536&end=1749500785536&paused=false) with workflow spans from the client and underlying LLM spans nested within. And when the base URL is set but not to a proxy URL, I get this [trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWXRmv8fYWxgAAABhBWmRXWFJtdkFBRDM5RTNkYXk4WEFBQUEAAAAkZjE5NzU2NWQtMzliNC00NjVhLWIzYjItMWRiNWU2MWQ0MjQ5AAAEug%22%7D%5D&spanId=7586973026220693213&start=1749499746513&end=1749500646513&paused=false), again with just the LLM span as expected. ## Langchain For the request using a proxy URL, I instrumented both the client and the server, except for Open AI. This was to make things simpler as the only integrations emitting spans would be Langchain and LiteLLM (since I am using a LiteLLM proxy server). I also chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWLxYtSWdrcwAAABhBWmRXTHhZdEFBRHFfMnhRT1lVU0FBQUEAAAAkZjE5NzU2MmYtMWM4Mi00ZTJiLWE1OWMtYTk1MjcwMDcwZDVjAAAAAg%22%7D%5D&spanId=9094709675449713583&start=1749496813623&end=1749497713623&paused=false)). ``` from langchain.chat_models import ChatOpenAI from langchain.schema import HumanMessage chat = ChatOpenAI( model = "gpt-3.5-turbo", temperature=0.1, ) messages = [HumanMessage(content="how are you?")] response = chat(messages) print(response) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWM0jM9LLMLQAAABhBWmRXTTBqTUFBQnRLbVpjdEJBcEFBQUEAAAAkZjE5NzU2MzMtNGE4NC00ODNkLWIwZjktMDBlN2MwY2E5Nzg5AAAABQ%22%7D%5D&spanId=16659272787581089973&start=1749497004926&end=1749497904926&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWNE1cSZNrcwAAABhBWmRXTkUxY0FBQkhWU3VFdmgxYUFBQUEAAAAkZjE5NzU2MzQtNGQ1ZC00NzQ5LTkwYWItMWE4MThmZmJkN2VjAAAAAg%22%7D%5D&spanId=4643233463687100041&start=1749497076231&end=1749497976231&paused=false)) ``` from langchain.chat_models import ChatOpenAI from langchain.schema import HumanMessage chat = ChatOpenAI( base_url="http://0.0.0.0:4000", model = "gpt-3.5-turbo", temperature=0.1, ) messages = [HumanMessage(content="how are you?")] response = chat(messages) print(response) ``` ## Langgraph For these tests, I used the following application code: ``` from langgraph.graph import StateGraph, START, END from typing import TypedDict from langchain_openai import ChatOpenAI class GraphState(TypedDict): question: str conclusion: str class Mathematician(): def __init__(self): self.llm = ChatOpenAI(model="gpt-3.5-turbo") def __call__(self, state: GraphState): prompt = f"You are a mathematician that should only answer questions with a number. You are given a question: {state['question']}. Please answer the question." return {"conclusion": self.llm.invoke(prompt)} graph_builder = StateGraph(GraphState) graph_builder.add_node("mathematician", Mathematician()) graph_builder.add_edge(START, "mathematician") graph_builder.add_edge("mathematician", END) graph = graph_builder.compile() conclusion = graph.invoke({ "question": "sum the numbers 1 to 100", })['conclusion'] print(conclusion) ``` I then made changes to the LLM model used to showcase the traces that result in the following cases: 1. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZda-xlM9SEw6wAAABhBWmRhLXhsTUFBQmdnOEVKYVplc0FBQUEAAAAkZjE5NzVhZmItMTk0ZC00YjYwLTgyYTQtNTY2YTNhOGMwZjllAAAABg%22%7D%5D&spanId=8325589879530822573&start=1749577652721&end=1749578552721&paused=false)) 2. Request with base URL specified ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdbBQy75H9rcwAAABhBWmRiQlF5N0FBQlBKcmNfNExUUUFBQUEAAAAkZjE5NzViMDUtMjY2OC00Nzg1LTgzMjctMmIyZjQxZGJjMjVhAAAABg%22%7D%5D&spanId=732132008673596287&start=1749577883240&end=1749578783240&paused=false)) 3. Request with base URL specified and `DD_LLMOBS_INSTRUMENTED_PROXY_URLS` set ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdbBI5bzhpfFAAAABhBWmRiQkk1YkFBRGVkaUVHdWZJY0FBQUEAAAAkZjE5NzViMDQtOGU4Yy00ZTE1LThiMjYtYTg4NDhhYjkyZjBiAAAAAg%22%7D%5D&spanId=2296149025644956944&start=1749577854444&end=1749578754444&paused=false)) ## LiteLLM For these tests, I started a LiteLLM server and sent requests to it by specifying the base URL as `"http://localhost:4000"`. To make the examples more relevant, I disabled the Open AI integration which means all spans were coming from the LiteLLM integration (this should not change the number of spans or the span kinds present in each trace). I also chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV3XswcyZDDAAAABhBWmRWM1hzd0FBQ2NzOWtqSlpaTkFBQUEAAAAkZjE5NzU1ZGQtN2IzMC00NjBiLWE1NGUtYTA2NDM3Y2ZjMDNjAAAAAA%22%7D%5D&spanId=1159980771162140816&start=1749491382194&end=1749492282194&paused=false)). ``` import os import litellm from litellm import completion litellm.api_key = os.environ["OPENAI_API_KEY"] messages = [{ "content": "What color is the sky?","role": "user"}] response = completion(model="gpt-3.5-turbo", messages=messages) print(response) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV6IDmv3BfGAAAABhBWmRWNklEbUFBQTh3N2ZhTzhjaUFBQUEAAAAkZjE5NzU1ZTgtODEzNy00MGQ4LWJkOGYtYzFiZWIxNDI1ZTcxAAAABA%22%7D%5D&spanId=832522128674090551&start=1749492100252&end=1749493000252&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdV6ljxyUkI0QAAABhBWmRWNmxqeEFBRDdkY1dNX2MwVkFBQUEAAAAkZjE5NzU1ZWEtN2JjNy00MGYwLThjY2ItZjhlODAxMWM4YmYyAAAAAw%22%7D%5D&spanId=3962954564061385568&start=1749492226691&end=1749493126691&paused=false)). ``` import os import litellm from litellm import completion litellm.api_key = os.environ["OPENAI_API_KEY"] messages = [{ "content": "What color is the sky?","role": "user"}] response = completion(model="gpt-3.5-turbo", messages=messages, api_base="http://localhost:4000") print(response) ``` ## Open AI I chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server. Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWIxjd6BrmNQAAABhBWmRXSXhqZEFBQkFhSlRiV05DeEFBQUEAAAAkZjE5NzU2MjMtNDU2Mi00MGJhLTlmYjQtNDZjMzczNjUxYmU5AAAABw%22%7D%5D&spanId=5010571712001212560&start=1749495949169&end=1749496849169&paused=false)). ``` import os from openai import OpenAI oai_client = OpenAI( api_key=os.environ.get("OPENAI_API_KEY"), ) completion = oai_client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "user", "content": "testing openai"}, ], ) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWJGwwdmBDDAAAABhBWmRXSkd3d0FBQ01GV0VRc29QUUFBQUEAAAAkZjE5NzU2MjQtN2NiNS00OTBmLWI3NmEtMTZlZmQ3NDYxNzE5AAAABw%22%7D%5D&spanId=9189504309494995843&start=1749496040239&end=1749496940239&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWJbwtwkVfGAAAABhBWmRXSmJ3dEFBQXdsTHhtNU1iMEFBQUEAAAAkZjE5NzU2MjUtZDEzZC00ZDVlLTgwMDItMzk0ODA0NjNlZGY0AAAABA%22%7D%5D&spanId=16803264818418539708&start=1749496121106&end=1749497021106&paused=false)). ``` import os from openai import OpenAI oai_client = OpenAI( api_key=os.environ.get("OPENAI_API_KEY"), base_url="http://0.0.0.0:4000", ) completion = oai_client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "user", "content": "testing openai"}, ], ) ``` ## Open AI Agents Request with default base URL ([trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWfOtO9p0WxgAAABhBWmRXZk90T0FBQ2RUMHJnWk9MY0FBQUEAAAAkZjE5NzU2N2QtMWZlYi00MWMwLThkYmItZjc0ZGViOWQ2MTU5AAAAFw%22%7D%5D&spanId=12382352369827199770&start=1749501871180&end=1749502771180&paused=false)). ``` from agents import Agent, Runner import asyncio math_tutor_agent = Agent( name="Math Tutor", handoff_description="Specialist agent for math questions", instructions="You provide help with math problems. Explain your reasoning at each step and include examples", model="gpt-3.5-turbo", ) triage_agent = Agent( name="Triage Agent", instructions="You determine which agent to use based on the user's homework question", handoffs=[math_tutor_agent], model="gpt-3.5-turbo", ) async def main(): result = await Runner.run(triage_agent, "what is the sum of the numbers between 1 and 100?", max_turns=3) print(result.final_output) if __name__ == "__main__": asyncio.run(main()) ``` Request with base URL specified ([when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWguw4pn2-8QAAABhBWmRXZ3V3NEFBQXhMYVlDQ0czVUFBQUEAAAAkZjE5NzU2ODItZWYxZC00NzExLWFkYmUtNGE4NmNkZDA3NGM3AAAAEQ%22%7D%5D&spanId=15430591163304707830&start=1749502304313&end=1749503204313&paused=false) and [when it is not](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZdWhbr7gAG0tAAAABhBWmRXaGJyN0FBQ283U0ZGN3B3ckFBQUEAAAAkZjE5NzU2ODYtMDNhNi00ZjA4LTkwODgtOWY3ODcxMGNiODI4AAAAEQ%22%7D%5D&spanId=5364068118937893712&start=1749502418618&end=1749503318618&paused=false)). ``` # only change was updating the model used in each agent from agents.extensions.models.litellm_model import LitellmModel import os model = LitellmModel( model="gpt-3.5-turbo", api_key=os.getenv("OPENAI_API_KEY"), base_url="http://localhost:4000", ) ``` ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: kyle <kyle@verhoog.ca>
Currently, all LLM interactions are sent to LLM Obs as LLM spans; however this does not gracefully handle the case where an LLM request is directed to a proxy server which internally makes the actual LLM call. Currently, for these cases, a customer may end up with nested LLM spans (one span sent from the client and one span sent from the server). This PR updates all LLM Obs integrations to conform to sending client-side requests to a proxy as workflow spans to LLM Obs.
Originally, we assumed that a non-default base URL was a good heuristic for identifying requests that were directed to a proxy; however, this assumption does not hold as customers can specify alternative model provider endpoints using the base URL (among potentially other use cases) which does not work with our previous assumption. In order to more accurately detect when a request is being directed to a proxy server, we are putting the onus on users to configure what URLs should be considered proxies. Users can configure this either by setting the DD_LLMOBS_INSTRUMENTED_PROXY_URLS environment variable (defined in
ddtrace/settings/_config.py) or by enabling LLM Obs with theinstrumented_proxy_urlsfield defined. We then check whether an LLM interaction is being sent to one of these proxy URLs. If so, we create a workflow span as it is expected that the underlying LLM span is captured in the proxy itself. Otherwise, we create an LLM span which is the current and default behavior.Existing integrations were modified as follows:
Anthropic: An LLM Obs workflow span will be sent for proxy requests.
Bedrock: An LLM Obs workflow span will be sent for proxy requests.
CrewAI: Crew AI uses LiteLLM under the hood to make LLM calls; therefore, these cases should already be handled by the LiteLLM integration.
Gemini: no changes since this library does not allow users to specify a custom base URL
Langchain: An LLM Obs workflow span will be sent for proxy requests.
Langgraph: Langgraph is model agnostic, so there is nothing to change within this integration itself.
LiteLLM: LLM Obs spans will be sent from the LiteLLM integration as long as there is no downstream Open AI span detected. The span kind will be a workflow if the span is a LiteLLM router operation or proxy request. Otherwise, the span kind is an LLM.
Open AI: An LLM Obs workflow span will be sent for proxy requests.
Open AI Agents: The Open AI agents SDK also uses LiteLLM to allow users to call non-Open AI models; therefore, these cases should already be handled by the LiteLLM integration.
Vertex AI: no changes since this library does not allow users to specify a custom base URL
Every time a span is created by one of the LLM Obs integrations, the
self._get_base_urlmethod is called to retrieve the base URL for that interaction if it exists. Then,self._is_proxy_url(base_url)is called to determine whether to set an item in the context that indicates that the current span represents a proxy request. This will later be used in the integration code to determine the appropriate span kind. With this design, any new integrations simply need to implement the_get_base_urlmethod and then use thePROXY_REQUESTcontext item to tag their LLM Obs spans accordingly.Manual Testing
For each integration, I tested three cases:
DD_LLMOBS_INSTRUMENTED_PROXY_URLS(this should result in a top-level workflow span and perhaps other child spans which may include an LLM span depending on how the proxy server is instrumented)Anthropic
Request with default base URL (trace).
Request with base URL specified (when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set and when it is not).
Bedrock
I chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server.
Request with default base URL (trace).
Request with base URL specified (when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set and when it is not)
Server Code (I created a proxy server of my own to test this out!)
Client code
Crew AI
To test out these changes with Crew AI, I used the following simple Crew AI flow:
When the base URL is not set, I get this trace with LLM spans.
When the base URL is set to the same URL as in
DD_LLMOBS_INSTRUMENTED_PROXY_URLS, I get this trace with workflow spans from the client and underlying LLM spans nested within. And when the base URL is set but not to a proxy URL, I get this trace, again with just the LLM span as expected.Langchain
For the request using a proxy URL, I instrumented both the client and the server, except for Open AI. This was to make things simpler as the only integrations emitting spans would be Langchain and LiteLLM (since I am using a LiteLLM proxy server). I also chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server.
Request with default base URL (trace).
Request with base URL specified (when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set and when it is not)
Langgraph
For these tests, I used the following application code:
I then made changes to the LLM model used to showcase the traces that result in the following cases:
DD_LLMOBS_INSTRUMENTED_PROXY_URLSset (trace)LiteLLM
For these tests, I started a LiteLLM server and sent requests to it by specifying the base URL as
"http://localhost:4000". To make the examples more relevant, I disabled the Open AI integration which means all spans were coming from the LiteLLM integration (this should not change the number of spans or the span kinds present in each trace). I also chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server.Request with default base URL (trace).
Request with base URL specified (when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set and when it is not).
Open AI
I chose to not instrument the server in the case where the base URL is specified but is not set as the proxy URL to avoid sending spans from the server.
Request with default base URL (trace).
Request with base URL specified (when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set and when it is not).
Open AI Agents
Request with default base URL (trace).
Request with base URL specified (when DD_LLMOBS_INSTRUMENTED_PROXY_URLS is set and when it is not).
Checklist
Reviewer Checklist