Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: observability integration without using agenta hosted apps #1567

Conversation

aybruhm
Copy link
Member

@aybruhm aybruhm commented Apr 26, 2024

Description

This PR proposes integrating observability into LLM applications with Agenta, even if the applications are not hosted on Agenta.

Related Issue

Closes cloud_#338
Relative commons_#43

Additional Information

A pre-alpha version of the SDK has been deployed to pypi for testing. You can check the obs-app on cloud.beta and review the observability traces created with the example in this PR.

Copy link

vercel bot commented Apr 26, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
agenta ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 30, 2024 7:37pm

@aybruhm aybruhm marked this pull request as ready for review April 26, 2024 16:50
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. CLI draft enhancement New feature or request example SDK labels Apr 26, 2024
@aybruhm aybruhm requested a review from mmabrouk April 26, 2024 16:51
Copy link
Member

@mmabrouk mmabrouk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @aybruhm
I did not finish the review, I am a bit too tired of that :) But I thought I'd leave some of the comments I made.

examples/app_with_observability/app_async.py Outdated Show resolved Hide resolved
examples/app_with_observability/app_async.py Show resolved Hide resolved
agenta-cli/agenta/sdk/agenta_decorator.py Outdated Show resolved Hide resolved
Copy link
Member

@mmabrouk mmabrouk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aybruhm I thought a bit about the problem, unfortunately I have to leave early today (in a bit) so I am just going to write my main thoughts. I hope these make sense. I'd love to get your feedback on this:

There are three workflows for instrumentation in agenta:

  1. Using an application hosted in agenta, we take care of adding the code snippets
  2. Using an application in shell
  3. Using an application somewhere else (hosted somewhere else)

We have designed the SDK for instrumentation for the first case. To instrument stuff in the first case, we use the following:

  • ag.init → The only reason for this is to set the variables api_key, app_id,, and initialize the stuff for the config.
  • llm_tracing() → Initializes the tracing object
  • @ag.entrypoint → Calls llm_tracing() to initialize the tracing singleton, then start the parent_span. In addition it does another million things for using the app: adding the argument to fastapi, the bash entrypoints…
  • @ag.span → Saves the span in the current tracing singleton

The question I am asking are:

  • How can we simplify instrumentation to the max. Remove anything that is not required.
    • ag.init()
      • Not really required for the instrumentation per se. We can instead use the environment variables for initializing the tracing object in case the agenta singleton does not exist (or the input variables)
    • llm_tracing We can keep this, as a way to use tracing with ag.init(), but for the pure observability use case, we can simply use the Tracing constructor and make sure that we use the environment variables if nothing is provided
    • @ag.entrypoint → I think we should have a different decorator that only implements initializing the tracing object, and calling start_parent_span. Basically has only one responsability. (ag.trace )
    • @ag.span Nothing changes there

So the way the final implementation would be:

# set the env vars
os.environ["AGENTA_API_KEY"] = xxx
os.environ["AGENTA_APP_ID"] = xxx

@ag.span(type="LLM")
async def llm_call(prompt):
    chat_completion = await client.chat.completions.create(
        model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}]
    )
    tracing.set_span_attribute(
        "model_config", {"model": "gpt-3.5-turbo", "temperature": ag.config.temperature}
    )  # translate to {"model_config": {"model": "gpt-3.5-turbo", "temperature": 0.2}}
    tokens_usage = chat_completion.usage.dict()
    return {
        "cost": ag.calculate_token_usage("gpt-3.5-turbo", tokens_usage),
        "message": chat_completion.choices[0].message.content,
        "usage": tokens_usage,
    }

@ag.span
async def generate(country: str, gender: str):
    """
    Generate a baby name based on the given country and gender.

    Args:
        country (str): The country to generate the name from.
        gender (str): The gender of the baby.
    """

    prompt = ag.config.prompt_template.format(country=country, gender=gender)
    response = await llm_call(prompt=prompt)
    return {
        "message": response["message"],
        "usage": response["usage"],
        "cost": response["cost"],
    }

In case someone has two spans in the same code using two different apps

# set the env vars
os.environ["AGENTA_API_KEY"] = xxx

@ag.span(type="LLM")
async def llm_call(prompt):
    chat_completion = await client.chat.completions.create(
        model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}]
    )
    tracing.set_span_attribute(
        "model_config", {"model": "gpt-3.5-turbo", "temperature": ag.config.temperature}
    )  # translate to {"model_config": {"model": "gpt-3.5-turbo", "temperature": 0.2}}
    tokens_usage = chat_completion.usage.dict()
    return {
        "cost": ag.calculate_token_usage("gpt-3.5-turbo", tokens_usage),
        "message": chat_completion.choices[0].message.content,
        "usage": tokens_usage,
    }

@ag.trace(app_id="")
async def generate(country: str, gender: str):
    """
    Generate a baby name based on the given country and gender.

    Args:
        country (str): The country to generate the name from.
        gender (str): The gender of the baby.
    """

    prompt = ag.config.prompt_template.format(country=country, gender=gender)
    response = await llm_call(prompt=prompt)
    return {
        "message": response["message"],
        "usage": response["usage"],
        "cost": response["cost"],
    }

@ag.trace(app_id="")
async def somepotherlogic(country: str, gender: str):
    prompt = ag.config.prompt_template.format(country=country, gender=gender)
    response = await llm_call(prompt=prompt)
    return {
        "message": response["message"],
        "usage": response["usage"],
        "cost": response["cost"],
    }

The issue in the second example is that we have currently one singleton which we are using. So maybe this use case is not feasible now?

Love to hear your thoughts

@aybruhm
Copy link
Member Author

aybruhm commented May 1, 2024

The question I am asking are:

  • How can we simplify instrumentation to the max. Remove anything that is not required.
    • ag.init()
      • Not really required for the instrumentation per se. We can instead use the environment variables for initializing the tracing object in case the agenta singleton does not exist (or the input variables)
    • llm_tracing We can keep this, as a way to use tracing with ag.init(), but for the pure observability use case, we can simply use the Tracing constructor and make sure that we use the environment variables if nothing is provided
    • @ag.entrypoint → I think we should have a different decorator that only implements initializing the tracing object, and calling start_parent_span. Basically has only one responsability. (ag.trace )
    • @ag.span Nothing changes there

Thanks for the review, @mmabrouk. I thought about this last night, and I created a POC to test how the implementation would work and it did.

Observability would still work if we:

  • remove ag.init and use env vars instead
  • construct the Tracing class directly and make use of the env vars

Regarding having ag.trace, here's how it would be:

@ag.trace # ag.entrypoint will be added after ag.trace in the case where the user wants to deploy their LLM app to agenta
async def generate(country: str, gender: str):
    prompt = ag.config.prompt_template.format(country=country, gender=gender)
    response = await llm_call(prompt=prompt)
    return {
        "message": response["message"],
        "usage": response["usage"],
        "cost": response["cost"],
    }

@aybruhm
Copy link
Member Author

aybruhm commented May 1, 2024

In case someone has two spans in the same code using two different apps

# set the env vars
os.environ["AGENTA_API_KEY"] = xxx

@ag.span(type="LLM")
async def llm_call(prompt):
    chat_completion = await client.chat.completions.create(
        model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}]
    )
    tracing.set_span_attribute(
        "model_config", {"model": "gpt-3.5-turbo", "temperature": ag.config.temperature}
    )  # translate to {"model_config": {"model": "gpt-3.5-turbo", "temperature": 0.2}}
    tokens_usage = chat_completion.usage.dict()
    return {
        "cost": ag.calculate_token_usage("gpt-3.5-turbo", tokens_usage),
        "message": chat_completion.choices[0].message.content,
        "usage": tokens_usage,
    }

@ag.span(app_id="")
async def generate(country: str, gender: str):
    """
    Generate a baby name based on the given country and gender.

    Args:
        country (str): The country to generate the name from.
        gender (str): The gender of the baby.
    """

    prompt = ag.config.prompt_template.format(country=country, gender=gender)
    response = await llm_call(prompt=prompt)
    return {
        "message": response["message"],
        "usage": response["usage"],
        "cost": response["cost"],
    }

@ag.span(app_id="")
async def somepotherlogic(country: str, gender: str):
    prompt = ag.config.prompt_template.format(country=country, gender=gender)
    response = await llm_call(prompt=prompt)
    return {
        "message": response["message"],
        "usage": response["usage"],
        "cost": response["cost"],
    }

The issue in the second example is that we have currently one singleton which we are using. So maybe this use case is not feasible now?

Love to hear your thoughts

The use case is. We just need to revise our implementation of Tracing to support multiple singletons if we want to have different spans for different applications.

The behaviour of the Tracing would not change, only thing we would need to do is modify the Tracing constructor to support multiple singletons, and also introduce a mechanism that would switch between different singletons based on the provided app_id.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels May 2, 2024
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label May 30, 2024
@mmabrouk mmabrouk merged commit b200bb7 into main May 30, 2024
10 checks passed
@mmabrouk mmabrouk deleted the 1560-age-118-observability-integration-without-using-agenta-hosted-apps branch May 30, 2024 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLI enhancement New feature or request example lgtm This PR has been approved by a maintainer observability SDK size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants