Skip to content

Fix the intermediate steps span logic to work better with nested coroutines and tasks#285

Merged
rapids-bot[bot] merged 10 commits intoNVIDIA:developfrom
mdemoret-nv:mdd_fix-intermediate-steps2
May 16, 2025
Merged

Fix the intermediate steps span logic to work better with nested coroutines and tasks#285
rapids-bot[bot] merged 10 commits intoNVIDIA:developfrom
mdemoret-nv:mdd_fix-intermediate-steps2

Conversation

@mdemoret-nv
Copy link
Collaborator

Description

This PR updates the logic to handle if the END event was called from a Task. This situation breaks the previous logic because tasks create a copy of the context preventing contextvar.set() from working.

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Signed-off-by: Michael Demoret <mdemoret@nvidia.com>
Signed-off-by: Michael Demoret <mdemoret@nvidia.com>
Signed-off-by: Michael Demoret <mdemoret@nvidia.com>
…iate-steps

Signed-off-by: Michael Demoret <mdemoret@nvidia.com>
… improve active span ID handling

- Removed the use of context variables for tracking the current open step ID.
- Updated logic to directly manage active span IDs within the context state.
- Enhanced test cases to reflect changes in context management and ensure proper functionality.

Signed-off-by: Michael Demoret <mdemoret@nvidia.com>
Signed-off-by: Michael Demoret <mdemoret@nvidia.com>
Signed-off-by: Michael Demoret <mdemoret@nvidia.com>
@mdemoret-nv mdemoret-nv added bug Something isn't working non-breaking Non-breaking change labels May 15, 2025
Signed-off-by: Michael Demoret <mdemoret@nvidia.com>
@mdemoret-nv mdemoret-nv requested a review from Copilot May 16, 2025 01:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the asynchronous handling of alert triage workflows by fixing issues related to context propagation with nested coroutines and tasks. It converts the assistant function to an async method with proper await of LLM calls, adds type annotations for clearer typing, and integrates profiling by tracking execution of the alert process function.

Comments suppressed due to low confidence (2)

examples/alert_triage_agent/src/aiq_alert_triage_agent/register.py:92

  • Converting 'ata_assistant' to an async function means that all its call sites must now await its result; please verify that its consumers have been updated accordingly.
async def ata_assistant(state: MessagesState):

examples/alert_triage_agent/src/aiq_alert_triage_agent/register.py:119

  • The addition of the '@track_function()' decorator aids profiling, but please ensure that any performance overhead it introduces has been evaluated for production usage.
@track_function()

…iate-steps2

Signed-off-by: Michael Demoret <mdemoret@nvidia.com>
@mdemoret-nv mdemoret-nv force-pushed the mdd_fix-intermediate-steps2 branch from df92c22 to bfa217f Compare May 16, 2025 01:16
Signed-off-by: Michael Demoret <mdemoret@nvidia.com>
@AnuradhaKaruppiah AnuradhaKaruppiah requested a review from Copilot May 16, 2025 01:57
@AnuradhaKaruppiah
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 2109620 into NVIDIA:develop May 16, 2025
12 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Fixes the intermediate span-tracking logic to correctly restore context when END events come from nested coroutines or separate tasks, and updates tests to validate via a subscribed output list using the new IntermediateStepType enum.

  • Manager now records both previous and active stacks in OpenStep and restores or mutates them depending on coroutine vs task context.
  • Tests subscribe to manager outputs in output_steps and use IntermediateStepType instead of the old event_state.
  • Example workflow updated to type-annotate the LLM, switch assistant helper to async, use ainvoke, and add a track_function decorator.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/aiq/builder/test_intermediate_step_manager.py Replaced context var stream with output_steps fixture and updated payload to use IntermediateStepType.
src/aiq/builder/intermediate_step_manager.py Added prev_stack/active_stack to OpenStep, modified START/END handling to fully restore span stack across tasks.
examples/alert_triage_agent/src/aiq_alert_triage_agent/register.py Annotated llm as BaseChatModel, made assistant function async, switched to ainvoke, and added @track_function.
Comments suppressed due to low confidence (1)

src/aiq/builder/intermediate_step_manager.py:65

  • The manager still branches on payload.event_state, but tests now only set payload.event_type. Update these checks to use payload.event_type == IntermediateStepType.LLM_START (and similar) or ensure event_state is populated from event_type.
if (payload.event_state == IntermediateStepState.START):

Comment on lines +120 to +124
while (curr_stack[-1] != parent_step_id):
curr_stack.pop()
pop_count += 1

if (pop_count != 1):
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop can raise an IndexError if parent_step_id is not found. Add a guard like while curr_stack and curr_stack[-1] != parent_step_id: to avoid popping past an empty list.

Suggested change
while (curr_stack[-1] != parent_step_id):
curr_stack.pop()
pop_count += 1
if (pop_count != 1):
while curr_stack and curr_stack[-1] != parent_step_id:
curr_stack.pop()
pop_count += 1
if not curr_stack:
logger.error(
"Parent step id %s not found in the current stack. "
"This indicates a potential inconsistency.",
parent_step_id)
elif pop_count != 1:

Copilot uses AI. Check for mistakes.

@pytest.fixture(name="mgr")
def mgr_fixture(ctx_state: AIQContextState):
def mgr_fixture(ctx_state: AIQContextState, output_steps: list[IntermediateStepPayload]):
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The output_steps fixture actually collects IntermediateStep objects (not IntermediateStepPayload), so the annotation can be misleading. Consider changing it to list[IntermediateStep] for clarity.

Suggested change
def mgr_fixture(ctx_state: AIQContextState, output_steps: list[IntermediateStepPayload]):
def mgr_fixture(ctx_state: AIQContextState, output_steps: list[IntermediateStep]):

Copilot uses AI. Check for mistakes.
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Jun 3, 2025
…utines and tasks (NVIDIA#285)

This PR updates the logic to handle if the END event was called from a Task. This situation breaks the previous logic because tasks create a copy of the context preventing `contextvar.set()` from working.

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Michael Demoret (https://github.com/mdemoret-nv)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#285
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Jun 3, 2025
…utines and tasks (NVIDIA#285)

This PR updates the logic to handle if the END event was called from a Task. This situation breaks the previous logic because tasks create a copy of the context preventing `contextvar.set()` from working.

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Michael Demoret (https://github.com/mdemoret-nv)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#285
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
AnuradhaKaruppiah pushed a commit to AnuradhaKaruppiah/oss-agentiq that referenced this pull request Aug 4, 2025
…utines and tasks (NVIDIA#285)

This PR updates the logic to handle if the END event was called from a Task. This situation breaks the previous logic because tasks create a copy of the context preventing `contextvar.set()` from working.

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Michael Demoret (https://github.com/mdemoret-nv)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#285
scheckerNV pushed a commit to scheckerNV/aiq-factory-reset that referenced this pull request Aug 22, 2025
…utines and tasks (NVIDIA#285)

This PR updates the logic to handle if the END event was called from a Task. This situation breaks the previous logic because tasks create a copy of the context preventing `contextvar.set()` from working.

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Michael Demoret (https://github.com/mdemoret-nv)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#285
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants