fix(max): Handle async query and recursion limit errors by Twixes · Pull Request #30233 · PostHog/posthog

Twixes · 2025-03-20T16:03:07Z

Problem

Async query errors were handled correctly in tests, but not in prod. I've unified the logic, so that it applies the same now regardless of the query executing sync or async (prod is async, tests not).
We also didn't have a human-friendly error on the recursion limit being hit.

Changes

Both classes of errors are now fixed.

How did you test this code?

Recursion limit got a test. The query execution errors were actually tested already, but the difference is that tests can't use async query execution – so I've ensured those tests continue to pass, and the fix is actually focused on unifying the logic of async vs sync running.

greptile-apps

PR Summary

This PR improves error handling for async queries and recursion limits in the PostHog AI assistant, focusing on user experience and robustness.

Added user-friendly error message when recursion limit (48 steps) is reached in assistant.py, preventing crashes
Implemented unified error handling between sync/async query execution with exponential backoff polling in query_executor/nodes.py
Added timeout handling for long-running queries with a 726s maximum wait time
Added comprehensive test coverage for both sync and async error scenarios in test_assistant.py
Improved conversation state management with proper locking and state saving during cancellation

_{5 file(s) reviewed, 2 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

ee/hogai/query_executor/nodes.py

ee/hogai/test/test_assistant.py

skoob13 · 2025-03-20T16:46:44Z

ee/hogai/query_executor/nodes.py

+                    if error_message := query_status.get("error_message"):
+                        raise APIException(error_message)
+                    raise Exception("Query failed")
+                results_response = query_status["results"]


Why doesn't the node return the exception to the tool call? I think it makes more sense than appending a failure message to the conversation. Failure messages are not associated with a specific tool call, but tool calls are important for orchestration.

In the SQL case, we want to continue the loop to correct runtime errors, but the current approach stops the generation.

What do you mean by returning the exception to the tool call?

Instead of raising the exception, why don't we return the exception to the tool that initiated the insight generation?

Coming back to this post-offsite:
Hmm, it seems much more versatile to let root handle these errors, as they can be arbitrary – while some are going to be basic and fixable type mismatches, especially in SQL, other HogQL or ClickHouse errors are not that solvable and would still have to bubble up to root after a couple retries. Hence going for a general approach here.

I think it's still worth providing HogQL/ClickHouse errors. HogQL errors are usually about syntax (not very descriptive, though) but have information about unsupported functions (since HogQL is only a subset of ClickHouse SQL). ClickHouse errors are trickier as they can be generic (memory or timeout), but they're sometimes helpful (type mismatch). Let's ship this PR as is, and we can revisit better exception handling later.

This is a PAIN to test because Celery doesn't work in our tests, hence tests behave differently from prod.

Twixes requested a review from skoob13 March 20, 2025 16:03

greptile-apps bot reviewed Mar 20, 2025

View reviewed changes

ee/hogai/query_executor/nodes.py Outdated Show resolved Hide resolved

ee/hogai/test/test_assistant.py Outdated Show resolved Hide resolved

Twixes mentioned this pull request Mar 20, 2025

feat(max): Add SQL generation skill #29544

Merged

skoob13 reviewed Mar 20, 2025

View reviewed changes

Twixes requested a review from skoob13 March 31, 2025 21:48

Twixes added 3 commits April 1, 2025 00:03

Friendly handling of recursion error

17e8893

Fix handling of errors in async query responses

9e9feff

This is a PAIN to test because Celery doesn't work in our tests, hence tests behave differently from prod.

Address Greptile comments

d1ab285

Twixes force-pushed the max-improve-err-handling branch from 2f1f9ef to d1ab285 Compare March 31, 2025 22:04

skoob13 approved these changes Apr 1, 2025

View reviewed changes

Format root/nodes.py

6d6ffc8

Twixes enabled auto-merge (squash) April 1, 2025 17:45

Twixes merged commit 1de2c52 into master Apr 1, 2025
94 of 95 checks passed

Twixes deleted the max-improve-err-handling branch April 1, 2025 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(max): Handle async query and recursion limit errors#30233

fix(max): Handle async query and recursion limit errors#30233
Twixes merged 4 commits intomasterfrom
max-improve-err-handling

Twixes commented Mar 20, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

skoob13 Mar 20, 2025

Uh oh!

skoob13 Mar 20, 2025

Uh oh!

Twixes Mar 26, 2025

Uh oh!

skoob13 Mar 27, 2025

Uh oh!

Twixes Mar 31, 2025

Uh oh!

skoob13 Apr 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Twixes commented Mar 20, 2025

Problem

Changes

How did you test this code?

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

Uh oh!

Uh oh!

Uh oh!

skoob13 Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

skoob13 Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

Twixes Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

skoob13 Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

Twixes Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

skoob13 Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants