New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exception to broker response when not all segments are available (partial response) #7264
Comments
There is an indication that the response is partial. Why add an exception? |
We don't really have that for case 1 and 2. For case 3, we should not expect normal users to check |
If partial response flag is not set in all cases, then that is a bug. Let us definitely fix it. I think applications check for exceptions and assume that there is no result. So, exceptions when there are partial results may not be the right answer. |
We don't have that flag in OSS. IIRC the logic for that flag is check if We don't model exceptions this way. Having exceptions does not indicate no result. It actually indicates the results are not complete (e.g. one segment throws exception, others work fine). When broker failed to route the query to a server, it also adds an exception. |
+1 I also suggest we change the default behavior to disallow partial response by default, and provide the flag to turn it on. |
I think the fundamental concern here is that partial results might be returned without the users taking cognisance of them. While having a flag like So this entails three changes:
|
@mcvsubbu @Jackie-Jiang do we have an agreement on this requirement? If yes, I can take it up. |
Partial responses and approximate results would be problematic in OLTP databases, but they make full sense in an analytical database such as Pinot and should be fully supported as first class citizens without throwing error or exception conditions. As an analytical database, there is value in providing partial responses or approximate results when full responses or accurate results are either missing or will take too long to generate provided that we can quantify how partial or approximate the results are. We already do a lot of this with approximation functions such as DISTINCTCOUNTHLL, high-cardinality group by estimates, so there shouldn't be anything wrong with providing partial responses as long as it can be quantified as to how partial the responses are. These should be treated different from exceptions, errors, and warnings. |
@amrishlal Partial and approximation are different, where approximation usually gives bounded error rate. The cases we want to notify the user here is when some segments are not available (e.g. all replica down), or some servers fail to respond (e.g. timeout). We will still return the response, but put some message in the query response so that user knows the result is not complete. |
I can easily see a case where a user may specify a timeout for a long running query as a way to get partial results early without waiting for those one or two servers that may take extra long time to process the results. In cases like these, one could set a warning in query results and try to quantify how accurate the results might be (i.e result is based on responses of 9 out of 11 segments, etc).
I think it's worth distinguishing between |
Purely from the user's perspective, this certainly looks like a |
I think the key here is to inform users and give them the options to choose the desirable behavior. For your example, users can choose DISTINCTCOUNT vs DISTCINTCOUNTHLL for the exact result or approximation. Similarly, users can also choose not to take partial results. For analytical databases, there are use cases that demand exact results, cannot accept partial results. Just to give one example, there are use cases at Uber that pinot results are used in the billing processing and accounting. |
@Jackie-Jiang shall we start with adding a partial response flag here? |
I'd suggest adding different errors for each scenario described above to the |
Not all the partial response is useless or harmful. There is another scenario that falls into partial response, which is that |
@Jackie-Jiang Yes in addition to the specific error codes that you're suggesting to add, do you think an explicit |
We need to think about how to set this |
Can we add it as a query option (allowPartial)? |
@yupeng9 Current pinot behavior will always return the available results, and set exceptions for unexpected scenarios such as unable to reach to a server. For now I would suggest following it, and the client can decide whether to retry based on the exceptions received. |
I have taken up the exceptions part @Jackie-Jiang. Let us think about the |
I have been a bit busy throughout last week, will raise a PR for this soon. |
There are 3 scenarios when the broker response might be partial:
numUnavailableSegments
inBaseBrokerRequestHandler
line 718numSegmentsQueried > numSegmentsAcquired
inServerQueryExecutorV1Impl
line 170numServersQueried > numServersResponded
inSingleConnectionBrokerRequestHandler
line 128Currently these partial responses are only tracked by metrics/query stats, but not modeled as an exception. We should add an exception to the broker response to inform the users that the response might be partial
The text was updated successfully, but these errors were encountered: