Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround fix to handle 416 #1071

Open
shanujshekhar opened this issue Feb 6, 2024 · 8 comments
Open

Workaround fix to handle 416 #1071

shanujshekhar opened this issue Feb 6, 2024 · 8 comments
Assignees
Labels

Comments

@shanujshekhar
Copy link

shanujshekhar commented Feb 6, 2024

Environment Used:

  • Media3: 1.1.1 / ExoPlayer: 2.19.1

Questions:

  • The HttpDataSources handle 416 response code in this manner.
    • Does ExoPlayer expect clients to request for byte ranges out of bound implicitly?
    • If so, why does ExoPlayer expect that in first place?
      • It is reasonable to depend on header fields as based on HTTP semantics for 416, it SHOULD not MUST return ContentRange in header, however some endpoints do not respect the semantics.
    • It feels like depending on header fields for 416 is not an ideal way to handle and we are not fixing the root cause as to why even we encounter this state in the first place. Any insights on this would be greatly appreciated.

Background:

  • Our app is getting 416 from CloudFront and CloudFront is not returning ContentRange in header.
    • Due to this, ExoPlayer code to handle 416 is not working and the end user is stuck in 416 state
    • Note: We are working with CloudFront to respect HTTP 416 semantics but would like to understand why ExoPlayer handles 416 the way it is currently
  • We had created an issue (ExoPlayer is throwing 416 error #1032) earlier:
    • Details about our original bug: When the 416 issue happens, we are seeing Request/Headers/Range=bytes=108421975-
      which doesn't contain length, and the position here 108421975 is one byte extra than it's actual total length. We had multiple records with 416 and they all have this same pattern.
    • TL;DR: Based on this comment, it was decided that it was a caching problem and server is misbehaving.
    • We agree with server misbehaving but don't understand what could lead to caching problem since we use CacheWriter and DefaultHttpDataSource provided by ExoPlayer
    • It could be our internal caching wrapper over SimpleCache problem but I see other clients also face a similar caching issue

Important Points to note:

  • Based on multiple issues above, we believe there could (/might) be some caching problem in ExoPlayer and the workaround to handle 416 on ExoPlayer is hiding the actual root cause

Any help on this would be greatly appreciated since our bug is the top download error and users have a really bad experience of being stuck in 416 state

@icbaker
Copy link
Collaborator

icbaker commented Feb 8, 2024

  • Does ExoPlayer expect clients to request for byte ranges out of bound implicitly?

I think this question is not framed the same way I think about it. My mental model:

  • The media3 DataSource.open method is documented as follows:

    The following edge case behaviors apply:

    • If the requested position is within the resource, but the requested length extends beyond the end of the resource, then open will succeed and data from the requested position to the end of the resource will be made available through read.
    • If the requested position is equal to the length of the resource, then open will succeed, and read will immediately return RESULT_END_OF_INPUT.
    • If the requested position is greater than the length of the resource, then open will throw an IOException for which isCausedByPositionOutOfRange will be true.
  • Therefore all implementations of DataSource need to follow these requirements.

In #1032 (comment) you said:

Request/Headers/Range=bytes=108421975-
which doesn't contain length, and the position here 108421975 is one byte extra than it's actual total length. We had multiple records with 416 and they all have this same pattern.

I'm not 100% sure what you mean the 'actual total length' of your content is in this example, whether it's 108421975 or 108421974. If the content length is 108421975 then the second bullet applies (i.e. we're requesting data that starts just after the end of the content). If it's 108421974 then the third bullet applies.


  • It is reasonable to depend on header fields as based on HTTP semantics for 416, it SHOULD not MUST return ContentRange in header, however some endpoints do not respect the semantics.

Looking at the code it's not obvious to me that we do assume the ContentRange header is included. We have cases for both where it's empty/absent and where it's incomplete (second matcher group missing) - in both cases we return C.LENGTH_UNSET from this method:

public static long getDocumentSize(@Nullable String contentRangeHeader) {
if (TextUtils.isEmpty(contentRangeHeader)) {
return C.LENGTH_UNSET;
}
Matcher matcher = CONTENT_RANGE_WITH_SIZE.matcher(contentRangeHeader);
return matcher.matches() ? Long.parseLong(checkNotNull(matcher.group(1))) : C.LENGTH_UNSET;
}


The conclusion in #1032 seems to be that the manifest of this media is malformed, which is what led to these out-of-bounds requests? If that's the case, there's little ExoPlayer can do - the media has resulted in us requesting data that the server can't satisfy.


I'm not really sure what different behaviour you want ExoPlayer to do here - if the ContentRange header on the 416 response is missing the length we can't check if the requested position is equal to the length (case 2 in the docs above), and so we have to throw. Note that you call this code a '416 workaround' but it's specifically for the case when the requested position is equal to the content length. If your content length above is 108421974 then even if you change the server behaviour to include the length in the ContentRange header, we will still throw (because the third bullet above applies).

@shanujshekhar
Copy link
Author

Hey @icbaker, thank you so much for looking into it and really appreciate the detailed response.

To answer your questions:

I'm not 100% sure what you mean the 'actual total length' of your content is in this example, whether it's 108421975 or 108421974. If the content length is 108421975 then the second bullet applies (i.e. we're requesting data that starts just after the end of the content). If it's 108421974 then the third bullet applies.

  • 108421975 is the content length in which case the second bullet will apply as per you mentioned which will return RESULT_END_OF_INPUT
  • We know from the endpoints that consumes ExoPlayer requests that it receives such requests Request/Headers/Range=bytes=108421975-, what we don't know and would really need help is to get your insights on when this could happen on ExoPlayer front?
    • If I can rephrase my question in a different way - How did ExoPlayer thought of handling 416 by specifically adding a check when the requested position is equal to the content length?
    • We would like to backtrack from the symptom (checking the requested position) to get to the root cause of unbounded (out-of-bound) requests and it would really help debug our issue.
      • We think that maybe going backwards from the solution to handle 416, we can find the problem that causes this 416

@shanujshekhar
Copy link
Author

The conclusion in #1032 seems to be that the manifest of this media is malformed, which is what led to these out-of-bounds requests? If that's the case, there's little ExoPlayer can do - the media has resulted in us requesting data that the server can't satisfy.

  • The interesting observation from this bug is that it doesn't happen for all the users, it only happens for few of them that alone impacts the download errors and makes us believe that the user experience is bad
  • If there was a manifest problem, then it would have been a much larger issue (but it only happens for few users) and easily reproducible (but we weren't able to reproduce with the same manifest an end-user that sees 416 gets in any of the titles - not even a single one of them)
  • However, we haven't ruled this one out and we are still looking into manifest generation code but given the evidence, we think it is less likely the manifest problem

@rnapier
Copy link

rnapier commented Feb 12, 2024

A major question is what kind of situation cause exoplayer to make unbounded requests (i.e.bytes xxx-)?

It's clear that it's "when exoplayer does not know the length," but what causes that to happen? Does exoplayer rely on the mpd to determine the length, the sidx atom, or the Content-Range header? If the Content-Range header, is this simply the first one to be returned, or does it continue to update this with each 206 response? (If most were correct, but one were missing or broken, would this confuse it?)

Would an MPD corruption do this? What kind of corruption would cause this but not break playback?

I am certain that in some cases, cloudfront is "hanging-up" on exoplayer without sending any response, leading eventually to a socket timeout. This is a situation we're working on, but would it cause this kind of problem? (My attempts to do this intentionally have not reproduced the problem.)

We believe this occurs most with users who have a bad network connection. It is possible that segment downloads are interrupted. Would this cause exoplayer to lose track of the length? (How does it keep track of this between launches?)

I can reproduce our 416 errors by intentionally replacing the Content-Range length with *, but I haven't found any "normal" way to get into this situation.

@icbaker
Copy link
Collaborator

icbaker commented Feb 12, 2024

How did ExoPlayer thought of handling 416 by specifically adding a check when the requested position is equal to the content length?

This is basically what I answered above: Because the DataSource interface requires that behaviour. This is due to adapting between different interpretations of an 'out of bound' request. The DataSource interface states that it's OK to request content that starts just after the last byte of the actual content, and the result is no bytes are read. The HTTP protocol thinks this is an out of bounds read (same as reading bytes that start e.g. 10 bytes after the end of the actual content) - i..e an error. Therefore a DataSource implementation that maps between the DataSource interface and the HTTP protocol has to distinguish these cases in order to map some HTTP 'out of bounds' errors back to 'read no bytes' - hence the code you've highlighted.


A major question is what kind of situation cause exoplayer to make unbounded requests (i.e. bytes xxx-)?

This can also be phrased as "when does ExoPlayer pass a DataSpec with length == LENGTH_UNSET to DataSource.open?". And this pushes the discussion out of DataSource implementations, up to the layer(s) of the player above.

Does exoplayer rely on the mpd to determine the length, the sidx atom, or the Content-Range header?

I'm afraid I'm not familiar enough with the DASH or caching parts of ExoPlayer to answer this question confidently - possibly @marcbaechinger can help more here.

@icbaker icbaker assigned marcbaechinger and unassigned icbaker Feb 12, 2024
@rnapier
Copy link

rnapier commented Feb 12, 2024

A major question is what kind of situation cause exoplayer to make unbounded requests (i.e. bytes xxx-)?

This can also be phrased as "when does ExoPlayer pass a DataSpec with length == LENGTH_UNSET to DataSource.open?". And this pushes the discussion out of DataSource implementations, up to the layer(s) of the player above.

I agree. This is an equivalent question. My expectation is that this happens somewhere between exoplayer itself and its cache, not inside of DataSource.

@shanujshekhar
Copy link
Author

A major question is what kind of situation cause exoplayer to make unbounded requests (i.e. bytes xxx-)?

This can also be phrased as "when does ExoPlayer pass a DataSpec with length == LENGTH_UNSET to DataSource.open?". And this pushes the discussion out of DataSource implementations, up to the layer(s) of the player above.

@icbaker thanks for looking into it. When you say layers up to the player above, do you mean CacheWriter? Do you think length == LENGTH_UNSET would be possible for CacheWriter?

@shanujshekhar
Copy link
Author

A major question is what kind of situation cause exoplayer to make unbounded requests (i.e. bytes xxx-)?

This can also be phrased as "when does ExoPlayer pass a DataSpec with length == LENGTH_UNSET to DataSource.open?". And this pushes the discussion out of DataSource implementations, up to the layer(s) of the player above.

Does exoplayer rely on the mpd to determine the length, the sidx atom, or the Content-Range header?

I'm afraid I'm not familiar enough with the DASH or caching parts of ExoPlayer to answer this question confidently - possibly @marcbaechinger can help more here.

Pushing up for awareness, @marcbaechinger would you be able to help us here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants