Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Potential Chrome CORS problem with local output #9773

Closed
bryevdv opened this issue Mar 12, 2020 · 29 comments · Fixed by #9777
Closed

[BUG] Potential Chrome CORS problem with local output #9773

bryevdv opened this issue Mar 12, 2020 · 29 comments · Fixed by #9777

Comments

@bryevdv
Copy link
Member

bryevdv commented Mar 12, 2020

cc @jakirkham @canavandl @martey

Under certain circumstances, loading local HTML Bokeh output can result in the error:

Access to script at ‘https://cdn.bokeh.org/bokeh/release/bokeh-2.0.0.min.js’ from origin ‘null’ has been blocked by CORS policy: No ‘Access-Control-Allow-Origin’ header is present on the requested resource.

The circumstance is this: BokehJS was previously first loaded from CDN without SRI/crossorigin attributes on the script tag, and then subsequently later, was loaded with them present. Evidently Chrome uses the cached headers and decides there is a mismatch. Steps to repro:

  • clear cached images and files in chrome
  • open a private window
  • navigate to iris example in docs gallery
  • run iris.py and open local iris.html file in a browser

When can this happen? For one, CDN loads in autoload_static were not fixed up in the last release, and still use "bare" script tags. So I think anyone who views our garllery first, then output_file output, will run in to this.

  • A force-reload will fix and allow things to work (but a bad UX)
  • Have not yet tested to see if other browsers are affected (but I don't think so)

Certainly, we should fix up autoload_static ASAP so that gallery viewing does not instigate this for users. Should be easy and can go in a 2.0.1 next week.

But user generated output could also instigate this for viewers if the users use "bare" script tags in their Flask app or whatever. We don't really have control over that.

Is there another fix or mitigation we can apply to make sure it always works, regardless? Would adding an crossorigin="anonymous" even to script tags without SRI hashes fix this? (I will try to test out later).

Thoughts welcome.

Some references:

@ghost
Copy link

ghost commented Mar 12, 2020

Results of testing across browsers:

  • Error prevents loading in Chrome (used 80.0.3987.132) and Brave (used 1.4.96)
  • No error present in Firefox (73.0.1) and Safari (used 13.0.5)

@bryevdv
Copy link
Member Author

bryevdv commented Mar 12, 2020

@hyles-lineata also reported on zulip that in the steps above, viewing with crossorigin="anonymous" (but no integrity hash) then viewing again with the hash does work. So I think my proposal is this:

  • update autoload_static to apply crossorigin/SRI (should be done anyway)
  • update all docs that demo manual script tags (e.g. to put in Flask template) to include a crossorigin attr as well a prominent note about it

Thoughts?

@bryevdv
Copy link
Member Author

bryevdv commented Mar 17, 2020

FYI have re-published the 2.0.0 docs manually using the current master in order to mitigate this issue until a new release can be made (hopefully next week at latest)

gpenzias pushed a commit to gpenzias/flask-framework that referenced this issue Mar 30, 2020
… generated by matplotlib, since there seems to be a bug with bokeh: bokeh/bokeh#9773.
@hmanuel1
Copy link

hmanuel1 commented Apr 10, 2020

Hi,
After upgrading to Bokeh 2.0.1, I'm having a similar issue with Chrome:

Access to script at 'https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.0.1.min.js' from origin 'null' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.

This error started showing up as soon as I ran the slider example from the Bokeh guide:

import numpy as np

from bokeh.layouts import column, row
from bokeh.models import CustomJS, Slider
from bokeh.plotting import ColumnDataSource, figure, output_file, show

x = np.linspace(0, 10, 500)
y = np.sin(x)

source = ColumnDataSource(data=dict(x=x, y=y))

plot = figure(y_range=(-10, 10), plot_width=400, plot_height=400)

plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)

amp_slider = Slider(start=0.1, end=10, value=1, step=.1, title="Amplitude")
freq_slider = Slider(start=0.1, end=10, value=1, step=.1, title="Frequency")
phase_slider = Slider(start=0, end=6.4, value=0, step=.1, title="Phase")
offset_slider = Slider(start=-5, end=5, value=0, step=.1, title="Offset")

callback = CustomJS(args=dict(source=source, amp=amp_slider, freq=freq_slider, phase=phase_slider, offset=offset_slider),
code="""
const data = source.data;
const A = amp.value;
const k = freq.value;
const phi = phase.value;
const B = offset.value;
const x = data['x']
const y = data['y']
for (var i = 0; i < x.length; i++) {
y[i] = B + AMath.sin(kx[i]+phi);
}
source.change.emit();
""")

amp_slider.js_on_change('value', callback)
freq_slider.js_on_change('value', callback)
phase_slider.js_on_change('value', callback)
offset_slider.js_on_change('value', callback)

layout = row(
plot,
column(amp_slider, freq_slider, phase_slider, offset_slider),
)

output_file("slider.html", title="slider.py example")

show(layout)

if I remove the sliders I don't see this issue in Chrome:

import numpy as np

from bokeh.layouts import column, row
from bokeh.models import CustomJS, Slider
from bokeh.plotting import ColumnDataSource, figure, output_file, show

x = np.linspace(0, 10, 500)
y = np.sin(x)

source = ColumnDataSource(data=dict(x=x, y=y))

plot = figure(y_range=(-10, 10), plot_width=400, plot_height=400)

plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)

layout = row(plot)

output_file("no_sliders.html", title="no_slider.py example")

show(layout)

I've been using Bokeh for a couple of weeks now. Thank you for the great work!

@bryevdv
Copy link
Member Author

bryevdv commented Apr 11, 2020

@hmanuel1 This means at some point your browser has first loaded BokehJS resources from CDN without the SRI hashes specified, then later tried to load them with the hashes specified. I don't know where or how this happened for you. AFAIK all the docs have now been updated to always specify the hashes, but it's possible there might be something in the docs (or an example) that was missed and that generated output without the hashes that you then viewed. There's nothing for anyone to do except for you to force-reload the page (once). [1] Otherwise if you ever happed to notice a page that loads BokehJS >= 2.0.1 without specifying the integrity attribute on the script tag, please point it out to use so we can change it.

[1] Many people think Chrome's SRI policy wrt cached scripts in this instance is bad/incorrect. Perhaps Chrome will change some day.

@hmanuel1
Copy link

hmanuel1 commented Apr 11, 2020

That was it. I cleared Chrome cached browsing data and it's working now. Thank you!

@dhgoratela
Copy link

dhgoratela commented Apr 30, 2020

That was it. I cleared Chrome cached browsing data and it's working now. Thank you!

Same for me..
Phew...!

@bryevdv
Copy link
Member Author

bryevdv commented Apr 30, 2020

AFAIK there should not be anything on the docs site that is rendering without SRI hashes any longer, so it should not be possible to trigger this circumstance by first perusing the docs, then using Bokeh locally (which was an issue for 2.0). But if this does crop up and you know how/where you got "exposed" to non-SRI Bokeh script loads, please let us know here so we can see about updating things.

@J-Kolhs
Copy link

J-Kolhs commented May 15, 2020

Thanks so much! I had the issue after deleting the integrity & crossorigin attributes and running a script using bokeh and cdn. Clearing cache fixes it for me. I can reproduce the bug by running the script. As OP explained, the bug will occur when you try to plot a normal bokeh graph without using cdn.

Best fix for me has been to add crossorigin attribute:
crossorigin="anonymous"

@Jonamaita
Copy link

Jonamaita commented Jun 1, 2020

That was it. I cleared Chrome cached browsing data and it's working now. Thank you!

Yes, that worked for me.
Perfect!!

@bryevdv
Copy link
Member Author

bryevdv commented Jun 16, 2020

This still seems to be happening sometimes with 2.1 even though AFAIK there is nothing in the docs published w/o hashes. I'm not sure if there is some other easy route for people to see URLs w/o the hashes (that I don't know about) that triggers this, or of there is some entirely other way mechanism to trigger (that I don't know about). I am also not really sure what to do about it but the current situation of relying on random users to know to randomly reset cache is not tenable. Some possible options off the top of my head:

  • remove SRI hash support altogether
  • use different URLs when retrieving scripts with the hashes, e.g cdn.bokeh.org/release/sri/bokeh.min.js, so that any access without the hashes do not cause this problem
  • set CDN cache headers to no no-store (not sure it would work also maybe financial implications)

Really need some help with ideas, suggestions here cc @jakirkham @martey @p-himik

@martey
Copy link

martey commented Jun 17, 2020

I noticed that if a Bokeh JS file is requested with a non-HTTPS Origin header, no Access-Control-Allow-Origin header is returned on the response.

Doesn't include access control headers:

curl -IL https://cdn.bokeh.org/bokeh/release/bokeh-2.1.0.min.js -H "origin: http://docs.bokeh.org"

Does include access control headers:

curl -IL https://cdn.bokeh.org/bokeh/release/bokeh-2.1.0.min.js -H "origin: https://docs.bokeh.org"

Is it possible that the S3 bucket's CORS rules are configured to only return access control headers for HTTPS origins? Could you post the current CORS configuration?

@bryevdv
Copy link
Member Author

bryevdv commented Jun 17, 2020

@martey that's a surprising an interesting observation that i did not know. The CORS config on the s3 bucket is here:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>*</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
</CORSRule>
</CORSConfiguration>

But the bucket is behind a Cloudfront distribution with has a Redirect HTTP to HTTPS Behavior configured. Also the following headers are whitelisted in the CDN config:

Access-Control-Request-Headers
Access-Control-Request-Method
Origin

@bryevdv
Copy link
Member Author

bryevdv commented Sep 28, 2020

@martey do you have any input or suggestions based on the configuration above?

@bryevdv
Copy link
Member Author

bryevdv commented Sep 28, 2020

Otherwise, I think I am tempted to look at

  • use different URLs when retrieving scripts with the hashes, e.g cdn.bokeh.org/release/sri/bokeh.min.js, so that any access without the hashes do not cause this problem

as the best solution soon. I don't think it is great but I am just not sure what else to try at this point.

@martey
Copy link

martey commented Oct 1, 2020

I just checked again and unlike in June, both HTTP and HTTPS origin headers seem to be returning Access-Control-Allow-Origin headers. Yay!

I'm no CORS guru, but it looks like the value of the access-control-allow-methods header is GET, HEAD. Some documentation I have found (including https://aws.amazon.com/premiumsupport/knowledge-center/no-access-control-allow-origin-error/) suggest that OPTIONS should be included as well so that preflight requests succeed. Maybe this will fix the remaining issues?

@philippjfr
Copy link
Contributor

philippjfr commented Oct 23, 2020

Just to say that a few users have reported this issue to me with recent releases (all 2.2.x). Still trying to track down why this occurred.

@bryevdv
Copy link
Member Author

bryevdv commented Oct 23, 2020

@martey I have set OPTIONS as an allowed header, and whitelisted the typical headers for S3-backed cloudfront:

Screen Shot 2020-10-23 at 11 58 02 AM

Screen Shot 2020-10-23 at 11 58 16 AM

but still not seeing it in curl results:

Access-Control-Allow-Methods: GET, HEAD

@philippjfr (and @martey) any input or ideas are welcome. I have exhausted everything I know to try on AWS config. At present the only two things I can think of are:

  • remove SRI hash support altogether
  • have separate URLs for SRI vs non-SRI access

Note that the last one is not foolproof if people are manually using CDN URLs nothing would prevent them from using the "wrong" one.

I suppose we could also try something like Cache-Control: no-store to prevent all local caching, but that seems highly undesirable both from a UX perspective and a costs perspective.

I think Chrome is behaving stupidly and obtusely here (see linked issue in OP) but I would not hold out any hope for a resolution from that side of things.

@bryevdv
Copy link
Member Author

bryevdv commented Oct 23, 2020

re: dropping SRI support, I'd note that projects as large as ReactJS (i.e. much larger then Bokeh) have still not changed to adding SRI hashes by default: reactjs/reactjs.org#1862

I know @jakirkham was very gung-ho about adding this support, but perhaps we have jumped the gun and just need to walk this back. The current situation where pages just randomly, silently fail frequently for users, offering no recourse, is not tenable. We could continue to compute and publish SRI hash tables in the release notes for users that want to use them (and perhaps offer a non-default option to include them in script tags).

Ideas welcome. cc @bokeh/dev

@p-himik
Copy link
Contributor

p-himik commented Oct 23, 2020

@bryevdv May be related to your OPTIONS config screenshots above:

$ curl https://cdn.bokeh.org/bokeh/release/bokeh-2.0.0.min.js -X OPTIONS -I
HTTP/2 400 
content-type: application/xml
date: Fri, 23 Oct 2020 21:36:53 GMT
server: AmazonS3
x-cache: Error from cloudfront
via: 1.1 32f35b6a71829a460d6fdae31f270164.cloudfront.net (CloudFront)
x-amz-cf-pop: PRG50-C1
x-amz-cf-id: e5rgcsHOHDx9d35WXVMKXd6NhQ0mHF7hpg1DQwM1zRaIu_aXhqfhjA==

Some tentative searching for x-cache: Error from cloudfront hints that there's something that's not configured correctly ("duh", I know) but I can't quickly find what exactly could be wrong since I've never dealt with CloudFront.

@bryevdv
Copy link
Member Author

bryevdv commented Oct 23, 2020

@p-himik I had thought (and previously) that perhaps OPTIONS needed to be added as allowed at the S3 Origin CORS config, but AWS will not let me do that:

Screen Shot 2020-10-23 at 3 11 44 PM

@p-himik
Copy link
Contributor

p-himik commented Oct 23, 2020

But my request is not a CORS one.
It's just a regular OPTIONS request, so the CORS config should not affect it. At least, it seems logical.

@bryevdv
Copy link
Member Author

bryevdv commented Oct 23, 2020

@p-himik I think it only supports CORS options requests, perhaps. AWS explicitly mentions a configuration for AllowedHeader on the origin, which says which headers CORS preflight checks can ask for. I've opened it up completely (as the AWS examples) do, with:

<AllowedHeader>*</AllowedHeader>

Now and actual CORS pre-flight check returns:

(dev) ❯ curl -H "Origin: http://example.com" \
  -H "Access-Control-Request-Method: GET" \
  -X OPTIONS  "https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.0.0.min.js" -I
HTTP/1.1 200 OK
Content-Length: 0
Connection: keep-alive
Date: Fri, 23 Oct 2020 22:22:17 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, HEAD
Vary: Origin, Access-Control-Request-Headers, Access-Control-Request-Method
Server: AmazonS3
X-Cache: Miss from cloudfront
Via: 1.1 1002c05e647d0804e83147cdd205d14a.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: SEA19-C1
X-Amz-Cf-Id: v0e_7LDJCn7rCpOuMpNoIRnzafl1ltf7MIfa6-iCzuGkYfqOBkPIlw==

which I think is correct? This is a bit out of my wheelhouse though.

@bryevdv
Copy link
Member Author

bryevdv commented Oct 24, 2020

FWIW I am no longer able to repro using the steps I provided in the OP (using Bokeh 2.0 env and docs since the 2.0 docs gallery examples do not have hashes). Maybe the <AllowedHeader>*</AllowedHeader> change is significant 🤷 @philippjfr if you have any ability or line on how to try reproducing that would be helpful.

@philippjfr
Copy link
Contributor

philippjfr commented Oct 24, 2020

My previous approach for reproducing this seems to work now but that doesn't say all that much since it seemed to be quite arbitrary before presumably due to caching which is difficult to reason about.

@philippjfr
Copy link
Contributor

philippjfr commented Jan 29, 2021

Given that this issue is still occurring in the wild and we have explored all avenues of addressing this without disabling the integrity check I've opened #10877 to supersede this issue and vote for removing the checks in Bokeh 2.3. If Chrome ever fixes the bug we can readd the checks.

@philippjfr philippjfr removed this from the 2.0.1 milestone Jan 29, 2021
@UlyssesInvictus
Copy link

UlyssesInvictus commented Mar 12, 2021

I found this thread while researching this issue, and I just wanted to chime in with what seems to be the likely issue: this insane eight-year old bug that was marked "Won't Fix" in Chromium:

https://bugs.chromium.org/p/chromium/issues/detail?id=158131

In my case, simply appending ?= to all my image requests from S3 instantly solved the problem, without any further CORS changes.

Hopefully this was helpful. After a full evening trying to debug why Chrome in particular is failing my code, I'm not on a personal crusade to help anyone else running into the problem.

@martey
Copy link

martey commented Mar 12, 2021

That Chromium bug was marked "wontfix" because the root cause of the problem was Amazon Cloudfront not supporting the right headers. That bug only concerns images, is eight years old, and Cloudfront now supports Vary: Origin, so I am pretty sure that problem is not the issue here.

That said, your comment did make me take another crack at finding potential solutions. It looks like manually adding an Origin header to requests to Cloudfront will force CORS processing, which should prevent any cached responses without the Access-Control-Allow-Origin header from being saved. CloudFront does mention this in their documentation, although the language suggests that you would almost never need to use it:

If some of your viewers don’t support cross-origin resource sharing (CORS), you can configure CloudFront to always add the Origin header to requests that it sends to your origin. Then you can configure your origin to return the Access-Control-Allow-Origin header for every request.

I am not sure what the support/troubleshooting burden with people having issues with this was before, so I will leave it to others to determine whether SRI hashes should be added back.

@bryevdv
Copy link
Member Author

bryevdv commented Mar 15, 2021

@UlyssesInvictus @martey Thank you for you comments. The only reliable process that was ever discovered to repro this involved cached files and exactly matched the details of other Chrome issue I linked much earlier. Accordingly, I don't actually think any AWS-side changes can resolve this, except perhaps not allowing caching at all, and that is too costly for us. In retrospect, this conclusion probably should have occurred to me earlier.

Now, it seems there may be other ways to trigger a CORS problem but there has never been any reliable way to repro them. Which means the cycle time to just "try something and see if there are still user reports" can stretch into months or years (we are already at that point!) out to "forever".

Given the especially egregious "blank page" failure mode, that is just not acceptable, unfortunately. I think the decision to remove the SRI hashes in output the library generates should stand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants