Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--num-procs X and curdoc().session_context.request.arguments don't go well together #5582

Closed
bkueng opened this issue Dec 19, 2016 · 17 comments · Fixed by #5901
Closed

--num-procs X and curdoc().session_context.request.arguments don't go well together #5582

bkueng opened this issue Dec 19, 2016 · 17 comments · Fixed by #5901

Comments

@bkueng
Copy link
Contributor

bkueng commented Dec 19, 2016

If I add these two lines:

args = curdoc().session_context.request.arguments
print(args)

to examples/app/sliders.py and run bokeh with:

bokeh serve sliders.py --num-procs 2

and then reload the page it crashes in roughly 50% of the cases with:

Error running application handler <bokeh.application.handlers.script.ScriptHandler object at 0x7f468eafe978>: 'NoneType' object has no attribute 'arguments'

I guess one of the processes has the request not set properly. It works if I don't add --num-procs.

This happens with bokeh 0.12.3 and latest master (unrelated: master does not show the example at all, is it currently broken?)

For reference, this is the full example to reproduce:

''' Present an interactive function explorer with slider widgets.

Scrub the sliders to change the properties of the ``sin`` curve, or
type into the title text box to update the title of the plot.

Use the ``bokeh serve`` command to run the example by executing:

    bokeh serve sliders.py

at your command prompt. Then navigate to the URL

    http://localhost:5006/sliders

in your browser.

'''
import numpy as np

from bokeh.io import curdoc
from bokeh.layouts import row, widgetbox
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import Slider, TextInput
from bokeh.plotting import figure

args = curdoc().session_context.request.arguments
print(args)

# Set up data
N = 200
x = np.linspace(0, 4*np.pi, N)
y = np.sin(x)
source = ColumnDataSource(data=dict(x=x, y=y))


# Set up plot
plot = figure(plot_height=400, plot_width=400, title="my sine wave",
              tools="crosshair,pan,reset,save,wheel_zoom",
              x_range=[0, 4*np.pi], y_range=[-2.5, 2.5])

plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)


# Set up widgets
text = TextInput(title="title", value='my sine wave')
offset = Slider(title="offset", value=0.0, start=-5.0, end=5.0, step=0.1)
amplitude = Slider(title="amplitude", value=1.0, start=-5.0, end=5.0)
phase = Slider(title="phase", value=0.0, start=0.0, end=2*np.pi)
freq = Slider(title="frequency", value=1.0, start=0.1, end=5.1)


# Set up callbacks
def update_title(attrname, old, new):
    plot.title.text = text.value

text.on_change('value', update_title)

def update_data(attrname, old, new):

    # Get the current slider values
    a = amplitude.value
    b = offset.value
    w = phase.value
    k = freq.value

    # Generate the new curve
    x = np.linspace(0, 4*np.pi, N)
    y = a*np.sin(k*x + w) + b

    source.data = dict(x=x, y=y)

for w in [offset, amplitude, phase, freq]:
    w.on_change('value', update_data)


# Set up layouts and add to document
inputs = widgetbox(text, offset, amplitude, phase, freq)

curdoc().add_root(row(inputs, plot, width=800))
curdoc().title = "Sliders"
@bryevdv
Copy link
Member

bryevdv commented Dec 19, 2016

This example is working for me on current master, with the issue you mention: some percentage of session loads fail due to the AttributeError. Guess there is some previously unknown interaction between accessing the HTTP request, and num-procs. An an immediate workaround, I can only suggest to run multiple Bokeh servers behind a load balancer. There are docs for how to do that with nginx in the user's guide as well as a sample deployment that does that in the demo site repo: https://github.com/bokeh/demo.bokehplots.com

@bkueng
Copy link
Contributor Author

bkueng commented Dec 20, 2016

Thanks I'll look into that. Adding num-procs would be much simpler though :)

@bwkeller
Copy link

bwkeller commented Jan 9, 2017

I have been encountering this same bug without num-procs as well. I am using python threading however for some IO (fetching data to feed to bokeh).

@bryevdv
Copy link
Member

bryevdv commented Jan 9, 2017

@bwkeller can you provide a minimal example to duplicate what you are seeing? I am not convinced it would be the same problem, but it is impossible to investigate thoroughly without code to see and run.

@bwkeller
Copy link

bwkeller commented Jan 9, 2017

I'll see what I can do. It is some proprietary client code I'm working on, so I may need to just write something from scratch to try and reproduce. The bug is extremely intermittent for me (~less than 1% of page requests I would estimate). It's definitely the same proximate cause, curdoc().session_context.request == None, for what that's worth.

@bryevdv
Copy link
Member

bryevdv commented Jan 9, 2017

That's definitely useful to know, thanks

@azjps
Copy link
Contributor

azjps commented Jan 18, 2017

Just to add another data point, this happens to me too arbitrarily, even on apps doing nothing fancy at all; no --num-procs or threading etc (I've always guarded against if curdoc().session_context.request is not None: therefore). Not sure under what conditions too.

@bryevdv bryevdv modified the milestones: 0.12.6, 0.12.5 Feb 1, 2017
@leopd
Copy link

leopd commented Feb 14, 2017

I've seen this without any threading or numprocs too, and not infrequently, on a fairly simple app.

If feels like there's something which will put the process into a state whereby every subsequent request fails with the error object has no attribute 'arguments'. I've suspected that errors on previous requests could cause this.

@leopd
Copy link

leopd commented Feb 16, 2017

I think I know what's going on. First the HTML delivers the javascript to open the websocket, and includes the sessionId so that the websocket request can get access to the Session object that was created with the initial HTML request. But the the websocket request is likely to land on a different server process where that sessionId is meaningless.

I think the demo apps have only been working because they don't need anything from the original HTTP request, so I'm guessing whatever process the websocket request happens to land on is just creating a brand new session object when it can't find the one specified by the id.

I think this can't be solved easily. The server architecture assumes that any incoming websocket connection will be able to find the ServerSession object by its id, which only works if there's a single shared memory space for all the sessions. I don't know exactly how tornado does its forking, but I'd be really surprised if the session dictionary is somehow in a shared memory space between all the server processes. The same problem gets even worse if you're actually scaled out to multiple machines -- then the only solution is to have a shared session store like redis/memcache or something, and then the locking gets more complicated. But I think this is a symptom of a basic design flaw. People figured all this stuff out in the 1990's with web applications, so the correct design patterns should be well known. But they get harder when you're trying to support real-time updates.

@bryevdv
Copy link
Member

bryevdv commented Feb 16, 2017

@leopd so looking at bokeh/server/views/ws.py shows:

yield self.application_context.create_session_if_needed(session_id)

So offhand I'm inclined to think you analysis is correct. Ping @havocp your thoughts welcome here.

We have discussed the possible need for session affinity in some circumstances. It seems as though it is possible to configure some reverse proxies (e.g. nginx) for session affinity. So it is possible an answer to this could be documentation about correctly configuring all these additional tools correctly for using Bokeh in "scale out" situations.

Alternatively, (or perhaps additionally) another idea would be to have create_session_if_needed defer to some kind of delegate for looking up session ID's. By default, its the policy is just "look in the local memory space" (i.e. what it's doing now), but other delegates could implement policies that refer to some shared store.

FWIW the HTML connection is immediately upgraded to a websocket and all further comms happen over that. I don't think the original Bokeh session is needed, if the WS upgrade happens to hit a different server or process. Or in other words, I think creating new Bokeh sessions on demand is OK and the the specific problem to solve is restricted to communicating just the original HTML request to the the WS session.

Thoughts here appreciated. Additional help from people with more expertise (more than I have, specifically) would be extremely valuable and welcome here.

@bryevdv
Copy link
Member

bryevdv commented Feb 16, 2017

but I'd be really surprised if the session dictionary is somehow in a shared memory space between all the server processes.

Commenting specifically in context of this issue (single-machine with fork) I think implementing a shared mapping of session IDs to HTML requests is do-able. Not entirely trivial, but certainly similar to things I have had to do in the past.

@havocp
Copy link
Contributor

havocp commented Feb 16, 2017

Yeah, something here isn't quite right. Sorry about that...

Session affinity / sticky sessions are required for this to work, probably. This is a consequence of keeping server-side state so apps can be written in Python (if we kept getting a new server-side context, then we'd make the app development model more complex). The original plan we discussed, IIRC, was always for scaled-out production deployments of Bokeh to have sticky sessions.

There's kind of an inherent session stickiness due to the websocket (once open, it always goes to the same server process). In simple cases without sticky sessions in the reverse proxy / load balancer, the original http request creates the session state, and then that session state is pretty much discarded in favor of the state the websocket request creates. But then the websocket stays connected to the same app server node and we don't need to create session state after that second time. So as long as the app doesn't care about the state created in the initial http request, things behave as if sessions are sticky without special behavior from your reverse proxy or load balancer. This has let us kick the sticky sessions can down the road.

request.arguments breaks this and now it's necessary to deal with state created by the initial http request.

We actually noticed this when adding request.arguments it looks like in #4858 , I said

A potential point of confusion here may be that the args are always for the session-creating request, not any subsequent request using the same session. But most of the time that shouldn't matter (?).

However obviously that wasn't thought through fully; it isn't just confusing and most of the time it does matter, if using more than one process.

What I forgot when saying "that shouldn't matter" is probably that there are two requests, the http one and the websocket one, in typical usage. Maybe I was wrongly remembering that in typical usage each session is only created for the websocket.

As you say, this has been figured out for web applications. However, Bokeh can't easily use the same answer because it isn't a regular web framework; in general, it hides http entirely! Bokeh gives a Python-data-science type of programming model that doesn't require people to be web devs (mess with http, JavaScript, and all that). This is done by having a big blob of Python state on the server (the Document) and syncing it to the client automatically, which of course is not how most web apps are written (they would keep state in a database, instead).

request.arguments was bolted on post-initial-Bokeh design, as a little escape hatch to get a little info from http. The problem now is that this sort of cascades; once we introduce the notion that web apps have requests, then we've also introduced the issue that each request should be stateless, and now you need stuff like cookies or a database or Redis to store your state across requests... Bokeh of course doesn't support setting cookies because it doesn't use the stateless http request/response web app model in the first place.

Some possible solutions:

  • document that request.arguments won't always be present
  • document possible solutions: sticky sessions; stuff your cross-app-process data in Redis; ?
  • when the initial page load has query params, embed them in the html so the same params are passed to the websocket request ?
  • make Bokeh have some sort of built-in simplified API cross-node shared data (has to be cross-machine, not just cross-process - Redis is a typical solution for stuff like this) - the idea would be to give a facility similar in spirit to Redis, maybe even configurable to use Redis, but much simpler API-wise

I'm a little skeptical of automatically forwarding request.arguments around; after all, if dropping to the http layer, maybe you actually care about this request, and might even want to know that the websocket request did not have arguments. But supporting some way to copy request.arguments into a shared location could be handy.

The danger is that if we go too far down the road of trying to allow writing a full-blown web app with full-blown http access in Bokeh, it will lose track of the actual original point which was to enable writing apps without learning http/javascript/etc.

My instinct is probably to focus on making session affinity work well; the --num-procs stuff should do affinity out of the box, and we should document how to do it with nginx or whatever.
I feel like there's a slippery slope trying to be a general web framework and it'd be better to ensure the assumption is accurate that we have server-side state.

But I don't know. Hope the above gives someone else some ideas.

Note that you certainly can today use Redis or a database with Bokeh to store stuff keyed by session ID, and that's no worse than where you'd be with Django or Flask or something, perhaps.

@leopd
Copy link

leopd commented Feb 16, 2017

Yeah, it's tricky. And I really appreciate the balance you're striking of making this stuff easy to do for python programmers who don't need to dive into these kinds of distributed systems issues.

Session affinity on the load-balancer is a reasonably good band-aid. (e.g. https://www.nginx.com/products/session-persistence/ ). But it doesn't handle the num_procs=2 case, which is otherwise a nice simple perf improvement. Session affinity has a scaling problem too because it means each user (browser) is stuck to a single server process -- so this won't handle well the situation where you have a single browser that wants to render lots of charts simultaneously say in a complex dashboard for example.

I actually like

when the initial page load has query params, embed them in the html so the same params are passed to the websocket request

(It's a special case of the trick early web app frameworks like ASP used to serialize session down to the client.) It doesn't fully work if you need to run custom logic to render the HTML based on the request. But in the cases I've worked with so far, the HTML is just a dumb shell, and thus could be rendered without invoking any app code -- save that until the WS is opened, and as you say it's got intrinsic stickiness. It wouldn't cover page reloads with num_procs=2 as written. (It would kinda work -- the chart would be reset to its original state if the second WS request landed on a new proc.) But couple this with a sticky-session load balancer, and num_procs=1 and you've got a reasonably scalable solution.

Something like a shared session store with pluggable backends -- redis/memcache/single-machine-shared-memory is the right way to really scale this out, but it gets messy when you think about supporting multiple writers which requires distributed locking or pub/sub or something. A routing layer that effectively implements stickiness is almost easier. If you can discount the multiple-writer use-case it gets a lot easier.

@leopd
Copy link

leopd commented Feb 17, 2017

One more point which my earlier comment was confusing about...

If the WS request includes a copy of the query params, then a lot of common cases work and scale well, with or without sticky load balancers, and with or without num_procs. One particularly difficult case is the "complex dashboard" case where one browser wants lots of documents -- it would scale out easily with this strategy, as long as you don't mind the dashboard getting reset when you reload the page. (Which is fine I think for a lot of cases I can imagine.) If you demand that user-invoked changes to the documents survive a reload, then you'll have to take the perf hit of session stickiness.

@bryevdv bryevdv modified the milestones: 0.12.5, 0.12.6 Feb 17, 2017
@bryevdv
Copy link
Member

bryevdv commented Feb 17, 2017

OK something like that had occurred to me but it seemed possibly to simplistic. But if both @havocp and @leopd think it's worth pursuing it seems possible to do for 0.12.5.

Questions:

  • should just the arguments be included in the Bokeh embed template? Or the entire HTML request?
  • Would it work to set the request information in a cookie instead? Would that be acceptable or better?

Regarding:

as long as you don't mind the dashboard getting reset when you reload the page.

The app code is always re-executed to generate a new Bokeh Document for a new Bokeh Session on a (re)load, so this is always going to be the case in a technical, literal sense. To have any kind of persistence, some external information (request arg, username, date, data read from an external database or file) would need to be used to influence the construction of that document.

@leopd
Copy link

leopd commented Feb 20, 2017

Cookies would need to be scoped to the sessionId. If there was just a single cookie, then the request args would leak across different documents in the same browser. But if the cookie name includes the session Id, that would work.

I'm not sure what's in the full HTML request that's not the arguments. POST body maybe? That could be relevant. Exotic things like custom request headers IMHO aren't worth worrying about.

All this will work assuming that the app is effectively "read-only" on the underlying data. If creating the document has any side-effects, none of this works. But I think that's the most common scenario.

@bryevdv
Copy link
Member

bryevdv commented Feb 20, 2017

Cookies would need to be scoped to the sessionId. If there was just a single cookie, then the request args would leak across different documents in the same browser. But if the cookie name includes the session Id, that would work.

I'm not sure what's in the full HTML request that's not the arguments. POST body maybe? That could be relevant. Exotic things like custom request headers IMHO aren't worth worrying about.

OK, thanks for the comments. I will concentrate on just the request then as that's the scope of the original feature anyway. I will also for now just encode the request args in the template that embeds the document. There is a separate issue to explore using cookies for session id's and I can look at everything together at one time when I get to that issue.

All this will work assuming that the app is effectively "read-only" on the underlying data. If creating the document has any side-effects, none of this works. But I think that's the most common scenario.

Can you elaborate on this? Although it needs to be updated, the old "Happiness" demo showed a way to send user-specific information based on an authenticated user. If a user id of some sort is available to an app and the app makes reads, writes or updates to some external persistent store based on that user id, I don't see what the issue would be. I don't think this is the case you are referring to, but I want to make sure I understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants