Skip to content

Add DataModel base class #3674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
guillochon opened this issue Jan 17, 2016 · 42 comments · Fixed by #10564
Closed

Add DataModel base class #3674

guillochon opened this issue Jan 17, 2016 · 42 comments · Fixed by #10564

Comments

@guillochon
Copy link
Contributor

guillochon commented Jan 17, 2016

See #3674 (comment) for a new, better idea for this feature.


Hi all, I'm having an issue where line plots will sometimes be invisible when rendered. This error seems to be non-deterministic; when I run the same code repeatedly, the plots will sometimes render as normal, and sometimes render invisibly.

An example page that is currently showing these "invisible" plots is available here (2nd plot from the top): https://sne.space/sne/SN1993J-broken.html. One thing you can notice is that hovering over the plot still produces tooltips, and an examination of the data within the page source reveals that the line_color attributes are non-white, and that the data is actually available. But for some reason, the lines are not visible!

And here's the exact same page when the render does work: https://sne.space/sne/SN1993J.html. The file sizes are identical but the file contents are not identical. It seems like it might be a different command order in the two files?

Anyone know what's going on?

@guillochon guillochon changed the title Line plots intermittently invisible Line plots intermittently invisible (with example!) Jan 18, 2016
@birdsarah
Copy link
Member

hi @guillochon, thanks for the report. I agree, I can see the tooltips but not the lines. Without the code that was used to produce the two examples "broken" and not it's impossible to know what's going on - I've certainly not seen this before and the console's not telling me anything useful if that helps.

@guillochon
Copy link
Contributor Author

The code is what I posted here: #3671. The code used to produce both examples is identical, I literally just ran the code and ran it again. Sometimes the issue occurs, sometimes not. It's not deterministic!

@birdsarah
Copy link
Member

I literally just ran the code and ran it again. Sometimes the issue occurs, sometimes not. It's not deterministic!

I had misunderstood - I thought it was at runtime - when the html loaded, not running your bokeh script.

I am not sure what's going on. My only guess is that HBox(vplot(hplot(p1,p2,vform(binslider,spacingslider)))) adds things in a non-deterministic order. I seem to remember there being something about order not being guaranteed for children of a document. ping @havocp

@guillochon
Copy link
Contributor Author

So I got rid of vplot, hplot, and vform, and the problem still persists. I generate thousands of these pages in a loop, and it seems like when it's not working, all the plots are invisible, on every single page. I'll restart the loop several times, and randomly everything will render nicely.

Is it possible for someone to look at the broken link above to see the ordering of the output? It is different than the "good" plot, suggesting that the output order is affecting the display of the data. I have no idea how to fix this and it makes automation of the script impossible (I have to babysit it and make sure it properly generates the plots).

@birdsarah
Copy link
Member

ping @bryevdv @mattpap

@havocp
Copy link
Contributor

havocp commented Jan 25, 2016

Possible clue: the working HTML page is 10x larger than the broken one.
What I would do is extract the JSON from each page into its own file, then pretty-format it with a JSON formatter, then do a diff. If the diff is obfuscated by different UUIDs, use a regex to strip all the UUIDs out perhaps.

@guillochon
Copy link
Contributor Author

@havocp Since I first posted this issue more data was added to the plot. The file sizes were identical previously. I'll try to grab a broken example again the next time the script executes.

@guillochon
Copy link
Contributor Author

This is kind of a deal-breaker for me for Bokeh, I can't have something that doesn't plot the data half the time. Does anyone have any idea of things I can try to fix this? What is Bokeh even doing here that would result in the non-deterministic output I'm seeing? Can I force Bokeh to output in a particular order, if that's what's causing the issue?

@havocp
Copy link
Contributor

havocp commented Jan 30, 2016

Do you have time to try the cleaned json diff I suggested? ideally two pages with the exact same data but one doesn't work ...

I have no idea what would cause this so step 1 for me would be to see what's different about the working and broken pages, by getting things human readable then diffing.

also maybe given a script to make the pretty diff you could automate generating a failure because you could regenerate, remove noise such as changed uuids, and then if the two pages aren't identical you know one of them is broken.

There's a "bokeh json" command too so if the problem is the json varies we could get the html out of the picture.

@havocp
Copy link
Contributor

havocp commented Jan 30, 2016

do you have a way to freeze the data so you can take variation in data out of the picture for debugging ?

@havocp
Copy link
Contributor

havocp commented Jan 30, 2016

Also useful would be to try to minimize a test case that's easy for anyone to run (includes all needed data files, etc).

@guillochon
Copy link
Contributor Author

OK, I have two files that should not change now, one broken, one working. A diff is just a mess, the JSON is in a completely different order:

https://sne.space/sne/SN1990aa-broken.html
https://sne.space/sne/SN1990aa-working.html

@havocp
Copy link
Contributor

havocp commented Jan 30, 2016

I can't work on this tonight but I'd probably write a little json cleaner script to sort the keys and change all the uuids to just the string "uuid" or something and then pretty print.

@havocp
Copy link
Contributor

havocp commented Jan 30, 2016

if json.dumps doesn't have a sort keys option it might keep the order if you hand it ordereddict instances I don't know...

@havocp
Copy link
Contributor

havocp commented Jan 30, 2016

another possibility is the different order is precisely the problem I suppose! maybe the wrong order of building the JavaScript models breaks. but that's pure speculation so I'd probably rule out other differences first if it were me

@bryevdv
Copy link
Member

bryevdv commented Jan 31, 2016

Possibly the problem (in a column data source in the bad file):

data: {"yoff": [93.09490073825461], "spacing": [1.0], "src": ["5", 
"5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", 
"5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", 
"5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", 
"5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", 
"5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5

There is an assumption that is not well enforced (I am working on this right now as a matter of fact) that all columns in a column data source be the same length. The best mental model for a column data source is a "cheap dataframe" (i.e., a collection of series all of the same length). I would expect that a glyph that refers to any of the "short" columns like this would only plot one point, which for a line means plotting nothing at all.

@bryevdv
Copy link
Member

bryevdv commented Jan 31, 2016

If you want to have fixed values for a certain field you can set that field as a value explicitly:

p.line(y=dict(value=0.10), ...)

# or

p.line(y=value(0.10), ...)

# or even usually just

p.line(y=0.10, ...)

The main thing is not to have arrays of different lengths in a column data source.

I'm working on a PR in conjunction with the streaming interface that enforces and regulates the column-length assumption much more rigorously and loudly.

@bryevdv
Copy link
Member

bryevdv commented Jan 31, 2016

Another possibility, in the bad file:

"type": "MultiLine", "id": "835699f7-33dd-413a-9c00-23cc3164c2d5"}, 
{"attributes": {"y": {"field": "y"}, "line_alpha": {"value": 0.1}, "line_color": 
{"value": "#1f77b4"}, "x": {"field": "x"}, "line_width": {"value": 2}}, "type": 
"Line", "id":

In the "good" file, the MultiLine seems to refer to fields _xs_ and _ys_, are you sure that the columns for MultiLine coords are always "lists of lists" ?

@bryevdv
Copy link
Member

bryevdv commented Jan 31, 2016

More ideas, there is "xs" and "ys" in the bad file that some glyph refers too, but they are empty:

"data": {"xs": [], "ys": []}}, "type

Edit: seems to happen in the good file too, though

@bryevdv
Copy link
Member

bryevdv commented Jan 31, 2016

As an aside it suddenly seems like a --filter option to bokeh json would be useful, and also a way to "ingest" from html files.

@rungatgenapsys
Copy link

Thank you very much, @guillochon, for asking this question! I spent almost 2 days wondering, researching, trying different things thinking I probably didn't code the datasource updater function or the session setup correctly (and pulling out my hair when nothing seemed to work). Although the manifestation of the bug is slightly different, but the root cause is the same: x-axis & y-axis arrays cannot have different number of elements. In my case, my real-time line plots took a few minutes to show up & reflect the real-time changes in the y-axis data, and it was consistently doing this. The same code was working in Bokeh v0.10.0 but starting behaving this way once I migrated it to v0.11.0, although v0.10.0 was throwing an error about non-equal column data lengths, but the latest version does not.
Finally, thank you, @bryevdv, for looking into this issue. My simple fix was to create the same length array filled with 0's for the y-axis since my x-axis is fixed.
My two cents is to make it work the same as back in v0.10.0 or throw a runtime exception and stop the execution.

@bryevdv
Copy link
Member

bryevdv commented Jan 31, 2016

I don't recall an explicit decision to turn off the "unequal length" warning (which doesn't mean there wasn't one, but I don't happen to recall any), so maybe that is a regression. It's also possible this is not the same problem as @guillochon but hopefully it might be.

I'm working on some cleanup and refactoring that will definitely make things more chatty on the JS side when this situation occurs, and we will revisit the python side warning as well.

@guillochon
Copy link
Contributor Author

Just to clarify again, the bad and the good files are produced by identical code, using identical inputs. I re-ran the code ten times in a row, sometimes the output is "good," sometimes it is "bad." The code runs in a loop and produces a few thousand html files each time, and it appears that if one of the files is bad, they are all bad, and if one of the files is good, they're all good. So what I'm doing now is running the code, looking at the first html file, and if it's "bad" I kill the program and restart it until the first output is "good," then all my outputs are good. This is extremely weird behavior.

The only way I think this can occur is if Bokeh's output ordering is not fixed, that's I think the real the underlying issue here, I do not understand how the output ordering is not guaranteed. If it's the unequal column data source thing, the way its manifesting "good" and "bad" outcomes is non-deterministic, I cannot predict whether the output will be "good" or "bad" until I've run the code and looked at the result. I will give that a try but I am skeptical that this is the real issue.

@guillochon
Copy link
Contributor Author

I've forced all of the data fields to be of equal length and am re-running multiple times in a row now to see if I can trigger the "bad" behavior. So far it's producing only "good" output, but in my experience it may take a few tries to trigger.

So a thought on perhaps why the column length mismatch causes this: If one is defining multiple fields in a column data sources, the order that the elements appear in that column data source is unpredictable because internally it's a "dict" object (which doesn't guarantee key order). So, if one of my one-element arrays happens to be ordered first, then it read the data as having a length of "1", and doesn't plot it.

I don't understand your solution though @bryevdv, I cannot define "binsize" for instance in the way that you did in your example (only "x" and "y" and other variables that line already knows about). My solution now is just to make "binsize" (and similar) as long as the other arrays in the ColumnDataSource, this unnecessarily duplicates variables and makes the filesize a lot larger. The reason for these additional variables in the first place is that I need some internal variables for the slider callbacks that I'm using, I don't see any other place to store these variables other than a ColumnDataSource.

@havocp
Copy link
Contributor

havocp commented Jan 31, 2016

If we can capture bad and good and make a human readable diff I hope things will become clear... were the column lengths only unequal in "bad"? was anything else different from bad to good? I have no guess why column lengths would vary from run to run in a way that would be caused by bokeh.

If the column lengths fix doesn't work I recommend a systematic approach (find the diff and root-cause it) rather than trying things haphazardly. The time to solution will be much more deterministic that way.

@guillochon
Copy link
Contributor Author

@havocp They were unequal in both cases, because the code and inputs are completely identical, no changes at all in the code. I think I see why the unequal column lengths might be causing this, and it's because of dict not guaranteeing key order (the order of the keys can be different from run to run with no changes in code). If this is the case then @bryevdv is right and an exception needs to be thrown when using unequal column lengths.

@havocp
Copy link
Contributor

havocp commented Jan 31, 2016

I understand there are no code changes - I'm talking about understanding the differences in output to be clear.

is dict order the only difference in good vs bad output ? or are there others?

I think Python may randomize hash keys and thus dict order as a security measure. I have some fuzzy memory of that.

@bryevdv
Copy link
Member

bryevdv commented Jan 31, 2016

The reason for these additional variables in the first place is that I need some internal variables for the slider callbacks that I'm using, I don't see any other place to store these variables other than a ColumnDataSource.

We're working on a "namespace" model specifically to store bits of state like this. In the mean time, since you seem to have a CustomJS callback, you can execute arbitrary JS code so you can attach these bits of data anywhere really. cb_obj.binsize or even window.binsize = 10 if you wanted.

Given what is known about dict key ordering, I am more confident this is the issue. For reference, here is the code in the source that reports the "length":

https://github.com/bokeh/bokeh/blob/master/bokehjs/src/coffee/models/sources/column_data_source.coffee#L18-L31

If the key order is not predictable, that unpredictability will clearly be propagated there. This length value is used by both glyph renderers and properties to condition the render loops. As I said, expect the data source assumptions to be enforced much more rigidly in the future.

If this is the case then @bryevdv is right and an exception needs to be thrown when using unequal column lengths.

Well as I also mentioned, there used to be a very loud warning, still need to look into what happened to it.

@bryevdv bryevdv modified the milestones: 0.12.6, 0.12.7 Jun 5, 2017
@bryevdv bryevdv modified the milestones: 0.12.8, 0.12.7 Aug 21, 2017
@bryevdv bryevdv modified the milestones: 0.13.x, short-term Sep 11, 2018
@bryevdv
Copy link
Member

bryevdv commented Sep 5, 2019

OK, there was a recent discussion on the Discourse that surfaced a better approach to this. The full discussion is there but I wanted to record the basic idea here, which is simply that we provide a mechanism to turn new Python Bokeh models that only require properties and no other JS implementation in to custom extensions automatically with no JS required from the users.

This would enable users to create nicely typed custom scratch spaces that automatically sync and trigger events in exactly the same way as existing models. This is a much better approach, both for users and us to implement.

Some questions:

Does this happen truly automagically for any Model subclass that is not a built-in? I.e. Does just defining

class Foo(Model):
    bar = String(...)

suffice, with nothing else needed? Or there could be a special parent class that subclasses of automatically get synthesized:

class Foo(DataModel):
    bar = String(...)

Or, we could require an explicit registration step for "data models" that are defined as regular Model subclasses.

I am inclined to prefer 2 or 3 I think. For example in the docs we define Model subclasses and there is no need to generate data models for those, as would happen with the first case.

@mattpap
Copy link
Contributor

mattpap commented Sep 5, 2019

I think DataModel approach would be the best, though I didn't have time yet to fully evaluate this.

@bryevdv bryevdv modified the milestones: short-term, 2.0 Nov 10, 2019
@bryevdv bryevdv changed the title Add Namespace models Add DataModel base class Nov 10, 2019
@bryevdv
Copy link
Member

bryevdv commented Nov 10, 2019

Noting that as part of this work, since it will be necessary to allow DataModels to be added as roots:

curdoc().add_root(some_datamodel)

we should ensure that any model (e.g. sources) an also be added (maybe that's already possible)

@bryevdv bryevdv modified the milestones: 2.0, next Jan 30, 2020
@MarcSkovMadsen
Copy link
Contributor

MarcSkovMadsen commented Oct 8, 2020

I need very much need the Dynamic DataModel.

I need it to add bidirectional communication to html elements included via the Jinja Template or via https://github.com/paulopes/panel-components which is a wrapper of the Jinja Template into something much my in the style of Dash layouts or R Shiny.

I have also requested it for Panel here holoviz/panel#1612.

For now I don't know how to create a dynamic DataModel from a param.Parameterized class or a Bokeh Python Model. But I can see that it easy to create seperate Event, StringProperty, StringAttribute, BooleanProperty, BooleanAttribute, IntegerProperty, FloatProperty, ListProperty, DictionaryProperty, DataFrameModel, ... data models and use them to pair one python parameter to one html element property, attribute or event at a time. So that is what I will start learning from.

But any kind of guidance in the right direction, some POC code or an actual Bokeh Data Model would be very much appreciated.

@philippjfr said that @mattpap already had experimented/ worked at bit with this? If some code or learning could be shared that would be great.


As inspiration of why this is valuable take a look at this example where we just need to be able to add bidirectional communication to a fast-button html element.

https://github.com/paulopes/panel-components/blob/master/examples/fast_hello_world.py

image

image

@MarcSkovMadsen
Copy link
Contributor

MarcSkovMadsen commented Oct 12, 2020

@mattpap . I can see you have done a pull request to solve this.

FYI @philippjfr

I have also created something. I guess it's different then what you have done. But you can see the details of what I need and what I have created here holoviz/panel#1612 (comment)

The implementation is in the data_model branch. The code can be found here https://github.com/MarcSkovMadsen/awesome-panel-extensions/blob/data_models/tests/data_models/test_manual.py, https://github.com/MarcSkovMadsen/awesome-panel-extensions/tree/data_models/awesome_panel_extensions/data_models and https://github.com/MarcSkovMadsen/awesome-panel-extensions/tree/data_models/awesome_panel_extensions/bokeh_extensions/data_models.

data_models

image

@mattpap mattpap modified the milestones: next, 2.3 Dec 5, 2020
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.