-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Table lock: tests run with tables that are read-only by default #5381
Conversation
d912603
to
28e51ef
Compare
@markotoplak, food for thought: some projection widgets (FreeViz, LinearProjection) output the projection matrix, that is, positions of "anchors" (projected base vectors, variables) in the projection as The user can move these "anchors", which changes the corresponding To avoid this problem, a widget must never send a table if it later changes it (place). In practice this means that all output tables must be explicitly constructed for output, without storing it. |
@janezd, yes, exactly, as soon as something goes through the signal it should not be changed anymore, or we get potential subtle bugs like this one: a receiving widget's stored table is changed before a signal is emitted. |
aca50e3
to
8e50762
Compare
Say that some function calls In practice, this should be important only if some code stores csr's internal arrays and expects they won't change, or if some code treats zeros differently than missing data. BTW, unless I've broken something with the last changes, we're down to only 5 failing tests. Yay. |
03ce796
to
2223f50
Compare
The two remaining issues are due to CatGB: its Cython code expects a writeable buffer, though it probably does not actually change it. In our code, we solve this by adding a So: does any of these methods (to my knowledge we support several implementations) modify the array? |
ea1d720
to
2508651
Compare
Codecov Report
@@ Coverage Diff @@
## master #5381 +/- ##
==========================================
+ Coverage 85.96% 85.97% +0.01%
==========================================
Files 313 313
Lines 65471 65641 +170
==========================================
+ Hits 56280 56435 +155
- Misses 9191 9206 +15 |
This now passes the tests. What still needs to be done:
@markotoplak, @lanzagar, could you inspect this before I continue? |
This idea's really creative, and the execution is very thorough. Good job! 👏 However, I believe #5303 could be solved simpler and easier within orange-canvas-core – after all, the problem you are solving stems from Table as a format of serialization between widgets. If widgets are linked in a single chain, it isn't wrong to alter data directly. Throwing an exception on data modification will break a LOOOOT of code, not just within Orange, but also in scripts that use Orange as a data mining library, and other projects which use Orange Table as their backbone. And their use cases for treating Table as mutable are completely valid. I'd rather not force people to learn to write |
I'm OK with issuing a warning, and letting it forever be a warning, not an exception. Though there's this problem that Furthermore, I'm OK with adding a global flag whether these warnings are enabled or not. Orange canvas would set it and thus show warning (for add-ons; the core code should be clean!). Any other projects can leave the flag unset. I believe that the problem is not related only to workflows. With any parallelism, changing the table may be problematic. I realize that this steepens the learning curve. The current exception says that the table is read-only unless unlocked; this should make the user look into documentation. I don't know of any uses of |
That'd be ideal.
This is a great idea :D
Don't rely on users looking into documentation, it'll make most people give up instead. If this is something that almost anyone who tries using Python Script will encounter, it'd be nice if we could guide them well with an error message. Could we offer an alternative suggestion in the sense of 'you should copy the array or unlock it if you're not using parallelism'? On that note, I'm also interested in what messages a DataFrame created from a locked numpy array will show ( After a little bit, there's probably going to be a StackOverflow question on the top of Google search results for 'table is read-only unless unlocked', but let's try not to rely on that. |
@irgolic, no, it is still wrong. Imagine that a chain is modifying data, and then a user changes the parameters of a widget in the chain. |
@irgolic, I do understand your points and agree that as long as you keep your table local, it is fine to change it. But if the table is passed through a signal, that changes. Even without any parallelism we will get problems (because a widget in a chain can be reapplied with new settings). I am for irreversibly locking tables that passed through a signal. First as a deprecation warning, of course. That would make it easier for users - no unlocking statements necessary. :) |
You're right.
I'm seriously against erroring on an unlocked table in Python Script. Throw a warning and briefly explain what they did wrong. Think 'best effort'. This got me thinking about how links could visibly stale if the table they sent through is ever unlocked, and would require recomputation. |
408fa64
to
490a9b7
Compare
@markotoplak proposed a very clever solution, which I happily implemented: tables are locked within tests, but not when running Orange (or individual widgets). Barring any bugs in implementation, locking should have no effect on runtime, but it should show any violations through failed tests. This PR takes care of core Orange, while tests for add-ons will most certainly fail after this is merged. Implementation details: locking is regulated by |
69dda8a
to
7b736f6
Compare
Notes to self, TODO for merge:
|
83a10da
to
dd213c1
Compare
Hm, at first I wanted to use your
I got Let's analyze what happened: Here we saw a problem in a single thread, which we can "fix" by not doing the optimization and we get a correct behavior of not allowing to unlock And this reminds me of race conditions: having multiple threads unlocking the same table or an array shared between tables could be an issue. Fortunately, we avoid editing tables anyway so perhaps this is not such an issue, just perhaps some badly written code could start randomly crashing if table locking was enabled. But that code was likely wrong already. So I wouldn't worry about it, or can you, @janezd, think of a problematic case? I'd avoid the optimization and would go for un-unlockable views, so original semantics in these cases. And I would not worry about multithreading, because if someone has multiple threads editing the same table, they deserve crashes. Finally, should we also search for all cases when tables could share arrays and try to code around them? I wouldn't worry too much about it, I'd fix it when something starts failing even though it should not. |
If computation of some table('s column) is separated into two threads, which simultaneously unlock it and set the data, this would fail although it is kind of legit. But the workaround is to unlock the table before starting the threads, so I don't see it as a big problem. Your example with metas is more disturbing because one table can share an array with another without realizing it, and this can be different to track. Maybe we should just avoid optimizations and always copy? |
f134c10
to
d45703c
Compare
@markotoplak, I'm OK with all your changes. You can tidy this up by squashing them into the original two commits, if you wish. |
45d3fc5
to
55fc086
Compare
Issue
Ref #5303.
Description of changes
Lock
Table
and provide a context manager that opens it.This is a draft; it will be interesting to see situations in which it fails. Here's one fascinating find: 17d37fe.
Includes