-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status of diff & patch #58
Comments
Hi!
Awesome that you are looking into using Immer for your project!
The diff/patch I haven't started working on, and I doubt I will have
time to work on that until next year, unless I get some generous sponsor
:)
On the other hand, I wonder whether diff/patch is really needed for your
application. You can do ad-hoc diff/patch by comparing the nodes in
your data-model tree (Immer already supports efficient == and !=) but of
course it depends on the shape of your data and the size of typical
documents. If you give me some more details I can give you better hints
on what the best approach might be to integrate Immer.
Cheers!
JP
bobkocisko <notifications@github.com> writes:
… Juanpe,
I'm currently considering options to simplify the application state
management in a large app. So far the whole app has been coded
imperatively and it gets somewhat tricky because one requirement we
have is to track and send updates externally of only the state that
has changed. It's becoming somewhat unmanageable as more new global
features are being added...I guess we're just reaching the point of
diminishing returns with the current approach. So I started looking
into functional programming techniques which led me to immutable data
structures and I came across your library. This is exciting stuff.
One thing that is holding me back though from testing immer is the
lack of diff / patch support. Since high performance is also a
requirement but since we need to be very aware of updating only what
has changed, support for efficient diffing would be essential. What
is the status of this? And what is the level of difficulty to
implement it?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#58
--
∿∿∿ https://sinusoid.al
∿∿∿ https://sinusoid.es
|
JP, Thanks! The app is basically MVC but the model also represents external device state which must be kept in sync. We need to support up to 250 external devices, each of which has about 10-300 settings. When a command is fired from the UI it needs to map to about 1-50 individual settings changes as well as some app state changes and many of these are asynchronous. Each setting has to be successfully sent and acknowledged by the external device(s). We must accurately track each device's current settings in memory because if we try to set something but it was already at that value, we get no acknowledgment from the device and have to assume something is wrong. And eventually we would like to have this all work transactionally so that if any of the settings requests fail that the others are rolled back to return to consistency. Additionally, each command needs to be able to be run in parallel with others; we cannot force them to run sequentially since they can be delayed with the network round-trips and multiple external APIs that we must update at the same time. So far everything is working pretty well with some annoying caveats (annoying to a software architect, not so much the users). But new requirements are coming all the time and making the software harder and harder to reason with. I think that functional programming techniques like map/reduce/filter and immutable data structures would dramatically simplify the code and lessen brain fatigue :). But I don't want to lose performance in the meantime. The app is extremely performant right now and we want to keep it that way. Besides using immer for storing the state of the app, are there any optimized libraries for implementing map/reduce/filter in a performant way in C++, similar to what GHC does for Haskell (https://stackoverflow.com/a/35038374/4307047)? One reason why I'm asking about performant diffing / patching is because I could see that as a way to answer some of these problems in a more encapsulated way and to help with merging the state of asynchronous commands into the rest of the application state. Another reason is that we have hundreds of controls in our UI and we have specific algorithms and signal/slot mechanisms to make sure that only the minimum number of OpenGL draw calls are run to update only the control(s) that have changed rather than re-drawing the whole screen. Our app runs on limited hardware so the user experience would be pretty bad and the processor would be running much hotter if we had to redraw everything all the time. I think that perhaps transitioning into a map/reduce architecture and hooking the UI updating signal/slot into the end of that pipeline might be an answer here but I'm only speculating. Also, do you have any recommendations for maintaining pointers between nodes in an immutable document, which stay up-to-date across changes? The current design of our app does this everywhere. But every time I try to think about applying that to an immutable structure, my brain goes into an infinite loop. :) I suspect the only answer would be to store the identity or path from the document root and then re-traverse that path after every update to find the latest changes. Am I missing an easier/faster way? Bob |
Hi!
Greetings from my layover flight on my way from CppCon...
Sorry for the delayed reply, I needed to give it some thinking and I was
busy travelling and coferencing :-)
Lots of interesting points there. Let me try answer one by one:
Thanks! The app is basically MVC but the model also represents
external device state which must be kept in sync. We need to support
up to 250 external devices, each of which has about 10-300 settings.
Are the settings dynamic (stored in a hash-map or alike), or static
(just structs)?
Either way, that should be easily manageable with an immutable approach.
When a command is fired from the UI it needs to map to about 1-50
individual settings changes as well as some app state changes and many
of these are asynchronous. Each setting has to be successfully sent
and acknowledged by the external device(s). We must accurately track
each device's current settings in memory because if we try to set
something but it was already at that value, we get no acknowledgment
from the device and have to assume something is wrong. And eventually
we would like to have this all work transactionally so that if any of
the settings requests fail that the others are rolled back to return
to consistency. Additionally, each command needs to be able to be run
in parallel with others; we cannot force them to run sequentially
since they can be delayed with the network round-trips and multiple
external APIs that we must update at the same time.
As you guessed, this sounds like a very good use-case for immutable
data-structures, since launching asynchronous operations or keeping old
values around is trivial and extremelly performant (no locking or
copying data needed for either case). This means that concurrency and
transactions come basically "for free". There is a performance penalty
when updating the data-structure, but it is still reasonable, and
probably compensated by the improvements in the other parts.
So far everything is working pretty well with some annoying caveats
(annoying to a software architect, not so much the users). But new
requirements are coming all the time and making the software harder
and harder to reason with. I think that functional programming
techniques like map/reduce/filter and immutable data structures would
dramatically simplify the code and lessen brain fatigue :). But I
don't want to lose performance in the meantime. The app is extremely
performant right now and we want to keep it that way.
I agree!
Besides using immer for storing the state of the app, are there any
optimized libraries for implementing map/reduce/filter in a performant
way in C++, similar to what GHC does for Haskell
(https://stackoverflow.com/a/35038374/4307047)?
It depends on what you want to achieve with map/filter etc. Ranges-v3
would be something recommendable in this scenario. You may as well take
a look at the work I did on "transducers" a few years ago:
https://www.youtube.com/watch?v=vohGJjGxtJQ. In either case, it feels to
me that finding the right data-model (based around immutable data) is a
more important challenge for you right now, since it is the most
architecturally-relevant change. Having good map/filter like API's is
more of a nice to have thing. Also note that a lot of things are just
different in Haskell due to lazy evaluation (and not necessarily
faster!) which fundamentally changes the way to think about recursion.
One reason why I'm asking about performant diffing / patching is
because I could see that as a way to answer some of these problems in
a more encapsulated way and to help with merging the state of
asynchronous commands into the rest of the application state.
The merging you can already do by carefully using the API provided by
Immer to make sure updates are done in the most efficient way
(e.g. avoid unnecessary allocations etc.) If you are going to need
immer::map or immer::set, note that "transients" are not yet supported
(but they are in the road-map). This is an optimization that would
allow you to gain some extra cycles, depending on how your data is
structured. If this eventually becomes critical for you, please
consider sponsoring that work!
Another reason is that we have hundreds of controls in our UI and we
have specific algorithms and signal/slot mechanisms to make sure that
only the minimum number of OpenGL draw calls are run to update only
the control(s) that have changed rather than re-drawing the whole
screen.
This is the part where diffing could be useful, to reconstruct the set
of changes from an mixed set of updates. There are some ways you can do
this without native immer support, and depending on your data, it might
be enough. Another technique you can used to is to provide "hints"
through your event paths, that preserve information about the nature or
scope of changes that occurred, such that UI updates can be done more
efficiently.
Our app runs on limited hardware so the user experience would
be pretty bad and the processor would be running much hotter if we had
to redraw everything all the time. I think that perhaps transitioning
into a map/reduce architecture and hooking the UI updating signal/slot
into the end of that pipeline might be an answer here but I'm only
speculating.
It is hard for me to provide you very specific performance guesses
without knowing the details of the application or the hardware. With
simple immutable data the data is often flatter and more compact than
with observable object trees (QObject and equivalent) allowing much
faster processing. Plus you can remove a lot of signal/slot firing, and
almost all mutex locking, highly improving latency and time wasted in
coordination. In the end, the only solution is really to prototype and
meassure.
Also, do you have any recommendations for maintaining pointers between
nodes in an immutable document, which stay up-to-date across changes?
The current design of our app does this everywhere. But every time I
try to think about applying that to an immutable structure, my brain
goes into an infinite loop. :) I suspect the only answer would be to
store the identity or path from the document root and then re-traverse
that path after every update to find the latest changes. Am I missing
an easier/faster way?
As you guessed, you can't do that. That is the whole point of immutable
data! If you reference something else, you need to either:
1. Store that something else, as a value, fully in the thing that
references it. Note that if you do this right the data would be
stored only once, just referenced from multiple places. But since
the data is still logically just a tree, when you update a property
of the referenced thing, you would need to update it everywhere.
2. Give the thing you want an identity, just as you sketched.
I suspect you need the second. I would argue that depending on how you
propagate the changes, this shall not be a problem at all. One tool you
can use to easily observe the changes from the outside (from mutable
interfaces) is using "lenses". I did an implementation some time ago in
a library called Atria (http://github.com/ableton/atria) where lenses
helped bind an underlying value-based data-model to a QObject based
model API that was used to build an UI using QML.
Thinking about performance in this new world is hard, because the
trade-offs are reversed from that of a mutable world. Specially if your
system is highly concurrent, I would suspect you would actually see
performance gains, even though some operations (certain classes of
updates) become slower in a single-threaded scenario. You also have to
get used to modelling things and programming in a different way, and
there are certain pitfalls that you have to avoid (like in every
programming model!). Also, the ability to safely reason locally and
separation between logic and data that this promotes is a great tool
when optimicing system, for one can be more playful with representation
and implement "global optimizations".
If you can afford it, try to make a prototype to prove yourself that
this is the right direction. I work as consultant and contractor so I
may be able to help with that.
Cheers!
JP
… Bob
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#58 (comment)
--
∿∿∿ https://sinusoid.al
∿∿∿ https://sinusoid.es
|
JP,
Wow! Thanks for such a complete reply. Your suggestions were very helpful
and I've had a chance now to dive into ranges-v3, atria/transducers and
immer, with some forays back to react/redux, over to elm, and then learning
a bit of haskell and clojure too. My brain is tired but very happy. :)
Are the settings dynamic (stored in a hash-map or alike), or static
(just structs)?
Either way, that should be easily manageable with an immutable approach.
I designed a state tree for the app with immutability in mind and it
immediately became clear that the app has evolved since its inception to a
point where its state is really quite static, so structs and immer::arrays
seem to be the answer rather than immer::maps, at least for a first pass.
Also although we have to support thousands of external device settings, the
UI-related settings are really the only ones that need to be stored in
immutable persistent state. Those revelations have opened up some really
neat ideas of performance improvements especially related to serialization
and reasoning about differences (thanks to boost.hana). Also, we now
finally have a use case for C++'s pointer-to-member operator and variadic
templates to make accessing and updating specific paths of the (immutable)
state tree re-usable and fast. Very cool!
> Another reason is that we have hundreds of controls in our UI and we
> have specific algorithms and signal/slot mechanisms to make sure that
> only the minimum number of OpenGL draw calls are run to update only
> the control(s) that have changed rather than re-drawing the whole
> screen.
This is the part where diffing could be useful, to reconstruct the set
of changes from an mixed set of updates. There are some ways you can do
this without native immer support, and depending on your data, it might
be enough. Another technique you can used to is to provide "hints"
through your event paths, that preserve information about the nature or
scope of changes that occurred, such that UI updates can be done more
efficiently.
I realized that with boost.hana and given the static nature of the state
tree, all state tree paths can be represented by a vector<int>. If the
type is a struct, then the int is the boost.hana tuple offset index of the
member to access. If the type is immer::array, then the int is an offset
into the array. So for serialization, we can iterate both trees and for
each difference generate this vector<int>, send it across the wire and
apply the same update on the other side. The largest individual array in
the tree is 27 elements, so iterating over that shouldn't really be a
performance issue. And because immer::array efficiently overlaods == we
can avoid going into any array if it hasn't changed. Simple and beautiful.
One tool you
can use to easily observe the changes from the outside (from mutable
interfaces) is using "lenses". I did an implementation some time ago in
a library called Atria (http://github.com/ableton/atria) where lenses
helped bind an underlying value-based data-model to a QObject based
model API that was used to build an UI using QML.
I searched for "lens" in atria but didn't find anything. Has the name
changed? How is this different from view_adaptors in ranges-v3?
Thinking about performance in this new world is hard, because the
trade-offs are reversed from that of a mutable world. Specially if your
system is highly concurrent, I would suspect you would actually see
performance gains, even though some operations (certain classes of
updates) become slower in a single-threaded scenario. You also have to
get used to modelling things and programming in a different way, and
there are certain pitfalls that you have to avoid (like in every
programming model!). Also, the ability to safely reason locally and
separation between logic and data that this promotes is a great tool
when optimicing system, for one can be more playful with representation
and implement "global optimizations".
I agree. A very welcome paradigm shift. Thinking in FP after thinking for
years in OOP feels like finding an island after trying to tread water in a
monsoon. :) For instance, one of the recent challenges in our codebase
spanned three classes, each containing about two dozen lines of code, and
was too complex to bother fully implementing. I just mocked up an FP
solution for the same problem using ranges-v3 in about a dozen dead-simple
lines of code in one function. And it is a full implementation which
covers every case. Amazing.
Cheers!
Thanks again!
Bob
|
Hi Bob!
Wow! Thanks for such a complete reply. Your suggestions were very helpful
and I've had a chance now to dive into ranges-v3, atria/transducers and
immer, with some forays back to react/redux, over to elm, and then learning
a bit of haskell and clojure too. My brain is tired but very happy. :)
You're welcome. And wow, you are learning very fast! It is a fun
journey though :)
I designed a state tree for the app with immutability in mind and it
immediately became clear that the app has evolved since its inception to a
point where its state is really quite static, so structs and immer::arrays
seem to be the answer rather than immer::maps, at least for a first pass.
Also although we have to support thousands of external device settings, the
UI-related settings are really the only ones that need to be stored in
immutable persistent state. Those revelations have opened up some really
neat ideas of performance improvements especially related to serialization
and reasoning about differences (thanks to boost.hana). Also, we now
finally have a use case for C++'s pointer-to-member operator and variadic
templates to make accessing and updating specific paths of the (immutable)
state tree re-usable and fast. Very cool!
Yeah! That sounds to me like the right direction. The flatter are more
static the data is, the better. So structs and arrays sound great if
you can get away with it (and more often than one would think, you
can!).
One note: basically immer::array and immer::vector are very similar, and
which one to use depends on the ammount of data and the pattern of
updates. Since the API is the same, it makes sense to start with arrays
and later move to vectors or flex vectors depending on the use-case.
I realized that with boost.hana and given the static nature of the state
tree, all state tree paths can be represented by a vector<int>. If the
type is a struct, then the int is the boost.hana tuple offset index of the
member to access. If the type is immer::array, then the int is an offset
into the array.
Exactly! And since everything is very flat and static, it sounds like
descending such tree would be _very_ fast. Also one note: I think boost
hana also supports addressing struct members via constexpr strings.
Depending on your use-case, it might make versioning the data easier.
So for serialization, we can iterate both trees and for
each difference generate this vector<int>, send it across the wire and
apply the same update on the other side. The largest individual array in
the tree is 27 elements, so iterating over that shouldn't really be a
performance issue. And because immer::array efficiently overlaods == we
can avoid going into any array if it hasn't changed. Simple and beautiful.
Completely! Sounds like you are really grasping how this paradigm works,
and how the duality between state snapshots and diffing works.
I searched for "lens" in atria but didn't find anything. Has the name
changed? How is this different from view_adaptors in ranges-v3?
It is a module called `funken`. The examples (unit tests) are kind of
minimal, so it might be a bit hard to grasp how to use it in practice.
Anyways, maybe you won't really need it.
> Thinking about performance in this new world is hard, because the
> trade-offs are reversed from that of a mutable world. Specially if your
> system is highly concurrent, I would suspect you would actually see
> performance gains, even though some operations (certain classes of
> updates) become slower in a single-threaded scenario. You also have to
> get used to modelling things and programming in a different way, and
> there are certain pitfalls that you have to avoid (like in every
> programming model!). Also, the ability to safely reason locally and
> separation between logic and data that this promotes is a great tool
> when optimicing system, for one can be more playful with representation
> and implement "global optimizations".
>
>
I agree. A very welcome paradigm shift. Thinking in FP after thinking for
years in OOP feels like finding an island after trying to tread water in a
monsoon. :) For instance, one of the recent challenges in our codebase
spanned three classes, each containing about two dozen lines of code, and
was too complex to bother fully implementing. I just mocked up an FP
solution for the same problem using ranges-v3 in about a dozen dead-simple
lines of code in one function. And it is a full implementation which
covers every case. Amazing.
Awesome :)
I agree it is a very important tool. In the end, C++ is a multi-paradigm
language. I believe that FP comes really handy when dealing with
architecture and application logic stuff, but one can still use the most
imperative style in low-level code in self-contained capsules for
performance. It takes time to get a good intution to when to do that.
In the end, we are all learning example by example, including myself.
I am indeed curious about how your project goes. Let me know eventually
how it goes, whether it is a success, but also if you find challenges or
flaws in the approach.
Cheers!
JP
…--
∿∿∿ https://sinusoid.al
∿∿∿ https://sinusoid.es
|
Thanks JP. Yeah it's a wild new world, and fun to be on the cutting edge doing it in C++. :) I appreciate your labors to explore this frontier!
Could you elaborate how this would help with versioning? And is there a way to use those strings at runtime? I'm still green with understanding TMP-to-runtime communication and I haven't actually started using boost.hana yet. |
It helps with versioning if you are persisting these data-structures and cursors into the data-structure, such that the cursors become more resilient if you reorder the elements of the structs. And yes, I think you can use those strings at runtime too, just iterate over the members of the struct (at compile time or run-time), which might come super-handy to do the diffing in a generic way: https://www.boost.org/doc/libs/1_61_0/libs/hana/doc/html/group__group-Struct.html |
Ok that makes sense. Very cool.
…On Mon, Oct 8, 2018 at 3:11 PM arximboldi ***@***.***> wrote:
Could you elaborate how this would help with versioning? And is there a
way to use those strings at runtime? I'm still green with understanding
TMP-to-runtime communication and I haven't actually started using
boost.hana yet.
It helps with versioning if you are persisting these data-structures and
cursors into the data-structure, such that the cursors become more
resilient if you reorder the elements of the structs. And yes, I think you
can use those strings at runtime too, just iterate over the members of the
struct (at compile time or run-time), which might come super-handy to do
the diffing in a generic way:
https://www.boost.org/doc/libs/1_61_0/libs/hana/doc/html/group__group-Struct.html
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#58 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACoMHj--_4brzoj-AlZIhDo34ROgqiuHks5ui6M8gaJpZM4Wy8lk>
.
|
Juanpe,
I'm currently considering options to simplify the application state management in a large app. So far the whole app has been coded imperatively and it gets somewhat tricky because one requirement we have is to track and send updates externally of only the state that has changed. It's becoming somewhat unmanageable as more new global features are being added...I guess we're just reaching the point of diminishing returns with the current approach. So I started looking into functional programming techniques which led me to immutable data structures and I came across your library. This is exciting stuff. One thing that is holding me back though from testing immer is the lack of diff / patch support. Since high performance is also a requirement but since we need to be very aware of updating only what has changed, support for efficient diffing would be essential. What is the status of this? And what is the level of difficulty to implement it?
The text was updated successfully, but these errors were encountered: