-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for mutagen as a sync strategy #603
Comments
Thats interesting, especially the maintain part - the real question is performance. Do you have any data on this? Because without it being faster, we cannot really have a use of it. |
It would worth benchmarking this in details, but it's seems faster to me. A common use case is when running a Also file notifications are natively supported: when changes are made on a file inside the editor (windows side), it's synched within the second on the remote part. |
Hi @EugenMayer, I'm the developer of Mutagen. I feel like I should chime in, because I'm actually still having a discussion with people about Mutagen's integration with docker in mutagen-io/mutagen#34. I'm not really sure that it makes sense to add Mutagen as a strategy for docker-sync. I think that Mutagen is actually capable of synchronizing files directly into the container. The short summary of Mutagen is that it carries around a collection of executables that it can copy to a remote target and then use to synchronize with over a standard input/output stream. Imagine if rsync had a copy of rsync for every platform, and instead of requiring there to be an rsync executable on the remote, it simply copies an rsync executable for the remote platform (e.g. using At the moment, Mutagen supports doing this via Regarding performance, Mutagen generally takes about 1.4x as much time as rsync to synchronize a file tree over SSH, though Mutagen synchronizes changes bidirectionally like Unison. Changes also generally propagate in less than 200 ms, but obviously this depends on the size of the project and your network latency. For the Linux source tree, it's closer to around 400-500 ms. My only concern at the moment is how efficient |
@havoc-io thanks for sharing you opinion! If mutagen is basically a drop in multi platform scp/ssh is in itself slower then rsync as you mentioned, usually even more then 1.4x du to the fact that rsync is faster on resync since it can compare - but still, rsync has no catalog and suffers the same issues scp has, while unison is better on that. SCP also has no syntax for excluding folders, while rsync has - this is essential for the usage with docker-sync, since one wants to sync Actually unison under linux, for heavy load/fast paced chaning filesystem is not to bad, especially considered when it uses inode events.. the problems are huge/fast changes on the whole tree, triggering a cascading amount of events, e.g. branch changes. Every child file triggers, but also the parent folder .. then the queue fills up, the order gets mixed and you get conflicts. But after all its all about FS events. If mutagen is design to run on the host, it will waste a lot of CPU for the horrible fs-events "watchers" available for hfs/apfs/osx. Using OSXFS to move that to linux ( native_osx ) helps a lot here, but OSXFS sometimes just stalls. And this stalling eats inode-events .. and that would end up being the same issue, scp/rsync/unison or mutagen. In the end, we have 2 aspecs: b) is horrible when trying to get it done os OSX, this way, we stick to OSXFS to move that layer to linux since docker has done a low-level-implementation of FS even propagation which is far better then unox/fswatch/whatever under OSX. You might see, that whole thing is a very complex topic and not just "some sync tool". Since every sync tool will need to watch for changes, every OSX based implementation will fall with b) and will become unusable from about 10k files it watches (80% CPU usage.. depends a little on the amount of folders) |
@EugenMayer I think I may have miscommunicated Mutagen's design. Like Unison, Mutagen retains a filesystem cache, performs a three-way content merge, and uses the rsync algorithm to transfer files when propagating changes. It also uses rsync to transfer the serialized snapshot of the filesystem, something Unison doesn't do and which can start to slow down performance when you start talking about 100,000+ files. Mutagen also has support for ignoring paths using a gitignore-like syntax, portably propagating symlinks (including to Windows), and natively watching filesystem locations (using native recursive watching on Windows and macOS, and a hybrid inotify/polling mechanism on Linux that is fast while avoiding exhausting watch descriptors). The design goal for Mutagen is to be able to synchronize 100,000 - 1,000,000 files in effectively real-time.
Absolutely. In any case, I still think Mutagen in the context of docker-sync doesn't really make sense. I think it makes more sense for Mutagen to support synchronization directly into Docker containers. |
Thats not really a point to use or not use it with docker-sync. Docker-sync tries to offer all types of strategies. E.g. the unison strategy does host to container sync, as the rsync strategy. The native_osx strategy does OSXFS to container ( no sync in that regard ) and then unison inside the linux based container ( better FS events ): maybe have a look at https://github.com/EugenMayer/docker-sync/wiki/8.-Strategies In my head, i would try implement mutagen for both, the native_osx strategy and also as a direct host to container sync. The latter actually is only problematic due to the usual CPU usage on OSX for folder watching. And that is my current concern and i see, you are far more digged into that then i expected. @havoc-io can you sum up why one could expect better File-Watching performance then with unox/fswatch? I think that is the main bottleneck for performance right now? Anything else seems very well though already, i just missunderstood you in the first place - sorry! |
I plan to put out a prototype build of Mutagen supporting Docker natively next week to let people play with it. I think it would be useful to experiment with that first before trying to build a docker-sync strategy, but I'm not against the idea of a docker-sync strategy if someone wants to write it.
FSEvents itself is a very lightweight API, so I'm not sure why you're seeing high CPU usage. The Mutagen's use of filesystem watching is separate from its synchronization algorithm. No filesystem watching API (ReadDirectoryChangesW, FSEvents, inotify, kqueue, FEN, etc.) provides enough information to fully track the state of a directory on disk. Either the notifications are unreliable or simply don't contain enough information to reconstruct the state of the directory. As a result, Mutagen only uses filesystem change events to trigger a synchronization cycle (which uses a Unison-like algorithm that does a full rescan with a filesystem cache for performance). Mutagen also has some other clever optimizations. For example, it has a coalescing window on filesystem notifications of 10 milliseconds, so if a user does something like create or remove a directory (where a large number of notifications will be generated within microseconds or milliseconds of each other), Mutagen will wait to see all notifications associated with the operation before it starts a synchronization cycle. If you're only using unox to trigger Unison, then my guess is that you could write a much more lightweight filesystem watcher than the watchdog package. |
A quick update on this issue: I've just put out Mutagen General information about using Mutagen can be found on the official site and Docker-specific documentation is available here. This support is intentionally minimal and low-level. The idea is for this support to be integrated into higher-level orchestration tools and workflows, so a docker-sync strategy might make sense. Because this support is experimental, I have opened mutagen-io/mutagen#41 for feedback. There might be additional features, commands, flags, environment variables, etc. that would be useful for integration. Please feel free to play around with the Docker support that exists and provide any feedback that you might have. I'm happy to help however I can to make integration work smoothly. |
I did have a play with Mutagen natively, but had issues with permissions not being transferred so I gave up. I'd be keen to test this if it gets integrated with docker sync though! |
@matthew-gill If you have a few minutes in the near future, can you open an issue (for Mutagen) with details of the permissions issues you experienced? Mutagen's permission propagation is intentionally somewhat limited. It only sets user permission bits (not group/other) on new files for security reasons (since it might not make sense to have something that's other-readable on your laptop be other-readable in a remote environment). For existing files, it will preserve permissions when updating them, so you can perform a Specifically, are you looking to have group/other permissions transferred by default? Are you looking to have user/group ids transferred? setuid? In the future I'm looking to add "raw POSIX" and "raw Windows" permission propagation modes in the future, so understanding the use case will help me to better understand the best options. |
@havoc-io yes I'll open the issue :) thanks For reference, I'm developing a symfony app using docker, docker runs as root, web server runs as www-data.... need those permissions xferred without having to worry about chmodding if possible!! |
@havoc-io thank you for the help here and the follow up. Currently, due to time constrains and just more important issues to work on with docker-sync, i have to park this issue without any schedule to pick it up. so anybody would like to volunteer here, i would be glad to see this happened. Implementing new strategies is rather simple, docker-sync was made for that. See https://github.com/EugenMayer/docker-sync/wiki/6.-Development Thanks |
@EugenMayer No problem on parking the issue. I may have some time in the not-too-distant future to look into putting together a docker-sync strategy if nobody else volunteers in the mean time. At the moment there are a few users playing with Mutagen's native Docker support, so once we've smoothed out the rough edges there, it might fit in even more easily as a docker-sync strategy. I'll keep you updated. Thanks! |
sounds great! looking forward to any progress and also to answer questions or sparing |
@havoc-io did you had any proper / satisfying results with mutagen? |
@EugenMayer To my understanding, there are a number of people now using Mutagen to synchronize into Docker containers, though I haven't had a chance to work on a docker-sync strategy. This release cycle has been primarily focused on additional synchronization modes and other features. I'm hoping to work on improving scriptability during the next release cycle in a few weeks, which will probably make a docker-sync strategy a bit easier to implement. However, to be quite honest, I don't see myself having time to work on a docker-sync strategy for at least 2 months, and I think it would make more sense to let someone else take up the mantle if they'd like. If you'd like to close this issue for now, I can add a message once the scripting features I've mentioned are finished, and then perhaps someone else can re-open if they want to continue the discussion. |
@havoc-io fair enough, thank you for being transparent! I keep the issue open if somebody wants to pick this up, i think we have time to wait until it would become mission critical - if it will. Thank you for the update! |
I started using mutagen, it's pretty straight-forward. I guess it can be used directly but having it as a strategy might make sense for some people too. If it turns out to be stable personally I probably would use it instead to keep it simple. |
hello @havoc-io , i have just a few questions. services:
php-fpm:
build:
context: .
dockerfile: docker/php-fpm/Dockerfile
target: development
args:
www_data_uid: 1000
www_data_gid: 1000
project_root: . development stage is without copying the files under $PWD into image so it is clean inside the second thing is even i manually wait till synchronization is done with skiped vcs thus in this case
i tried it with linux and also windows another notice: i found out that even there are no changes on the watched path the |
@boris-brtan I would move questions like this to the Mutagen issue tracker so that we don't bug the docker-sync developers too much. But just to answer your questions real quick:
You can ensure that a synchronization cycle has completed by using the
Can you give me some more details on the statistics of the files being synchronized? E.g. directory count, file count, total size, etc.? 3x seems exceptionally high. As I said, perhaps move that discussion to the Mutagen issue tracker.
This is due to Mutagen's Linux file watching strategy which opts for correctness at the cost of CPU usage. You should be able to reduce this by setting a higher polling interval on that endpoint (e.g. |
OK, i will gather the data and create the issue at [havoc-io/mutagen] |
Given that a docker-sync strategy doesn't currently exist, could someone please be so kind as to explain the relative strengths of mutagen vs docker-sync, to assist in choosing between the two tools? To frame the question differently: if (hypothetically speaking) a docker-sync strategy did currently exist, why would one want to use mutagen as a docker-sync strategy, rather than just using mutagen directly? Also, should mutagen be added to the docker-sync alternatives page ( https://docker-sync.readthedocs.io/en/latest/miscellaneous/alternatives.html )? |
Just saw that https://blog.rocketinsights.com/speeding-up-docker-development-on-the-mac/ speaks directly to my first framing of the question. |
I'm guessing that part of the answer to my re-framed question is that using mutagen as a docker-sync strategy would eliminate some of the manual setup involved in using mutagen directly. |
I think you nailed it pretty much - docker sync never opted in to compete with rsync, unison, osxfs or now days mutagen. Rather the purpose was, compared to most alternatives, to provide a convenient wrapper for those strategies with an ability to swap the strategy bei changing one configuration option. The reason for that was mainly to acknowledge the fact that there are very different projects out there. With a lot of files (Drupal), with a lot of files pulled during the build (node\npm) based projects, heavy cache usage inside the project folder during build(Java with Gradle and others) and of course different code sizes in general. Those different characteristics of projects including the variotion of the needed workflow (1way sync, 2 way sync) makes it very key to be able to switch sync strategies. They all have different strengths fitting some of the scenarios better, and some not. So the individual choice per project and need, realizing that yet there is not a single one being generic better then the others in all field, is the reason for docker syncs idea. Not to forget, that bringing the different sync strategies together helps you build a more suistanable and wide community sharing a lot, just picking different startegies in the runtime :) Hope this helps |
A small ping to let everyone know that those |
Just an update, docker-sync does also support |
stale, closing |
Feature Request
Mutagen is a 2-ways synchronisation tool written in Go. Compared to unison, it's maintained, really faster and more stable. It can also be deployed automatically to the remote side through SSH. You may have a look to the README and docs for more informations.
It would be interesting to get Mutagen implemented as a new strategy for docker-sync.
The text was updated successfully, but these errors were encountered: