Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bidirectional sync robustness #8

Closed
cjnaz opened this issue Aug 6, 2018 · 22 comments
Closed

Bidirectional sync robustness #8

cjnaz opened this issue Aug 6, 2018 · 22 comments
Labels
enhancement New feature or request

Comments

@cjnaz
Copy link
Owner

cjnaz commented Aug 6, 2018

Up to version 2.1, rclonesync has used a sync process that makes Path1 the perfect copy, and then uses rclone sync to replicate Path1 to Path2. The changes on Path2 are identified and then individually copied to Path1 (or deleted on Path1 if the file was deleted on Path2). This process flow works great when the changes are infrequent. The issue is when there are changes on the filesystems during the rclonesync run. In certain circumstances the new changes will be lost - see orange labels in pictures and the tables.

Changes are proposed for rclonesync's process flow to lessen the chance of data loss. Please provide your thoughts and feedback on this proposal.

V2.1 and prior sync process flow

image

Sync robustness and risk of data loss in the V2.1 process flow

Changed at Handling Status
A only - Normal case File copied from Path2 to Path1. Good
A and B - Changed again after initially identified as changed B version copied to Path1 and captured correctly in the Final LSLs. Good
B only or F only - Not identified at A Change will be lost in the final rclone sync Path1 to Path2.  Path1 version survives. Data loss.
C only Change will be copied from Path1 to Path2 by the final rclone sync. Good.
A and (D or E) - Changed during file copy or sync operations File access conflict.  rclone will retry.  Change at destination will be lost.  Source version survives. Possible Critical Error Abort
A and F - Changed after copy Path2 to Path1 F change will be lost in the final rclone sync Path1 to Path2.  A version survives. Data loss
G only Change will be copied to Path2 in the next rclonesync run. Good.  Out of sync until next run.
A and H H version captured in Path2 Final LSL and not identified as a delta on the next run.  The next run will push the Path1 version to Path2, losing the H version. Out of sync until next run.  Data loss on next run.

V2.2 proposed new sync process flow

The picture shows handling of changes identified at time A on Path2. With the V2.2 process flow the Path1 handling would be the same as the Path2 handling, so just switch Path1 and Path2 in the drawing.
This sync process might be faster as it eliminates the rlcone sync operation, which I suspect contains its own rclone lsl operations.
image

Sync robustness and risk of data loss in the V2.2 process flow

Changed at Handling status
A only - Normal case File copied from Path2 to Path1. Good
B only or C only - Changed after initial LSLs Change is missed.  File not copied. Good. Out of sync until next run.
A and C - Changed on Path1 after identified as changed on Path2 Path2 version survives.  Path1 version is lost. Data loss
A and (D or E) - Changed or deleted during actual copy File access conflict.  rclone will retry.  Likely E version is lost and D version survives. Possible Critical Error Abort
A and (F or G) - Changed after copy. Identified as different in the two Final LSLs.  Earlier version saved in both Tracking LSLs. Will be caught in next sync run
A and B - B version copied but rclonesync is unaware that it copied B version.  B version identified in Final LSL and updated in Path1 and Path2 Tracking LSLs. Good
@cjnaz
Copy link
Owner Author

cjnaz commented Oct 2, 2018

NOTE that the above "V2.1 sync process flow" is still how it works as of V2.3. The above "V2.2 proposed new sync process flow" should be called "Future proposed sync process flow".

@Jwink3101
Copy link

Jwink3101 commented Apr 24, 2019

I faced this issue before when I wrote my own sync script. (which does not yet support rclone but I am strongly considering adding it).

There are many issues with using an old file list ("lsl") to generate deltas and then propagating deltas:

  • if the old list is corrupted or lost, then you are largely SOL
  • Usually this is accompanied by a destructive sync. This means that you can :
    • lose data (as you noted)
    • Rely on understanding with 100% certainty what is going to happen
  • Having multiple remotes, if I choose to support, becomes slightly harder

I decided on an alternative paradigm. I used old lists to determine:

  • new files
  • deleted files
  • moved files (if you want to track that)

Then to consider which files to move which way, I would just use mod time. Finally, I wouldn't actually rely on a tool of choice to be anything but a dumb transfer agent. So with my tool as it stands now, I used rsync just for efficient transfers.

Now, to speed it up (and this applied to rsync and rclone both), I did use the "sync" mode but I specified files and told it to ignore its own mod-time checks. This reduces overhead (SSH handshake on rsync, file listing on rclone) and, for the case of rclone, lets me specify threads.

addendum: This method can still be affected by adding files during a sync but the implications are pretty minor. I cover it in my FAQs but the basic idea is that you may not sync new data but you are unlikely to lose it (it is possible but very remote chance)

@Jwink3101
Copy link

Just a followup to my post, I added rclone to my tool if you want to compare methodologies. I am not saying mine is better. It is just an alternative approach that may be of interest to you. It was also not written originally with rclone in mind so that brings its own limitations.

Best of luck on yours!

@cjnaz
Copy link
Owner Author

cjnaz commented May 9, 2019

Good to have a link here to your solution. Regards.

@Jayashree-nach
Copy link

Jayashree-nach commented Aug 17, 2020

Merits of bidirectional syncing? If a run a rclone copy from local to remote & remote to local same thing happens.

For both the bidirectional sync and one way sync i've to run a cron job right..

Means kindly jot down the differences

Thanks in adv.

@Jwink3101
Copy link

Merits of bidirectional syncing? If a run a rclone copy from local to remote & remote to local same thing happens.

For both the bidirectional sync and one way sync i've to run a cron job right..

Means kindly jot down the differences

Thanks in adv.

What you are proposing, assuming you use the flag to always keep the newest, is what I call a union-sync.

Files on side A and side B will be propagated both ways keeping the newest.

But, that has some fundamental flaws. Roughly in order:

  • deletes will never be propagated unless you delete on both sides. Otherwise the file will just be recreated on the other side each time! (This is the biggest one to me!)
    • edge cases also include a delete on one side and modification on the other (no idea how this code handles it but mine check for that)
  • Moves are not propagated either. It will copy the same file back and the new one. Now you have two copies. Even if the tool doesn't track moves, a proper sync tool will see a new file and a deleted file.
  • Conflict resolution is both buried in the logs and hard to decipher. By using the keep-newest flags, you always keep the latest file. By not using it and changing the sync order you can always keep one side or the other. But in doing so, you still need to carefully read the logs to figure out what happened.

There's probably more I am missing but I think this covers enough for me to want a proper sync tool.

If you are opposed to it, what you are proposing works fine enough. It's a common rsync question too. But for me who does delete and move files, it's a no-go. I tend not to have conflicts often but when I do, I also like to know about it very clearly.

I hope this helps.

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 17, 2020

rclone sync (and rclone copy) path1 path2 blindly makes path2 look like path1. If you had modified or deleted files on path2 then those changes are lost/overwritten by the path1 content. A bi-directional sync needs to comprehend changes on each side and make the matching changes on the other side.

@Jayashree-nach
Copy link

Jayashree-nach commented Aug 20, 2020 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 20, 2020

You are running rclonesyncs in a loop? The lock file enforces that only one job is running at a time. Try running the rclonesyncs with --verbose repeatedly manually in a cmd window. If the lock comes back, look carefully at the error log.

@Jayashree-nach
Copy link

Thanks. Lemme check nd revert you on queries...

@Jayashree-nach
Copy link

Jayashree-nach commented Aug 24, 2020 via email

@Jayashree-nach
Copy link

Jayashree-nach commented Aug 24, 2020 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 25, 2020

The just posted V3.0 implements some aspects of this thread. Closing this thread as obsolete.

@Jayashree-nach,
I'd suggest running V3.0 on your whole 400GB tree. If only a few files are changing they are sync'd very quickly. It will depend on how quickly the rclone lslruns on your whole tree.
Alternately, you can be selective about which subdirectories you want to include/exclude from the sync. See the FILTERING.md.
I've added some basic intro info at the top of the README.md which may help with what's going on.

regards,
cjn

@cjnaz cjnaz closed this as completed Aug 25, 2020
@Jayashree-nach
Copy link

Jayashree-nach commented Aug 26, 2020 via email

@Jayashree-nach
Copy link

Jayashree-nach commented Aug 28, 2020 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 28, 2020

This is something new. Please open a new issue.

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 29, 2020

I'll copy this into the new issue that you open. So as to get started...

The log isn't providing enough info for debug. The exception got tripped in load_list but there is no exception string, so I don't have any clues for what the problem is. Let's try a few things…

  • Try a run rclonesync (without --first-sync) immediately after your --first-sync run, and in the same CMD window.
  • Please look at the file C:\Users\Administrator/.rclonesyncwd/LSL_D__NewPImage_remote_eMR-Testing__Path2 for anything odd. (On windows use all back-slashes \.) Is it empty? See any odd lines?
  • Are you logged in as the user Administrator for both the --first-sync run and the non --first-sync run? Perhaps there is a permission issue with the prior Path2 lsl file.

Possibly, the issue is related to character encoding

  • Is your locale set to something other than UTF8?
  • If your locale is set to UTF8, are there characters in some of your filename that are not UTF8, and do you see them in the LSL file?
  • If you think it is a locale issue, please post your failing LSL file to https://drive.google.com/drive/folders/1FuHvtoezlesiK4btn0Jr8yhi4VQQ1xOr?usp=sharing. I'll delete it after I've grabbed it. Please try to make your test case reasonable small and avoid personal information.
  • In your batch file, are you setting chcp 65001 and set PYTHONIOENCODING=UTF-8, or running your batch file from a CMD shell where you have already set these?

I have a 3.1 rev with some changes to the error logging in load_list that I'll post if needed.

If it looks like something related to your Path2 LSL file, would you please upload it to
https://drive.google.com/drive/folders/1FuHvtoezlesiK4btn0Jr8yhi4VQQ1xOr?usp=sharing. Please also upload a log of the --first-sync and non --first-sync runs with --verbons --verbose and --rc-verbose --rc-verbose.

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 29, 2020

Try eliminating the loop in your batch file.

Also try running V2.11. It is available at https://github.com/cjnaz/rclonesync-V2/tree/v2.11. If that still doesn't work, please try V2.10. You only need to download (copy, paste into a local file) the rclonesync.py file. You don't need to download everything. You can rename the files to different versions so that they are all available for testing.

image

@Jayashree-nach
Copy link

Jayashree-nach commented Aug 31, 2020 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Sep 1, 2020

If you have some useful info to help isolate this bug, please open a new issue.

  1. Is there any time limit or max_execution_limit / max_file_size
    no

  2. Is this rclone sync work fine on huge volumes of data?
    Yes

  3. Can I be known the max_upload_size ? (Like it takes nearly 4 to 5 hrs for non-first-sync for traversing my 300GB tree)
    The rclone lsl execution time will give you a good indication of the minimum run time, plus the time to transfer changed/new files. How long does it take to do an rlcone lsl on your remote? rclonesync run time will be dominated by your internet bandwidth. How many GB do you expect to change from one rclonesync run to the next? 10GB? 50MB?

  4. Is there any limitations on data
    Same as Q1?

I suggest that you look into rclone mount. If your remote is Dropbox of Google Drive, I suggest that you use their agents on Windows.

@Jayashree-nach
Copy link

Jayashree-nach commented Sep 1, 2020 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Sep 2, 2020

Please continue the issue/discussion in #59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants