Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync fails when files being naively written to #24

Open
mjg0 opened this issue May 3, 2019 · 2 comments
Open

Sync fails when files being naively written to #24

mjg0 opened this issue May 3, 2019 · 2 comments
Labels
limitation Known problem with no planned path for solution

Comments

@mjg0
Copy link

mjg0 commented May 3, 2019

Background: I'm trying to set up rclonesync.py for an HPC environment, meaning that good I/O practices can't be guaranteed. Users who don't know how to avoid hammering files, or who of necessity use programs that are not designed for HPC/don't handle I/O properly, could be writing to files many times per second at sync time.

When a file is being constantly written to, though, running rclonesync.py fails hard enough that it must be rerun with --first-sync before it can be run normally again. I can understand why it would fail--a file that's getting hammered isn't a good candidate for syncing--but is there anything preventing the failure from being graceful, i.e. allowing the next try to succeed without requiring --first-sync? I know that bidirectional syncing is a hard problem, so I may be missing something obvious that means one simply has to be careful not to engage in foolishness like for i in {1..99999}; do echo $i >> myfile; done while syncing. However, if there is a way to more gracefully handle such cases, it would mean a more robust rclonesync.py.

The following case illustrates my point. In essence, I run rclonesync.py while a local file is being written to 100 times per second, and cause a critical error abort:

$ rclone mkdir dropbox:rclonesync_testdir
$ mkdir rclonesync_testdir
$ cd rclonesync_testdir/
$ echo "file 1" > file1.txt
$ rclone copy file1.txt dropbox:rclonesync_testdir
$ rclonesync.py . dropbox:rclonesync_testdir --first-sync # works
2019-05-03 16:22:24,816:  ***** BiDirectional Sync for Cloud Services using rclone *****
2019-05-03 16:22:24,845:  Lock file created: </tmp/rclonesync_LOCK_._dropbox__rclonesync_testdir_>
2019-05-03 16:22:24,846:  Synching Path1  <./>  with Path2  <dropbox:/rclonesync_testdir/>
2019-05-03 16:22:24,846:  Command line:  <Namespace(Path1='.', Path2='dropbox:rclonesync_testdir', check_access=False, check_filename='RCLONE_TEST', config=None, dry_run=False, filters_file=None, first_sync=True, force=False, max_deletes=50, no_datetime_log=False, rc_verbose=None, rclone='rclone', rclone_args=None, remove_empty_directories=False, verbose=False, workdir='/fslhome/micgre93/.rclonesyncwd')>
2019-05-03 16:22:24,846:  >>>>> --first-sync copying any unique Path2 files to Path1
2019-05-03 16:22:26,280:  >>>>> Path1 Checking for Diffs
2019-05-03 16:22:26,280:  >>>>> Path2 Checking for Diffs
2019-05-03 16:22:26,280:  >>>>> No changes on Path2 - Skipping ahead
2019-05-03 16:22:26,280:  >>>>> Synching Path1 to Path2
2019-05-03 16:22:26,876:  >>>>> Refreshing Path1 and Path2 lsl files
2019-05-03 16:22:27,494:  Lock file removed: </tmp/rclonesync_LOCK_._dropbox__rclonesync_testdir_>
2019-05-03 16:22:27,494:  >>>>> Successful run.  All done.
$
$
$ for i in {1..1000}; do sleep 0.01; echo $i >> file2.txt; done & rclonesync.py . dropbox:rclonesync_testdir
[1] 84345
2019-05-03 16:23:31,004:  ***** BiDirectional Sync for Cloud Services using rclone *****
2019-05-03 16:23:31,037:  Synching Path1  <./>  with Path2  <dropbox:/rclonesync_testdir/>
2019-05-03 16:23:32,648:       1 file change(s) on Path1:    1 new,    0 newer,    0 older,    0 deleted
2019/05/03 16:23:33 ERROR : file2.txt: Failed to copy: upload failed: Post https://content.dropboxapi.com/2/files/upload: can't copy - source file is being updated (size changed from 288 to 380)
2019/05/03 16:23:33 ERROR : Dropbox root 'rclonesync_testdir': not deleting files as there were IO errors
2019/05/03 16:23:33 ERROR : Dropbox root 'rclonesync_testdir': not deleting directories as there were IO errors
2019/05/03 16:23:33 ERROR : Attempt 1/3 failed with 2 errors and: upload failed: Post https://content.dropboxapi.com/2/files/upload: can't copy - source file is being updated (size changed from 288 to 380)
...
[truncated for brevity]
...
2019/05/03 16:23:36 Failed to sync: upload failed: Post https://content.dropboxapi.com/2/files/upload: can't copy - source file is being updated (size changed from 1288 to 1364)
2019-05-03 16:23:36,513:    ERROR    rclone sync failed.  (Line 473)     - ./
2019-05-03 16:23:36,513:  ***** Critical Error Abort - Must run --first-sync to recover.  See README.md *****

Is this fixable? If not, do you have any suggestions on how to handle cases like this? Since adding --first-sync does make the issue at least superficially go away (at least after the file hammering is finished), what would be the implications of using --first-sync every time rclonesync.py is run?

@cjnaz
Copy link
Owner

cjnaz commented May 4, 2019

Interesting problem. I'll think about if its safe to simply skip a file that fails the rclone transfer. For the individual file copy from path2 to path1 I know which file failed, but for the final rclone sync from path1 to path2 it seems unsafe. Hum... Also see issue #8.

Note that there is a risk of data loss with --first-sync. Newer versions of files on Path2 will be lost. Be careful.

I'm out of town for a bit. Perhaps next week I'll get back to this.
Regards

@cjnaz
Copy link
Owner

cjnaz commented May 4, 2019

Perhaps you could have such files to the filters file and entirely ignore them?

@cjnaz cjnaz added the enhancement New feature or request label Aug 2, 2020
@cjnaz cjnaz added limitation Known problem with no planned path for solution and removed enhancement New feature or request labels Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
limitation Known problem with no planned path for solution
Projects
None yet
Development

No branches or pull requests

2 participants