Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new 'sync' action for easy local <-> remote copying #20

Merged
merged 6 commits into from
Jan 4, 2016
Merged

Added new 'sync' action for easy local <-> remote copying #20

merged 6 commits into from
Jan 4, 2016

Conversation

cederberg
Copy link
Contributor

Not the very cleanest of implementations. A few rough corners:

  • Compares on file name and size only.
  • Doesn't preserve last modified timestamps. FIXED
  • Re-uses other actions as helpers, which is non-optimal... FIXED
  • Output is not understandable (each action outputs differently) FIXED
  • The --delete option doesn't work if multiple B2 file versions exists.
  • Symlinks and other special files are handled as ordinary files.

The comparison should reuse the SHA-1 calculated during the upload, but it doesn't seem to be available via the API.

Most other issues above seem easy enough to address with a few refactorings here and there. But I won't embark on that unless this PR seems interesting to you.

@bwbeach
Copy link
Contributor

bwbeach commented Dec 31, 2015

Sorry for the delay getting back to you. I've been enjoying the holidays and haven't been on the computer much.

I think this would be a useful addition to the program. Using last-modified timestamps would be good. The recommended way of doing that is documented in the b2_upload_file API.

@ppolewicz
Copy link
Collaborator

FWIW, I am wondering if this should be part of the, or rather a program which uses the b2 python library. This functionality can almost be used as a backup application, which was not the original intent of this repository (if I understood the intent correctly).

Don't get me wrong - I think the idea of creating something like this is marvelous, but new features for the backup application part can easily overtake the rest of the b2 command line tool. To name a few features:

  • file hashes can be cached
  • only files with certain extensions / file attributes can be uploaded (Windows "Archive" bit)
  • many files should be uploaded in parallel (using threads, processes or greenlets)
  • inotify could be used for triggering the upload
  • to cut down sync time with many files, metadata would be cached in a remote and local journal
  • support for seemless synchronization from/to many computers at the same time would be nice

Also, as progress bars are supported now, progress bar of the sync operation should also be supported.

Therefore I gently suggest to consider putting this functionality in a separate tool (b2sync?), which would use the current b2 tool or the library interface from #11.

@cederberg
Copy link
Contributor Author

@ppolewicz: A sync command or similar is baked into the command-line tools for every other cloud file storage. For a good reason, since it is the most practical way to upload a larger set of files. See:

Also, don't mixup sync with backup. Being used by a backup implementation isn't the same as being one. This is similar to b2 ls in that is attempts to provide a usable command-line abstraction for the raw API. Which, after all, is the purpose of this tool.

@ppolewicz
Copy link
Collaborator

Ok, that's a good point.

Are you willing to do some more work on the implementation of the sync command?

@cederberg
Copy link
Contributor Author

I've improved upon a few things (see updated PR comment at the top). Tried to avoid modifying existing functions as much as possible, in order to keep the patch reasonable in size.

In the future, the command-line option parsing and the API calls shoud probably share more code with other functions. In particular, the iter_file_list function was written to be easily used by ls.

@ppolewicz ppolewicz mentioned this pull request Jan 3, 2016
23 tasks
@ppolewicz
Copy link
Collaborator

This will conflict with #11. I will merge your changes with mine and submit it inside #11, so that you don't have to deal with conflicts yourself. Please do not modify the code for a couple of days.

@cederberg
Copy link
Contributor Author

Long-running branches (with massive refactorings) always cause conflicts. That's exactly why they should be avoided. In particular for collaborative projects.

If you don't intend to block all other activity on this project, #11 must be split into multiple small patches. Ones that can be inspected, understood and merged separately (in order of course).

@ppolewicz
Copy link
Collaborator

At the time it seemed like a good idea: when #11 was started, there was almost no other activity in this project.
I do small commits which can be reviewed separately. I also test my changes manually, so it could be merged at any time with keeping backward compatibility. Now we also have a test script done by @bwbeach
Ah, also, #11 will be finished today. After this option parsing, tests and CI should help us a lot.

@bwbeach
Copy link
Contributor

bwbeach commented Jan 4, 2016

I'm going to try resolving the merge conflicts, now that #11 is merged in.

In the future, smaller, more incremental changes would be better.

@ppolewicz
Copy link
Collaborator

@cederberg now that #11 is merged, I tried merging your changes. They apply with just minor problems.

The sync command doesn't work due to changed interface (for example now to get a bucket id, one gets a Bucket object from Api and checks the id_ field on it).

When trying to fix it, I noticed something and I am wondering if you'd agree with me, that sync should be an operation which is performed on a Bucket object rather than on the API itself. This way someone can use the sync functionality both from the console tool and from a python program which will decide how and when to sync which bucket with which directory. Putting it in the Bucket class would also make some things easier: Bucket has access to API and indirect access to account_info.

I also have another question - do you want my help in resolving the interface changes or do you prefer to do it yourself?

@bwbeach
Copy link
Contributor

bwbeach commented Jan 4, 2016

I have it just about working. There will be a few TODO items in comments, but it's working.

@bwbeach bwbeach merged commit b571cd8 into Backblaze:master Jan 4, 2016
@bwbeach
Copy link
Contributor

bwbeach commented Jan 4, 2016

Merge is done. Let me know if you see any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants