Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Svnsync syncer #224

Closed
wants to merge 6 commits into
from

Conversation

Projects
None yet
3 participants

fjuan commented Nov 23, 2011

Hi there,

I'm attaching a syncer implementation based on svnsync, with accompanying passing tests.

The implementation has also been tested in a production environment.

Best regards!

Owner

mrrooijen commented Nov 30, 2011

hi @fjuan

I'm not sure what SVNSync is/does or why you would want to use it over RSync or anything else Backup currently provides. I tried looking up some info briefly but it still isn't really clear to me. Could you provide some use cases/scenarios where SVNSync would be better than what backup currently provides/supports?

Thanks!

fjuan commented Dec 1, 2011

Hello @meskyanichi

You may not backup a SVN repository by copying all files from the filesystem because yoy may cause corruption on both the copy and the actual (master) repository.

The correct approach to backup a SVN repository is to create an slave repository and configure it as "linked with the master repo". Then you can call the command 'svn sync' on the slave and it will fetch all commits, including author, chmod and a lot of stuff from SVN.

This functionality will be helpful if you want to backup a project manager app like chiliproject or trac.

Thanks for this awesome gem.
Best regards,
Francisco

Owner

mrrooijen commented Dec 1, 2011

Interesting. I have only used SVN for a short while so I'm not too familiar with how it works. Any idea if this is also the case with Git itself? Or is it safe to just RSync a whole Git repository?

Also, looking at the code it seems that this is also intended to "pull" data from a production box, rather than push it to the Backup server. Am I right? And it would allow any protocol/host/port so you could sync the SVN repository from a public/protected http server, but are there any other protocols that don't go through an HTTP server that can be used?

I'm thinking of how we should separate "Pushing" and "Pulling" data with for example the RSync Syncer. What should such a thing be called, and would we really need two separate classes to handle it instead of a flag within the same class, etc. (I haven't thought about backing up this way so I'll have to see what the possibilities are, and otherwise we'll have to do two separate classes like you're already doing, if the code gets too messy in a single class).

I definitely like the idea of being able to pull stuff from multiple production boxes to a single backup server with a single backup installation so I'll be looking in to it now and see what can be done.

Owner

mrrooijen commented Dec 1, 2011

Oh, so what I also wanted to know is whether SVNSyncer can both PULL and PUSH data? Or is this only pulling it? I prefer to keep some consistency with the gem. Either everything can pull, everything can push, or everything can do both. Otherwise people might expect certain behavior to be there and then it turns out there isn't.

Cheers!

Contributor

kikito commented Dec 2, 2011

Hi there!

I also worked on this pull request.

"Any idea if this is also the case with Git itself? Or is it safe to just RSync a whole Git repository?"

I believe that git, being distributed, is better prepared for backup copies than subversion. I would not use rsync to duplicate a git repo, though. I'd use git itself - a git pull or fetch, with a git clone initial git clone if needed. It is actually our plan to create another branch for git repo syncing.

Regarding Pushing & Pulling:

This idea of "pushing and pulling" is something that you plan to do, right? Not implemented yet.

It seems to me that at least the current RSync syncer is unidirectional - push only. That's why we created InverseRsync - to be able to pull (By the way, push and pull look like much more descriptive than "normal" and "inverse")

I confess I haven't tried to use svnsync to "push" changes to a server - only to "pull" from them. By looking at the documentation, it seems possible, although it doesn't seem like the most usual case. There doesn't seem to be an obvious difference between the "source" repo and the "mirror". The only requirement is that the user has write access to the mirror.

However, our implementation is pull only - the first step is creating a local subversion repository with svnadmin, and that command is restricted to folders. If there is a way to create remote subversion repositories, we don't know about it, or could not think about it.

I hope this clarifies your questions. Regards!

Owner

mrrooijen commented Dec 2, 2011

Thanks for clarifying.

So generally speaking people tend to "pull" in the case of SVNSync. If barely anyone actually "pushes" to keep mirrors (usually, same like with database replication to mirror, slaves only pull), then I don't mind implementing a "push-only" variant.

As for the InverseRSync, I'm thinking that it might be better to name it to RSyncServer because it resides on the Backup server and not client (production box). What do you think? I'm willing to pull in both pull requests, but I'm always a bit cautious when it comes to "what most people do" and "what to name things to be as descriptive, but good-looking and consistent as possible".

So you would have:

  • RSync (Client)
  • RSyncServer (Server)
  • SVNSync (Server)

But maybe it might be better to name them like so:

  • RSyncClient
  • RSyncServer
  • SVNSyncServer

So that means we'd have to change the naming on RSync -> RSyncClient and S3 -> S3Client, which seems fine to me as long as things are consistent.

The suffix "Server" would indicate whether this should be run from the Server (Backup server), rather than the Client (production server).

If you have any suggestions/feedback/comments on such naming, please let me know.

Cheers!

Note: If you get around to implementing a Git-equivalent (GitSync?) that would be great.

Owner

mrrooijen commented Dec 2, 2011

Or maybe it should be split in to two different types of Syncers:

sync_with Client::RSync do |rsync|
end

sync_with Server::RSync do |rsync|
end

Something like that perhaps and let it's namespace determine whether it's push or pull. But this might be inconsistent with the rest of the Backup configuration file. So maybe:

sync_with RSyncClient do |rsync|
end

sync_with RSyncServer do |rsync|
end

fjuan commented Dec 2, 2011

I really like this idea!

I'll add the Git support and contact you again.

Do you want me to split the server/client syncer?

Best regards,
Francisco de Juan

El 02/12/2011, a las 22:26, Michael van Rooijen
reply@reply.github.com
escribió:

Or maybe it should be split in to two different types of Syncers:

sync_with Client::RSync do |rsync|
end

sync_with Server::RSync do |rsync|
end

Something like that perhaps and let it's namespace determine whether it's push or pull.


Reply to this email directly or view it on GitHub:
meskyanichi#224 (comment)

Owner

mrrooijen commented Dec 2, 2011

Was thinking that maybe namespacing with modules isn't a bad idea ( Server::RSync ) because the whole sync_with section has no relation to the rest of the backup procedure. It's mainly just a wrapper around unix utilities and doesn't follow any procedure. (But it does of course support email notifications on fail and such.)

What do you think of doing this?

module Backup
  module Syncer
    module Client
      class RSync
        ...
      end
    end
  end
end
module Backup
  module Syncer
    module Server
      class RSync
        ...
      end
    end
  end
end
module Backup
  module Syncer
    module Server
      class SVNSync
        ...
      end
    end
  end
end

No Client for SVNSync because people are likely to use it as a Server, rather than Client in real world use cases?

module Backup
  module Syncer
    module Server
      class GitSync
        ...
      end
    end
  end
end

No Client for GitSync because people are likely to use it as a Server, rather than Client in real world use cases?

module Backup
  module Syncer
    module S3
      class Client
        ...
      end
    end
  end
end

No "Server" for S3 because you can't install Backup on S3.

The folder structure would look like this:

./lib/backup/syncers/client/rsync.rb
./lib/backup/syncers/client/s3.rb
./lib/backup/syncers/server/gitsync.rb
./lib/backup/syncers/server/rsync.rb
./lib/backup/syncers/server/svnsync.rb

And the DSL in the backup config file would look like this:

sync_with Client::RSync {}
sync_with Client::S3 {}
sync_with Server::GitSync {}
sync_with Server::RSync {}
sync_with Server::SVNSync {}

What do you think?

fjuan commented Dec 3, 2011

I like the modules approach but there is no such dual behavior for S3, SVN or Git. How about?

sync_with RSync::Client {}
sync_with RSync::Server {}
sync_with S3 {}
sync_with Git {}
sync_with SVNSync {}

Regarding the names, there's always a lot of servers and it may be confusing the client/server usage.

What do you thing about next?

  • RSync::Push => For syncing some directories from the server that executes the backup to a remote sever.
  • RSync::Pull => If you want to backup a folder from a remote server to the one that executes the backup.
  • RSync::Local => If the two folders that you want to keep synced are on the same server (instead of using a flag on RSync::Push)

I would also change the name from GitSync to Git because there is no gitsync utility like svnsync

Francisco de Juan

Owner

mrrooijen commented Dec 3, 2011

Thanks Francisco,

You bring up great points and I think that all that you said is the way to go. If you have time feel free to do it this way:

sync_with RSync::Push {}
sync_with RSync::Pull {}
sync_with RSync::Local {}
sync_with S3 {}
sync_with Git {}
sync_with SVNSync {}

So the folder structure would be like this:

./lib/backup/syncers/rsync/local.rb
./lib/backup/syncers/rsync/push.rb
./lib/backup/syncers/rsync/pull.rb
./lib/backup/syncers/s3.rb
./lib/backup/syncers/git.rb
./lib/backup/syncers/svnsync.rb

I believe?

Owner

mrrooijen commented Dec 3, 2011

If you could implement that in your two pull requests (and rebase with current develop so you're up to date and are able to merge it right in with develop) that would be great. Meanwhile I can go finish other tickets right now. Then hopefully in the next few days we can go ahead and release version 3.0.20 which includes your pull requests, and also the Git syncer if you're able to incorporate it in time.

fjuan commented Dec 3, 2011

Perfect!

Next Monday I'll include this development on my company's planning and send you an estimated date of release.

There's a lot of people using this gem and this change may break their configuration. Do we need backward compatibility? IMHO we should not, but maybe you think different.

Owner

mrrooijen commented Dec 3, 2011

Nah doesn't need to be backwards compatible if it's for the better, and it is. If people want to upgrade and benefit from the great features provided by 3.0.20 they will simply have to update the configuration which takes a few seconds. Maybe more if they spread it across 10+ servers. But if they are logging in to 10 servers to update all 10 gems, then updating the config real quick can't hurt.

I try to avoid backwards incompatibility but sometimes things really need to be updated for better consistency and what not. Plus, 3.0.20 will also break cycling as it currently stands since we're introducing "chunking" backups in to multiple archives which caused there to be an overhaul in the whole cycling process. Also, I want to get most of the DSL changes/backwards incompatibility going now before releasing Backup 3.1 at some point so by that time people are already used to the changes and we can call it a solid 3.1 release. But, before that, I would of course like to run 3.0.20 for a while (and maybe release a few gems quickly after it to patch any potential issues that might pop up) and then see how that goes for a while.

So I'd say just implement it as discussed, making it backwards incompatible. People will either have to stay on 3.0.19 or upgrade to 3.0.20 and update their config. :)

Owner

mrrooijen commented Dec 3, 2011

Also: Thanks for your help!

Contributor

kikito commented Jan 4, 2012

We took a while to do this, but I've finally been able to do something on this regard. I'll do the svnsync part next ( but probably next week)

Contributor

kikito commented Feb 10, 2012

Hi there!

I have been working on this branch again - a bit later than I anticipated ;)

I updated our SVNSync branch with the latest changes from your "develop" branch. I had to make some changes in order to do this: replaced "Exception::xxx" with "Error::xxx" here and there, mkdir is now only available via FileUtils.

With these changes, it should be very easily to merge it into "develop".

I also realized that we forgot to include a template for this pull request. It is included now.

It is also worth noting that this is a PULL-only backup. You can't use svnsync to sync a repo in a remote server. You have to be in the server where the backup is being stored in order to use it. We did not call it "SVNSyncPull" or something similar, because you can't use SVNSync in any other way. We have also included a note about this particularity in the template.

Let me know if you think anything else is required. Regards!

Contributor

kikito commented Feb 16, 2012

I don't recommend using this request any more. Instead, use this one:

meskyanichi#276

That one backs up git and SVN, and it's better designed and tested.

Please close this pull request.

fjuan commented May 4, 2012

As kikito said, this commit has been improved at meskyanichi#276

@fjuan fjuan closed this May 4, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment