New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to migrate data between engine types. #2

Closed
alexkiro opened this Issue Jun 6, 2014 · 5 comments

Comments

Projects
None yet
2 participants
@alexkiro
Contributor

alexkiro commented Jun 6, 2014

It would be useful to have a script that would allow migrating data between various engine types. (for e.g. migrating the MySQL data to redis)

This doesn't necessarily have to be deployed.

@alexkiro alexkiro added this to the 0.8 milestone Jun 6, 2014

@alexkiro alexkiro self-assigned this Jun 6, 2014

@gryphius

This comment has been minimized.

Contributor

gryphius commented Jun 7, 2014

I'm currently toying with the different backends so I've tried to implement this at https://github.com/gryphius/pyzor/blob/feature_migrationscript/extra/pyzor-migrate.py

the script simply iterates over the source records and writes them the the destination backend which may not be the fastest possible way but gets the job done in a few lines of code

./extra/pyzor-migrate.py --help
Usage: pyzor-migrate.py [options]

Options:
  -h, --help            show this help message and exit
  --se=SOURCE_ENGINE, --source-engine=SOURCE_ENGINE
                        select source database backend
  --sd=SOURCE_DSN, --source-dsn=SOURCE_DSN
                        data source DSN - see pyzor documentation for format
  --de=DESTINATION_ENGINE, --destination-engine=DESTINATION_ENGINE
                        select destination database backend
  --dd=DESTINATION_DSN, --destination-dsn=DESTINATION_DSN
                        destination DSN - see pyzor documentation for format


#copy from a old pyzor version gdbm db to a new one, skipping apparently broken records
./extra/pyzor-migrate.py --se gdbm --sd testdata/pyzord.db --de gdbm --dd testdata/backup.db
Record 822e75200ef9f20e286b21db0b2f5e72972bd34a failed: time data '2014-06-04 11:32:44' does not match format '%Y-%m-%d %H:%M:%S.%f'
100000 records transferred...
Record 0f939289def56f04527bcfd9f75bc6eeeaa8424a failed: time data '2014-05-24 11:59:15' does not match format '%Y-%m-%d %H:%M:%S.%f'
200000 records transferred...
Record 9dfbce490ed985ef715abd14fe4b84969d555b78 failed: time data '2014-06-02 19:36:27' does not match format '%Y-%m-%d %H:%M:%S.%f'
Record 0b21711fbf0e39f43dd6d67086b0edd73a6d7b94 failed: time data '2014-05-12 15:36:42' does not match format '%Y-%m-%d %H:%M:%S.%f'
Migration complete, 237768 records transferred successfully, 4 records failed

# gdbm to redis
./extra/pyzor-migrate.py --se gdbm --sd testdata/backup.db --de redis --dd localhost,6379,,0
100000 records transferred...
200000 records transferred...
Migration complete, 237768 records transferred successfully, 0 records failed

# redis to mysql
./extra/pyzor-migrate.py --se redis --sd localhost,6379,,0 --de mysql --dd localhost,root,,pyzor,public
100000 records transferred...
200000 records transferred...
Migration complete, 237768 records transferred successfully, 0 records failed

is this what the OP had in mind?

@alexkiro

This comment has been minimized.

Contributor

alexkiro commented Jun 7, 2014

is this what the OP had in mind?

Haha, yes it's exactly what I had in mind. And this was the reason why I added the ability to iterate over engines in 71e13f2.

the script simply iterates over the source records and writes them the the destination backend which may not be the fastest possible way but gets the job done in a few lines of code

We don't really care about speed here, as it is only a simple utility script. But let's change to using iteritems. You can do that by doing something like this:

it = db.iteritems()
while True:
    try:
        key, record = it.next()
        # ....
    except StopIteration:
        break
    except:
        # error handling here 

As this is more optimized for MySQL engine. Then submit it as a pull request. Thanks!

@gryphius

This comment has been minimized.

Contributor

gryphius commented Jun 7, 2014

thanks for your feedback.. I tried to change to iteritems() according to your example , but this still causes the iterator to abort after the first exception. I guess if we want to do it this way, we'd have to do some exception handling in the iterators directly?

@alexkiro

This comment has been minimized.

Contributor

alexkiro commented Jun 7, 2014

I guess if we want to do it this way, we'd have to do some exception handling in the iterators directly?

Good point: e12a2ba

alexkiro added a commit that referenced this issue Jun 7, 2014

@alexkiro

This comment has been minimized.

Contributor

alexkiro commented Jun 7, 2014

Looks good, thank you @gryphius! I won't add any tests for this at the moment, since it is only a utility script.

@alexkiro alexkiro closed this Jun 7, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment