Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk loading #311

Closed
coleifer opened this issue Feb 17, 2014 · 8 comments
Closed

Bulk loading #311

coleifer opened this issue Feb 17, 2014 · 8 comments

Comments

@coleifer
Copy link
Owner

Bulk loading performance improvements, see Maxime's post on mailing list.

$ python AAA.py
How many lines to insert ?50000
One Big Query
('Done in ',0.3364,'seconds')
Many queries, no transaction
('Done in ',453.3142,'seconds')
Many queries, using transaction
('Done in ',9.6940,'seconds')
#!/usr/bin/python
# -*-coding:Utf-8 -*

from peewee import *
from peewee import RawQuery
import time

db = MySQLDatabase('*****',user='root',passwd='*****')

class Message(Model) :
    messId = IntegerField(primary_key=True)
    messState = CharField()

    class Meta:
        database = db

    @staticmethod
    def massAdditionTest(idList,state):

        print("One Big Query")
        Message.drop_table()
        Message.create_table()

        start = time.time()
        if len(idList)>0 :
            sql = list() 
            args = list()

            for id in idList :
                sql.append( '(%s,%s)' )
                args += [id,state]

            sql = 'INSERT INTO `message` (`messId`,`messState`) VALUES ' + ','.join(sql)

            q = RawQuery(Message,sql,*args)
            q.execute()
        end = time.time()
        print('Done in ',end-start,'seconds')



        print("Many queries, no transaction")
        Message.drop_table()
        Message.create_table()

        start = time.time()
        if len(idList)>0:
            for id in idList :
                Message.create(messId=id,messState=state)
        end = time.time()
        print('Done in ',end-start,'seconds')



        print("Many queries, using transaction")
        Message.drop_table()
        Message.create_table()

        start = time.time()
        if len(idList)>0:
            with db.transaction():
                for id in idList :
                    Message.create(messId=id,messState=state)
        end = time.time()
        print('Done in ',end-start,'seconds')


if __name__ == "__main__":
    n = input('How many lines to insert ?')
    Message.massAdditionTest(range(n),'test')

So I said something wrong : the 'transaction' method is much more better than the one without transaction.
But it is also true that the first method is much quicker.

Is there a way to do it with peewee without using the RawQuery ?

@coleifer
Copy link
Owner Author

@themantalope
Copy link

Hey, did you ever finish the page for bulk inserts? I'm having very poor performance inserting bulk data with atomic and insert_many on my model?

@coleifer
Copy link
Owner Author

http://docs.peewee-orm.com/en/latest/peewee/querying.html#bulk-inserts

What database? Poor compared to what?

@themantalope
Copy link

themantalope commented May 28, 2016

I'm using a sqlite database. Even for a relatively simple insert of about 2000 rows of 5 ints (per row) it takes about 5 seconds for peewee to insert.

@themantalope
Copy link

I upgraded my sqlite version. Before I was only able to insert about 200 rows per transaction due to that version of sqlite's SQLITE_MAX_VARIABLE_NUMBER. I can now insert over 40,000 rows in one with db.atomic(): block, however it can take as long as 80 seconds to complete the insert. Any ideas on what I can do with peewee to speed this up? I have millions of rows that I would like to insert.

@themantalope
Copy link

I'm really sorry to bother you, @coleifer , but I have one more question. Based on what I've been reading, it seems that for doing large bulk inserts it's better to insert the data into a table without an index and then build the index for the table later. Is this possible to do with peewee? If this isn't the correct place for this question let me know and I'll post it elsewhere. I wasn't able to find anything on SO regarding the topic for peewee.

@coleifer
Copy link
Owner Author

There's a great StackOverflow thread on speeding up SQLite inserts. Summary of tips:

  • Create your indexes afterwards.
  • Use PRAGMA synchronous=0; when inserting
  • Use bulk insert syntax
  • Use transactions
  • Batch it up.

For peewee, you can create the tables and indexes separately by running:

db = SqliteDatabase(...)

db.create_table(ModelClass)
db.create_table(Model2)

# do insert

ModelClass.create_indexes()
Model2.create_indexes()

@themantalope
Copy link

themantalope commented Jun 1, 2016

Sorry to get back to you so late, I didn't see this email. Thanks for the
response, I really appreciate it. I was able to speed up the insert by
turning off auto-indexing and doing the other things you mentioned from SO
discussions. I may be doing another large insert again this week and I'll
try your suggestions regarding peewee's create_indexes method. Thanks again for your input!

On Mon, May 30, 2016 at 6:35 PM, Charles Leifer notifications@github.com
wrote:

There's a great StackOverflow thread on speeding up SQLite inserts.
Summary of tips:

  • Create your indexes afterwards.
  • Use PRAGMA synchronous=0; when inserting
  • Use bulk insert syntax
  • Use transactions
  • Batch it up.

For peewee, you can create the tables and indexes separately by running:

db = SqliteDatabase(...)

db.create_table(ModelClass)
db.create_table(Model2)

do insert

ModelClass.create_indexes()
Model2.create_indexes()


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#311 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AHP3Bx0A1GZZlqlCNTZnmZLztVwkmCoYks5qG3Q-gaJpZM4BilCr
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants