Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk insertion : avoid commiting each time a triple is added #9

Closed
fconil opened this issue Feb 20, 2014 · 5 comments
Closed

Bulk insertion : avoid commiting each time a triple is added #9

fconil opened this issue Feb 20, 2014 · 5 comments

Comments

@fconil
Copy link
Contributor

fconil commented Feb 20, 2014

Hello,

The current behaviour of the plugin make SQLAlchemy work in autocommit mode.

Each time a triple is added, it is committed making bulk insertion very slow : 1 min 30 s for 500 triples with SQLite, 25 seconds with MySQL (using the triples files of rdflib-benchmark).

The old rdflib-mysql plugin was not issuing a commit on each triple insertion but only when the commit method of the store was used.

I made an quick and dirty change of the plugin to test the impact on performance : begin a transaction when the store is opened and commit only when the store commit method is called.
In this context, 500 triples are added in 0.3 second for SQLite and 1.15 seconds for MySQL.

https://github.com/ktbs/rdflib-sqlalchemy/blob/avoid_autocommit/rdflib_sqlalchemy/SQLAlchemy.py

Maybe autocommiting could be a store parameter ?

Regards

@gromgull
Copy link
Member

Have you tried the Graph.addN method? I've not really worked on the SQLAlchmey store, but hopefully it has been overridden to add triples in one transaction?

@fconil
Copy link
Contributor Author

fconil commented Feb 20, 2014

I have submitted a pull request for Graph.addN method which does not currently write the content in the database.
I do not think if I can give addN an arbitrary set of triples (if I try to split the initial set in a set of 100 triples for instance).
We have an application which writes a few triples at a time but this application may have many little sets to write, so the commit time has a great impact on insertion time.

@pchampin
Copy link

For the record, I think @fconil touched a broader issue here, which I tried to summarize as issue #357 on rdflib.

@PonteIneptique
Copy link
Contributor

I am bumping this quite old issue as it is quite relevant to my use cases as well. Currently, the performance drop using SQLAlchemy store is mostly in execute / commit time for my project. Trying to see if it would be possible to propose a simple fix for this issue based on former code by @fconil

@mwatts15
Copy link
Collaborator

I'm closing this issue since the stated problem of bulk insertion is well supported by addN().

I do, however, see the need to support more flexible transaction management for other uses cases involving multiple reads and writes to the triple store and for external transaction management. Currently, there's a mix of manual transaction management logic and absence thereof which needs to be removed to support sqlalchemy's existing transaction management. This does require additional application logic to manage transactions, but I see that as just the cost of working with a transactional store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants