Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new version of sqlite, heaps faster #56

Closed
mwotton opened this issue Oct 8, 2014 · 15 comments
Closed

new version of sqlite, heaps faster #56

mwotton opened this issue Oct 8, 2014 · 15 comments

Comments

@mwotton
Copy link
Contributor

mwotton commented Oct 8, 2014

http://permalink.gmane.org/gmane.comp.db.sqlite.general/90549

exciting we should probably upgrade:)

@nurpax
Copy link
Collaborator

nurpax commented Oct 8, 2014

Sure, when it's out of alpha. Would be interesting to run my benchmarks with and without the new version.

@mkscrg
Copy link

mkscrg commented Oct 5, 2015

Anyone working on this upgrade? We'd be happy to jump in if not!

@IreneKnapp
Copy link
Owner

Just as a note, #47 describes the procedure to upgrade the bundled sqlite (we have a "useful" issue label just to make that findable easily). We've had to do this a bunch of times now, I'm not sure if anyone has talked to upstream about whether they know about the issue and what they plan to do about it? Volunteer efforts could productively be spent on that communication.

@vdukhovni
Copy link

May I plead for another sqlite3 upgrade (or see below for an alternative approach)?

I have a schema that describes objects in DNS that can have CNAME chains. I am storing data about domains, the MX hosts of domains, and the IP and TLSA records of MX hosts. This introduces opportunities for nested CNAME chains:

a.example. IN CNAME b.example.
b.example. IN CNAME c.example.
c.example IN MX 0 d.example.
d.example. IN CNAME e.example.
e.example. IN CNAME f.example.
f.example. IN A 192.0.2.1
_25._tcp.f.example. IN CNAME _dane.g.example.
_dane.g.example. IN CNAME. _dane.h.example.
h.example. IN TLSA 3 1 1 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

To process such data, I need to construct a temporary table that represents the transitive closure of the all the CNAME aliases, that stores for each alias its final target.

       CREATE TABLE IF NOT EXISTS "domain"
       ( "qname" VARCHAR(255)
       , "rtype" INTEGER
       , "cname" VARCHAR(255)
       , "dnssec" INTEGER
       , PRIMARY KEY ("qname", "rtype")
       ) WITHOUT ROWID;

       CREATE TEMP TABLE "canonical"
       ( "alias" VARCHAR(255)
       , "rtype" INTEGER
       , "owner" VARCHAR(255)
       , PRIMARY KEY ("alias", "rtype", "owner")
       ) WITHOUT ROWID;

      WITH RECURSIVE
        "dealias"(a,r,o) AS (
            SELECT D."qname" as a, D."rtype" as r, D."qname" as o
            FROM "domain" D
            WHERE D."cname" is NULL
            UNION
            SELECT C."qname" as a,
                   T.r as r,
                   T.o as o
            FROM "dealias" T, "domain" C
            WHERE C."cname" = T.a
              AND C."rtype" = T.r
        )
      INSERT INTO "canonical" ("alias", "rtype", "owner")
      SELECT T.a, T.r, T.o
      FROM "dealias" T

Using the sqlite3 command-line, with my dataset of ~100k domains, this query takes 0.2 seconds on my laptop. However, with the SQLite bundled with Direct-Sqlite, it takes ~16 minutes!

After checking that the problem is not in my code, I ended up building a custom copy of direct-sqlite that uses the system (really homebrew) sqlite libraries, and the query became (unsurprisingly) as fast as in the sqlite3 CLI.

If constant requests for upgrades are a pain, there is perhaps a better way:

I know that upstream SQLite recommends that applications bundle their own copy of SQLite so that they get a stable platform, but I am not convinced that this makes sense for libraries. It seems to make more sense for the choice of SQLite to come from the top level application. So that if I build something that depends on Database.SQLite.Simple -> Database.SQLite3.Direct -> SQLite,
I can choose to bundle my own Database.SQLite3.Direct and a suitable SQLite version, or otherwise just go with the system version.

So a better long-term strategy, rather than constant upgrades of the bundled SQLite in Database.SQLite3.Direct, may be to change the cabal file to have "systemlib" on by default, and allow applications with special requirements to specify a frozen version of their choice. This is easy with stack, right now I have:

extra-deps:
- direct-sqlite-2.3.17
flags:
    direct-sqlite:
        systemlib: true

But I think that should be the default, and one would then use some explicit libsqlite3.a for applicationst that want that sort of stability, which versions is right, rather depends on the application!

I think this will considerably reduce the wear-and-tear on the package, by leaving the choice of the SQLite library to the application.

Thoughts?

@nurpax
Copy link
Collaborator

nurpax commented Oct 31, 2016

@vdukhovni & others, Sorry for being so passive as a maintainer for such a long time. A bunch of real life things changed, so I had a long hiatus from Haskell programming. But I'm getting back into it now and I think I should be able to support this library better too.

Regarding systemlib vs embedding the native library as source. I don't have a strong preference either way. If I can keep up with more frequent releases (and maybe get a backup maintainer), then maybe the default is fine? My concern with systemlib default is how easy it will be to use this library on Windows. With the cbits embedded, it's easy to build without any configuration. Using any type of OSS shared libraries on Windows can be a PITA to get building.

@nurpax
Copy link
Collaborator

nurpax commented Oct 31, 2016

@vdukhovni @mwotton is sqlite 3.15.0 what we should upgrade to?

@nurpax
Copy link
Collaborator

nurpax commented Oct 31, 2016

@mwotton @mkscrg @vdukhovni @IreneKnapp Upgraded sqlite3 to 3.15.0. The PR is up here:

https://github.com/IreneKnapp/direct-sqlite/compare/sqlite-upgrade-3.15.0?expand=1

I'll let that soak in for a day and then prepare a merge & hackage release.

While at it, I also fixed the test suite which was failing on GHC 8.0.

@vdukhovni
Copy link

3.15.0 will do I guess... The idea of using the system libraries by default (can the default be platform dependent?) may still merit further thought/discussion. I am not saying that I'm sure that's the right answer, but I do think it is worth considering.

@vdukhovni
Copy link

Oh, and thanks for the upcoming update!

@nurpax nurpax closed this as completed in 8648ca7 Nov 1, 2016
@nurpax
Copy link
Collaborator

nurpax commented Nov 1, 2016

I just released direct-sqlite-2.3.18 on hackage.

@nurpax
Copy link
Collaborator

nurpax commented Nov 1, 2016

The idea of using the system libraries by default (can the default be platform dependent?)

@vdukhovni Sure, it makes sense to discuss the pros and cons, of course.

The pros I see for the current "cbits is the default" are:

  • cbits builds out of the box without extra configuration, regardless of whether the target system has sqlite3 and/or sqlite3 headers installed (I would imagine missing sqlite3.h to be common, even if the library was installed.)
  • direct-sqlite is currently easy to install on Windows, just build it and it works. IME it's not easy to install dev packages on Windows and build against them.
  • a specific version of direct-sqlite will always have the exact same functionality regardless of what's already installed on the target system. Ie., if I build software on top of direct-sqlite-2.3.18 and install it on several systems, it should behave exactly the same (as far as sqlite3 is concerned) regardless of what the target OS has installed. With systemlibs, the application might work on some systems, but on some systems with an older version of sqlite native library, it might break or have poor performance.

IMO the current default is a more reliable default than relying on what the system happens to have installed. At the risk of using made up statistics, I would imagine most users of this library are content with what's been packaged into cbits and can live with a version that's a even a few releases old (and probably don't even care what sqlite3 is embedded, as long as the package itself works). OTOH, those who need a specific version or want to override the systemlib variable, are probably more expert users and can deal with any potential problems that might arise.

Btw, an alternative (not mutually exclusive) to systemlib would be to provide a way for direct-sqlite to be built against user provided cbits. So if you have a project where you want a specific version of sqlite, you could git clone the C source into your own project and set things up with stack and cabal such that direct-sqlite builds with sqlite coming in from your local, out-of-package location. Not sure if that's easy to do with cabal, but listing this option here for completeness.

@vdukhovni
Copy link

FWIW, SQLIte 3.15.2 is out: https://www.sqlite.org/changes.html and fixes a number of bugs, so at some point another upgrade may be in order...

@nurpax
Copy link
Collaborator

nurpax commented Dec 10, 2016

@vdukhovni thanks! Will be updating shortly. Most likely tonight.

@nurpax
Copy link
Collaborator

nurpax commented Dec 10, 2016

@vdukhovni Upgraded to sqlite 3.15.2 in direct-sqlite 2.3.19. I just released it to hackage.

@vdukhovni
Copy link

Super! Thanks, much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants