Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support round-tripping between SQLite and CLDF #58

Closed
Anaphory opened this issue Feb 23, 2018 · 9 comments
Closed

Support round-tripping between SQLite and CLDF #58

Anaphory opened this issue Feb 23, 2018 · 9 comments
Assignees
Milestone

Comments

@Anaphory
Copy link

I have just found that although I had no idea about PyQt5 yesterday, I was easily able to write a sortable, filterable word list viewer in it today using QSqlDatabase.addDatabase("QSQLITE"). This gives me an editable interface to my FormTable, which has me wondering how I can feed the changes I make back into CLDF.

Does it make sense to have a cldf fromdb DATABASEFILE DIRECTORY subcommand? Can we have it so that if DIRECTORY already contains CLDF datasets of those types its table urls (and other data that might not be in the DB but in the JSON?) are used instead of defaults, so it will be round-trip safe?

@xrotwang
Copy link
Contributor

I guess that would be cool. Would require a lot of tests, though, I guess.

@xrotwang xrotwang changed the title Going back from SQLite to CLDF Support round-tripping between SQLite and CLDF Mar 21, 2018
@xrotwang xrotwang self-assigned this Mar 21, 2018
@xrotwang
Copy link
Contributor

@Anaphory I think round-trip-ability to SQLite would be a big win - in particular considering use cases like the one you outline. I guess the best way to do this would be via very close mirroring of the CLDF schema in the db, i.e.

  • storing the metadata (i.e. common and inherited properties) for all objects, probably using JSON columns
  • using CLDF Ontology URLs as table(?) and column names (if this seems unwieldy, I'd propose a bit of python convention: use CLDF Ontology local names for tables and columns, and __-prefixed names for all other objects, e.g. non-CLDF tables and columns)

@xrotwang
Copy link
Contributor

Alternatively, all the metadata could be kept out of the database, and the original JSON description could be consulted only upon serialising the dataset as CLDF again. This could create a bit of ambiguity, though, because in CLDF we have this shortcut for many-to-many relations without additional properties on the association table: These can be specified by using a list-valued column as foreign key. But this could also be deduced from the DB schema - leaving just the ambiguity of where to put the foreign key.

@Anaphory any ideas/preferences regarding this issue?

@xrotwang
Copy link
Contributor

Sorry, I'll keep my internal monologue going: The above isn't really true: If we consult the original JSON when writing the db to CLDF, we'll know what to do.

@xrotwang
Copy link
Contributor

I start leaning towards keeping the JSON metadata out of the DB. Basically, SQLite would become an alternative backend for the data, while the description will always reside in the JSON file. The alternative SQLite backend could even be used to speed up validating a dataset.

@xrotwang
Copy link
Contributor

xrotwang commented Apr 16, 2019

@Anaphory @lmaurits I have a proto-type for round-tripping CSV / SQLite data storage for CSVW TableGroups ready. What remains to do is

  • adding lots of tests
  • figuring out the best modularization of the functionality, from the aspect of re-use for CLDF; this means figuring out
    • when and how to add a Sources table
    • when and how to add relations between other tables and sources
    • how to translate back and forth between local CSV file and column names and CLDF ontology terms.

@xrotwang
Copy link
Contributor

@Anaphory so with what I implemented now, your workflow could look like:

  1. Create a JSON metadata description for your dataset.
  2. Run cldf createdb MD.JSON DATASET.sqlite.
  3. Edit the data in the SQLite DB.
  4. Run cldf dumpdb MD.JSON DATASET.sqlite to write the changed/added data back to "proper" CLDF files.

@xrotwang
Copy link
Contributor

@Anaphory note that the SQLite DB contains:

  • SQL CHECK constraints as specified by the datatype descriptions (as much as possible)
  • Foreign key constraints
  • A SourceTable which will be written to BibTeX when dumped.

So this database should be fairly safe and complete for adding data (as long as you don't want to add tables or columns - this would heve to be done in the JSON before running createdb).

@xrotwang
Copy link
Contributor

xrotwang commented May 2, 2019

Almost idempotency confirmed for somewhat complex examples (Grambank and the Tseltal dictionary). Not fully round-trippable:

  • BibTeX formatting
  • Precision of floating point numbers

@xrotwang xrotwang closed this as completed May 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants