Support round-tripping between SQLite and CLDF #58

Anaphory · 2018-02-23T14:29:10Z

I have just found that although I had no idea about PyQt5 yesterday, I was easily able to write a sortable, filterable word list viewer in it today using QSqlDatabase.addDatabase("QSQLITE"). This gives me an editable interface to my FormTable, which has me wondering how I can feed the changes I make back into CLDF.

Does it make sense to have a cldf fromdb DATABASEFILE DIRECTORY subcommand? Can we have it so that if DIRECTORY already contains CLDF datasets of those types its table urls (and other data that might not be in the DB but in the JSON?) are used instead of defaults, so it will be round-trip safe?

The text was updated successfully, but these errors were encountered:

xrotwang · 2018-02-23T15:19:14Z

I guess that would be cool. Would require a lot of tests, though, I guess.

xrotwang · 2018-03-21T09:12:54Z

@Anaphory I think round-trip-ability to SQLite would be a big win - in particular considering use cases like the one you outline. I guess the best way to do this would be via very close mirroring of the CLDF schema in the db, i.e.

storing the metadata (i.e. common and inherited properties) for all objects, probably using JSON columns
using CLDF Ontology URLs as table(?) and column names (if this seems unwieldy, I'd propose a bit of python convention: use CLDF Ontology local names for tables and columns, and __-prefixed names for all other objects, e.g. non-CLDF tables and columns)

xrotwang · 2018-03-21T09:20:01Z

Alternatively, all the metadata could be kept out of the database, and the original JSON description could be consulted only upon serialising the dataset as CLDF again. This could create a bit of ambiguity, though, because in CLDF we have this shortcut for many-to-many relations without additional properties on the association table: These can be specified by using a list-valued column as foreign key. But this could also be deduced from the DB schema - leaving just the ambiguity of where to put the foreign key.

@Anaphory any ideas/preferences regarding this issue?

xrotwang · 2018-03-21T09:21:37Z

Sorry, I'll keep my internal monologue going: The above isn't really true: If we consult the original JSON when writing the db to CLDF, we'll know what to do.

xrotwang · 2018-03-21T09:24:24Z

I start leaning towards keeping the JSON metadata out of the DB. Basically, SQLite would become an alternative backend for the data, while the description will always reside in the JSON file. The alternative SQLite backend could even be used to speed up validating a dataset.

xrotwang · 2019-04-16T15:30:56Z

@Anaphory @lmaurits I have a proto-type for round-tripping CSV / SQLite data storage for CSVW TableGroups ready. What remains to do is

adding lots of tests
figuring out the best modularization of the functionality, from the aspect of re-use for CLDF; this means figuring out
- when and how to add a Sources table
- when and how to add relations between other tables and sources
- how to translate back and forth between local CSV file and column names and CLDF ontology terms.

xrotwang · 2019-04-29T12:24:24Z

@Anaphory so with what I implemented now, your workflow could look like:

Create a JSON metadata description for your dataset.
Run cldf createdb MD.JSON DATASET.sqlite.
Edit the data in the SQLite DB.
Run cldf dumpdb MD.JSON DATASET.sqlite to write the changed/added data back to "proper" CLDF files.

xrotwang · 2019-04-29T12:28:24Z

@Anaphory note that the SQLite DB contains:

SQL CHECK constraints as specified by the datatype descriptions (as much as possible)
Foreign key constraints
A SourceTable which will be written to BibTeX when dumped.

So this database should be fairly safe and complete for adding data (as long as you don't want to add tables or columns - this would heve to be done in the JSON before running createdb).

xrotwang · 2019-05-02T19:11:02Z

Almost idempotency confirmed for somewhat complex examples (Grambank and the Tseltal dictionary). Not fully round-trippable:

BibTeX formatting
Precision of floating point numbers

xrotwang changed the title ~~Going back from SQLite to CLDF~~ Support round-tripping between SQLite and CLDF Mar 21, 2018

xrotwang self-assigned this Mar 21, 2018

xrotwang added this to the SQLite milestone May 2, 2018

xrotwang mentioned this issue Apr 16, 2019

CLDF dataset loading pains lmaurits/BEASTling#232

Open

2 tasks

xrotwang mentioned this issue Apr 26, 2019

createdb database table / database_ID column #61

Closed

xrotwang closed this as completed May 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support round-tripping between SQLite and CLDF #58

Support round-tripping between SQLite and CLDF #58

Anaphory commented Feb 23, 2018

xrotwang commented Feb 23, 2018

xrotwang commented Mar 21, 2018

xrotwang commented Mar 21, 2018

xrotwang commented Mar 21, 2018

xrotwang commented Mar 21, 2018

xrotwang commented Apr 16, 2019 •

edited

Loading

xrotwang commented Apr 29, 2019

xrotwang commented Apr 29, 2019

xrotwang commented May 2, 2019

Support round-tripping between SQLite and CLDF #58

Support round-tripping between SQLite and CLDF #58

Comments

Anaphory commented Feb 23, 2018

xrotwang commented Feb 23, 2018

xrotwang commented Mar 21, 2018

xrotwang commented Mar 21, 2018

xrotwang commented Mar 21, 2018

xrotwang commented Mar 21, 2018

xrotwang commented Apr 16, 2019 • edited Loading

xrotwang commented Apr 29, 2019

xrotwang commented Apr 29, 2019

xrotwang commented May 2, 2019

xrotwang commented Apr 16, 2019 •

edited

Loading