Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

publish sqlite DB in addition to CSV? #19

Open
cldellow opened this issue Jan 1, 2024 · 1 comment
Open

publish sqlite DB in addition to CSV? #19

cldellow opened this issue Jan 1, 2024 · 1 comment

Comments

@cldellow
Copy link

cldellow commented Jan 1, 2024

Hello, thank you for making qrank! It saves a lot of labour and pipeline fiddling.

Would it be possible to also publish qrank as a sqlite DB in addition to a CSV file?

The SQLite format has been stable since 2004. IMO it can serve a similar role as CSV, but with the benefit of having fast random-access lookups when the table is indexed.

The SQLite library is widely available on modern OSes, with bindings for all popular programming languages. Some environments, such as Python and Bun, bundle SQLite so that it can be immediately used without installing any additional packages.

If the DB is created with the full Wikidata ID (e.g., the string Q12345), a SQLite DB is approx 1.1GB - approx 3 times the size as the CSV.

If created with the ID converted to a number (e.g., the number 12345), a SQLite DB is approx 350MB -- about the same size as the CSV.

Creating the DB is straight-forward, e.g. this bash script:

do_the_thing() {
	cat <<EOF
.mode csv
.import qrank.csv qrank_raw
create table qrank(id integer not null primary key, rank integer not null);
insert into qrank(id, rank) select substr(Entity, 2), QRank FROM qrank_raw;
drop table qrank_raw;
vacuum;
EOF
}

rm -f qrank.db
do_the_thing | sqlite3 qrank.db

I understand if this is out of scope for the project -- but if you're open to serving an extra file, I suspect it'd eliminate one more step from many people's pipelines.

@cldellow
Copy link
Author

If anyone else is interested in this, I've started republishing the QRank dataset as an SQLite artifact weekly. It can be fetched from https://github.com/hikeratlas/qrank/releases/download/latest/qrank.db

Hope that's OK - if not, let me know and I'll happily take it down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant