Phoible 3.0 #35

xrotwang · 2021-04-09T06:20:34Z

I tired to make the CLDF creation and dataset as transparent as possible, by

pulling in the PHOIBLE curation repos as submodule: https://github.com/cldf-datasets/phoible/tree/phoible-3.0/raw
creating a human-readable version of the CLDF metadata: https://github.com/cldf-datasets/phoible/blob/phoible-3.0/cldf/README.md
closes Refactor repo to be cldfbench compliant #32
closes Make CLDF creation independent of phoible-scripts #34

xrotwang · 2021-04-09T06:31:26Z

Currently I'm working on translating the PHOIBLE FAQ to work from the CLDF data. Since I'm not much of an R coder, I'm doing it with SQL - but it would be awesome if we also had an R version. This shouldn't be too difficult, I think. All it may take is joining data from a couple of CSV files to get the representation the FAQ starts with, here http://phoible.github.io/faq/#how-do-i-get-the-data

xrotwang · 2021-04-09T07:24:33Z

I don't have a nice SQL solution for the sampling examples (maybe this should be left for downstream analysis anyway, and SQL be confined to basic data assembling), but the rest seems straightforward: https://github.com/cldf-datasets/phoible/blob/206ea83807e259bd0d52c016c657962daa62a7f6/faq.md

xrotwang · 2021-04-10T08:01:29Z

Just moved the faq here https://github.com/cldf-datasets/phoible/blob/15ffa0203e12baca7039bfbc73ae18bba7de9f6f/doc/faq.md

bambooforest · 2021-04-10T08:39:40Z

@xrotwang -- so there will be a FAQ page on the website and another FAQ page in the CLDF repo?

xrotwang · 2021-04-10T09:37:58Z

@bambooforest I'm not sure. There's definitly potential for confusion. So here's what I'd like to see:

a version of the FAQ (AFAIC this could be the only version) which accesses the data from released versions of the CLDF dataset
a CLDF dataset that is fairly self-contained, i.e. can be used and understood without needing other resources.

Having an FAQ in this repos would meet these requirements.

xrotwang · 2021-04-10T09:41:03Z

OTOH, maybe a smaller/shorter description of the dataset with one usage example would be sufficient to meet the "used and understood by itself" requirement. But then, I'd like the "official" FAQ to access "official" data, i.e. longterm archived, versioned releases at Zenodo.

bambooforest · 2021-04-10T10:19:50Z

I'd rather not have two FAQs, as you point out @xrotwang , because it may cause confusion.

On the other hand, perhaps there's already confusion between the "not official" CSV data that we create here:

https://github.com/phoible/dev/tree/master/data

and that we have release on GitHub and Zenodo:

in contrast to the "official" CLDF version:

https://zenodo.org/record/2677911#.YHF6bxQzYaY

Maybe best then to keep them separate and have two FAQs -- one for each data type?

@drammock ?

drammock · 2021-04-10T12:41:11Z

I'd vote for updating our FAQ to start from the clld version of the data. The SQL version is cool and I'm considering keeping it as an "orphan" page (not accessible from the nav, only linked from the main R FAQ). But this adds some overhead each time we want to update the FAQ content... 🤔

xrotwang · 2021-04-10T13:01:15Z

Ok, if the R version would use the CLDF data that would be perfect from my POV. In that case, I'd reduce the SQL variant to just

a description how to get started and then
one example of a "translation" of the equivalent R code
and one example highlighting how SQLite can be integrated in shell pipelines, possible using the nice termgraph package.

Then, there should be minimal overlap between the text/content of the two pages, and thus not much overhead in terms of maintenance.

bambooforest · 2021-04-12T07:08:45Z

So just to be clear, the updated R code should read the CLDF CSV files from here (assuming these will be the standardized file names moving forward):

https://github.com/cldf-datasets/phoible/tree/phoible-3.0/cldf

and I suppose joined into the CSV file that we use and maintain and dev for simplicity's sake (won't have to update the rest of the code to work with different CLDF CSV files).

We also need to add a field for categories specifying source gaps (#333) and there are a few data issues in dev's issue trackers that I would like to fix before releasing and official 3.0.

xrotwang · 2021-04-12T07:12:34Z

@bambooforest yes, that's what I propose. While this slightly complicates some things, it simplifies others:

the links to the bibliographic sources are already in inventories.csv,
Glottolog metadata of a specified version is already in languages.csv.

xrotwang · 2021-04-12T07:14:00Z

I think, values.csv, with languages.csv and inventories.csv joined is basically an enriched phoible.csv as in phoible/dev.

bambooforest · 2021-04-12T07:23:44Z

OK, will look into it. Would also be motivational for users to use the CLDF version instead of the dev CSV if it contains additional Glottolog metadata, which we typically just merge into dev CSV anyway.

xrotwang · 2021-04-12T07:27:06Z

Yes, considering that both PHOIBLE and Glottolog are moving targets, getting full transparency about particular versions makes the added complexity worthwhile.

bambooforest · 2021-10-27T10:48:23Z

I'm going to merge this and I created an issue (#36) to update the R code in the FAQ, since I need to do this elsewhere and having this PR merged gives me the submodule, etc.

xrotwang added 2 commits April 9, 2021 08:14

update

7cde4cb

CLDF creation via cldfbench

92b1bd1

xrotwang requested a review from bambooforest April 9, 2021 06:20

xrotwang marked this pull request as draft April 9, 2021 06:26

wip faq

206ea83

more work on faq

15ffa02

bambooforest marked this pull request as ready for review October 27, 2021 10:48

bambooforest merged commit b261644 into master Oct 27, 2021

bambooforest deleted the phoible-3.0 branch October 27, 2021 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phoible 3.0 #35

Phoible 3.0 #35

xrotwang commented Apr 9, 2021 •

edited

xrotwang commented Apr 9, 2021

xrotwang commented Apr 9, 2021

xrotwang commented Apr 10, 2021

bambooforest commented Apr 10, 2021

xrotwang commented Apr 10, 2021 •

edited

xrotwang commented Apr 10, 2021

bambooforest commented Apr 10, 2021

drammock commented Apr 10, 2021

xrotwang commented Apr 10, 2021

bambooforest commented Apr 12, 2021

xrotwang commented Apr 12, 2021

xrotwang commented Apr 12, 2021

bambooforest commented Apr 12, 2021

xrotwang commented Apr 12, 2021

bambooforest commented Oct 27, 2021

Phoible 3.0 #35

Phoible 3.0 #35

Conversation

xrotwang commented Apr 9, 2021 • edited

xrotwang commented Apr 9, 2021

xrotwang commented Apr 9, 2021

xrotwang commented Apr 10, 2021

bambooforest commented Apr 10, 2021

xrotwang commented Apr 10, 2021 • edited

xrotwang commented Apr 10, 2021

bambooforest commented Apr 10, 2021

drammock commented Apr 10, 2021

xrotwang commented Apr 10, 2021

bambooforest commented Apr 12, 2021

xrotwang commented Apr 12, 2021

xrotwang commented Apr 12, 2021

bambooforest commented Apr 12, 2021

xrotwang commented Apr 12, 2021

bambooforest commented Oct 27, 2021

xrotwang commented Apr 9, 2021 •

edited

xrotwang commented Apr 10, 2021 •

edited