New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phoible 3.0 #35
Phoible 3.0 #35
Conversation
Currently I'm working on translating the PHOIBLE FAQ to work from the CLDF data. Since I'm not much of an R coder, I'm doing it with SQL - but it would be awesome if we also had an R version. This shouldn't be too difficult, I think. All it may take is joining data from a couple of CSV files to get the representation the FAQ starts with, here http://phoible.github.io/faq/#how-do-i-get-the-data |
I don't have a nice SQL solution for the sampling examples (maybe this should be left for downstream analysis anyway, and SQL be confined to basic data assembling), but the rest seems straightforward: https://github.com/cldf-datasets/phoible/blob/206ea83807e259bd0d52c016c657962daa62a7f6/faq.md |
@xrotwang -- so there will be a FAQ page on the website and another FAQ page in the CLDF repo? |
@bambooforest I'm not sure. There's definitly potential for confusion. So here's what I'd like to see:
Having an FAQ in this repos would meet these requirements. |
OTOH, maybe a smaller/shorter description of the dataset with one usage example would be sufficient to meet the "used and understood by itself" requirement. But then, I'd like the "official" FAQ to access "official" data, i.e. longterm archived, versioned releases at Zenodo. |
I'd rather not have two FAQs, as you point out @xrotwang , because it may cause confusion. On the other hand, perhaps there's already confusion between the "not official" CSV data that we create here: and that we have release on GitHub and Zenodo: in contrast to the "official" CLDF version: Maybe best then to keep them separate and have two FAQs -- one for each data type? |
I'd vote for updating our FAQ to start from the clld version of the data. The SQL version is cool and I'm considering keeping it as an "orphan" page (not accessible from the nav, only linked from the main R FAQ). But this adds some overhead each time we want to update the FAQ content... 🤔 |
Ok, if the R version would use the CLDF data that would be perfect from my POV. In that case, I'd reduce the SQL variant to just
Then, there should be minimal overlap between the text/content of the two pages, and thus not much overhead in terms of maintenance. |
So just to be clear, the updated R code should read the CLDF CSV files from here (assuming these will be the standardized file names moving forward): https://github.com/cldf-datasets/phoible/tree/phoible-3.0/cldf and I suppose joined into the CSV file that we use and maintain and dev for simplicity's sake (won't have to update the rest of the code to work with different CLDF CSV files). We also need to add a field for categories specifying source gaps (#333) and there are a few data issues in dev's issue trackers that I would like to fix before releasing and official 3.0. |
@bambooforest yes, that's what I propose. While this slightly complicates some things, it simplifies others:
|
I think, |
OK, will look into it. Would also be motivational for users to use the CLDF version instead of the dev CSV if it contains additional Glottolog metadata, which we typically just merge into dev CSV anyway. |
Yes, considering that both PHOIBLE and Glottolog are moving targets, getting full transparency about particular versions makes the added complexity worthwhile. |
I'm going to merge this and I created an issue (#36) to update the R code in the FAQ, since I need to do this elsewhere and having this PR merged gives me the submodule, etc. |
I tired to make the CLDF creation and dataset as transparent as possible, by
pulling in the PHOIBLE curation repos as submodule: https://github.com/cldf-datasets/phoible/tree/phoible-3.0/raw
creating a human-readable version of the CLDF metadata: https://github.com/cldf-datasets/phoible/blob/phoible-3.0/cldf/README.md
closes Refactor repo to be cldfbench compliant #32
closes Make CLDF creation independent of phoible-scripts #34