Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum supported DDI Codebook version #5

Open
pitkant opened this issue Apr 16, 2024 · 3 comments
Open

Minimum supported DDI Codebook version #5

pitkant opened this issue Apr 16, 2024 · 3 comments

Comments

@pitkant
Copy link

pitkant commented Apr 16, 2024

Hi, thank you for the useful package.

It is stated in the DDI website that DDIwR is used to "create, edit and validate a DDI Codebook version 2.6, using R script commands.". However, in the package documentation it is hard to find references on which DDI versions the package actually supports.

In getDNS() function in internals.R the package seems to look for version 2.5, otherwise it throws an error:

DDIwR/R/internals.R

Lines 784 to 790 in 34dab97

`getDNS` <- function(xml) {
xmlns <- xml2::xml_ns(xml)
wns <- which(xmlns == "ddi:codebook:2_5")
if (length(wns) == 0) {
admisc::stopError("The XML document does not contain a DDI namespace.")
}
return(paste0(names(xmlns)[wns[1]], ":"))

This was a problem when a certain data repository was still using DDI 2.0, which is valid DDI but not recognised as such by the package. However, it is reasonable that the package supports only DDI 2.5, I just wish the package was more transparent about it.

In DDI_Codebook_2.6.R DDIC object is defined, which is used in checkXMList() function in internals.R:

DDIwR/R/internals.R

Lines 263 to 285 in 34dab97

`checkXMList` <- function(xmlist) {
nms <- c()
extractNames <- function(x) {
if (is.list(x)) {
nmsx <- names(x)
indexes <- seq_along(nmsx)
wextra <- which(nmsx == ".extra")
if (length(wextra)) {
indexes <- indexes[-wextra]
}
nms <<- unique(c(nms, setdiff(nmsx, c(".extra", ""))))
lx <- lapply(x[indexes], extractNames)
}
}
extractNames(xmlist)
if (!all(is.element(nms, names(DDIC)))) {
admisc::stopError(
"This XML file contains elements that are not part of the DDI Codebook standard."
)
}
}

When I tried to use getMetadata() function on a DDI-C 2.5 file it threw and error on lines 280-284 because the check does not accept p and extLink fields in my file although they are 2.5 namespace. However, at least in the case of ExtLink there is a deprecation note in the R-file:

"Note also that the DDI contains a linking mechanism permitting arbitrary links between internal elements (See Link) and from internal elements to external sources (See ExtLink). Note that the use of these two elements has been DEPRECATED in version 2.6."

However, when I commented this check out the getMetadata() function seemed to produce a sensible result without errors. Also with DDI Codebook 2.6 still being in draft phase (?) this check seems to be too stringent - or that the package only supports version 2.6 could be communicated to the end user more explicitly.

@dusadrian
Copy link
Owner

Hello @pitkant,

Yes indeed, the DDIwR package is also transitioning towards Codebook 2.6 but, as you rightfully observe, the standard is still not final.
Last year, when the 2.6 elements were introduced, I thought the final version would be released by the end of the year (at least this is how it was advertised on the DDI Alliance website).
This was not the case, hence the DDIwR package also needs to wait until that is final.

Parts of the code, such as checking the namespace, are also in stand-by mode for the same reason.
But I think, the main strength of the package is less about reading a DDI 2.6 Codebook, but to actually produce 2.6 Codebooks using plain R commands.

The versions should not really matter, as the Codebook is specifically engineered to be backwards compatible. If my understanding is correct, elements from version 2.0 should still be part of the Codebook 2.6, even if deprecated.
If the (now deprecated) ExtLink is not found in version 2.6, this should be raised to the DDI Alliance Technical Committee, because such an oversight breaks the backwards compatibility.

My intention is to be 100% consistent with the version from the DDI Alliance, however there might be some modifications in the latest development versions of the 2.6 Codebook that I might not have implemented. I think the best is to wait for the very final version before updating the DDIwR package. And in that version, the package will absolutely be fully transparent about what is supported.

I hope this explains the situation, thank you so much for your very helpful review of the code!
Best,
Adrian

@pitkant
Copy link
Author

pitkant commented Apr 17, 2024

So I take it that the list object DDIC defined in DDI_Codebook_2.6.R is then generated from codebook.xsd (note: that can be downloaded from DDI-Alliance's Atlassian here) ? The file seems to have some additional mentions of PHRASE Element "ExtLink" and FORM Elements "p" that were not carried over to the DDIC object, but maybe that was the intention.

I agree that if generating 2.6 Codebooks is the goal of this package, then it is probably sufficient. However, if there is the option to read existing codebooks, then some other users might encounter the same problems as I did so just wanted to write out these few paragraphs here for future reference.

@dusadrian
Copy link
Owner

dusadrian commented Apr 17, 2024

Indeed.
I should also confess that I never understood the "p" elements from the .xsd file. I always thought they are just <p> (HTML paragraphs) that were needed to compile the web page documentation. Probably, thought the same about ExtLink but I will revisit the schema definition to make sure I am not missing anything obvious.
The long term goals is to also read Codebook files (all versions). For the time being, producing a DDI 2.6 Codebook seems to me like a very good outcome. Allowing users to take an existing dataset and exporting such a codebook for archiving / publishing purposes has never been so easy since the days of Nesstar.
And I am committed to make it better and better. Help is always welcome, even if making the documentation more clear / transparent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants