New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate file description #32
Comments
If anything, it's probably a bug or at least a weirdness in the Dataverse API, which shows "description" twice. Here's a screenshot from https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi%3A10.7910/DVN/WOT075 @adam3smith I'd encourage you to create an issue at https://github.com/IQSS/dataverse/issues but I'd be afraid that if we delete one of the "description" fields from the Dataverse API that an integration would break. It's probably better to think of this as a wart in the Dataverse API, something to fix in v2 or whatever. 😄 |
The columns also get duplicated when binding here (both have the Line 137 in ac67f0f
In my fork (kuriwaki@49fd9e5), I've removed the duplicate and it works: Sys.setenv("DATAVERSE_KEY" = "5b514e42-1260-4b78-b395-e27de83d3115")
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
library(tibble)
library(dataverse) # devtools::install_github("kuriwaki/dataverse-client-r")
# description about each dataset
obrien_files <- get_dataset("doi:10.7910/DVN/WOT075")[['files']]
any(duplicated(colnames(obrien_files)))
#> [1] FALSE
# non-duplicated column names makes tibble possible
as_tibble(obrien_files)
#> # A tibble: 6 x 22
#> label restricted version datasetVersionId categories id persistentId
#> <chr> <lgl> <int> <int> <list> <int> <chr>
#> 1 Geog… FALSE 1 178559 <chr [1]> 3.64e6 doi:10.7910…
#> 2 Land… FALSE 1 178559 <chr [1]> 3.64e6 doi:10.7910…
#> 3 Land… FALSE 1 178559 <chr [1]> 3.64e6 doi:10.7910…
#> 4 Prop… FALSE 1 178559 <chr [1]> 3.64e6 doi:10.7910…
#> 5 Road… FALSE 1 178559 <chr [1]> 3.64e6 doi:10.7910…
#> 6 Road… FALSE 1 178559 <chr [1]> 3.64e6 doi:10.7910…
#> # … with 16 more variables: pidURL <chr>, filename <chr>, contentType <chr>,
#> # filesize <int>, description <chr>, storageIdentifier <chr>,
#> # rootDataFileId <int>, md5 <chr>, checksum$type <chr>, $value <chr>,
#> # creationDate <chr>, originalFileFormat <chr>, originalFormatLabel <chr>,
#> # originalFileSize <int>, UNF <chr>, tabularTags <list> Created on 2019-12-16 by the reprex package (v0.3.0) |
Duplicate column was manually removed after the fact in PR #39, in commit kuriwaki@49fd9e5 library("dataverse")
## code goes here
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
obrien_files <- get_dataset("doi:10.7910/DVN/WOT075")[['files']]
colnames(obrien_files)
#> [1] "label" "restricted" "version"
#> [4] "datasetVersionId" "categories" "id"
#> [7] "persistentId" "pidURL" "filename"
#> [10] "contentType" "filesize" "description"
#> [13] "storageIdentifier" "rootDataFileId" "md5"
#> [16] "checksum" "creationDate" "originalFileFormat"
#> [19] "originalFormatLabel" "originalFileSize" "originalFileName"
#> [22] "UNF" "tabularTags"
any(duplicated(colnames(obrien_files)))
#> [1] FALSE Created on 2020-12-28 by the reprex package (v0.3.0) |
Please specify whether your issue is about:
The "description" for files is repeated, resulting in a duplicate data.frame column name which causes all sorts of issues. Not sure if this is a problem with the API or the R-package, but figured I'd start here. CC @pdurbin
The text was updated successfully, but these errors were encountered: