Support searching on original file formats (i.e. RData) when tabular data is successfully ingested #2707
Labels
Feature: Search/Browse
Help Wanted: Code
Mentor: pdurbin
Type: Suggestion
an idea
User Role: Curator
Curates and reviews datasets, manages permissions
http://economics.stackexchange.com/questions/8922/examples-of-applied-micro-paper-with-r-code-and-data-in-public-repository is an example of someone looking specifically for data in RData format. We should support this use case.
Here's the problem. As of #2038 (comment) you can search for RData with "fileContentType:application/x-rlang-transport" ( https://dataverse.harvard.edu/dataverse/harvard?q=fileContentType%3Aapplication%2Fx-rlang-transport ) (good!) but only if the RData file failed to ingest. https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/21042 for example has a mix of RData files where some of them were ingested but many where not. The ones that were not ingested will be searchable with the query above. To search specifically within that dataverse:
So what's going on? When an RData file is successfully ingested, at index time what's being indexed is
"fileContentType":"text/tab-separated-values"
rather than"fileContentType":"application/x-rlang-transport"
. Here's an example Solr document from https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.5072/FK2/VJ1Y9K where an RData file was successfully ingested:The text was updated successfully, but these errors were encountered: