Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support searching on original file formats (i.e. RData) when tabular data is successfully ingested #2707

Closed
pdurbin opened this issue Oct 29, 2015 · 3 comments

Comments

@pdurbin
Copy link
Member

pdurbin commented Oct 29, 2015

http://economics.stackexchange.com/questions/8922/examples-of-applied-micro-paper-with-r-code-and-data-in-public-repository is an example of someone looking specifically for data in RData format. We should support this use case.

Here's the problem. As of #2038 (comment) you can search for RData with "fileContentType:application/x-rlang-transport" ( https://dataverse.harvard.edu/dataverse/harvard?q=fileContentType%3Aapplication%2Fx-rlang-transport ) (good!) but only if the RData file failed to ingest. https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/21042 for example has a mix of RData files where some of them were ingested but many where not. The ones that were not ingested will be searchable with the query above. To search specifically within that dataverse:

screen shot 2015-10-29 at 10 07 25 am

So what's going on? When an RData file is successfully ingested, at index time what's being indexed is "fileContentType":"text/tab-separated-values" rather than "fileContentType":"application/x-rlang-transport". Here's an example Solr document from https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.5072/FK2/VJ1Y9K where an RData file was successfully ingested:

screen shot 2015-10-29 at 9 55 44 am

  {
    "entityId":1430,
    "dataverseVersionIndexedBy_s":"4.2",
    "identifier":"1430",
    "persistentUrl":"http://dx.doi.org/10.5072/FK2/VJ1Y9K",
    "dvObjectType":"files",
    "fileNameWithoutExtension":["Multiplication Table-Multiplication Table"],
    "fileName":["Multiplication Table-Multiplication Table",
      "Multiplication Table-Multiplication Table.tab"],
    "name":"Multiplication Table-Multiplication Table.tab",
    "nameSort":"Multiplication Table-Multiplication Table.tab",
    "datasetVersionId":463,
    "fileAccess":["Public"],
    "dateSort":"2015-10-21T13:54:56.908Z",
    "dateFriendly":"Oct 21, 2015",
    "publicationStatus":["Published"],
    "publicationDate":"2015",
    "dsPublicationDate":"2015",
    "id":"datafile_1430",
    "fileTypeDisplay":"Tab-Delimited",
    "fileContentType":"text/tab-separated-values",
    "fileType":["Tab-Delimited",
      "tabulardata"],
    "fileTypeGroupFacet":"tabulardata",
    "fileSizeInBytes":499,
    "fileMd5":"e33a7b3ef797d8945d5fa9d175518cd1",
    "unf":"UNF:6:SHOdLOf1LEtmuZnBTFRilg==",
    "subtreePaths":["/1424"],
    "parentId":"1425",
    "parentIdentifier":"doi:10.5072/FK2/VJ1Y9K",
    "parentCitation":"Liscouski, Amanda, 2015, \"Pretium\", http://dx.doi.org/10.5072/FK2/VJ1Y9K,  Demo Dataverse,  DRAFT VERSION ",
    "parentName":"Pretium",
    "variableName":["x",
      "0",
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "7",
      "8",
      "9",
      "10",
      "11",
      "12"],
    "variableLabel":["x",
      "0",
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "7",
      "8",
      "9",
      "10",
      "11",
      "12"],
    "_version_":1515655756533202944}
@pdurbin
Copy link
Member Author

pdurbin commented Oct 29, 2015

Related: Advanced Search: make File Type a dropdown driven by uploaded data #543

@mercecrosas mercecrosas modified the milestone: In Review Nov 30, 2015
@scolapasta scolapasta modified the milestone: Not Assigned to a Release Jan 28, 2016
@pdurbin pdurbin removed the zTriaged label Jun 28, 2017
@pdurbin
Copy link
Member Author

pdurbin commented Jun 28, 2017

A recent comment by @landreev at #2822 (comment) gives me hope that we can fix this some day.

@pdurbin pdurbin added Help Wanted: Code Mentor: pdurbin User Role: Curator Curates and reviews datasets, manages permissions labels Jun 28, 2017
@pdurbin
Copy link
Member Author

pdurbin commented Jun 28, 2018

Closing in favor of #2822.

@pdurbin pdurbin closed this as completed Jun 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants