Skip to content

text/tsv is not a valid media type (text/tab-separated-values is) #11505

@pdurbin

Description

@pdurbin

Before #4854 we only had text/tab-separated-values and in retrospect, I don't think we ever should have added text/tsv since it's not listed https://www.iana.org/assignments/media-types/media-types.xhtml

Here's where text/tsv is used in the code base, as of d8a55a9

src/test/java/edu/harvard/iq/dataverse/api/FileTypeDetectionIT.java
106:                .body("data.files[0].dataFile.contentType", equalTo("text/tsv"))

src/main/java/edu/harvard/iq/dataverse/util/FileUtil.java
141:        STATISTICAL_FILE_EXTENSION.put("tsv", "text/tsv");
154:    public static final String MIME_TYPE_TSV     = "text/tsv";
843:        } else if (fileType.equalsIgnoreCase("text/tsv") || fileType.equalsIgnoreCase("text/tab-separated-values")) {

src/main/java/edu/harvard/iq/dataverse/ingest/IngestServiceBean.java
496:                        // "text/tsv" should be used instead: 

src/main/java/edu/harvard/iq/dataverse/dataaccess/StoredOriginalFile.java
112:        } else if (fileType.equalsIgnoreCase("text/tsv") || fileType.equalsIgnoreCase("text/tab-separated-values")) {

src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java
87:    private static final String MIME_TYPE_TSV   = "text/tsv";

src/main/java/META-INF/mime.types
10:text/tsv tab TAB tsv TSV

src/main/java/propertyFiles/MimeTypeDisplay.properties
87:text/tsv=Tab-Separated Values

src/main/java/propertyFiles/MimeTypeFacets.properties
85:text/tsv=Data

In particular, we should look at IngestServiceBean.java. Here's a bit more of the code added in #6517:

} else if (FileUtil.MIME_TYPE_INGESTED_FILE.equals(dataFile.getContentType())) {
    // Make sure no *uningested* tab-delimited files are saved with the type "text/tab-separated-values"!
    // "text/tsv" should be used instead: 
    dataFile.setContentType(FileUtil.MIME_TYPE_TSV);
}

Also related:

Some context for this issue is that I was considering making a pull request to add text/tsv as an alternative to text/tab-separated-values at https://github.com/mlcommons/croissant/blob/v1.0.17/python/mlcroissant/mlcroissant/_src/operation_graph/operations/read.py#L126 but again, since text/tsv doesn't seem to be a legit format, I don't think I should.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions