Skip to content

Allow line-based importer to specify a custom line-delimiter sequence for custom data #4103

@thadguidry

Description

@thadguidry

This might need better streaming support for our line-based importer as a pre-requisite. I don't know.
I often land myself with large byte arrays streamed out to a single file that have custom sequence of chars (*%%*) used as record delimiters that I would like to treat simply as a line-delimiter while importing the file into OpenRefine. The file sizes are typically under 4GB, usually only 1GB or 2GB in size, where I often have over 20GB system memory available to give to the Java heap.
Other tools allow reading a file as a stream of characters and separating into new lines based on a custom char sequence.
I would like our OpenRefine line-based importer or a new importer to handle this use case.

Proposed solution

Allow line-based importer to have an option to use a custom line-delimiter character sequence (overriding the defaults of \n or \r\n.

Alternatives considered

I have to use other tools.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Difficulty: IntermediateIdentifies moderately challenging issues that require some experience and familiarity with project.Type: Feature RequestIdentifies requests for new features or enhancements. These involve proposing new improvements.importAbout importers in general - add a label for the data format if availablenew data formatRequests for creation of new importers/exporters

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions