Allow line-based importer to specify a custom line-delimiter sequence for custom data #4103
Labels
Difficulty: Intermediate
Identifies moderately challenging issues that require some experience and familiarity with project.
import
About importers in general - add a label for the data format if available
new data format
Requests for creation of new importers/exporters
Type: Feature Request
Identifies requests for new features or enhancements. These involve proposing new improvements.
Milestone
This might need better streaming support for our line-based importer as a pre-requisite. I don't know.
I often land myself with large byte arrays streamed out to a single file that have custom sequence of chars (
*%%*
) used as record delimiters that I would like to treat simply as a line-delimiter while importing the file into OpenRefine. The file sizes are typically under 4GB, usually only 1GB or 2GB in size, where I often have over 20GB system memory available to give to the Java heap.Other tools allow reading a file as a stream of characters and separating into new lines based on a custom char sequence.
I would like our OpenRefine line-based importer or a new importer to handle this use case.
Proposed solution
Allow line-based importer to have an option to use a custom line-delimiter character sequence (overriding the defaults of
\n
or\r\n
.Alternatives considered
I have to use other tools.
Additional context
The text was updated successfully, but these errors were encountered: