-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Allow line-based importer to specify a custom line-delimiter sequence for custom data #4103
Copy link
Copy link
Closed
Labels
Difficulty: IntermediateIdentifies moderately challenging issues that require some experience and familiarity with project.Identifies moderately challenging issues that require some experience and familiarity with project.Type: Feature RequestIdentifies requests for new features or enhancements. These involve proposing new improvements.Identifies requests for new features or enhancements. These involve proposing new improvements.importAbout importers in general - add a label for the data format if availableAbout importers in general - add a label for the data format if availablenew data formatRequests for creation of new importers/exportersRequests for creation of new importers/exporters
Milestone
Metadata
Metadata
Assignees
Labels
Difficulty: IntermediateIdentifies moderately challenging issues that require some experience and familiarity with project.Identifies moderately challenging issues that require some experience and familiarity with project.Type: Feature RequestIdentifies requests for new features or enhancements. These involve proposing new improvements.Identifies requests for new features or enhancements. These involve proposing new improvements.importAbout importers in general - add a label for the data format if availableAbout importers in general - add a label for the data format if availablenew data formatRequests for creation of new importers/exportersRequests for creation of new importers/exporters
This might need better streaming support for our line-based importer as a pre-requisite. I don't know.
I often land myself with large byte arrays streamed out to a single file that have custom sequence of chars (
*%%*) used as record delimiters that I would like to treat simply as a line-delimiter while importing the file into OpenRefine. The file sizes are typically under 4GB, usually only 1GB or 2GB in size, where I often have over 20GB system memory available to give to the Java heap.Other tools allow reading a file as a stream of characters and separating into new lines based on a custom char sequence.
I would like our OpenRefine line-based importer or a new importer to handle this use case.
Proposed solution
Allow line-based importer to have an option to use a custom line-delimiter character sequence (overriding the defaults of
\nor\r\n.Alternatives considered
I have to use other tools.
Additional context