Skip to content

Provide info of exact processed bytes to resume a large file processing #446

@tmokmss

Description

@tmokmss

Summary

I want to know exactly how many bytes a parser read when it parsed the current record. This information is useful when a process is suspended and want to restart it from the last record, without reading the target csv from the beginning again, which sometimes requires very long time to get the last position.

Motivation

We can resume processing a csv file from a certain position using createReadStream's start option:

const parser = createReadStream('foo.csv', { start: startPosition }).pipe(parse({}));

But to use the option, we have to know the exact position (in bytes) of the first byte of the last record. However, currently CSV parse does not offer such information (ref), making it difficult to resume a process.

Alternative

  1. parser.info.bytes seems to include the number of bytes it has read so far (it eagerly reads a file). Because it does not necessarily mean the exact positon of a head of a record, it cannot be used for this purpose.

  2. parse function has from and fromLines option, but it has to read the file from the beginning so it didn't shorten the processinig time at all.

Draft

Write a proposal for the feature, how it works, its expected coverage, a sample code or unit test. If the feature is related to a documentation or article, write the content or the table of content you expect.

Additional context

Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions