Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Deno.lines #2101

Closed
wants to merge 1 commit into from

Conversation

6 participants
@bartlomieju
Copy link
Contributor

bartlomieju commented Apr 12, 2019

This PR adds Deno.lines method, mentioned in #2081

Alternative implementation by @johnsonjo4531.

if (newLineIndex > -1) {
const line = scratch.slice(0, newLineIndex);
scratch = scratch.slice(newLineIndex + 1);
yield line;

This comment has been minimized.

Copy link
@hayd

hayd Apr 12, 2019

Contributor

doesn't only one yield per chunk mean scratch could grow linearly?
I think this newLineIndex/yield should be in a while loop.

const decoder = new TextDecoder();
let scratch = "";

for await (const chunk of toAsyncIterator(r)) {

This comment has been minimized.

Copy link
@ry

ry Apr 12, 2019

Collaborator

Ideally we could import code from deno_std here. The correct way to solve this is to use BufReader in std/io/bufio.ts
https://github.com/denoland/deno_std/blob/master/io/bufio.ts#L220

In order to use that code, we need to change //js/ to use extensions in the imports... (cc @kitsonk )

This comment has been minimized.

Copy link
@kitsonk

kitsonk Apr 12, 2019

Contributor

Yeah, that is complicated (using extensions). I was planning to tackle it in the medium term. Essentially we have to replace rollup.

This comment has been minimized.

Copy link
@ry

ry Apr 12, 2019

Collaborator

Maybe we should move bufio into //js then (problem is that it will likely bring all of std/io with it as dependencies)

This comment has been minimized.

Copy link
@ry

ry Apr 12, 2019

Collaborator

Or maybe "lines" should just be in deno_std instead of core..

This comment has been minimized.

Copy link
@bartlomieju

bartlomieju Apr 13, 2019

Author Contributor

It looks like BufReader provides a nice way to solve lines 👍

Reading line by line is such a common operation that IMHO it should be Deno.lines instead of living in deno_std.

It's greatly illustrated by @ry's example. This is super simple:

cat /etc/passwd |  deno eval " \
  for await (let line of Deno.lines(Deno.stdin) { \
    console.log(line.split(':')[0]) }"

This is not:

cat /etc/passwd |  deno eval " \
  import { lines } from "https://deno.land/x/io/mod.ts; \
  for await (let line of lines(Deno.stdin) { \
    console.log(line.split(':')[0]) }"

This comment has been minimized.

Copy link
@ry

ry Apr 13, 2019

Collaborator

@bartlomieju I agree it would be great to have built in to the CLI... but I really want to ensure that we use the proper bufio infrastructure to solve these problems rather than having a bunch of one-off algorithms for buffering. So either we need to first be able to import deno_std code into the CLI (ideal) or move bufio from deno_std into the main repo (less ideal)

Closing this one for now. Thanks @bartlomieju !

for await (const chunk of toAsyncIterator(r)) {
scratch += decoder.decode(chunk);

let newLineIndex = scratch.indexOf("\n");

This comment has been minimized.

Copy link
@johnsonjo4531

johnsonjo4531 Apr 12, 2019

Don't forget about windows carriage return line feeds \r\n. Maybe you could keep the code for finding the index of \n, but also check the index in scratch before the \n and check if it's the \r character then skip both the characters if that is the case.

This comment has been minimized.

Copy link
@zekth

zekth Apr 13, 2019

Contributor

Using something like this for the remaining lines too.

let crLf = false;
let idx = newLineIndex;
if(scratch[idx-1]=="\r") {
  idx--;
  crlf = true;
}
const line = scratch.slice(0, idx);

...

const remainingLines;
if(crLF) {
  remainingLines = scratch.split("\n");
} else {
  remainingLines = scratch.split("\r\n");
}

This comment has been minimized.

Copy link
@bartlomieju

bartlomieju Apr 13, 2019

Author Contributor

@johnsonjo4531 you're right, that's where your implementation is better :) let's wait on BufReader it should solve this issue.

@johnsonjo4531

This comment has been minimized.

Copy link

johnsonjo4531 commented Apr 12, 2019

Great job @bartlomieju your code is very clean and simple! One of the differences between my implementation and Bartek's is I tried not to decode mine as a string and kept it in bytes (i.e. Uint8Arrays). I should also remark I am not used to working with byte data, so I could have made some very newbie mistakes, so feel free to leave some comments on my commits in my repo. @bartlomieju also used a lot better tools from Deno for the job.

Maybe it would be good to not only have a method for getting the lines as strings (as in @bartlomieju's implemenation), but maybe also have the ability to get the lines as Uint8Arrays (as in my implementation)? I'm not sure the benefit other than preventing overhead of decoding the data as a string just throwing ideas out there. Honestly if we did use both we might just make the string lines implementation wrap the bytes lines implementation just to save build size. It could look like this. Yet again just throwing ideas out there.

async function* lines (r: Deno.Reader) {
  const encoder = new TextEncoder();
  for await(const line of lines_bytes(r)) {
    yield encoder.encode(line);
  }
}

johnsonjo4531 added a commit to johnsonjo4531/read_lines that referenced this pull request Apr 13, 2019

Restructuring v1.0.0
Renamed the main file. Decided to follow @ry's suggestion in
denoland/deno#2101 (comment)
Added two extra examples. Decided not to open and close files for the user,
but instead use a Reader.
@bartlomieju

This comment has been minimized.

Copy link
Contributor Author

bartlomieju commented Apr 13, 2019

@johnsonjo4531 thanks I appreciate it.

I agree with you, we can solve it simply with: Deno.byteLines and Deno.lines (where lines just decodes lines yielded by byteLines

@ry ry closed this Apr 13, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.