-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
let preston cat
dereference content line ranges
#128
Comments
Crazy idea - perhaps we could specify disjoint sets of lines, e.g. L1,10 or L1,10-20. This would make it simple to pair any row of a TSV with the header row - without the header, any other row's meaning can be difficult to parse. e.g.
And, if there's any use for it, this could also be extended to |
@mielliott Introducing the A more general approach would be to say something like: here's the schema definition associated with this piece of content. With DwC-A, this schema definition would be a fragment of the meta.xml . And . . . your proposed notation sounds neat for reasons other than just getting a sense for a schema. I imagine that subsetting a disjoint range of records from a (potentially) giant dataset would be very useful. |
Still gotta squash some bugs. Bear with me. |
Tests are passing in my IntelliJ (Windows) but failing when running |
@mielliott sounds good. Sometimes I just commit known failures to share the joys of fixing the bug or test error. . . |
@mielliott Thanks for sharing your code. After fixing the test case, I tried:
However, when I tried:
the command didn't complete, it just seemed stuck. Instead, I expected:
Any idea what it going on? |
Yeah, I think I figured it out. Just a sec. |
There were some weird bugs due to some inherited code in the SelectedLinesReader class, but it's sorted out now:
And you can also enjoy lists of lines
as well as lists of line ranges
|
Note that the trailing newline character is not printed. This was the existing behavior for single-line queries, so it is preserved for multi-line queries. For example, |
Without the trailing newline character, some possibly unexpected things happen. e.g.
Maybe we should include the "\n" at the end of lines. Is it part of the line? I feel like it is. @jhpoelen thoughts? Also note that
Notice where the $ appears after running each command. |
@mielliott neat examples. I'd say, follow the wisdom of cat: cat doesn't add a newline, but echo without -n does -
with cut appending newline except when using -z
|
Agreed, I don't think we should be adding \n where there isn't one. I was more wondering about preston's current behavior of removing the \n at the end of a line. For example, in catting
So I suggest that have preston print the \n if there is one |
sounds good! |
@mielliott thanks for implementing the line range feature! I just tried:
and
and
very neat! |
@mielliott Just installed preston 0.3.0 and found that
https://deeplinker.bio/cat/line:zip:hash://sha256/29d30b566f924355a383b13cd48c3aa239d42cba0a55f4ccfc2930289b88b43c!/occurrence.txt!/L1
works like a charm (see attached screenshot) . Note that the hash is the (huge) ebird dataset
I had the urge to use a line range e.g., L1-2 . Is that something you had in mind too?
Originally posted by @jhpoelen in #109 (comment)
The text was updated successfully, but these errors were encountered: