-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SeekableReader not available from readers over InputStream #17
Comments
@tgregg Why was this closed? There's good use cases for this, and at least some scenarios where we can support seeking. |
Hello, I'm trying to learn more about what is this issue here. I'm looking for a way to seek into a specific item inside an Ion file. In the text form, I can seek to the position and start reading there, it works fine, but I can't figure out how to do the same using the binary format. Is this issue related to some feature like this? Or is there already some way to seek and read a specific entry using the binary format? Thanks. |
@wilkerlucio Can you share more information about what you're trying to do? Are you trying to skip forward in a stream until you find the value you're looking for, or do you need to be able to seek back to a value you've seen previously? Can you share the code that you mentioned works with text Ion but not binary Ion? |
Sure, I have a system in which I like to index records stored in file data formats. The idea is that when I try to look something up (for example, I want a record with ID 123, from this ID, I can have an index that tells me the record 123 is stored at offset 452315 of a given file). With the offset at hand, I like to open the file, skip to that offset and read the record that starts at that point. This works fine with text format. Here is a snippet demonstrating it: package com.amazon-ion-encode-demo;
import com.amazon.ion.IonReader;
import com.amazon.ion.IonType;
import com.amazon.ion.IonWriter;
import com.amazon.ion.system.IonReaderBuilder;
import com.amazon.ion.system.IonTextWriterBuilder;
import java.io.*;
public class IonSkipReadDemo {
IonReaderBuilder readerBuilder = IonReaderBuilder.standard();
IonTextWriterBuilder textWriterBuilder = IonTextWriterBuilder.standard();
public static void main(String[] args) {
try {
IonSkipReadDemo demo = new IonSkipReadDemo();
demo.writeFile();
demo.readSkipping(53); // 53 is the byte offset of the record with "world 4"
} catch (Throwable e) {
System.out.println(e.getMessage());
}
}
void writeFile() throws IOException {
try (OutputStream out = new FileOutputStream("file-java.txt");
IonWriter textWriter = textWriterBuilder.build(out)) {
for (long i = 1; i < 1000; i++) {
writeHelloWorld(textWriter, "world " + i);
}
}
}
void readSkipping(long offset) throws IOException {
InputStream in = new FileInputStream("file-java.txt");
in.skip(offset);
try (IonReader reader = readerBuilder.build(in)) {
readHelloWorld(reader);
}
}
void writeHelloWorld(IonWriter writer, String value) throws IOException {
writer.stepIn(IonType.STRUCT); // step into a struct
writer.setFieldName("hello"); // set the field name for the next value to be written
writer.writeString(value); // write the next value
writer.stepOut(); // step out of the struct
}
void readHelloWorld(IonReader reader) {
reader.next(); // position the reader at the first value, a struct
reader.stepIn(); // step into the struct
reader.next(); // position the reader at the first value in the struct
String fieldName = reader.getFieldName(); // retrieve the current value's field name
String value = reader.stringValue(); // retrieve the current value's String value
reader.stepOut(); // step out of the struct
System.out.println(fieldName + " " + value); // prints "hello world"
}
} But I can't figure out how to do the same with the binary because it must read the header at the beginning, and I'm also not sure how it would handle the local symbol tables in this case (although, for my case, I can just used shared tables, if that helps). Is there a way to make the same with the binary format? |
Thank you for the illustration. I understand what you're trying to do. This works with text because (in general) text Ion does not require a symbol table. If you seek a text Ion For binary Ion, you correctly identified part of the problem. Seeking past a symbol table throws away context that the binary Therefore, rather than calling You can still quickly seek past binary Ion values, however; the Ion 1.0 specification optimizes for this case by requiring all values to be prefixed with their length. Skipping a value simply requires the reader to parse the value's length from its header, then seek ahead by that length. The To achieve something similar to what you're attempting above using the functionality available in the library today, I recommend recording value index rather than byte position during your indexing pass. Then, create your For example: void readSkipping(long valueIndex) throws IOException {
try (IonReader reader = readerBuilder.build(new FileInputStream("file-java.10n"))) {
for (int i = 0; i < valueIndex; i++) {
reader.next();
}
readHelloWorld(reader);
}
} This assumes you only have one index to revisit; if you have more than one, visiting them all in order using the same |
Hello @tgregg, thanks for the detailed response. Yes, jumping is valid, but it's only performant up to a point. In my use case, I need low latency, and I have huge files (gbs of data in a single file), and also, the file system is a networked one. In this scenario, if I have a record close to the end of a gb sized file, and I have to keep jumping with a networked file system, it won't perform well enough for my requirements. I'm looking forward to the |
Imported from ION-243,IONJAVA-102
The text was updated successfully, but these errors were encountered: