-
Notifications
You must be signed in to change notification settings - Fork 4k
Added support for S3 reader for perquet format in go. #14683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format? or See also: |
| func (f *Reader) parseMetaData() error { | ||
| if f.footerOffset <= int64(footerSize) { | ||
| return fmt.Errorf("parquet: file too small (size=%d)", f.footerOffset) | ||
| } | ||
|
|
||
| buf := make([]byte, footerSize) | ||
| // backup 8 bytes to read the footer size (first four bytes) and the magic bytes (last 4 bytes) | ||
| n, err := f.r.ReadAt(buf, f.footerOffset-int64(footerSize)) | ||
| if err != nil { | ||
| return fmt.Errorf("parquet: could not read footer: %w", err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this logic already exists in the file package of the Parquet module and is unnecessary to reproduce. All you need to do is have something that meets the parquet.ReaderAtSeeker interface and everything else will be handled automatically.
| type S3file struct { | ||
| source.ParquetFile | ||
| } | ||
|
|
||
| func (rdr S3file) ReadAt(p []byte, off int64) (n int, err error) { | ||
| rdr.Seek(off, io.SeekCurrent) | ||
| return rdr.Read(p) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could like just pass a pointer to this to file.OpenParquetFile and it would work. The rest of this file is then unnecessary and just copying code.
zeroshade
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately, I don't think there's a need to add a specific support for S3 as anything that can support the io.ReaderAt + io.Seeker interfaces can be used with the existing library and there are multiple different S3 libraries which can provide that interface.
|
Going to close this for now, feel free to reopen and tag me if needed |
No description provided.