Skip to content

Conversation

@writoblocknaut
Copy link

No description provided.

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

Comment on lines +166 to +176
func (f *Reader) parseMetaData() error {
if f.footerOffset <= int64(footerSize) {
return fmt.Errorf("parquet: file too small (size=%d)", f.footerOffset)
}

buf := make([]byte, footerSize)
// backup 8 bytes to read the footer size (first four bytes) and the magic bytes (last 4 bytes)
n, err := f.r.ReadAt(buf, f.footerOffset-int64(footerSize))
if err != nil {
return fmt.Errorf("parquet: could not read footer: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this logic already exists in the file package of the Parquet module and is unnecessary to reproduce. All you need to do is have something that meets the parquet.ReaderAtSeeker interface and everything else will be handled automatically.

Comment on lines +47 to +54
type S3file struct {
source.ParquetFile
}

func (rdr S3file) ReadAt(p []byte, off int64) (n int, err error) {
rdr.Seek(off, io.SeekCurrent)
return rdr.Read(p)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could like just pass a pointer to this to file.OpenParquetFile and it would work. The rest of this file is then unnecessary and just copying code.

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, I don't think there's a need to add a specific support for S3 as anything that can support the io.ReaderAt + io.Seeker interfaces can be used with the existing library and there are multiple different S3 libraries which can provide that interface.

@zeroshade
Copy link
Member

Going to close this for now, feel free to reopen and tag me if needed

@zeroshade zeroshade closed this Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants