Added support for S3 reader for perquet format in go. #14683

writoblocknaut · 2022-11-21T00:59:15Z

No description provided.

github-actions · 2022-11-21T00:59:34Z

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

zeroshade · 2022-11-28T15:38:14Z

go/parquet/s3/s3_reader.go

+func (f *Reader) parseMetaData() error {
+	if f.footerOffset <= int64(footerSize) {
+		return fmt.Errorf("parquet: file too small (size=%d)", f.footerOffset)
+	}
+
+	buf := make([]byte, footerSize)
+	// backup 8 bytes to read the footer size (first four bytes) and the magic bytes (last 4 bytes)
+	n, err := f.r.ReadAt(buf, f.footerOffset-int64(footerSize))
+	if err != nil {
+		return fmt.Errorf("parquet: could not read footer: %w", err)
+	}


All of this logic already exists in the file package of the Parquet module and is unnecessary to reproduce. All you need to do is have something that meets the parquet.ReaderAtSeeker interface and everything else will be handled automatically.

zeroshade · 2022-11-28T15:39:58Z

go/parquet/s3/s3_reader.go

+type S3file struct {
+	source.ParquetFile
+}
+
+func (rdr S3file) ReadAt(p []byte, off int64) (n int, err error) {
+	rdr.Seek(off, io.SeekCurrent)
+	return rdr.Read(p)
+}


You could like just pass a pointer to this to file.OpenParquetFile and it would work. The rest of this file is then unnecessary and just copying code.

zeroshade

Ultimately, I don't think there's a need to add a specific support for S3 as anything that can support the io.ReaderAt + io.Seeker interfaces can be used with the existing library and there are multiple different S3 libraries which can provide that interface.

zeroshade · 2022-12-06T21:04:04Z

Going to close this for now, feel free to reopen and tag me if needed

Added support for S3 reader for perquet format in go.

7bee4e3

github-actions bot added the Component: Go label Nov 21, 2022

zeroshade reviewed Nov 28, 2022

View reviewed changes

zeroshade closed this Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added support for S3 reader for perquet format in go. #14683

Added support for S3 reader for perquet format in go. #14683

Uh oh!

writoblocknaut commented Nov 21, 2022

Uh oh!

github-actions bot commented Nov 21, 2022

Uh oh!

zeroshade Nov 28, 2022

Uh oh!

zeroshade Nov 28, 2022

Uh oh!

zeroshade left a comment

Uh oh!

zeroshade commented Dec 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Added support for S3 reader for perquet format in go. #14683

Added support for S3 reader for perquet format in go. #14683

Uh oh!

Conversation

writoblocknaut commented Nov 21, 2022

Uh oh!

github-actions bot commented Nov 21, 2022

Uh oh!

zeroshade Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

zeroshade Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

zeroshade left a comment

Choose a reason for hiding this comment

Uh oh!

zeroshade commented Dec 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants