Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

close function instead of mutable reference #1969

Closed
liyongjing opened this issue Jun 30, 2022 · 4 comments
Closed

close function instead of mutable reference #1969

liyongjing opened this issue Jun 30, 2022 · 4 comments
Labels
parquet Changes to the parquet crate question Further information is requested

Comments

@liyongjing
Copy link

Describe the bug
A clear and concise description of what the bug is.

use super::builder;
use arrow::record_batch::RecordBatch;
use parquet::{
    arrow::ArrowWriter,
    basic::Compression,
    file::properties::{WriterProperties, WriterVersion},
};
use std::{fs::File, path::Path};

pub struct Writer {
    writer: ArrowWriter<File>,
}

impl Writer {
    pub fn new<P: AsRef<Path>>(path: P) -> Self {
        let schema = builder::get_schema();
        let file = File::create(&path).unwrap();
        let props = WriterProperties::builder().build();
        let writer = ArrowWriter::try_new(file, schema, Some(props)).unwrap();
        Self { writer }
    }

    pub fn write(&mut self, batch: &RecordBatch) {
        self.writer.write(batch).expect("Writing batch");
    }

    pub fn close(&mut self) {
        // writer must be closed to write footer
        self.writer.close().unwrap();
    }
}

To Reproduce
Steps to reproduce the behavior:

   |         self.writer.close().unwrap();
   |         ^^^^^^^^^^^ move occurs because `self.writer` has type `ArrowWriter<std::fs::File>`, which does not implement the `Copy` trait

Expected behavior
A clear and concise description of what you expected to happen.

write and close use the same parameters

parquet-17.0.0/src/arrow/arrow_writer/mod.rs 222 line

    /// Close and finalize the underlying Parquet writer
    pub fn close(&mut self) -> Result<parquet_format::FileMetaData> {
        self.flush()?;
        self.writer.close()
    }

parquet-17.0.0/src/file/writer.rs 167 line

    pub fn close(&mut self) -> Result<parquet::FileMetaData> {
        self.assert_previous_writer_closed()?;
        let metadata = self.write_metadata()?;
        Ok(metadata)
    }

Additional context
Add any other context about the problem here.

@liyongjing liyongjing added the bug label Jun 30, 2022
@tustvold
Copy link
Contributor

This was an intentional change, to truncate the lifetime of the writer on calling close. See this thread for more context #1719 (comment)

The change is more obviously beneficial on the ColumnWriter, etc... but you can also see it on ArrowWriter. Consider

let file = File::create(&path).unwrap();
let writer = ArrowWriter::try_new(&mut file, ...);

...

writer.close();

file.rewind().unwrap();

Without this change you need to manually drop the writer in order to reuse the file. We also benefit from the borrow checker making sure we can't call close multiple times.

If you are integrating with some system that requires &mut self for some reason, Option::take should allow you to get the previous semantics.

@tustvold tustvold added question Further information is requested and removed bug labels Jun 30, 2022
@liyongjing
Copy link
Author

I call write multiple times, call close one times using static HashMap.

fn global() -> &'static Mutex<HashMap<String, Writer>> {
    static INSTANCE: OnceCell<Mutex<HashMap<String, Writer>>> = OnceCell::new();
    INSTANCE.get_or_init(|| Mutex::new(HashMap::new()))
}
pub fn write(path: &str, val: Vec<u8>) {
    let mut g = global_channel().lock().unwrap();
    let w = g.entry(path.to_string()).or_insert(Writer::new(path));
    let batch =  ...;
    w.write(&batch);
}
pub fn close(path: &str) {
    let mut g = global_channel().lock().unwrap();
    if let Some(v) = g.get_mut(path) {
        // *v don't copy
        v.close();
    }
    g.remove(path);
}

@tustvold
Copy link
Contributor

tustvold commented Jul 1, 2022

How about?

pub fn close(path: &str) -> Result<()> {
    let mut g = global_channel().lock().unwrap()
    if let Some(v) = g.remove(path) {
        v.close()?;
    }
}

@liyongjing
Copy link
Author

thanks for you comment

@alamb alamb added the parquet Changes to the parquet crate label Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants