Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panicked at 'called Option::unwrap() on a None value' Arrow Parquet writer with huge arrays #139

Closed
alamb opened this issue Apr 26, 2021 · 1 comment · Fixed by #3564
Closed
Labels
bug parquet Changes to the parquet crate

Comments

@alamb
Copy link
Contributor

alamb commented Apr 26, 2021

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-10920

I stumbled across this by chance. I am not too surprised that this fails but I would expect it to fail gracefully and not with a segmentation fault.

 

 use std::fs::File;
use std::sync::Arc;

use arrow::array::StringBuilder;
use arrow::datatypes::{DataType, Field, Schema};
use arrow::error::Result;
use arrow::record_batch::RecordBatch;

use parquet::arrow::ArrowWriter;

fn main() -> Result<()> {
    let schema = Schema::new(vec![
        Field::new("c0", DataType::Utf8, false),
        Field::new("c1", DataType::Utf8, true),
    ]);
    let batch_size = 2500000;
    let repeat_count = 140;
    let file = File::create("/tmp/test.parquet")?;
    let mut writer = ArrowWriter::try_new(file, Arc::new(schema.clone()), None).unwrap();
    let mut c0_builder = StringBuilder::new(batch_size);
    let mut c1_builder = StringBuilder::new(batch_size);

    println!("Start of loop");
    for i in 0..batch_size {
        let c0_value = format!("{:032}", i);
        let c1_value = c0_value.repeat(repeat_count);
        c0_builder.append_value(&c0_value)?;
        c1_builder.append_value(&c1_value)?;
    }

    println!("Finish building c0");
    let c0 = Arc::new(c0_builder.finish());

    println!("Finish building c1");
    let c1 = Arc::new(c1_builder.finish());

    println!("Creating RecordBatch");
    let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![c0, c1])?;

    // write the batch to parquet
    println!("Writing RecordBatch");
    writer.write(&batch).unwrap();

    println!("Closing writer");
    writer.close().unwrap();

    Ok(())
}
Start of loop
Finish building c0
Finish building c1
Creating RecordBatch
Writing RecordBatch
Segmentation fault (core dumped)
``` 
@alamb alamb added the arrow Changes to the arrow crate label Apr 26, 2021
@jorgecarleitao jorgecarleitao added bug parquet Changes to the parquet crate security and removed arrow Changes to the arrow crate labels Apr 26, 2021
@alamb
Copy link
Contributor Author

alamb commented Oct 29, 2021

When I run this test program I get an error (not super helpful but not a segfault either)

warning: `rust_parquet` (bin "rust_parquet") generated 2 warnings
    Finished release [optimized + debuginfo] target(s) in 1m 22s
     Running `/Volumes/RAMDisk/df-target/release/rust_parquet`
Start of loop
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /Users/alamb/Software/arrow-rs/arrow/src/array/builder.rs:925:71
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@alamb alamb changed the title Segmentation fault in Arrow Parquet writer with huge arrays panicked at 'called Option::unwrap() on a None value' Arrow Parquet writer with huge arrays Oct 29, 2021
@alamb alamb removed the security label Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants