Skip to content

[C++][Parquet] Failure decoding sample dict-encoded file from parquet-compatibility project #42298

@asfimport

Description

@asfimport

See attached. This throws an exception when read:

$ debug/parquet_reader nation.dict.parquet 
File statistics:
Version: 1
Created By: parquet-mr
Total rows: 25
Number of RowGroups: 1
Number of Real Columns: 4
Number of Columns: 4
Number of Selected Columns: 4
Column 0: nation_key (INT32)
Column 1: name (BYTE_ARRAY)
Column 2: region_key (INT32)
Column 3: comment_col (BYTE_ARRAY)
--- Row Group 0 ---
--- Total Bytes 0 ---
  rows: 25---
Column 0
, values: 25  Statistics Not Set
  compression: UNCOMPRESSED, encodings: 
  uncompressed size: 125, compressed size: 125
Column 1
, values: 25  Statistics Not Set
  compression: UNCOMPRESSED, encodings: 
  uncompressed size: 322, compressed size: 322
Column 2
, values: 25  Statistics Not Set
  compression: UNCOMPRESSED, encodings: 
  uncompressed size: 125, compressed size: 125
Column 3
, values: 25  Statistics Not Set
  compression: UNCOMPRESSED, encodings: 
  uncompressed size: 2002, compressed size: 2002
nation_key              name                    region_key              comment_col             
0                       Parquet error: Unexpected end of stream.

However, I checked that I can read this file with Impala:

In [13]: hdfs.put('/tmp/nation-dict-test/test.parq', 'nation.dict.parquet')
Out[13]: '/tmp/nation-dict-test/test.parq'

In [14]: pf = con.parquet_file('/tmp/nation-dict-test')

In [15]: pf.execute()
Out[15]: 
    nation_key            name  region_key  \
0            0         ALGERIA           0   
1            1       ARGENTINA           1   
2            2          BRAZIL           1   
3            3          CANADA           1   
4            4           EGYPT           4   
5            5        ETHIOPIA           0   
6            6          FRANCE           3   
7            7         GERMANY           3   
8            8           INDIA           2   
9            9       INDONESIA           2   
10          10            IRAN           4   
11          11            IRAQ           4   
12          12           JAPAN           2   
13          13          JORDAN           4   
14          14           KENYA           0   
15          15         MOROCCO           0   
16          16      MOZAMBIQUE           0   
17          17            PERU           1   
18          18           CHINA           2   
19          19         ROMANIA           3   
20          20    SAUDI ARABIA           4   
21          21         VIETNAM           2   
22          22          RUSSIA           3   
23          23  UNITED KINGDOM           3   
24          24   UNITED STATES           1   

                                          comment_col  
0    haggle. carefully final deposits detect slyly...  
1   al foxes promise slyly according to the regula...  
2   y alongside of the pending deposits. carefully...  
3   eas hang ironic, silent packages. slyly regula...  
4   y above the carefully unusual theodolites. fin...  
5                     ven packages wake quickly. regu  
6              refully final requests. regular, ironi  
7   l platelets. regular accounts x-ray: unusual, ...  
8   ss excuses cajole slyly across the packages. d...  
9    slyly express asymptotes. regular deposits ha...  
10  efully alongside of the slyly final dependenci...  
11  nic deposits boost atop the quickly final requ...  
12               ously. final, express gifts cajole a  
13  ic deposits are blithely about the carefully r...  
14   pending excuses haggle furiously deposits. pe...  
15  rns. blithely bold courts among the closely re...  
16      s. ironic, unusual asymptotes wake blithely r  
17  platelets. blithely pending dependencies use f...  
18  c dependencies. furiously express notornis sle...  
19  ular asymptotes are about the furious multipli...  
20  ts. silent requests haggle. closely express pa...  
21     hely enticingly express accounts. even, final   
22   requests against the platelets use never acco...  
23  eans boost carefully special requests. account...  
24  y final packages. slow foxes cajole quickly. q...  

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

Related issues:

Original Issue Attachments:

Note: This issue was originally created as PARQUET-816. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions