-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
See attached. This throws an exception when read:
$ debug/parquet_reader nation.dict.parquet
File statistics:
Version: 1
Created By: parquet-mr
Total rows: 25
Number of RowGroups: 1
Number of Real Columns: 4
Number of Columns: 4
Number of Selected Columns: 4
Column 0: nation_key (INT32)
Column 1: name (BYTE_ARRAY)
Column 2: region_key (INT32)
Column 3: comment_col (BYTE_ARRAY)
--- Row Group 0 ---
--- Total Bytes 0 ---
rows: 25---
Column 0
, values: 25 Statistics Not Set
compression: UNCOMPRESSED, encodings:
uncompressed size: 125, compressed size: 125
Column 1
, values: 25 Statistics Not Set
compression: UNCOMPRESSED, encodings:
uncompressed size: 322, compressed size: 322
Column 2
, values: 25 Statistics Not Set
compression: UNCOMPRESSED, encodings:
uncompressed size: 125, compressed size: 125
Column 3
, values: 25 Statistics Not Set
compression: UNCOMPRESSED, encodings:
uncompressed size: 2002, compressed size: 2002
nation_key name region_key comment_col
0 Parquet error: Unexpected end of stream.However, I checked that I can read this file with Impala:
In [13]: hdfs.put('/tmp/nation-dict-test/test.parq', 'nation.dict.parquet')
Out[13]: '/tmp/nation-dict-test/test.parq'
In [14]: pf = con.parquet_file('/tmp/nation-dict-test')
In [15]: pf.execute()
Out[15]:
nation_key name region_key \
0 0 ALGERIA 0
1 1 ARGENTINA 1
2 2 BRAZIL 1
3 3 CANADA 1
4 4 EGYPT 4
5 5 ETHIOPIA 0
6 6 FRANCE 3
7 7 GERMANY 3
8 8 INDIA 2
9 9 INDONESIA 2
10 10 IRAN 4
11 11 IRAQ 4
12 12 JAPAN 2
13 13 JORDAN 4
14 14 KENYA 0
15 15 MOROCCO 0
16 16 MOZAMBIQUE 0
17 17 PERU 1
18 18 CHINA 2
19 19 ROMANIA 3
20 20 SAUDI ARABIA 4
21 21 VIETNAM 2
22 22 RUSSIA 3
23 23 UNITED KINGDOM 3
24 24 UNITED STATES 1
comment_col
0 haggle. carefully final deposits detect slyly...
1 al foxes promise slyly according to the regula...
2 y alongside of the pending deposits. carefully...
3 eas hang ironic, silent packages. slyly regula...
4 y above the carefully unusual theodolites. fin...
5 ven packages wake quickly. regu
6 refully final requests. regular, ironi
7 l platelets. regular accounts x-ray: unusual, ...
8 ss excuses cajole slyly across the packages. d...
9 slyly express asymptotes. regular deposits ha...
10 efully alongside of the slyly final dependenci...
11 nic deposits boost atop the quickly final requ...
12 ously. final, express gifts cajole a
13 ic deposits are blithely about the carefully r...
14 pending excuses haggle furiously deposits. pe...
15 rns. blithely bold courts among the closely re...
16 s. ironic, unusual asymptotes wake blithely r
17 platelets. blithely pending dependencies use f...
18 c dependencies. furiously express notornis sle...
19 ular asymptotes are about the furious multipli...
20 ts. silent requests haggle. closely express pa...
21 hely enticingly express accounts. even, final
22 requests against the platelets use never acco...
23 eans boost carefully special requests. account...
24 y final packages. slow foxes cajole quickly. q... Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm
Related issues:
- Segfaults and encoding issues in Python Parquet reads (is related to)
Original Issue Attachments:
Note: This issue was originally created as PARQUET-816. Please see the migration documentation for further details.