Skip to content

7z archives error unpacking #70968

@Delphin1

Description

@Delphin1

Consumption of data packed in 7z does not work on s3 and locally.

select *, _file as file
FROM file('tr_cyp_kosmos2023-12-25.7z', 'CSV'
)

limit 100
SETTINGS
format_csv_delimiter=' ',
input_format_csv_detect_header=false,
input_format_csv_allow_whitespace_or_tab_as_delimiter=true
;



Query id: ac28d786-85c0-4128-b67b-62c50d09a318


Elapsed: 0.163 sec.

Received exception from server (version 24.9.1):
Code: 27. DB::Exception: Received from host:9000. DB::Exception: Cannot parse input: expected ' ' before: 'q#e.\0\0\0\0\0�\0\0\0\0\0\0\0�\'j�����\r]\0\f�i�t[��]�Ƥ=�ko�)k�/T۟��G��L�N\fo�\baʿBn�w����{�����\0�mkr��Ps����?�ĸ���kZ:!���G�Ն�E,~ݽ�����v��\'�mzx[': (at row 1)
:
Row 1:
Column 0,   name: ticker_quote_time,         type: DateTime64(3),  parsed text: "7z��<SINGLE QUOTE><0x1C><ASCII NUL><0x04>w�"
ERROR: garbage after DateTime64(3): "q#e.<0x01><ASCII NUL><ASCII NUL><ASCII NUL><ASCII NUL><ASCII NUL>"

: (in file/uri /var/lib/clickhouse/user_files/ticks/tr_cyp_kosmos2023-12-25.7z): While executing ParallelParsingBlockInputFormat: While executing File. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)

A clear and concise description of what works not as it is supposed to.

How to reproduce
cd /var/lib/clickhouse/user_files
wget https://github.com/Delphin1/clickhouse-7z-issue/blob/main/tr_cyp_kosmos2023-12-25.7z
query select above

  • Which ClickHouse server version to use
    CH version: 24.9.1.3278

Expected behavior
Parsed data table

Looks like some codecs do not support.

7z t tr_cyp_kosmos2023-12-25.7z

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,8 CPUs LE)

Scanning the drive for archives:
1 file, 77575 bytes (76 KiB)

Testing archive: tr_cyp_kosmos2023-12-25.7z
--
Path = tr_cyp_kosmos2023-12-25.7z
Type = 7z
Physical Size = 77575
Headers Size = 162
Method = LZMA2:1536k
Solid = -
Blocks = 1

Everything is Ok

Size:       1284855
Compressed: 77575

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions