Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input value ... of a column ... is out of allowed Date32 range, which is [25567, 120530]: While executing ParquetBlockInputFormat #51402

Closed
filimonov opened this issue Jun 26, 2023 · 2 comments · Fixed by #55696
Labels
comp-formats Input / output formats feature

Comments

@filimonov
Copy link
Contributor

import numpy as np
import pandas as pd
import pyarrow as pa
from datetime import date
df = pd.DataFrame({'a': [1, 2, 3],'b': [date.fromisoformat('1900-10-01'),date.fromisoformat('2019-12-04'),date.fromisoformat('9999-12-31') ]})
table = pa.Table.from_pandas(df)
import pyarrow.parquet as pq
pq.write_table(table, 'date.parquet')
lickhouse-local --stacktrace
ClickHouse local version 23.5.1.988 (official build).

laptop-5591 :) select * from file('date.parquet','Parquet','a int, b Date32')

SELECT *
FROM file('date.parquet', 'Parquet', 'a int, b Date32')

Query id: 121daf93-4a9e-4d9e-a3ca-8888021fcacf


0 rows in set. Elapsed: 0.181 sec. 

Received exception:
Code: 321. DB::Exception: Input value 2932896 of a column "b" is out of allowed Date32 range, which is [25567, 120530]: While executing ParquetBlockInputFormat: While executing File. (VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE), Stack trace (when copying this message, always include the lines below):

0. ./build_docker/./src/Common/Exception.cpp:92: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000e144795 in /usr/bin/clickhouse
1. ./build_docker/./contrib/llvm-project/libcxx/include/string:1499: DB::Exception::Exception<int&, String const&, int, int>(int, FormatStringHelperImpl<std::type_identity<int&>::type, std::type_identity<String const&>::type, std::type_identity<int>::type, std::type_identity<int>::type>, int&, String const&, int&&, int&&) @ 0x0000000014b4b927 in /usr/bin/clickhouse
2. ./build_docker/./src/Processors/Formats/Impl/ArrowColumnToCHColumn.cpp:283: DB::readColumnFromArrowColumn(std::shared_ptr<arrow::ChunkedArray>&, String const&, String const&, bool, std::unordered_map<String, DB::ArrowColumnToCHColumn::DictionaryInfo, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, DB::ArrowColumnToCHColumn::DictionaryInfo>>>&, bool, bool, bool&, std::shared_ptr<DB::IDataType const>, bool) @ 0x0000000014b41b5b in /usr/bin/clickhouse
3. ./build_docker/./contrib/boost/boost/smart_ptr/intrusive_ptr.hpp:115: DB::ArrowColumnToCHColumn::arrowColumnsToCHChunk(DB::Chunk&, std::unordered_map<String, std::shared_ptr<arrow::ChunkedArray>, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, std::shared_ptr<arrow::ChunkedArray>>>>&, unsigned long, DB::BlockMissingValues*) @ 0x0000000014b447e0 in /usr/bin/clickhouse
4. ./build_docker/./contrib/llvm-project/libcxx/include/__hash_table:1473: DB::ArrowColumnToCHColumn::arrowTableToCHChunk(DB::Chunk&, std::shared_ptr<arrow::Table>&, unsigned long, DB::BlockMissingValues*) @ 0x0000000014b440ae in /usr/bin/clickhouse
5. ./build_docker/./contrib/llvm-project/libcxx/include/__mutex_base:203: DB::ParquetBlockInputFormat::decodeOneChunk(unsigned long, std::unique_lock<std::mutex>&) @ 0x0000000014c37572 in /usr/bin/clickhouse
6. ./build_docker/./src/Processors/Formats/Impl/ParquetBlockInputFormat.cpp:212: void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<DB::ParquetBlockInputFormat::scheduleRowGroup(unsigned long)::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x0000000014c3ac65 in /usr/bin/clickhouse
7. ./build_docker/./base/base/../base/wide_integer_impl.h:796: ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::worker(std::__list_iterator<ThreadFromGlobalPoolImpl<false>, void*>) @ 0x000000000e21cd07 in /usr/bin/clickhouse
8. ./build_docker/./src/Common/ThreadPool.cpp:0: void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<void ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::scheduleImpl<void>(std::function<void ()>, long, std::optional<unsigned long>, bool)::'lambda0'()>(void&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000000e21f535 in /usr/bin/clickhouse
9. ./build_docker/./base/base/../base/wide_integer_impl.h:796: ThreadPoolImpl<std::thread>::worker(std::__list_iterator<std::thread, void*>) @ 0x000000000e218b34 in /usr/bin/clickhouse
10. ./build_docker/./contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:302: void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, long, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000e21e3a1 in /usr/bin/clickhouse
11. ? @ 0x00007f327f694b43 in ?
12. ? @ 0x00007f327f726a00 in ?

Similar #39249

Maybe we should introduce the setting? (throw / silent overflow / use maximum or minimum from the supported range)

@jonashaag
Copy link

Same problem here.

@zvonand
Copy link
Contributor

zvonand commented Oct 27, 2023

This behavior left as default (to avoid breaking someone's working workflow)
In #55696 added a setting that also controls behavior in this place

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-formats Input / output formats feature
Projects
None yet
4 participants