You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to read a subset of my parquet files using the ParquetDataset object with a predefined schema, when it tries to validate the schema a to_arrow_schema is called and the schema does not support this. I don't what is happening, this is a sample.
If we check the type of the schema as defined above we get:
type(schema)
pyarrow.lib.Schema
But the required type according with the docs is pyarrow.parquet.Schema, I don't know how to produce a object with this since we are forbbiden to use the Schema constructor directly.
If we check the implementation on github we get directly this line here:
dataset_schema = self.schema.to_arrow_schema()
Is this a problem in the schema builder or the parquet dataset object?
Wes McKinney / @wesm:
This isn't a bug. The intention of the schema parameter is to pass a Parquet schema object obtained from the metadata of a particular file
Krisztian Szucs / @kszucs:
It is not a regression and the dostring indicates that a ParquetSchema must be passed, so I wouldn't consider it as a blocker.
I was trying to read a subset of my parquet files using the ParquetDataset object with a predefined schema, when it tries to validate the schema a
to_arrow_schema
is called and the schema does not support this. I don't what is happening, this is a sample.If we check the type of the schema as defined above we get:
But the required type according with the docs is
pyarrow.parquet.Schema
, I don't know how to produce a object with this since we are forbbiden to use the Schema constructor directly.If we check the implementation on github we get directly this line here:
Is this a problem in the schema builder or the parquet dataset object?
Environment: _libgcc_mutex 0.1 main
arrow-cpp 0.15.1 py37h982ac2c_6 conda-forge
attrs 19.3.0 py_0 conda-forge
backcall 0.1.0 py_0 conda-forge
bleach 3.1.0 py_0 conda-forge
boost-cpp 1.70.0 h8e57a91_2 conda-forge
brotli 1.0.7 he1b5a44_1000 conda-forge
bzip2 1.0.8 h516909a_2 conda-forge
c-ares 1.15.0 h516909a_1001 conda-forge
ca-certificates 2019.11.28 hecc5488_0 conda-forge
certifi 2019.11.28 py37_0 conda-forge
decorator 4.4.1 py_0 conda-forge
defusedxml 0.6.0 py_0 conda-forge
double-conversion 3.1.5 he1b5a44_2 conda-forge
entrypoints 0.3 py37_1000 conda-forge
gflags 2.2.2 he1b5a44_1002 conda-forge
glog 0.4.0 he1b5a44_1 conda-forge
grpc-cpp 1.25.0 h213be95_2 conda-forge
icu 64.2 he1b5a44_1 conda-forge
importlib_metadata 1.4.0 py37_0 conda-forge
inflect 4.0.0 py37_1 conda-forge
ipykernel 5.1.4 py37h5ca1d4c_0 conda-forge
ipython 7.11.1 py37h5ca1d4c_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
jaraco.itertools 5.0.0 py_0 conda-forge
jedi 0.16.0 py37_0 conda-forge
jinja2 2.10.3 py_0 conda-forge
jsonschema 3.2.0 py37_0 conda-forge
jupyter_client 5.3.4 py37_1 conda-forge
jupyter_core 4.6.1 py37_0 conda-forge
ld_impl_linux-64 2.33.1 h53a641e_7
libblas 3.8.0 14_openblas conda-forge
libcblas 3.8.0 14_openblas conda-forge
libedit 3.1.20181209 hc058e9b_0
libevent 2.1.10 h72c5cf5_0 conda-forge
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_4 conda-forge
liblapack 3.8.0 14_openblas conda-forge
libopenblas 0.3.7 h5ec1e0e_6 conda-forge
libprotobuf 3.11.0 h8b12597_0 conda-forge
libsodium 1.0.17 h516909a_0 conda-forge
libstdcxx-ng 9.1.0 hdf63c60_0
lz4-c 1.8.3 he1b5a44_1001 conda-forge
markupsafe 1.1.1 py37h516909a_0 conda-forge
mistune 0.8.4 py37h516909a_1000 conda-forge
more-itertools 8.1.0 py_0 conda-forge
nbconvert 5.6.1 py37_0 conda-forge
nbformat 5.0.4 py_0 conda-forge
ncurses 6.1 he6710b0_1
notebook 6.0.3 py37_0 conda-forge
numpy 1.17.5 py37h95a1406_0 conda-forge
openssl 1.1.1d h516909a_0 conda-forge
pandas 0.25.3 py37hb3f55d8_0 conda-forge
pandoc 2.9.1.1 0 conda-forge
pandocfilters 1.4.2 py_1 conda-forge
parquet-cpp 1.5.1 2 conda-forge
parso 0.6.0 py_0 conda-forge
pexpect 4.8.0 py37_0 conda-forge
pickleshare 0.7.5 py37_1000 conda-forge
pip 20.0.2 py37_0
prometheus_client 0.7.1 py_0 conda-forge
prompt_toolkit 3.0.2 py_0 conda-forge
ptyprocess 0.6.0 py_1001 conda-forge
pyarrow 0.15.1 py37h8b68381_1 conda-forge
pygments 2.5.2 py_0 conda-forge
pyrsistent 0.15.7 py37h516909a_0 conda-forge
python 3.7.6 h0371630_2
python-dateutil 2.8.1 py_0 conda-forge
pytz 2019.3 py_0 conda-forge
pyzmq 18.1.1 py37h1768529_0 conda-forge
re2 2020.01.01 he1b5a44_0 conda-forge
readline 7.0 h7b6447c_5
send2trash 1.5.0 py_0 conda-forge
setuptools 45.1.0 py37_0
six 1.14.0 py37_0 conda-forge
snappy 1.1.7 he1b5a44_1003 conda-forge
sqlite 3.30.1 h7b6447c_0
terminado 0.8.3 py37_0 conda-forge
testpath 0.4.4 py_0 conda-forge
thrift-cpp 0.12.0 hf3afdfd_1004 conda-forge
tk 8.6.8 hbc83047_0
tornado 6.0.3 py37h516909a_0 conda-forge
traitlets 4.3.3 py37_0 conda-forge
uriparser 0.9.3 he1b5a44_1 conda-forge
wcwidth 0.1.8 py_0 conda-forge
webencodings 0.5.1 py_1 conda-forge
wheel 0.33.6 py37_0
xz 5.2.4 h14c3975_4
zeromq 4.3.2 he1b5a44_2 conda-forge
zipp 2.1.0 py_0 conda-forge
zlib 1.2.11 h7b6447c_3
zstd 1.4.4 h3b9ef0a_1 conda-forge
Reporter: Otávio Vasques
Note: This issue was originally created as ARROW-7727. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: