-
Notifications
You must be signed in to change notification settings - Fork 722
Closed
Labels
enhancementNew feature or requestNew feature or requestminor releaseWill be addressed in the next minor releaseWill be addressed in the next minor release
Milestone
Description
Describe the bug
Issue with wr.athena.read_sql_query: fails reading a "SELECT * FROM table", but using wr.s3.read_parquet_table on this same table works fine (see screenshot).
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd7 in position 3: invalid continuation byte
The table was originally created using wr.s3.to_parquet
wr.s3.to_parquet(
df=df,
path='s3://xxxxxxxxx/external/ph3a/sample',
dataset=True,
database='default', # Athena/Glue database
table='ph3a_sample', # Athena/Glue table
dtype={c: "string" for c in null_columns},
mode='overwrite'
)
To Reproduce
Not sure. I can't share the file I'm using because of PII. I will try to reproduce with random data.
I run this code using this Jupyter Docker image https://hub.docker.com/r/jupyter/pyspark-notebook/
! python -V
Python 3.7.6
! pip freeze
alembic==1.4.2
async-generator==1.10
athenacli==1.2.0
attrs==19.3.0
awscli==1.18.44
awswrangler==1.0.4
backcall==0.1.0
beautifulsoup4==4.8.2
bleach==3.1.4
blinker==1.4
bokeh==2.0.1
boto3==1.12.44
botocore==1.15.44
brotlipy==0.7.0
certifi==2020.4.5.1
certipy==0.1.3
cffi==1.14.0
chardet==3.0.4
cli-helpers==1.2.1
click==7.1.1
cloudpickle==1.3.0
colorama==0.4.3
conda==4.8.2
conda-package-handling==1.6.0
configobj==5.0.6
cryptography==2.8
cycler==0.10.0
Cython==0.29.16
cytoolz==0.10.1
dask==2.14.0
decorator==4.4.2
defusedxml==0.6.0
dill==0.3.1.1
distributed==2.14.0
docutils==0.15.2
entrypoints==0.3
fastcache==1.1.0
fsspec==0.7.2
future==0.18.2
gmpy2==2.1.0b1
h5py==2.10.0
HeapDict==1.0.1
idna==2.9
imageio==2.8.0
importlib-metadata==1.6.0
ipykernel==5.2.0
ipympl==0.5.6
ipython==7.13.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.17.0
Jinja2==2.11.2
jmespath==0.9.5
joblib==0.14.1
json5==0.9.0
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
jupyter-telemetry==0.0.5
jupyterhub==1.1.0
jupyterlab==2.0.1
jupyterlab-server==1.1.1
jupyters3==0.0.43
kiwisolver==1.2.0
llvmlite==0.31.0
locket==0.2.0
Mako==1.1.0
MarkupSafe==1.1.1
matplotlib==3.2.1
mistune==0.8.4
mpmath==1.1.0
msgpack==1.0.0
nbconvert==5.6.1
nbformat==5.0.6
networkx==2.4
notebook==6.0.3
numba==0.48.0
numexpr==2.7.1
numpy==1.18.1
oauthlib==3.0.1
olefile==0.46
packaging==20.1
pamela==1.0.0
pandas==1.0.3
pandocfilters==1.4.2
parso==0.7.0
partd==1.1.0
patsy==0.5.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.1.1
prometheus-client==0.7.1
prompt-toolkit==2.0.10
protobuf==3.11.4
psutil==5.7.0
psycopg2-binary==2.8.5
ptyprocess==0.6.0
pyarrow==0.16.0
pyasn1==0.4.8
PyAthena==1.10.4
pycosat==0.6.3
pycparser==2.20
pycurl==7.43.0.5
Pygments==2.6.1
PyJWT==1.7.1
PyMySQL==0.9.3
pyOpenSSL==19.1.0
pyparsing==2.4.7
pyrsistent==0.16.0
PySocks==1.7.1
pyspark==2.4.5
python-dateutil==2.8.1
python-dotenv==0.13.0
python-editor==1.0.4
python-json-logger==0.1.11
pytz==2019.3
PyWavelets==1.1.1
PyYAML==5.3.1
pyzmq==19.0.0
requests==2.23.0
rsa==3.4.2
ruamel-yaml==0.15.80
ruamel.yaml.clib==0.2.0
s3fs==0.4.2
s3transfer==0.3.3
scikit-image==0.16.2
scikit-learn==0.22.2.post1
scipy==1.4.1
seaborn==0.10.0
Send2Trash==1.5.0
six==1.14.0
sortedcontainers==2.1.0
soupsieve==1.9.4
SQLAlchemy==1.3.13
sqlalchemy-redshift==0.7.7
sqlparse==0.3.1
statsmodels==0.11.1
sympy==1.5.1
tabulate==0.8.7
tblib==1.6.0
tenacity==6.1.0
terminado==0.8.3
terminaltables==3.1.0
testpath==0.4.4
toolz==0.10.0
tornado==6.0.4
tqdm==4.45.0
traitlets==4.3.3
typing-extensions==3.7.4.1
urllib3==1.25.9
vincent==0.4.4
wcwidth==0.1.9
webencodings==0.5.1
widgetsnbextension==3.5.1
xlrd==1.2.0
zict==2.0.0
zipp==3.1.0
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestminor releaseWill be addressed in the next minor releaseWill be addressed in the next minor release
