Skip to content

UnicodeDecodeError with wr.athena.read_sql_query #201

@pmleveque

Description

@pmleveque

Describe the bug

Issue with wr.athena.read_sql_query: fails reading a "SELECT * FROM table", but using wr.s3.read_parquet_table on this same table works fine (see screenshot).

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd7 in position 3: invalid continuation byte

image

The table was originally created using wr.s3.to_parquet

wr.s3.to_parquet(
    df=df,
    path='s3://xxxxxxxxx/external/ph3a/sample',
    dataset=True,
    database='default',  # Athena/Glue database
    table='ph3a_sample',  # Athena/Glue table
    dtype={c: "string" for c in null_columns},
    mode='overwrite'
)

To Reproduce
Not sure. I can't share the file I'm using because of PII. I will try to reproduce with random data.

I run this code using this Jupyter Docker image https://hub.docker.com/r/jupyter/pyspark-notebook/

! python -V
Python 3.7.6

! pip freeze
alembic==1.4.2
async-generator==1.10
athenacli==1.2.0
attrs==19.3.0
awscli==1.18.44
awswrangler==1.0.4
backcall==0.1.0
beautifulsoup4==4.8.2
bleach==3.1.4
blinker==1.4
bokeh==2.0.1
boto3==1.12.44
botocore==1.15.44
brotlipy==0.7.0
certifi==2020.4.5.1
certipy==0.1.3
cffi==1.14.0
chardet==3.0.4
cli-helpers==1.2.1
click==7.1.1
cloudpickle==1.3.0
colorama==0.4.3
conda==4.8.2
conda-package-handling==1.6.0
configobj==5.0.6
cryptography==2.8
cycler==0.10.0
Cython==0.29.16
cytoolz==0.10.1
dask==2.14.0
decorator==4.4.2
defusedxml==0.6.0
dill==0.3.1.1
distributed==2.14.0
docutils==0.15.2
entrypoints==0.3
fastcache==1.1.0
fsspec==0.7.2
future==0.18.2
gmpy2==2.1.0b1
h5py==2.10.0
HeapDict==1.0.1
idna==2.9
imageio==2.8.0
importlib-metadata==1.6.0
ipykernel==5.2.0
ipympl==0.5.6
ipython==7.13.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.17.0
Jinja2==2.11.2
jmespath==0.9.5
joblib==0.14.1
json5==0.9.0
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
jupyter-telemetry==0.0.5
jupyterhub==1.1.0
jupyterlab==2.0.1
jupyterlab-server==1.1.1
jupyters3==0.0.43
kiwisolver==1.2.0
llvmlite==0.31.0
locket==0.2.0
Mako==1.1.0
MarkupSafe==1.1.1
matplotlib==3.2.1
mistune==0.8.4
mpmath==1.1.0
msgpack==1.0.0
nbconvert==5.6.1
nbformat==5.0.6
networkx==2.4
notebook==6.0.3
numba==0.48.0
numexpr==2.7.1
numpy==1.18.1
oauthlib==3.0.1
olefile==0.46
packaging==20.1
pamela==1.0.0
pandas==1.0.3
pandocfilters==1.4.2
parso==0.7.0
partd==1.1.0
patsy==0.5.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.1.1
prometheus-client==0.7.1
prompt-toolkit==2.0.10
protobuf==3.11.4
psutil==5.7.0
psycopg2-binary==2.8.5
ptyprocess==0.6.0
pyarrow==0.16.0
pyasn1==0.4.8
PyAthena==1.10.4
pycosat==0.6.3
pycparser==2.20
pycurl==7.43.0.5
Pygments==2.6.1
PyJWT==1.7.1
PyMySQL==0.9.3
pyOpenSSL==19.1.0
pyparsing==2.4.7
pyrsistent==0.16.0
PySocks==1.7.1
pyspark==2.4.5
python-dateutil==2.8.1
python-dotenv==0.13.0
python-editor==1.0.4
python-json-logger==0.1.11
pytz==2019.3
PyWavelets==1.1.1
PyYAML==5.3.1
pyzmq==19.0.0
requests==2.23.0
rsa==3.4.2
ruamel-yaml==0.15.80
ruamel.yaml.clib==0.2.0
s3fs==0.4.2
s3transfer==0.3.3
scikit-image==0.16.2
scikit-learn==0.22.2.post1
scipy==1.4.1
seaborn==0.10.0
Send2Trash==1.5.0
six==1.14.0
sortedcontainers==2.1.0
soupsieve==1.9.4
SQLAlchemy==1.3.13
sqlalchemy-redshift==0.7.7
sqlparse==0.3.1
statsmodels==0.11.1
sympy==1.5.1
tabulate==0.8.7
tblib==1.6.0
tenacity==6.1.0
terminado==0.8.3
terminaltables==3.1.0
testpath==0.4.4
toolz==0.10.0
tornado==6.0.4
tqdm==4.45.0
traitlets==4.3.3
typing-extensions==3.7.4.1
urllib3==1.25.9
vincent==0.4.4
wcwidth==0.1.9
webencodings==0.5.1
widgetsnbextension==3.5.1
xlrd==1.2.0
zict==2.0.0
zipp==3.1.0

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestminor releaseWill be addressed in the next minor release

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions