Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error when reading table with hive cursor that does not happen with hdfs #1563

Open
lucharo opened this issue Jun 8, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@lucharo
Copy link

lucharo commented Jun 8, 2021

Describe the bug
I get the following error when creating a table with a pyhive cursor:

from blazingsql import BlazingContext
from pyhive import hive

cursor = hive.Connection(
        host="{hive_edge_node_url}",
        username = getuser(),
        auth='KERBEROS',
        kerberos_service_name="hive",
        configuration = {'hive.execution.engine': "tez", 'tez.queue.name': "group1"}
    ).cursror()

bc = BlazingContext()

bc.create_table('bliblu',
                cursor, 
                hive_table_name = 'transuk2m2019_mini',
                hive_database_name = 'chavesrl')

Error:

ERROR: Could not get partition values for file: hdfs://anahnn/visa/user/chavesrl/chavesrl.db/transuk2m2019_mini/000000_0
ERROR: Could not get partition values for file: hdfs://anahnn/visa/user/chavesrl/chavesrl.db/transuk2m2019_mini/000001_0
ERROR: Could not get partition values for file: hdfs://anahnn/visa/user/chavesrl/chavesrl.db/transuk2m2019_mini/000002_0
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-edf72ca4fd46> in <module>
      4     hive_table_name = 'transuk2m2019_mini',
      5     hive_database_name = 'chavesrl',
----> 6     file_format = 'parquet'
      7 )

/projects/gds/chavesrl/condapv/envs/visaverse-gpu/lib/python3.7/site-packages/pyblazing/apiv2/context.py in create_table(self, table_name, input, **kwargs)
   2458             ):
   2459                 parsedMetadata = self._parseMetadata(
-> 2460                     file_format_hint, table.slices, parsedSchema, kwargs
   2461                 )
   2462 

/projects/gds/chavesrl/condapv/envs/visaverse-gpu/lib/python3.7/site-packages/pyblazing/apiv2/context.py in _parseMetadata(self, file_format_hint, currentTableNodes, schema, kwargs)
   2714         schema["names"] = [i.encode() for i in schema["names"]]
   2715         if "names" in kwargs:
-> 2716             kwargs["names"] = [i.encode() for i in kwargs["names"]]
   2717 
   2718         if self.dask_client:

/projects/gds/chavesrl/condapv/envs/visaverse-gpu/lib/python3.7/site-packages/pyblazing/apiv2/context.py in <listcomp>(.0)
   2714         schema["names"] = [i.encode() for i in schema["names"]]
   2715         if "names" in kwargs:
-> 2716             kwargs["names"] = [i.encode() for i in kwargs["names"]]
   2717 
   2718         if self.dask_client:

AttributeError: 'bytes' object has no attribute 'encode'

The table I am trying to read is parquet but specifying that does not helo either, the problem I've found enabling the debugger is that i.encode() is trying to encode i which is already a byte-string.

Expected behavior
Column names being read properly. maybe pyblazing detecting the strings are already encoded

Environment overview (please complete the following information)

  • Environment location: Bare metal
  • Method of BlazingSQL install: conda
  • BlazingSQL Version which can be obtained by doing as follows:
    import blazingsql
    print(blazingsql.__info__())
    
BlazingSQL version (git hash): ff4ece0366a4d76bf533baeb03dd03bdfc5232be
BlazingSQL branch name: HEAD
BlazingSQL branch tag: v0.19.0
BlazingSQL build id: 0
BlazingSQL compiler version: GNU /usr/bin/c++ 7.5.0
BlazingSQL cuda flags: -Xcompiler -Wno-parentheses -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 --expt-extended-lambda --expt-relaxed-constexpr -Werror=cross-execution-space-call -Xcompiler -Wall,-Wno-error=deprecated-declarations --default-stream=per-thread -DHT_DEFAULT_ALLOCATOR
BlazingSQL Operating system kernel: Linux-5.4.0-1038-aws
BlazingSQL Operating system architecture: x86_64
BlazingSQL Linux Operating system release: NAME=Ubuntu|VERSION=16.04.7 LTS (Xenial Xerus)|ID=ubuntu|ID_LIKE=debian|PRETTY_NAME=Ubuntu 16.04.7 LTS|VERSION_ID=16.04|HOME_URL=http://www.ubuntu.com/|SUPPORT_URL=http://help.ubuntu.com/|BUG_REPORT_URL=http://bugs.launchpad.net/ubuntu/|VERSION_CODENAME=xenial|UBUNTU_CODENAME=xenial
None

Environment details
Please run and paste the output of the print_env.sh script here, to gather any other relevant environment details

Additional context
Add any other context about the problem here.

----For BlazingSQL Developers----
Suspected source of the issue
Where and what are potential sources of the issue

Other design considerations
What components of the engine could be affected by this?

@lucharo lucharo added ? - Needs Triage needs team to review and classify bug Something isn't working labels Jun 8, 2021
@wmalpica wmalpica removed the ? - Needs Triage needs team to review and classify label Jun 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants