Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inserting NULL values into Hive #55

Closed
PedroMDuarte opened this issue Jun 6, 2016 · 2 comments
Closed

Inserting NULL values into Hive #55

PedroMDuarte opened this issue Jun 6, 2016 · 2 comments

Comments

@PedroMDuarte
Copy link

Hi, thanks for making this software available to all. I am reaching out to see if I can get help with an issue I am having. I am trying to upload a pandas dataframe to Hive, but I run into a problem when the dataframe has None values.

from sqlalchemy import Column, Table, MetaData, types
from sqlalchemy.engine import create_engine
import contextlib

import pandas as pd
df = pd.DataFrame([['a', 'b', 'c'],['d', None, 'f'],['g', 'h', 'i']], 
                  columns=['col1', 'col2', 'col3'])

engine = create_engine('hive://user@host:10000/default')
try:
    with contextlib.closing(engine.connect()) as connection:

        cols = []
        for name, dtype in df.dtypes.iteritems():
            cols.append(Column(name, getattr(sqlalchemy.types, 'String')))

        table = Table('test_table', MetaData(bind=engine), *cols, schema='default')
        table.drop(checkfirst=True)
        table.create()

        ins = table.insert(df.to_dict('records'))
        connection.execute(ins)

        result = table.select().execute().fetchall()
        print result
finally:
    engine.dispose()

The code above results in the following error:

ProgrammingError: (pyhive.exc.ProgrammingError) Unsupported object None [SQL: u'INSERT INTO TABLE `default`.`test_table` VALUES (%(col1_0)s, %(col2_0)s, %(col3_0)s), (%(col1_1)s, %(col2_1)s, %(col3_1)s), (%(col1_2)s, %(col2_2)s, %(col3_2)s)'] [parameters: {u'col2_2': 'h', u'col2_1': None, u'col2_0': 'b', u'col1_0': 'a', u'col1_1': 'd', u'col1_2': 'g', u'col3_2': 'i', u'col3_0': 'c', u'col3_1': 'f'}]

Any help would be greatly appreciated. Thanks!

@PedroMDuarte
Copy link
Author

I just found a simple fix to this issue:

def escape_item(self, item):
    if isinstance(item, (int, float)):
        return self.escape_number(item)
    elif isinstance(item, basestring):
        return self.escape_string(item)
    elif item is None:
        return 'NULL'
    else:
        raise exc.ProgrammingError("Unsupported object {}".format(item))

import pyhive
pyhive.common.ParamEscaper.escape_item = escape_item

Please let me know if this would be an acceptable fix and I can contribute a pull request.

Thanks,
Pedro

@jingw jingw closed this as completed in 276e272 Jun 8, 2016
@jingw
Copy link
Contributor

jingw commented Jun 8, 2016

Thanks for the report! This is fixed.

fbertsch pushed a commit to fbertsch/PyHive that referenced this issue Oct 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants