Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INSERT/executemany fails #50

Open
mschmill opened this issue May 27, 2016 · 7 comments
Open

INSERT/executemany fails #50

mschmill opened this issue May 27, 2016 · 7 comments
Labels

Comments

@mschmill
Copy link

mschmill commented May 27, 2016

I am trying to use pandas to insert a batch of data to a Hive table and it bombs after the first insert.
PyHive seems to try to get a result set after each insert and does not get one, breaking the executemany:

File "/usr/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 1160, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/usr/anaconda2/lib/python2.7/site-packages/pandas/io/sql.py", line 571, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/usr/anaconda2/lib/python2.7/site-packages/pandas/io/sql.py", line 1250, in to_sql
    table.insert(chunksize)
  File "/usr/anaconda2/lib/python2.7/site-packages/pandas/io/sql.py", line 770, in insert
    self._execute_insert(conn, keys, chunk_iter)
  File "/usr/anaconda2/lib/python2.7/site-packages/pandas/io/sql.py", line 745, in _execute_insert
    conn.execute(self.insert_statement(), data)
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1116, in _execute_context
    context)
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 447, in do_executemany
    cursor.executemany(statement, parameters)
  File "/usr/anaconda2/lib/python2.7/site-packages/pyhive/common.py", line 84, in executemany
    self._fetch_more()
  File "/usr/anaconda2/lib/python2.7/site-packages/pyhive/hive.py", line 228, in _fetch_more
    raise ProgrammingError("No result set")
sqlalchemy.exc.ProgrammingError: (pyhive.exc.ProgrammingError) No result set [SQL: u'INSERT INTO TABLE

But there is another issue, which is that this is not performant at all in the way the batch insert is generated. It generates a separate insert per row, which causes Hive to create a MR job for each row. Is there a better way to handle a batch insert like this?

@jingw
Copy link
Contributor

jingw commented May 31, 2016

Yeah this is a bug, though I'm more inclined to just delete executemany since it's pointlessly inefficient as you mention.

You can insert a batch of rows using this syntax:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL

It would be neat to handle that automatically, but it can't be done cleanly in executemany because that would require parsing and rewriting the input query. Maybe there's some SQLAlchemy flag to generate a single INSERT ... VALUES statement instead of using executemany. I couldn't find anything obvious from Googling that.

@aa3222119
Copy link

aa3222119 commented May 9, 2019

i have the same problem too on python3.7 about executemany fails

/usr/local/python37/lib/python3.7/site-packages/sqlalchemy/engine/default.py in do_executemany(self, cursor, statement, parameters, context)
531
532 def do_executemany(self, cursor, statement, parameters, context=None):
-- 533 cursor.executemany(statement, parameters)
534
535 def do_execute(self, cursor, statement, parameters, context=None):

/usr/local/python37/lib/python3.7/site-packages/pyhive/common.py in executemany(self, operation, seq_of_parameters)
88 self.execute(operation, parameters)
89 while self._state != self._STATE_FINISHED:
--- 90 self._fetch_more()
91 if seq_of_parameters:
92 self.execute(operation, seq_of_parameters[-1])

/usr/local/python37/lib/python3.7/site-packages/pyhive/hive.py in _fetch_more(self)
378 assert(self._operationHandle is not None), "Should have an op handle in _fetch_more"
379 if not self._operationHandle.hasResultSet:
-- 380 raise ProgrammingError("No result set")
381 req = ttypes.TFetchResultsReq(
382 operationHandle=self._operationHandle,

ProgrammingError: (pyhive.exc.ProgrammingError) No result set [SQL: 'INSERT INTO TABLE bi_middles.tem_st_regions_by_kh VALUES (%(stncode)s, %(stnname1)s, %(region1)s, %(sm_num1)s)'] [parameters: ({'..........})] (Background on this error at: http://sqlalche.me/e/f405)

finally i got one row on this hive table

@morganics
Copy link

morganics commented Nov 6, 2019

See #250 for a monkey patch, which works, but is hopelessly inefficient, as mentioned.

You can also generate a multiline SQL statement and execute using the cursor, which is what I ended up doing (but couldn't find a way to parameterise the string). Finally, probably the best option, is to write your CSV in to HDFS and use the CSV batch import in Hive.

taogeYT added a commit to taogeYT/PyHive that referenced this issue May 14, 2020
taogeYT added a commit to taogeYT/PyHive that referenced this issue May 15, 2020
@wilberh
Copy link

wilberh commented Jul 1, 2020

Any fix / workaround for Hive on inserting a batch of data?
Other than uploading to HDFS or s3 bucket, or monkey patching the monkey patch pyhive.hive.Cursor's _fetch_more() method.

@gordthompson
Copy link

Another workaround using to_sql's method="multi" option is described here:

#250 (comment)

@zhujinqiu
Copy link

Another workaround using to_sql's method="multi" option is described here:

#250 (comment)

It works!Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants