INSERT/executemany fails #50

mschmill · 2016-05-27T19:49:26Z

I am trying to use pandas to insert a batch of data to a Hive table and it bombs after the first insert.
PyHive seems to try to get a result set after each insert and does not get one, breaking the executemany:

File "/usr/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 1160, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/usr/anaconda2/lib/python2.7/site-packages/pandas/io/sql.py", line 571, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/usr/anaconda2/lib/python2.7/site-packages/pandas/io/sql.py", line 1250, in to_sql
    table.insert(chunksize)
  File "/usr/anaconda2/lib/python2.7/site-packages/pandas/io/sql.py", line 770, in insert
    self._execute_insert(conn, keys, chunk_iter)
  File "/usr/anaconda2/lib/python2.7/site-packages/pandas/io/sql.py", line 745, in _execute_insert
    conn.execute(self.insert_statement(), data)
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1116, in _execute_context
    context)
  File "/usr/anaconda2/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 447, in do_executemany
    cursor.executemany(statement, parameters)
  File "/usr/anaconda2/lib/python2.7/site-packages/pyhive/common.py", line 84, in executemany
    self._fetch_more()
  File "/usr/anaconda2/lib/python2.7/site-packages/pyhive/hive.py", line 228, in _fetch_more
    raise ProgrammingError("No result set")
sqlalchemy.exc.ProgrammingError: (pyhive.exc.ProgrammingError) No result set [SQL: u'INSERT INTO TABLE

But there is another issue, which is that this is not performant at all in the way the batch insert is generated. It generates a separate insert per row, which causes Hive to create a MR job for each row. Is there a better way to handle a batch insert like this?

The text was updated successfully, but these errors were encountered:

jingw · 2016-05-31T20:54:21Z

Yeah this is a bug, though I'm more inclined to just delete executemany since it's pointlessly inefficient as you mention.

You can insert a batch of rows using this syntax:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL

It would be neat to handle that automatically, but it can't be done cleanly in executemany because that would require parsing and rewriting the input query. Maybe there's some SQLAlchemy flag to generate a single INSERT ... VALUES statement instead of using executemany. I couldn't find anything obvious from Googling that.

aa3222119 · 2019-05-09T08:34:44Z

i have the same problem too on python3.7 about executemany fails

/usr/local/python37/lib/python3.7/site-packages/sqlalchemy/engine/default.py in do_executemany(self, cursor, statement, parameters, context)
531
532 def do_executemany(self, cursor, statement, parameters, context=None):
-- 533 cursor.executemany(statement, parameters)
534
535 def do_execute(self, cursor, statement, parameters, context=None):

/usr/local/python37/lib/python3.7/site-packages/pyhive/common.py in executemany(self, operation, seq_of_parameters)
88 self.execute(operation, parameters)
89 while self._state != self._STATE_FINISHED:
--- 90 self._fetch_more()
91 if seq_of_parameters:
92 self.execute(operation, seq_of_parameters[-1])

/usr/local/python37/lib/python3.7/site-packages/pyhive/hive.py in _fetch_more(self)
378 assert(self._operationHandle is not None), "Should have an op handle in _fetch_more"
379 if not self._operationHandle.hasResultSet:
-- 380 raise ProgrammingError("No result set")
381 req = ttypes.TFetchResultsReq(
382 operationHandle=self._operationHandle,

ProgrammingError: (pyhive.exc.ProgrammingError) No result set [SQL: 'INSERT INTO TABLE bi_middles.tem_st_regions_by_kh VALUES (%(stncode)s, %(stnname1)s, %(region1)s, %(sm_num1)s)'] [parameters: ({'..........})] (Background on this error at: http://sqlalche.me/e/f405)

finally i got one row on this hive table

morganics · 2019-11-06T20:05:06Z

See #250 for a monkey patch, which works, but is hopelessly inefficient, as mentioned.

You can also generate a multiline SQL statement and execute using the cursor, which is what I ended up doing (but couldn't find a way to parameterise the string). Finally, probably the best option, is to write your CSV in to HDFS and use the CSV batch import in Hive.

…ve table

wilberh · 2020-07-01T16:22:14Z

Any fix / workaround for Hive on inserting a batch of data?
Other than uploading to HDFS or s3 bucket, or monkey patching the monkey patch pyhive.hive.Cursor's _fetch_more() method.

wilberh · 2020-07-02T02:57:44Z

Here's a workaround -
Chunking csv files using panda's dataframe.

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-chunking
https://medium.com/towards-artificial-intelligence/efficient-pandas-using-chunksize-for-large-data-sets-c66bf3037f93
#55

gordthompson · 2020-07-17T22:07:35Z

Another workaround using to_sql's method="multi" option is described here:

#250 (comment)

zhujinqiu · 2024-03-04T16:14:10Z

Another workaround using to_sql's method="multi" option is described here:

#250 (comment)

It works!Thanks!

jingw added the bug label Mar 17, 2017

afeld mentioned this issue Mar 5, 2019

test with Hive XDgov/sas2db#13

Open

weiguoz mentioned this issue May 12, 2019

Support run extendedSQL for hive sql-machine-learning/sqlflow#319

Merged

taogeYT added a commit to taogeYT/PyHive that referenced this issue May 14, 2020

Fix dropbox#50 executemany failed when insert a batch of data to a Hi…

333d0d1

…ve table

taogeYT added a commit to taogeYT/PyHive that referenced this issue May 15, 2020

Fix dropbox#50 executemany failed bug, add unit tests as well

7eaa0d6

wilberh mentioned this issue Jul 1, 2020

Hive - Bulk Inserts bug #343

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSERT/executemany fails #50

INSERT/executemany fails #50

mschmill commented May 27, 2016 •

edited by jingw

jingw commented May 31, 2016

aa3222119 commented May 9, 2019 •

edited

morganics commented Nov 6, 2019 •

edited

wilberh commented Jul 1, 2020

wilberh commented Jul 2, 2020

gordthompson commented Jul 17, 2020

zhujinqiu commented Mar 4, 2024

INSERT/executemany fails #50

INSERT/executemany fails #50

Comments

mschmill commented May 27, 2016 • edited by jingw

jingw commented May 31, 2016

aa3222119 commented May 9, 2019 • edited

morganics commented Nov 6, 2019 • edited

wilberh commented Jul 1, 2020

wilberh commented Jul 2, 2020

gordthompson commented Jul 17, 2020

zhujinqiu commented Mar 4, 2024

mschmill commented May 27, 2016 •

edited by jingw

aa3222119 commented May 9, 2019 •

edited

morganics commented Nov 6, 2019 •

edited