Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure on cursor.fetchall() #380

Open
jcortes-tealium opened this issue Mar 28, 2024 · 2 comments
Open

Failure on cursor.fetchall() #380

jcortes-tealium opened this issue Mar 28, 2024 · 2 comments

Comments

@jcortes-tealium
Copy link

Recently started having issues with the fetchall() method. This exact code was working fine last week, but now this same query statement is throwing the errors seen below.

def databricks_sql_count(column, catalog, schema, table, where=""):
    connection_cursor = open_connection()
    cursor = connection_cursor["cursor"]
    connection = connection_cursor["connection"]
    query = f'SELECT COUNT({column}) FROM `{catalog}`.`{schema}`.`{table}` WHERE  {where};'
    cursor.execute(query)
    response = cursor.fetchall()
    close_connection(cursor, connection)
    return response[0][0]

initial_count = databricks_sql_count('visitor_id',
                                               catalog,
                                               schema,
                                               table,
                                               f"created_at >= '{TEST_START_DATE}'")

integration_test/utils/databricks/databricks_sql.py:89: in databricks_sql_count
response = cursor.fetchall()
/usr/local/lib/python3.10/dist-packages/databricks/sql/client.py:670: in fetchall
return self.active_result_set.fetchall()
/usr/local/lib/python3.10/dist-packages/databricks/sql/client.py:944: in fetchall
return self._convert_arrow_table(self.fetchall_arrow())
/usr/local/lib/python3.10/dist-packages/databricks/sql/client.py:884: in _convert_arrow_table
res = df.to_numpy(na_value=None)
/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py:1981: in to_numpy
result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)


self = BlockManager
Items: Index(['0'], dtype='object')
Axis 1: RangeIndex(start=0, stop=1, step=1)
ExtensionBlock: slice(0, 1, 1), 1 x 1, dtype: Int64
dtype = None, copy = True, na_value = None

  def as_array(
      self,
      dtype: np.dtype | None = None,
      copy: bool = False,
      na_value: object = lib.no_default,
  ) -> np.ndarray:
      """
      Convert the blockmanager data into an numpy array.
  
      Parameters
      ----------
      dtype : np.dtype or None, default None
          Data type of the return array.
      copy : bool, default False
          If True then guarantee that a copy is returned. A value of
          False does not guarantee that the underlying data is not
          copied.
      na_value : object, default lib.no_default
          Value to be used as the missing value sentinel.
  
      Returns
      -------
      arr : ndarray
      """
      passed_nan = lib.is_float(na_value) and isna(na_value)
  
      if len(self.blocks) == 0:
          arr = np.empty(self.shape, dtype=float)
          return arr.transpose()
  
      if self.is_single_block:
          blk = self.blocks[0]
  
          if na_value is not lib.no_default:
              # We want to copy when na_value is provided to avoid
              # mutating the original object
              if lib.is_np_dtype(blk.dtype, "f") and passed_nan:
                  # We are already numpy-float and na_value=np.nan
                  pass
              else:
                  copy = True
  
          if blk.is_extension:
              # Avoid implicit conversion of extension blocks to object
  
              # error: Item "ndarray" of "Union[ndarray, ExtensionArray]" has no
              # attribute "to_numpy"
              arr = blk.values.to_numpy(  # type: ignore[union-attr]
                  dtype=dtype,
                  na_value=na_value,
                  copy=copy,
              ).reshape(blk.shape)
          else:
              arr = np.array(blk.values, dtype=dtype, copy=copy)
  
          if using_copy_on_write() and not copy:
              arr = arr.view()
              arr.flags.writeable = False
      else:
          arr = self._interleave(dtype=dtype, na_value=na_value)
          # The underlying data was copied within _interleave, so no need
          # to further copy if copy=True or setting na_value
  
      if na_value is lib.no_default:
          pass
      elif arr.dtype.kind == "f" and passed_nan:
          pass
      else:
          arr[isna(arr)] = na_value

E TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

/usr/local/lib/python3.10/dist-packages/pandas/core/internals/managers.py:1701: TypeError

@jcortes-tealium
Copy link
Author

On version 2.9.3

Copy link
Collaborator

benc-db commented Mar 28, 2024

What version of pandas do you have installed? We've seen this issue with pandas 2.2.0+, so in the latest versions of this library, we have pinned pandas to be < than that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants