Skip to content

IncrementalCursorInvalidCoercion when trying to specify an initial value in incremental column #2460

@trymzet

Description

@trymzet

dlt version

1.6.1

Describe the problem

Specifying initial_value param in dlt.sources.incremental() (as described eg. here https://dlthub.com/docs/general-usage/incremental-loading#incremental-loading-with-a-cursor-field) results in an error. Maybe this has something to do with using the pyarrow backend?

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/dlt/extract/incremental/transform.py", line 408, in __call__
    start_value_scalar = to_arrow_scalar(self.start_value, cursor_data_type)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dlt/common/libs/pyarrow.py", line 525, in to_arrow_scalar
    return pyarrow.scalar(value, type=arrow_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/scalar.pxi", line 1212, in pyarrow.lib.scalar
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: object of type <class 'str'> cannot be converted to int

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/dlt/pipeline/pipeline.py", line 470, in extract
    self._extract_source(
  File "/usr/local/lib/python3.12/site-packages/dlt/pipeline/pipeline.py", line 1238, in _extract_source
    load_id = extract.extract(
              ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dlt/extract/extract.py", line 435, in extract
    self._extract_single_source(
  File "/usr/local/lib/python3.12/site-packages/dlt/extract/extract.py", line 358, in _extract_single_source
    for pipe_item in pipes:
                     ^^^^^
  File "/usr/local/lib/python3.12/site-packages/dlt/extract/pipe_iterator.py", line 228, in __next__
    next_item = step(item, meta=pipe_item.meta)  # type: ignore
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dlt/extract/incremental/__init__.py", line 788, in __call__
    return self._incremental(item, meta)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dlt/extract/incremental/__init__.py", line 561, in __call__
    rows = self._transform_item(transformer, rows)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dlt/extract/incremental/__init__.py", line 387, in _transform_item
    row, self.start_out_of_range, self.end_out_of_range = transformer(row)
                                                          ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dlt/extract/incremental/transform.py", line 410, in __call__
    raise IncrementalCursorInvalidCoercion(
dlt.extract.incremental.exceptions.IncrementalCursorInvalidCoercion: In processing pipe Transaction: Could not coerce start_value/initial_value with value 2025-03-25 and type <class 'str'> to actual data item <arrow column> at path ModifiedDateUTC with type timestamp[us, tz=UTC]: object of type <class 'str'> cannot be converted to int. You need to use different data type for start_value/initial_value or cast your data ie. by using `add_map` on this resource.

Expected behavior

No response

Steps to reproduce

  • use pyarrow backend
  • pass initial_value to an incremental column (eg. "2025-03-25T00:00:00Z")

Operating system

Linux

Runtime environment

Local

Python version

3.12

dlt data source

No response

dlt destination

No response

Other deployment details

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions