Salesforce Stream "Account" cannot be processed by REST or BULK API #20703

koconder · 2022-12-20T08:23:07Z

Environment

Airbyte version: 0.40.25
OS Version / Instance: Kubernetes
Deployment: Kubernetes via Helm Chart
Source Connector and version: Salesforce (1.0.27)
Destination Connector and version: Amazon S3 (0.1.26)
Step where error happened: Sync Job

Current Behavior

When i connect to my Salesforce instance, i am unable to fetch metadata unless i exclude using connection filters where object equals "Account".

Error in pod logs:

Exception: Stream Account cannot be processed by REST or BULK API.
2022-12-20 08:01:18 �[1;31mERROR�[m i.a.w.i.DefaultAirbyteStreamFactory(internalLog):116 - Stream Account cannot be processed by REST or BULK API.
Traceback (most recent call last):
  File "/airbyte/integration_code/main.py", line 13, in <module>
    launch(source, sys.argv[1:])
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py", line 131, in launch
    for message in source_entrypoint.run(parsed_args):
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py", line 116, in run
    catalog = self.source.discover(self.logger, config)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 70, in discover
    streams = [stream.as_airbyte_stream() for stream in self.streams(config=config)]
  File "/airbyte/integration_code/source_salesforce/source.py", line 110, in streams
    streams = self.generate_streams(config, stream_objects, sf)
  File "/airbyte/integration_code/source_salesforce/source.py", line 95, in generate_streams
    raise Exception(f"Stream {stream_name} cannot be processed by REST or BULK API.")

Expected Behavior

Data should sync, metadata should be shown.

The text was updated successfully, but these errors were encountered:

koconder · 2022-12-20T08:26:28Z

Issue comes from:

🐛 Source SalesForce: better detect API type #16086 @grubberr @bazarnov @edgao
Source Salesforce: Deprecate API Type parameter #9302 @sergei-solonitcyn @vitaliizazmic @augan-rymkhan

tagged original reviewers and committers

poolmaster · 2023-01-03T10:40:11Z

The issue seemed to be that bulk API and REST API cannot cover tables with "many" columns and "special" datatypes.
The REST API has this limit while Bulk API has this list and does NOT support "object" and "base64" types.

Can we exclude certain columns to bypass the issue ? I specified limited number of columns in ConfiguredCatalog, but that seems not changing any behavior.

koconder · 2023-01-03T22:27:49Z

@poolmaster is there some query i could run in Salesforce Workbench or otherwise to output the cols and the data types etc for us to diagnose the issue in relation to the Bulk vs REST API calls?

Any way we can update logic to ignore the columns beyond a certain point for now to resolve the block/issue temporarily?

poolmaster · 2023-01-04T04:17:38Z

@koconder I don't have one. But I'm pretty sure that was the root cause. It basically blocks the connector from working for certain some critical tables.
I think we need to support either:

ignore the columns beyond a certain point, which would allow REST API always work.
let client to configure what columns to pull (based on ConfiguredCatalog)

Option 2 seems better IMHO

koconder · 2023-01-05T02:56:54Z

@poolmaster are you willing to raise a PR for option 2? I'm happy to help review, test and push it through.

yepher · 2023-02-09T05:53:01Z

This is my first time using AirByte (testing it as a potential solution in our infrastructure). I am bumping into this same issue. Our SFDC schema is very complex and has a lot of fields. I think this is the bug I am running into:
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-salesforce/source_salesforce/source.py#L69

I changed that code to this:

@classmethod
    def _get_api_type(cls, stream_name, properties):
        # Salesforce BULK API currently does not support loading fields with data type base64 and compound data
        properties_not_supported_by_bulk = {
            key: value for key, value in properties.items() if value.get("format") == "base64" or "object" in value["type"]
        }
        properties_length = len(",".join(p for p in properties))

        rest_required = stream_name in UNSUPPORTED_BULK_API_SALESFORCE_OBJECTS or properties_not_supported_by_bulk
        # If we have a lot of properties we can overcome REST API URL length and get an error: "reason: URI Too Long".
        # For such cases connector tries to use BULK API because it uses POST request and passes properties in the request body.
        bulk_required = properties_length + 2000 > Salesforce.REQUEST_SIZE_LIMITS
        print(f"xxxxxxxxxx bulk_required: {bulk_required} properties_lengthL {properties_length} + LIMIT: {Salesforce.REQUEST_SIZE_LIMITS}")
        if rest_required and not bulk_required:
            return "rest"
        if not rest_required:
            return "bulk"

And this is the output:

2023-02-09 05:45:12 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):125 - xxxxxxxxxx bulk_required: True properties_lengthL 14846 + LIMIT: 16384
airbyte-worker                    | 2023-02-09 05:45:12 ERROR i.a.w.i.DefaultAirbyteStreamFactory(internalLog):163 - None: Stream Account cannot be processed by REST or BULK API.
airbyte-worker                    | Traceback (most recent call last):
airbyte-worker                    |   File "/airbyte/integration_code/main.py", line 13, in <module>
airbyte-worker                    |     launch(source, sys.argv[1:])
airbyte-worker                    |   File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py", line 131, in launch
airbyte-worker                    |     for message in source_entrypoint.run(parsed_args):
airbyte-worker                    |   File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py", line 116, in run
airbyte-worker                    |     catalog = self.source.discover(self.logger, config)
airbyte-worker                    |   File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 75, in discover
airbyte-worker                    |     streams = [stream.as_airbyte_stream() for stream in self.streams(config=config)]
airbyte-worker                    |   File "/airbyte/integration_code/source_salesforce/source.py", line 110, in streams
airbyte-worker                    |     streams = self.generate_streams(config, stream_objects, sf)
airbyte-worker                    |   File "/airbyte/integration_code/source_salesforce/source.py", line 95, in generate_streams
airbyte-worker                    |     raise Exception(f"{api_type}: Stream {stream_name} cannot be processed by REST or BULK API.")

The return boolean logic is just a little off. It should return "bulk" but is returning None in this case because the code falls through. This is not the perfect solution, but to test, I changed the return to this, and it worked for me:

        rest_required = stream_name in UNSUPPORTED_BULK_API_SALESFORCE_OBJECTS or properties_not_supported_by_bulk
        # If we have a lot of properties we can overcome REST API URL length and get an error: "reason: URI Too Long".
        # For such cases connector tries to use BULK API because it uses POST request and passes properties in the request body.
        bulk_required = properties_length + 2000 > Salesforce.REQUEST_SIZE_LIMITS
        print(f"xxxxxxxxxx bulk_required: {bulk_required} properties_lengthL {properties_length} + LIMIT: {Salesforce.REQUEST_SIZE_LIMITS}")
        if rest_required and not bulk_required:
            return "rest"
        if not rest_required:
            return "bulk"
        return "bulk" # this case needs handled better

yepher · 2023-02-09T16:22:53Z

This looks like it was likely resolved in the PR: https://github.com/airbytehq/airbyte/pull/22597/files#diff-1180da547d1ffa8dbd6c895174d912b0023e3ccfce51920fab55913a0629baa8

koconder · 2023-02-12T22:53:18Z

Seems like #22597 has added new issues:

airbytehq/oncall/issues/1403
cc @davydov-d

2023-02-12 22:50:25 source > Syncing stream: Account 
2023-02-12 22:50:25 source > <h1>Bad Message 431</h1><pre>reason: Request Header Fields Too Large</pre>
2023-02-12 22:50:25 source > Encountered an exception while reading stream Account
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 971, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/local/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:

davydov-d · 2023-02-13T10:02:45Z

@koconder may I ask you to share your credentials if possible so I could reproduce this? or at least the log file of the failed sync. Thanks!

davydov-d · 2023-02-13T11:42:11Z

@koconder may I ask you to share your credentials if possible so I could reproduce this? or at least the log file of the failed sync. Thanks!

nevermind, I have found the root cause and prepared a patch for that: #22896

…s-chunk-length

koconder · 2023-02-17T04:33:26Z

Still an issue @davydov-d

    raise e
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 120, in read
    yield from self._read_stream(
  File "/airbyte/integration_code/source_salesforce/source.py", line 138, in _read_stream
    yield from super()._read_stream(logger, stream_instance, configured_stream, state_manager, internal_config)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 189, in _read_stream
    for record in record_iterator:
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 256, in _read_incremental
    for message_counter, record_data_or_message in enumerate(records, start=1):
  File "/airbyte/integration_code/source_salesforce/streams.py", line 162, in read_records
    yield from self._read_pages(
  File "/airbyte/integration_code/source_salesforce/streams.py", line 209, in _read_pages
    chunk_page_records = {record[self.primary_key]: record for record in chunk_page_records}
  File "/airbyte/integration_code/source_salesforce/streams.py", line 209, in <dictcomp>
    chunk_page_records = {record[self.primary_key]: record for record in chunk_page_records}
KeyError: 'Id'
2023-02-17 04:30:21 destination > Airbyte message consumer: succeeded.
2023-02-17 04:30:21 destination > executing on success close procedure.
2023-02-17 04:30:21 destination > Flushing all 0 current buffers (0 bytes in total)
2023-02-17 04:30:21 destination > Completed integration: io.airbyte.integrations.destination.s3.S3Destination
2023-02-17 04:30:21 destination > Completed destination: io.airbyte.integrations.destination.s3.S3Destination

I will send you the full log on Slack

davydov-d · 2023-02-17T08:02:48Z

oh dang, I think I know what the problem is

…pk-in-properties-chunks

…ies-chunks' of github.com:airbytehq/airbyte into ddavydov/#20703-source-salesforce-include-pk-in-properties-chunks

* #20703 source salesforce: include primary key in every chunk * #20703 source salesforce: upd changelog * #20703 source salesforce: review fix * auto-bump connector version * #20703 source salesforce: upd connectors.md --------- Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

davydov-d · 2023-02-23T18:18:09Z

#23190

koconder · 2023-02-28T10:35:34Z

Still buggy, new bug introduced:

Inconsistent record with primary key 0014a000007hTpOAAU found. It consists of 1 chunks instead of 2. Skipping it.

This issue above repeats for all rows in the first object which was originally the issue Accounts

davydov-d · 2023-03-02T07:22:01Z

#23610

davydov-d · 2023-03-03T08:55:19Z

the fix has been released

koconder · 2023-03-03T09:29:33Z

@davydov-d im testing now... so far the chunking has started fingers crosssed
One error so far for each chunk

2023-03-03 09:25:26 ERROR c.n.s.DateTimeValidator(tryParse):82 - Invalid date-time: Invalid timezone offset: +0000

davydov-d · 2023-03-03T09:59:02Z

@koconder that is rather a warning, and it used to be logged even before chunks were introduced. In case it's something critical, please report a new issue since it's not related to this one.

davydov-d · 2023-03-03T13:10:14Z

https://github.com/airbytehq/oncall/issues/1571 confirmed it works

koconder added type/bug Something isn't working needs-triage connectors/source/salesforce labels Dec 20, 2022

octavia-squidington-iv added autoteam team/connectors-python labels Dec 20, 2022

bazarnov mentioned this issue Dec 21, 2022

Source Salesforce: populate streams #20510

Closed

YowanR added bounty reward-100 difficulty - ⭐ and removed type/bug Something isn't working labels Jan 24, 2023

davydov-d added a commit that referenced this issue Feb 13, 2023

#20703 Source Salesforce: fix properties chunk length count

82966a2

davydov-d mentioned this issue Feb 13, 2023

Source Salesforce: fix properties chunk length count #22896

Merged

davydov-d added a commit that referenced this issue Feb 13, 2023

#20703 Source Salesforce: upd changelog

2bdcf63

YowanR removed reward-100 difficulty - ⭐ bounty needs-triage labels Feb 14, 2023

davydov-d added a commit that referenced this issue Feb 15, 2023

Merge branch 'master' into ddavydov/#20703-source-salesforce-fix-prop…

31782a3

…s-chunk-length

davydov-d added a commit that referenced this issue Feb 15, 2023

Merge branch 'master' into ddavydov/#20703-source-salesforce-fix-prop…

3c2ec42

…s-chunk-length

davydov-d added a commit that referenced this issue Feb 15, 2023

#20703: source salesforce - add allowedHosts

4de082e

koconder reopened this Feb 17, 2023

davydov-d self-assigned this Feb 17, 2023

davydov-d added a commit that referenced this issue Feb 17, 2023

#20703 source salesforce: include primary key in every chunk

b03b182

davydov-d mentioned this issue Feb 17, 2023

Source Salesforce: include primary key in every chunk #23190

Merged

davydov-d added a commit that referenced this issue Feb 17, 2023

#20703 source salesforce: upd changelog

f889f34

davydov-d added a commit that referenced this issue Feb 20, 2023

#20703 source salesforce: review fix

1af9202

lazebnyi added a commit that referenced this issue Feb 22, 2023

Merge branch 'master' into ddavydov/#20703-source-salesforce-include-…

6a8bbcf

…pk-in-properties-chunks

lazebnyi added a commit that referenced this issue Feb 22, 2023

Merge branch 'master' into ddavydov/#20703-source-salesforce-include-…

787e217

…pk-in-properties-chunks

davydov-d added a commit that referenced this issue Feb 23, 2023

Merge branch 'master' into ddavydov/#20703-source-salesforce-include-…

d71adb7

…pk-in-properties-chunks

davydov-d added a commit that referenced this issue Feb 23, 2023

Merge branch 'master' into ddavydov/#20703-source-salesforce-include-…

2f42585

…pk-in-properties-chunks

davydov-d added a commit that referenced this issue Feb 23, 2023

Merge branch 'master' into ddavydov/#20703-source-salesforce-include-…

b93ba3d

…pk-in-properties-chunks

davydov-d added a commit that referenced this issue Feb 23, 2023

Merge branch 'master' into ddavydov/#20703-source-salesforce-include-…

b0c75f3

…pk-in-properties-chunks

davydov-d added a commit that referenced this issue Feb 23, 2023

Merge branch 'master' into ddavydov/#20703-source-salesforce-include-…

368feb3

…pk-in-properties-chunks

davydov-d added a commit that referenced this issue Feb 23, 2023

Merge branch 'master' into ddavydov/#20703-source-salesforce-include-…

dccf593

…pk-in-properties-chunks

davydov-d added a commit that referenced this issue Feb 23, 2023

Merge branch 'master' into ddavydov/#20703-source-salesforce-include-…

df110b4

…pk-in-properties-chunks

davydov-d added a commit that referenced this issue Feb 23, 2023

#20703 source salesforce: upd connectors.md

eaadb13

davydov-d added a commit that referenced this issue Feb 23, 2023

Merge branch 'ddavydov/#20703-source-salesforce-include-pk-in-propert…

f449dba

…ies-chunks' of github.com:airbytehq/airbyte into ddavydov/#20703-source-salesforce-include-pk-in-properties-chunks

davydov-d closed this as completed Feb 23, 2023

koconder reopened this Feb 28, 2023

davydov-d closed this as completed Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Salesforce Stream "Account" cannot be processed by REST or BULK API #20703

Salesforce Stream "Account" cannot be processed by REST or BULK API #20703

koconder commented Dec 20, 2022

koconder commented Dec 20, 2022

poolmaster commented Jan 3, 2023

koconder commented Jan 3, 2023

poolmaster commented Jan 4, 2023

koconder commented Jan 5, 2023

yepher commented Feb 9, 2023 •

edited

yepher commented Feb 9, 2023 •

edited

koconder commented Feb 12, 2023

davydov-d commented Feb 13, 2023

davydov-d commented Feb 13, 2023

koconder commented Feb 17, 2023

davydov-d commented Feb 17, 2023

davydov-d commented Feb 23, 2023

koconder commented Feb 28, 2023

davydov-d commented Mar 2, 2023

davydov-d commented Mar 3, 2023

koconder commented Mar 3, 2023

davydov-d commented Mar 3, 2023

davydov-d commented Mar 3, 2023

Salesforce Stream "Account" cannot be processed by REST or BULK API #20703

Salesforce Stream "Account" cannot be processed by REST or BULK API #20703

Comments

koconder commented Dec 20, 2022

Environment

Current Behavior

Expected Behavior

koconder commented Dec 20, 2022

poolmaster commented Jan 3, 2023

koconder commented Jan 3, 2023

poolmaster commented Jan 4, 2023

koconder commented Jan 5, 2023

yepher commented Feb 9, 2023 • edited

yepher commented Feb 9, 2023 • edited

koconder commented Feb 12, 2023

davydov-d commented Feb 13, 2023

davydov-d commented Feb 13, 2023

koconder commented Feb 17, 2023

davydov-d commented Feb 17, 2023

davydov-d commented Feb 23, 2023

koconder commented Feb 28, 2023

davydov-d commented Mar 2, 2023

davydov-d commented Mar 3, 2023

koconder commented Mar 3, 2023

davydov-d commented Mar 3, 2023

davydov-d commented Mar 3, 2023

yepher commented Feb 9, 2023 •

edited

yepher commented Feb 9, 2023 •

edited