Skip to content

Commit

Permalink
Source SalesForce: fix encoding guess (#29538)
Browse files Browse the repository at this point in the history
  • Loading branch information
artem1205 committed Aug 18, 2023
1 parent c1bc88e commit 6ba7c03
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@ RUN pip install .

ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=2.1.3
LABEL io.airbyte.version=2.1.4
LABEL io.airbyte.name=airbyte/source-salesforce
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ data:
connectorSubtype: api
connectorType: source
definitionId: b117307c-14b6-41aa-9422-947e34922962
dockerImageTag: 2.1.3
dockerImageTag: 2.1.4
dockerRepository: airbyte/source-salesforce
githubIssueLabel: source-salesforce
icon: salesforce.svg
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,25 @@ def filter_null_bytes(self, b: bytes):
self.logger.warning("Filter 'null' bytes from string, size reduced %d -> %d chars", len(b), len(res))
return res

def get_response_encoding(self, headers) -> str:
"""Returns encodings from given HTTP Header Dict.
:param headers: dictionary to extract encoding from.
:rtype: str
"""

content_type = headers.get("content-type")

if not content_type:
return self.encoding

content_type, params = requests.utils._parse_content_type_header(content_type)

if "charset" in params:
return params["charset"].strip("'\"")

return self.encoding

def download_data(self, url: str, chunk_size: int = 1024) -> tuple[str, str, dict]:
"""
Retrieves binary data result from successfully `executed_job`, using chunks, to avoid local memory limitations.
Expand All @@ -453,8 +472,8 @@ def download_data(self, url: str, chunk_size: int = 1024) -> tuple[str, str, dic
with closing(self._send_http_request("GET", url, headers={"Accept-Encoding": "gzip"}, stream=True)) as response, open(
tmp_file, "wb"
) as data_file:
response_encoding = response.encoding or self.encoding
response_headers = response.headers
response_encoding = self.get_response_encoding(response_headers)
for chunk in response.iter_content(chunk_size=chunk_size):
data_file.write(self.filter_null_bytes(chunk))
# check the file exists
Expand Down
1 change: 1 addition & 0 deletions docs/integrations/sources/salesforce.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ Now that you have set up the Salesforce source connector, check out the followin

| Version | Date | Pull Request | Subject |
|:--------|:-----------|:---------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------|
| 2.1.4 | 2023-08-17 | [29538](https://github.com/airbytehq/airbyte/pull/29538) | Fix encoding guess |
| 2.1.3 | 2023-08-17 | [29500](https://github.com/airbytehq/airbyte/pull/29500) | handle expired refresh token error |
| 2.1.2 | 2023-08-10 | [28781](https://github.com/airbytehq/airbyte/pull/28781) | Fix pagination for BULK API jobs; Add option to force use BULK API |
| 2.1.1 | 2023-07-06 | [28021](https://github.com/airbytehq/airbyte/pull/28021) | Several Vulnerabilities Fixes; switched to use alpine instead of slim, CVE-2022-40897, CVE-2023-29383, CVE-2023-31484, CVE-2016-2781 |
Expand Down

0 comments on commit 6ba7c03

Please sign in to comment.