Skip to content

Commit

Permalink
🐛 Source Smartsheets: fix scrambled columns bug (#11911)
Browse files Browse the repository at this point in the history
* #5520 fix scrambled columns bug

* #5520 source smartsheets: add changelog item

* #5520 move pytest to optional setup requirements

* #5520 source smartsheets: add two unit tests

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
  • Loading branch information
2 people authored and suhomud committed May 23, 2022
1 parent 370dab4 commit 58d6db8
Show file tree
Hide file tree
Showing 9 changed files with 95 additions and 70 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -737,7 +737,7 @@
- name: Smartsheets
sourceDefinitionId: 374ebc65-6636-4ea0-925c-7d35999a8ffc
dockerRepository: airbyte/source-smartsheets
dockerImageTag: 0.1.8
dockerImageTag: 0.1.9
documentationUrl: https://docs.airbyte.io/integrations/sources/smartsheets
icon: smartsheet.svg
sourceType: api
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7868,7 +7868,7 @@
oauthFlowOutputParameters:
- - "access_token"
- - "refresh_token"
- dockerImage: "airbyte/source-smartsheets:0.1.8"
- dockerImage: "airbyte/source-smartsheets:0.1.9"
spec:
documentationUrl: "https://docs.airbyte.io/integrations/sources/smartsheets"
connectionSpecification:
Expand Down
6 changes: 3 additions & 3 deletions airbyte-integrations/connectors/source-smartsheets/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ RUN apt-get update && apt-get install -y bash && rm -rf /var/lib/apt/lists/*
ENV CODE_PATH="source_smartsheets"

WORKDIR /airbyte/integration_code
COPY $CODE_PATH ./$CODE_PATH
COPY main.py ./
COPY setup.py ./
RUN pip install .
COPY main.py ./
COPY $CODE_PATH ./$CODE_PATH

ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.1.8
LABEL io.airbyte.version=0.1.9
LABEL io.airbyte.name=airbyte/source-smartsheets
Original file line number Diff line number Diff line change
Expand Up @@ -2,35 +2,18 @@
"streams": [
{
"stream": {
"name": "test",
"name": "aws_s3_sample",
"json_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"0": {
"type": ["null", "string", "number"]
},
"First Name": {
"type": ["null", "string"]
},
"Last Name": {
"type": ["null", "string"]
},
"Gender": {
"type": ["null", "number", "string"]
},
"Country": {
"type": ["null", "number", "string"]
},
"Age": {
"type": ["null", "string", "number"]
},
"Date": {
"type": ["null", "string"]
},
"Id": {
"type": ["null", "string", "number"]
}
"id": { "type": "string" },
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"gender": { "type": "string" },
"ip_address": { "type": "string" },
"primary_email": { "type": "string" },
"dob": { "type": "string", "format": "date" }
}
},
"supported_sync_modes": ["full_refresh"]
Expand Down
9 changes: 8 additions & 1 deletion airbyte-integrations/connectors/source-smartsheets/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,19 @@

from setuptools import find_packages, setup

MAIN_REQUIREMENTS = ["airbyte-cdk~=0.1", "smartsheet-python-sdk==2.105.1"]
TEST_REQUIREMENTS = ["pytest==6.1.2", "pytest-mock~=3.6.1"]


setup(
name="source_smartsheets",
description="Source implementation for Smartsheets.",
author="Nate Nowack",
author_email="contact@airbyte.io",
packages=find_packages(),
install_requires=["airbyte-cdk~=0.1", "pytest==6.1.2", "smartsheet-python-sdk==2.105.1"],
install_requires=MAIN_REQUIREMENTS,
extras_require={
"tests": TEST_REQUIREMENTS,
},
package_data={"": ["*.json"]},
)
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

import json
from datetime import datetime
from typing import Dict, Generator
from typing import Dict, Generator, List

import smartsheet
from airbyte_cdk import AirbyteLogger
Expand All @@ -19,8 +19,6 @@
Status,
Type,
)

# helpers
from airbyte_cdk.sources import Source


Expand All @@ -33,8 +31,14 @@ def get_prop(col_type: str) -> Dict[str, any]:
return props.get(col_type, {"type": "string"})


def get_json_schema(sheet: Dict) -> Dict:
column_info = {i["title"]: get_prop(i["type"]) for i in sheet["columns"]}
def construct_record(sheet_columns: List[Dict], row_cells: List[Dict]) -> Dict:
# convert all data to string as it is only expected format in schema
values_column_map = {cell["columnId"]: str(cell.get("value", "")) for cell in row_cells}
return {column["title"]: values_column_map[column["id"]] for column in sheet_columns}


def get_json_schema(sheet_columns: List[Dict]) -> Dict:
column_info = {column["title"]: get_prop(column["type"]) for column in sheet_columns}
json_schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
Expand All @@ -43,7 +47,6 @@ def get_json_schema(sheet: Dict) -> Dict:
return json_schema


# main class definition
class SourceSmartsheets(Source):
def check(self, logger: AirbyteLogger, config: json) -> AirbyteConnectionStatus:
try:
Expand Down Expand Up @@ -74,8 +77,7 @@ def discover(self, logger: AirbyteLogger, config: json) -> AirbyteCatalog:
try:
sheet = smartsheet_client.Sheets.get_sheet(spreadsheet_id)
sheet = json.loads(str(sheet)) # make it subscriptable
sheet_json_schema = get_json_schema(sheet)

sheet_json_schema = get_json_schema(sheet["columns"])
logger.info(f"Running discovery on sheet: {sheet['name']} with {spreadsheet_id}")

stream = AirbyteStream(name=sheet["name"], json_schema=sheet_json_schema)
Expand All @@ -97,35 +99,23 @@ def read(

for configured_stream in catalog.streams:
stream = configured_stream.stream
properties = stream.json_schema["properties"]
if isinstance(properties, list):
columns = tuple(key for dct in properties for key in dct.keys())
elif isinstance(properties, dict):
columns = tuple(i for i in properties.keys())
else:
logger.error("Could not read properties from the JSONschema in this stream")
name = stream.name

try:
sheet = smartsheet_client.Sheets.get_sheet(spreadsheet_id)
sheet = json.loads(str(sheet)) # make it subscriptable
logger.info(f"Starting syncing spreadsheet {sheet['name']}")
logger.info(f"Row count: {sheet['totalRowCount']}")

for row in sheet["rows"]:
# convert all data to string as it is only expected format in schema
values = tuple(str(i["value"]) if "value" in i else "" for i in row["cells"])
try:
data = dict(zip(columns, values))

record = construct_record(sheet["columns"], row["cells"])
yield AirbyteMessage(
type=Type.RECORD,
record=AirbyteRecordMessage(stream=name, data=data, emitted_at=int(datetime.now().timestamp()) * 1000),
record=AirbyteRecordMessage(stream=stream.name, data=record, emitted_at=int(datetime.now().timestamp()) * 1000),
)
except Exception as e:
logger.error(f"Unable to encode row into an AirbyteMessage with the following error: {e}")

except Exception as e:
logger.error(f"Could not read smartsheet: {name}")
logger.error(f"Could not read smartsheet: {stream.name}")
raise e
logger.info(f"Finished syncing spreadsheet with ID: {spreadsheet_id}")
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#
# Copyright (c) 2021 Airbyte, Inc., all rights reserved.
#

import json
from itertools import permutations
from unittest.mock import Mock

import pytest
from source_smartsheets.source import SourceSmartsheets


@pytest.fixture
def config():
return {"access_token": "token", "spreadsheet_id": "id"}


@pytest.fixture(name="catalog")
def configured_catalog():
stream_mock = Mock()
stream_mock.name = "test" # cannot be used in __init__
return Mock(streams=[Mock(stream=stream_mock)])


_columns = [
{"id": "1101932201830276", "title": "id", "type": "TEXT_NUMBER"},
{"id": "5605531829200772", "title": "first_name", "type": "TEXT_NUMBER"},
{"id": "3353732015515524", "title": "last_name", "type": "TEXT_NUMBER"},
]


_cells = [
{"columnId": "1101932201830276", "value": "11"},
{"columnId": "5605531829200772", "value": "Leonardo"},
{"columnId": "3353732015515524", "value": "Dicaprio"},
]


@pytest.mark.parametrize(("row", "columns"), (*((perm, _columns) for perm in permutations(_cells)), ([], _columns), ([], [])))
def test_different_cell_order_produces_one_result(mocker, config, catalog, row, columns):
sheet = json.dumps({"name": "test", "totalRowCount": 3, "columns": columns, "rows": [{"cells": row}] if row else []})
mocker.patch("smartsheet.Smartsheet", Mock(return_value=Mock(Sheets=Mock(get_sheet=Mock(return_value=sheet)))))
source = SourceSmartsheets()
records = [message.record.data for message in source.read(logger=Mock(), config=config, catalog=catalog, state={})]
expected_records = [] if not row else [{"id": "11", "first_name": "Leonardo", "last_name": "Dicaprio"}]
assert list(records) == expected_records
31 changes: 15 additions & 16 deletions docs/integrations/sources/smartsheets.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,24 +25,24 @@ The data type mapping adopted by this connector is based on the Smartsheet [docu

**NOTE**: For any column datatypes interpreted by Smartsheets beside `DATE` and `DATETIME`, this connector's source schema generation assumes a `string` type, in which case the `format` field is not required by Airbyte.

| Integration Type | Airbyte Type | Airbyte Format |
| :--- | :--- | :--- |
| `TEXT_NUMBER` | `string` | |
| `DATE` | `string` | `format: date` |
| `DATETIME` | `string` | `format: date-time` |
| `anything else` | `string` | |
| Integration Type | Airbyte Type | Airbyte Format |
|:-----------------|:-------------|:---------------------|
| `TEXT_NUMBER` | `string` | |
| `DATE` | `string` | `format: date` |
| `DATETIME` | `string` | `format: date-time` |
| `anything else` | `string` | |

The remaining column datatypes supported by Smartsheets are more complex types \(e.g. Predecessor, Dropdown List\) and are not supported by this connector beyond its `string` representation.

### Features

This source connector only supports Full Refresh Sync. Since Smartsheets only allows 5000 rows per sheet, it's likely that the Full Refresh Sync Mode will suit the majority of use-cases.

| Feature | Supported? |
| :--- | :--- |
| Full Refresh Sync | Yes |
| Incremental Sync | No |
| Namespaces | No |
| Feature | Supported? |
|:------------------|:-----------|
| Full Refresh Sync | Yes |
| Incremental Sync | No |
| Namespaces | No |

### Performance considerations

Expand Down Expand Up @@ -86,8 +86,7 @@ To setup your new Smartsheets source, Airbyte will need:

## Changelog

| Version | Date | Pull Request | Subject |
|:--------|:-----------| :--- |:--------------------|
| 0.1.8 | 2022-02-04 | [9792](https://github.com/airbytehq/airbyte/pull/9792) | Added oauth support |


| Version | Date | Pull Request | Subject |
|:--------|:-----------|:---------------------------------------------------------|:--------------------------|
| 0.1.9 | 2022-04-12 | [11911](https://github.com/airbytehq/airbyte/pull/11911) | Bugfix: scrambled columns |
| 0.1.8 | 2022-02-04 | [9792](https://github.com/airbytehq/airbyte/pull/9792) | Added oauth support |

0 comments on commit 58d6db8

Please sign in to comment.