Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Smartsheets: fix scrambled columns bug #11911

Merged
merged 6 commits into from
Apr 18, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions airbyte-integrations/connectors/source-smartsheets/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ RUN apt-get update && apt-get install -y bash && rm -rf /var/lib/apt/lists/*
ENV CODE_PATH="source_smartsheets"

WORKDIR /airbyte/integration_code
COPY $CODE_PATH ./$CODE_PATH
COPY main.py ./
COPY setup.py ./
RUN pip install .
COPY main.py ./
COPY $CODE_PATH ./$CODE_PATH
davydov-d marked this conversation as resolved.
Show resolved Hide resolved

ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.1.8
LABEL io.airbyte.version=0.1.9
LABEL io.airbyte.name=airbyte/source-smartsheets
Original file line number Diff line number Diff line change
Expand Up @@ -2,35 +2,18 @@
"streams": [
{
"stream": {
"name": "test",
"name": "aws_s3_sample",
"json_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"0": {
"type": ["null", "string", "number"]
},
"First Name": {
"type": ["null", "string"]
},
"Last Name": {
"type": ["null", "string"]
},
"Gender": {
"type": ["null", "number", "string"]
},
"Country": {
"type": ["null", "number", "string"]
},
"Age": {
"type": ["null", "string", "number"]
},
"Date": {
"type": ["null", "string"]
},
"Id": {
"type": ["null", "string", "number"]
}
"id": { "type": "string" },
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"gender": { "type": "string" },
"ip_address": { "type": "string" },
"primary_email": { "type": "string" },
"dob": { "type": "string", "format": "date" }
}
},
"supported_sync_modes": ["full_refresh"]
Expand Down
9 changes: 8 additions & 1 deletion airbyte-integrations/connectors/source-smartsheets/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,19 @@

from setuptools import find_packages, setup

MAIN_REQUIREMENTS = ["airbyte-cdk~=0.1", "smartsheet-python-sdk==2.105.1"]
TEST_REQUIREMENTS = ["pytest==6.1.2", "pytest-mock~=3.6.1"]


setup(
name="source_smartsheets",
description="Source implementation for Smartsheets.",
author="Nate Nowack",
author_email="contact@airbyte.io",
packages=find_packages(),
install_requires=["airbyte-cdk~=0.1", "pytest==6.1.2", "smartsheet-python-sdk==2.105.1"],
install_requires=MAIN_REQUIREMENTS,
extras_require={
"tests": TEST_REQUIREMENTS,
},
package_data={"": ["*.json"]},
)
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

import json
from datetime import datetime
from typing import Dict, Generator
from typing import Dict, Generator, List

import smartsheet
from airbyte_cdk import AirbyteLogger
Expand All @@ -19,8 +19,6 @@
Status,
Type,
)

# helpers
from airbyte_cdk.sources import Source


Expand All @@ -33,8 +31,14 @@ def get_prop(col_type: str) -> Dict[str, any]:
return props.get(col_type, {"type": "string"})


def get_json_schema(sheet: Dict) -> Dict:
column_info = {i["title"]: get_prop(i["type"]) for i in sheet["columns"]}
def construct_record(sheet_columns: List[Dict], row_cells: List[Dict]) -> Dict:
# convert all data to string as it is only expected format in schema
values_column_map = {cell["columnId"]: str(cell.get("value", "")) for cell in row_cells}
return {column["title"]: values_column_map[column["id"]] for column in sheet_columns}


def get_json_schema(sheet_columns: List[Dict]) -> Dict:
column_info = {column["title"]: get_prop(column["type"]) for column in sheet_columns}
json_schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
Expand All @@ -43,7 +47,6 @@ def get_json_schema(sheet: Dict) -> Dict:
return json_schema


# main class definition
class SourceSmartsheets(Source):
def check(self, logger: AirbyteLogger, config: json) -> AirbyteConnectionStatus:
try:
Expand Down Expand Up @@ -74,8 +77,7 @@ def discover(self, logger: AirbyteLogger, config: json) -> AirbyteCatalog:
try:
sheet = smartsheet_client.Sheets.get_sheet(spreadsheet_id)
sheet = json.loads(str(sheet)) # make it subscriptable
sheet_json_schema = get_json_schema(sheet)

sheet_json_schema = get_json_schema(sheet["columns"])
logger.info(f"Running discovery on sheet: {sheet['name']} with {spreadsheet_id}")

stream = AirbyteStream(name=sheet["name"], json_schema=sheet_json_schema)
Expand All @@ -97,35 +99,23 @@ def read(

for configured_stream in catalog.streams:
stream = configured_stream.stream
properties = stream.json_schema["properties"]
if isinstance(properties, list):
columns = tuple(key for dct in properties for key in dct.keys())
elif isinstance(properties, dict):
columns = tuple(i for i in properties.keys())
else:
logger.error("Could not read properties from the JSONschema in this stream")
name = stream.name

try:
sheet = smartsheet_client.Sheets.get_sheet(spreadsheet_id)
sheet = json.loads(str(sheet)) # make it subscriptable
logger.info(f"Starting syncing spreadsheet {sheet['name']}")
logger.info(f"Row count: {sheet['totalRowCount']}")

for row in sheet["rows"]:
# convert all data to string as it is only expected format in schema
values = tuple(str(i["value"]) if "value" in i else "" for i in row["cells"])
try:
data = dict(zip(columns, values))

record = construct_record(sheet["columns"], row["cells"])
yield AirbyteMessage(
type=Type.RECORD,
record=AirbyteRecordMessage(stream=name, data=data, emitted_at=int(datetime.now().timestamp()) * 1000),
record=AirbyteRecordMessage(stream=stream.name, data=record, emitted_at=int(datetime.now().timestamp()) * 1000),
)
except Exception as e:
logger.error(f"Unable to encode row into an AirbyteMessage with the following error: {e}")

except Exception as e:
logger.error(f"Could not read smartsheet: {name}")
logger.error(f"Could not read smartsheet: {stream.name}")
raise e
logger.info(f"Finished syncing spreadsheet with ID: {spreadsheet_id}")
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#
# Copyright (c) 2021 Airbyte, Inc., all rights reserved.
#

import json
from itertools import permutations
from unittest.mock import Mock

import pytest
from source_smartsheets.source import SourceSmartsheets


@pytest.fixture
def config():
return {"access_token": "token", "spreadsheet_id": "id"}


@pytest.fixture(name="catalog")
def configured_catalog():
stream_mock = Mock()
stream_mock.name = "test" # cannot be used in __init__
return Mock(streams=[Mock(stream=stream_mock)])


columns = [
{"id": "1101932201830276", "title": "id", "type": "TEXT_NUMBER"},
{"id": "5605531829200772", "title": "first_name", "type": "TEXT_NUMBER"},
{"id": "3353732015515524", "title": "last_name", "type": "TEXT_NUMBER"},
]


cells = [
{"columnId": "1101932201830276", "value": "11"},
{"columnId": "5605531829200772", "value": "Leonardo"},
{"columnId": "3353732015515524", "value": "Dicaprio"},
]


davydov-d marked this conversation as resolved.
Show resolved Hide resolved
@pytest.mark.parametrize("row", permutations(cells))
def test_different_cell_order_produces_one_result(mocker, row, config, catalog):
sheet = json.dumps({"name": "test", "totalRowCount": 3, "columns": columns, "rows": [{"cells": row}]})
mocker.patch("smartsheet.Smartsheet", Mock(return_value=Mock(Sheets=Mock(get_sheet=Mock(return_value=sheet)))))
source = SourceSmartsheets()
records = [message.record.data for message in source.read(logger=Mock(), config=config, catalog=catalog, state={})]
assert list(records) == [{"id": "11", "first_name": "Leonardo", "last_name": "Dicaprio"}]
31 changes: 15 additions & 16 deletions docs/integrations/sources/smartsheets.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,24 +25,24 @@ The data type mapping adopted by this connector is based on the Smartsheet [docu

**NOTE**: For any column datatypes interpreted by Smartsheets beside `DATE` and `DATETIME`, this connector's source schema generation assumes a `string` type, in which case the `format` field is not required by Airbyte.

| Integration Type | Airbyte Type | Airbyte Format |
| :--- | :--- | :--- |
| `TEXT_NUMBER` | `string` | |
| `DATE` | `string` | `format: date` |
| `DATETIME` | `string` | `format: date-time` |
| `anything else` | `string` | |
| Integration Type | Airbyte Type | Airbyte Format |
|:-----------------|:-------------|:---------------------|
| `TEXT_NUMBER` | `string` | |
| `DATE` | `string` | `format: date` |
| `DATETIME` | `string` | `format: date-time` |
| `anything else` | `string` | |

The remaining column datatypes supported by Smartsheets are more complex types \(e.g. Predecessor, Dropdown List\) and are not supported by this connector beyond its `string` representation.

### Features

This source connector only supports Full Refresh Sync. Since Smartsheets only allows 5000 rows per sheet, it's likely that the Full Refresh Sync Mode will suit the majority of use-cases.

| Feature | Supported? |
| :--- | :--- |
| Full Refresh Sync | Yes |
| Incremental Sync | No |
| Namespaces | No |
| Feature | Supported? |
|:------------------|:-----------|
| Full Refresh Sync | Yes |
| Incremental Sync | No |
| Namespaces | No |

### Performance considerations

Expand Down Expand Up @@ -86,8 +86,7 @@ To setup your new Smartsheets source, Airbyte will need:

## Changelog

| Version | Date | Pull Request | Subject |
|:--------|:-----------| :--- |:--------------------|
| 0.1.8 | 2022-02-04 | [9792](https://github.com/airbytehq/airbyte/pull/9792) | Added oauth support |


| Version | Date | Pull Request | Subject |
|:--------|:-----------|:---------------------------------------------------------|:--------------------------|
| 0.1.9 | 2022-04-12 | [11911](https://github.com/airbytehq/airbyte/pull/11911) | Bugfix: scrambled columns |
| 0.1.8 | 2022-02-04 | [9792](https://github.com/airbytehq/airbyte/pull/9792) | Added oauth support |