Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New Source: Whisky Hunter API [low-code CDK] #17918

Merged
merged 10 commits into from
Oct 21, 2022

Conversation

jrhizor
Copy link
Contributor

@jrhizor jrhizor commented Oct 12, 2022

What

Creates a source for airbytehq/connector-contest#140 (part of the connector contest)

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@github-actions github-actions bot added the area/connectors Connector related issues label Oct 12, 2022
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Oct 12, 2022
@jrhizor
Copy link
Contributor Author

jrhizor commented Oct 12, 2022

Since there is no config required, for testing you can use {} as the contents of the secrets/config.json file.

@@ -360,7 +360,8 @@ def _validate_records_structure(records: List[AirbyteRecordMessage], configured_
continue
record_fields = set(get_object_structure(record.data))
common_fields = set.intersection(record_fields, schema_pathes)
assert common_fields, f" Record from {record.stream} stream should have some fields mentioned by json schema, {schema_pathes}"

assert common_fields, f" Record {record} from {record.stream} stream with fields {record_fields} should have some fields mentioned by json schema: {schema_pathes}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating this because it's hard to debug without being able to compare both schemas and the contents of the record.

@@ -39,7 +39,8 @@ def _prepare_volumes(self, config: Optional[Mapping], state: Optional[Mapping],
self.input_folder.mkdir(parents=True)
self.output_folder.mkdir(parents=True)

if config:
# using "is not None" to allow falsey config objects like {} to still write
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug in the testing code that prohibited the user of empty or falsey config objects (does this count as related to the Low-Code CDK? 💸 ).

The connector works in the UI and from the CLI regardless, this only hurts testing.

@@ -149,7 +150,7 @@ def read(cls, container: Container, command: str = None, with_ext: bool = True)
raise
if exit_status["StatusCode"]:
error = exit_status["Error"] or exception or line
logging.error(f"Docker container was failed, " f'code {exit_status["StatusCode"]}, error:\n{error}')
logging.error(f"Docker container failed, " f'code {exit_status["StatusCode"]}, error:\n{error}')
Copy link
Contributor Author

@jrhizor jrhizor Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads better this way.

@jrhizor
Copy link
Contributor Author

jrhizor commented Oct 12, 2022

I have some usability feedback for my experience writing this:

(LOW) Failing endpoints give confusing messages

If an endpoint is getting a 404 error (like if you specified the wrong url like I did initially), if you're following the tutorial steps, the first error you reach is:

❯ python main.py check --config secrets/config.json
{"type": "LOG", "log": {"level": "ERROR", "message": "Check failed"}}
{"type": "CONNECTION_STATUS", "connectionStatus": {"status": "FAILED", "message": "'Unable to connect to stream auctions_info - Expecting value: line 4 column 1 (char 3)'"}}

This is because the connection status doesn't check for return codes and instead first fails with the parsing error for the response. Since the wrong URL wasn't JSON-friendly, I get this message, when it could just say that it was a 404.

(LOW) Guide on inferring JSON Schema from endpoints

For something that doesn't publish an OpenAPI spec, it would be useful to point the user at a tool like https://json-schema-inferrer.herokuapp.com/ to quickly summarize the endpoints. I did this and then had to unwrap the top level due to extractors, and it saves a lot of time. (I'm not sure if something like this is in the other docs, but it would probably make sense to show that as part of the exchangerates nocode tutorial).

(HIGH - but fixed in this PR) No such file or directory errors for '/data/tap_config.json' were hard to debug

I fixed this in my current PR (it turned out to be a testing harness bug around empty/falsey dictionaries for connector configs. It's weird to get something like this for the easiest to implement connectors without settings or auth:

ERROR    root:connector_runner.py:152 Docker container was failed, code 1, error:
{"type": "TRACE", "trace": {"type": "ERROR", "emitted_at": 1665606910365.525, "error": {"message": "Something went wrong in the connector. See the logs for more details.", "internal_message": "[Errno 2] No such file or directory: '/data/tap_config.json'", "stack_trace": "Traceback (most recent call last):\n  File \"/airbyte/integration_code/main.py\", line 13, in <module>\n    launch(source, sys.argv[1:])\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py\", line 123, in launch\n    for message in source_entrypoint.run(parsed_args):\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py\", line 84, in run\n    raw_config = self.source.read_config(parsed_args.config)\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/connector.py\", line 51, in read_config\n    with open(config_path, \"r\") as file:\nFileNotFoundError: [Errno 2] No such file or directory: '/data/tap_config.json'\n", "failure_type": "system_error"}}}

When searching for this it looks like #15880 ran into a problem with similar outputs (I'm not sure if it's 100% the same underlying problem).

(MEDIUM - but fixed in this PR) schema error messages are hard to read

When debugging schema errors, it would be really helpful if there was some output to compare. "this schema" failed against "this record".

Instead of just getting

E           AssertionError:  Record from auctions_data stream should have some fields mentioned by json schema, {'/[]/auction_lots_count', '/[]/auction_name', '/[]/winning_bid_max', '/[]/all_auctions_lots_count', '/[]/winning_bid_mean', '/[]/auction_trading_volume', '/[]/dt', '/[]/winning_bid_min', '/[]/auction_slug'}
E           assert set()

and then having to find the problem yourself.

After adding the comments in the PR it has output like this instead:

E           AssertionError:  Record namespace=None stream='auctions_data' data={'dt': '2022-09-01', 'winning_bid_max': 12392.4, 'winning_bid_min': 10.0, 'winning_bid_mean': 301.02, 'auction_trading_volume': 155930.6, 'auction_lots_count': 518, 'all_auctions_lots_count': 38194, 'auction_name': 'Australian Whisky Auctions', 'auction_slug': 'australian-whisky-auctions'} emitted_at=1665610911626 from auctions_data stream with fields {'/winning_bid_max', '/all_auctions_lots_count', '/winning_bid_min', '/auction_lots_count', '/dt', '/auction_name', '/auction_trading_volume', '/auction_slug', '/winning_bid_mean'} should have some fields mentioned by json schema: {'/[]/winning_bid_max', '/[]/auction_slug', '/[]/winning_bid_min', '/[]/winning_bid_mean', '/[]/auction_name', '/[]/all_auctions_lots_count', '/[]/auction_lots_count', '/[]/dt', '/[]/auction_trading_volume'}

which is much more helpful.

(MEDIUM) Easy to forget that you need to rebuild the docker image for some operations

When most of the testing you're doing is using python commands like python main.py discover --config secrets/config.json, it's hard to remember as a new or returning user that python -m pytest integration_tests -p integration_tests.acceptance requires the Docker image to be rebuilt. Maybe this build should run automatically when this happens or show a big warning that you have changes that haven't been included yet?

(LOW) Some logged error messages are confusing

For example:

 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestBasicRead.test_read[inputs0] ✓                                         92% █████████▎{"type": "LOG", "log": {"level": "ERROR", "message": "Docker container failed, code 1, error:\n{\"type\": \"TRACE\", \"trace\": {\"type\": \"ERROR\", \"emitted_at\": 1665611155019.032, \"error\": {\"message\": \"Something went wrong in the connector. See the logs for more details.\", \"internal_message\": \"2 validation errors for ConfiguredAirbyteCatalog\\nstreams -> 0 -> sync_mode\\n  value is not a valid enumeration member; permitted: 'full_refresh', 'incremental' (type=type_error.enum; enum_values=[<SyncMode.full_refresh: 'full_refresh'>, <SyncMode.incremental: 'incremental'>])\\nstreams -> 0 -> destination_sync_mode\\n  value is not a valid enumeration member; permitted: 'append', 'overwrite', 'append_dedup' (type=type_error.enum; enum_values=[<DestinationSyncMode.append: 'append'>, <DestinationSyncMode.overwrite: 'overwrite'>, <DestinationSyncMode.append_dedup: 'append_dedup'>])\", \"stack_trace\": \"Traceback (most recent call last):\\n  File \\\"/airbyte/integration_code/main.py\\\", line 13, in <module>\\n    launch(source, sys.argv[1:])\\n  File \\\"/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py\\\", line 123, in launch\\n    for message in source_entrypoint.run(parsed_args):\\n  File \\\"/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py\\\", line 111, in run\\n    config_catalog = self.source.read_catalog(parsed_args.catalog)\\n  File \\\"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/source.py\\\", line 90, in read_catalog\\n    return ConfiguredAirbyteCatalog.parse_obj(self.read_config(catalog_path))\\n  File \\\"/usr/local/lib/python3.9/site-packages/pydantic/main.py\\\", line 521, in parse_obj\\n    return cls(**obj)\\n  File \\\"/usr/local/lib/python3.9/site-packages/pydantic/main.py\\\", line 341, in __init__\\n    raise validation_error\\npydantic.error_wrappers.ValidationError: 2 validation errors for ConfiguredAirbyteCatalog\\nstreams -> 0 -> sync_mode\\n  value is not a valid enumeration member; permitted: 'full_refresh', 'incremental' (type=type_error.enum; enum_values=[<SyncMode.full_refresh: 'full_refresh'>, <SyncMode.incremental: 'incremental'>])\\nstreams -> 0 -> destination_sync_mode\\n  value is not a valid enumeration member; permitted: 'append', 'overwrite', 'append_dedup' (type=type_error.enum; enum_values=[<DestinationSyncMode.append: 'append'>, <DestinationSyncMode.overwrite: 'overwrite'>, <DestinationSyncMode.append_dedup: 'append_dedup'>])\\n\", \"failure_type\": \"system_error\"}}}\n"}}

This shows an error, but the test itself passes and the actual connector appears to function correctly in the UI. Should this error message be hidden? Similarly, I get messages like:

Pulling docker image airbyte/source-whisky-hunter:latest
{"type": "LOG", "log": {"level": "WARN", "message": "\n We did not find the airbyte/source-whisky-hunter:latest image for this connector. This probably means this version has not yet been published to an accessible docker registry like DockerHub."}}

when running the acceptance tests. Why is it trying to pull latest instead of dev at all? The tests end up working and using dev properly, but it's confusing to show something like this when the tutorials show dev-tagged info only.

(LOW) lack of generation

Why don't the doc pages automatically get generated with TODOs? Why don't the indices / build pages have entries automatically added? Why are some instructions for doc generations like the link in "Connector's bootstrap.md. See description and examples" broken?

It feels like the checklist could be decreased significantly or at least easier to look for TODOs in the PR to resolve.


Hope this is helpful.

@sajarin sajarin added new-connector bounty-XL Maintainer program: claimable extra large bounty PR labels Oct 13, 2022
@sajarin sajarin added internal and removed bounty bounty-XL Maintainer program: claimable extra large bounty PR labels Oct 13, 2022
@marcosmarxm
Copy link
Member

Thanks for the contribution @jrhizor! 💯 the connector implementation looks good I'll test and release later. I also asked someone to take a look in your feedback.

@marcosmarxm marcosmarxm changed the title 🎉 New Source: Whisky Hunter API 🎉 New Source: Whisky Hunter API [low-code CDK] Oct 13, 2022
@jrhizor
Copy link
Contributor Author

jrhizor commented Oct 14, 2022

Hey @sajarin I'm not internal💰

@marcosmarxm marcosmarxm self-assigned this Oct 17, 2022
Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry the delay @jrhizor I'll test/publish this connector tomorrow. Thanks for the contribution. Also I'll split your feedback into issues.

@marcosmarxm
Copy link
Member

marcosmarxm commented Oct 19, 2022

/test connector=connectors/source-whisky-hunter

🕑 connectors/source-whisky-hunter https://github.com/airbytehq/airbyte/actions/runs/3285250117
❌ connectors/source-whisky-hunter https://github.com/airbytehq/airbyte/actions/runs/3285250117
🐛 https://gradle.com/s/pww4fuwgdaric

Build Failed

Test summary info:

	 =========================== short test summary info ============================
	 FAILED unit_tests/test_core.py::test_read[schema2-record2-True] - AssertionEr...
	 FAILED unit_tests/test_core.py::test_read[schema4-record4-True] - AssertionEr...
	 
	 Results (258.59s):
	 �[32m     339 passed�[0m
	 �[31m       2 failed�[0m
	          - �[36m�[0mtest_core.py�[0m:238 �[31mtest_read[schema2-record2-True]�[0m
	          - �[36m�[0mtest_core.py�[0m:238 �[31mtest_read[schema4-record4-True]�[0m
	 /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/source-acceptance-test/.venv/lib/python3.9/site-packages/coverage/control.py:788: CoverageWarning: No data was collected. (no-data-collected)
	   self._warn("No data was collected.", slug="no-data-collected")

> Task :airbyte-integrations:bases:source-acceptance-test:_unitTestCoverage FAILED

Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

See https://docs.gradle.org/7.4/userguide/command_line_interface.html#sec:command_line_warnings

Execution optimizations have been disabled for 1 invalid unit(s) of work during this build to ensure correctness.
Please consult deprecation warnings for more details.
42 actionable tasks: 29 executed, 13 up-to-date

Publishing build scan...
https://gradle.com/s/pww4fuwgdaric

@marcosmarxm
Copy link
Member

marcosmarxm commented Oct 19, 2022

/test connector=bases/source-acceptance-test

🕑 bases/source-acceptance-test https://github.com/airbytehq/airbyte/actions/runs/3285271461
✅ bases/source-acceptance-test https://github.com/airbytehq/airbyte/actions/runs/3285271461
No Python unittests run

Build Passed

Test summary info:

All Passed

@marcosmarxm
Copy link
Member

marcosmarxm commented Oct 21, 2022

/test connector=connectors/source-whisky-hunter

🕑 connectors/source-whisky-hunter https://github.com/airbytehq/airbyte/actions/runs/3294066804
✅ connectors/source-whisky-hunter https://github.com/airbytehq/airbyte/actions/runs/3294066804
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          10      4    60%   15-18
	 source_acceptance_test/config.py                        83      6    93%   78-80, 84-86
	 source_acceptance_test/conftest.py                     164    164     0%   6-282
	 source_acceptance_test/plugin.py                        48     48     0%   6-104
	 source_acceptance_test/tests/test_core.py              329    111    66%   39, 50-58, 63-70, 74-75, 79-80, 164, 202-219, 228-236, 240-245, 251, 284-289, 327-334, 377-379, 382, 442-451, 480-481, 487, 490, 523-533, 546-571, 576-580
	 source_acceptance_test/tests/test_incremental.py       145     20    86%   21-23, 29-31, 36-43, 48-61, 224
	 source_acceptance_test/utils/asserts.py                 37      2    95%   57-58
	 source_acceptance_test/utils/common.py                  77     10    87%   15-16, 24-30, 64, 67
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       112     50    55%   23-26, 32, 36, 39-68, 71-73, 76-78, 81-83, 86-88, 91-93, 96-114, 148-150
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%   30-31, 38, 41, 65-68, 96, 120, 190-192
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1351    451    67%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:60: Skipping TestIncremental.test_two_sequential_reads because not found in the config
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:51: The previous connector image could not be retrieved.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:229: The previous connector image could not be retrieved.
======================== 23 passed, 3 skipped in 35.80s ========================

@marcosmarxm
Copy link
Member

marcosmarxm commented Oct 21, 2022

/publish connector=connectors/source-whisky-hunter

🕑 Publishing the following connectors:
connectors/source-whisky-hunter
https://github.com/airbytehq/airbyte/actions/runs/3294118896


Connector Did it publish? Were definitions generated?
connectors/source-whisky-hunter

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@marcosmarxm marcosmarxm merged commit e3ff75f into airbytehq:master Oct 21, 2022
@marcosmarxm
Copy link
Member

@RealChrisSean a new hacktober contribution

@RealChrisSean
Copy link

WOOOO!

jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
* initial commit for source-whisky-hunter

* better logs

* finish up connector

* fix catalog

* fix bug in connector_runner.py that doesn't allow falsey configs

* add bootstrap.md

* add docs

* fix typo

* fix source-accept test

* auto-bump connector version

Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
@girarda girarda mentioned this pull request Nov 21, 2022
49 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants