Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New Source: Gutendex API [low-code CDK] #18075

Merged
merged 9 commits into from
Oct 20, 2022

Conversation

pabloescoder
Copy link
Collaborator

@pabloescoder pabloescoder commented Oct 17, 2022

What

How

Using the low-code CDK.

Recommended reading order

  1. docs\integrations\sources\gutendex.md
  2. airbyte-integrations\connectors\source-gutendex\source_gutendex\spec.yaml
  3. airbyte-integrations\connectors\source-gutendex\source_gutendex\gutendex.yaml
  4. airbyte-integrations\connectors\source-gutendex\source_gutendex\schemas\results.json

🚨 User Impact 🚨

None. No breaking changes.

Pre-merge Checklist

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret - No secrets/authorization required
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Tests

Unit
Integration
Acceptance

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Oct 17, 2022
Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some commments! thanks for the contribution

docs/integrations/sources/gutendex.md Show resolved Hide resolved
Comment on lines +46 to +53
sort:
type: string
description: (Optional) Use this to sort books - ascending for Project Gutenberg ID numbers from lowest to highest, descending for IDs highest to lowest, or popular (the default) for most popular to least popular by number of downloads.
pattern: ^(ascending|descending|popular)$
examples:
- ascending
- descending
- popular
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is a parameter in the configuration and not used to make the connector incremental?

Copy link
Collaborator Author

@pabloescoder pabloescoder Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort is a query parameter that is used to sort the results. I'm not exactly sure how that can be used to make the connector incremental.

After going through the API response, what I noticed is that it only returns a maximum of 32 books per response (even though the total books are more than that) and for the next 32 records, we need to make a request with a query param of page=2 and so on.

So, in order to read data about all the books, I implemented pagination by using a DefaultPaginator to read each and every record, instead of just the first 32 records.

Attaching a video demonstrating the same (the parameters I chose in config.json returned 147 books - I purposely formed a query that returned a small number of books to make the video short - and all the 147 records were read. Similarly, all the records are read even if the query returns 50000+ books - thanks to DefaultPaginator)

Gutendex.Controller.with.Pagination.mp4

@marcosmarxm
Copy link
Member

marcosmarxm commented Oct 20, 2022

/test connector=connectors/source-gutendex

🕑 connectors/source-gutendex https://github.com/airbytehq/airbyte/actions/runs/3292255627
✅ connectors/source-gutendex https://github.com/airbytehq/airbyte/actions/runs/3292255627
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          10      4    60%   15-18
	 source_acceptance_test/config.py                        83      6    93%   78-80, 84-86
	 source_acceptance_test/conftest.py                     164    164     0%   6-282
	 source_acceptance_test/plugin.py                        48     48     0%   6-104
	 source_acceptance_test/tests/test_core.py              329    111    66%   39, 50-58, 63-70, 74-75, 79-80, 164, 202-219, 228-236, 240-245, 251, 284-289, 327-334, 374-376, 379, 439-448, 477-478, 484, 487, 520-530, 543-568, 573-577
	 source_acceptance_test/tests/test_incremental.py       145     20    86%   21-23, 29-31, 36-43, 48-61, 224
	 source_acceptance_test/utils/asserts.py                 37      2    95%   57-58
	 source_acceptance_test/utils/common.py                  77     10    87%   15-16, 24-30, 64, 67
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       112     50    55%   23-26, 32, 36, 39-67, 70-72, 75-77, 80-82, 85-87, 90-92, 95-113, 147-149
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%   30-31, 38, 41, 65-68, 96, 120, 190-192
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1351    451    67%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:60: Skipping TestIncremental.test_two_sequential_reads because not found in the config
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:51: The previous connector image could not be retrieved.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:229: The previous connector image could not be retrieved.
================== 24 passed, 3 skipped in 171.70s (0:02:51) ===================

@marcosmarxm
Copy link
Member

marcosmarxm commented Oct 20, 2022

/publish connector=connectors/source-gutendex

🕑 Publishing the following connectors:
connectors/source-gutendex
https://github.com/airbytehq/airbyte/actions/runs/3292509810


Connector Did it publish? Were definitions generated?
connectors/source-gutendex

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@marcosmarxm
Copy link
Member

@RealChrisSean another one :)

@RealChrisSean
Copy link

WOOO congrats!

jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
* Initial commit for source-gutendex

* Clean up unnecessary code

* solve conflict

* Fix schema, Rename results_stream to books_stream

* Add parameters in gutendex.md, change doc url to official Airbyte url

* Add pagination, read each and every record instead of just 32

* Add bootstrap.md, add badge in builds.md

* add source definition to airbyte-config

* auto-bump connector version

Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants