Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New Source: Newsdata [low-code CDK] #18576

Merged
merged 13 commits into from Nov 10, 2022
Merged

🎉 New Source: Newsdata [low-code CDK] #18576

merged 13 commits into from Nov 10, 2022

Conversation

Xabilahu
Copy link
Contributor

What

New Source: Newsdata API. newsdata.io

Screenshot from 2022-10-28 01-06-36

Notes

I'm using a free API key, click here to get one for CI integration tests.

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Screenshot from 2022-10-28 01-13-59

@Xabilahu
Copy link
Contributor Author

secrets/config.json being used for integ tests is:

{
  "api_key": "XXX",
  "country": ["fr"],
  "category": ["health"],
  "language": ["en"]
}

@marcosmarxm
Copy link
Member

Thanks for the contribution @Xabilahu !

Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

Comment on lines 17 to 18
paginator:
type: NoPagination
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xabilahu can you check the pagination for this connector?
image

type: "CursorPagination"
cursor_value: "{{ response['nextPage'] }}"
page_size: 10
# page_size: !!int "{% if config['is_free_tier'] %}{{ 10 }}{% else %}{{ 50 }}{% endif %}"
Copy link
Contributor Author

@Xabilahu Xabilahu Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API sets the page_size dependent on the subscription you have.

  • Free tier gets page_size = 10
  • Any paid tier gets page_size = 50

The jinja interpolation works, but in this case, as it interpolates after !!int is executed, it fails to convert the string to an int. We need to cast it to int, as it is the type of page_size property. Any hint?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@girarda any advice here?

Copy link
Contributor

@girarda girarda Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest setting the page_size in the spec instead of the is_free_tier field. I'll create an issue for the pagination strategy to handle dynamic page_size

issue: https://github.com/airbytehq/airbyte/issues/18783

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@girarda, but wouldn't we still need string interpolation if handled in the spec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right. There's no way to dynamically set the page size today

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey, so for now I will remove is_free_tier from spec.yaml, and set the page_size = 10, so that this connector works for both, free and paid tiers.

We will leave the dynamic page_size for a follow up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xabilahu agree.

Comment on lines +7 to +13
base_requester:
url_base: "https://newsdata.io/api/1"
http_method: "GET"
authenticator:
type: ApiKeyAuthenticator
header: "X-ACCESS-KEY"
api_token: "{{ config['api_key'] }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the API documentation:

To obtain an API key you have to create an account. You get a free API key while you are in development. When making REST API requests, your API key has to be provided as a parameter in the URL (e.g. apikey=YOUR_API_KEY).

I tried to read data but no records were read.

{"type": "LOG", "log": {"level": "INFO", "message": "Read 0 records from latest stream"}}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the authentication.

Copy link
Contributor Author

@Xabilahu Xabilahu Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the request would not have been authenticated, you would have gotten a 401 - Unauthorized response.

From Newsdata API documentation:

Your Newsdata.io API Key. Alternatively you can provide this via the X-ACCESS-KEY HTTP header.

Could you share your config.json?

Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xabilahu reading the API documentation I saw one endpoint News Archive where you can read all news from a specific date. This can be really valuable to users and you can implement a Incremental sync.

@Xabilahu
Copy link
Contributor Author

Xabilahu commented Nov 2, 2022

@Xabilahu reading the API documentation I saw one endpoint News Archive where you can read all news from a specific date. This can be really valuable to users and you can implement a Incremental sync.

After taking a look, I remembered why I didn't implement support for that endpoint. It is because it is only accessible to paid tiers, which I don't have access to, and I wouldn't be able to test my implementation :( Same happens for the Crypto News endpoint...

Should I implement them anyways?

@marcosmarxm
Copy link
Member

Not a problem!

Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Xabilahu I'll publish later this week.

@marcosmarxm
Copy link
Member

marcosmarxm commented Nov 4, 2022

Hello! Your PR is approved but didn't have the time to publish and merge it this week. As you can check in Chris' comment all PRs submitted before 2-nov are eligible to win the prize. I'll be out of the office on Friday and return Monday to start publishing your contribution. Any question you can send a message in #hacktoberfest-2022 channel in Airbyte's Slack.

Have a good weekend and thank you for this amazing contribution for Hacktoberfest 🎉

@Xabilahu
Copy link
Contributor Author

Xabilahu commented Nov 4, 2022

Hello! Your PR is approved but didn't have the time to publish and merge it this week. As you can check in Chris' comment all PRs submitted before 2-nov are eligible to win the prize. I'll be out of the office on Friday and return Monday to start publish it. Any question you can send a message in #hacktoberfest-2022 in Airbyte's Slack.

Have a good weekend and thank you for this amazing contribution for Hacktoberfest tada

More than happy to contribute ❤️ Have a nice weekend! 🥳

@marcosmarxm
Copy link
Member

marcosmarxm commented Nov 10, 2022

/test connector=connectors/source-newsdata

🕑 connectors/source-newsdata https://github.com/airbytehq/airbyte/actions/runs/3439969771
✅ connectors/source-newsdata https://github.com/airbytehq/airbyte/actions/runs/3439969771
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       139      5    96%   87, 93, 235, 239-240
	 source_acceptance_test/conftest.py                     196     92    53%   35, 41-43, 48, 54, 60, 66, 72-74, 93, 98-100, 106-108, 114-115, 120-121, 126, 132, 141-150, 156-161, 176, 200, 231, 237, 243-248, 256-261, 269-282, 287-293, 300-311, 318-334
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              345    110    68%   53, 64-72, 77-84, 88-89, 93-94, 178, 216-233, 242-250, 254-259, 265, 298-303, 341-348, 391-393, 396, 461-469, 481-484, 489, 545-546, 552, 555, 591-601, 614-639
	 source_acceptance_test/tests/test_incremental.py       158     14    91%   52-59, 64-77, 240
	 source_acceptance_test/utils/asserts.py                 37      2    95%   57-58
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       112     50    55%   23-26, 32, 36, 39-68, 71-73, 76-78, 81-83, 86-88, 91-93, 96-114, 148-150
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%   30-31, 38, 41, 65-68, 96, 120, 190-192
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1508    348    77%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: This connector does not implement incremental sync
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:65: The previous connector image could not be retrieved.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:243: The previous connector image could not be retrieved.
======================== 24 passed, 3 skipped in 24.65s ========================

@marcosmarxm
Copy link
Member

marcosmarxm commented Nov 10, 2022

/publish connector=connectors/source-newsdata

🕑 Publishing the following connectors:
connectors/source-newsdata
https://github.com/airbytehq/airbyte/actions/runs/3440066583


Connector Did it publish? Were definitions generated?
connectors/source-newsdata

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@marcosmarxm marcosmarxm merged commit c2cd11d into airbytehq:master Nov 10, 2022
@Xabilahu Xabilahu deleted the newsdata branch November 10, 2022 20:29
@Xabilahu
Copy link
Contributor Author

Thanks for all the help @marcosmarxm 🥳😁

akashkulk pushed a commit that referenced this pull request Dec 2, 2022
* Initial Newsdata source connector implementation

* Update docs, source_definitions and upload icon

* Set single input for `sources` stream

* Remove unused definition

* Update `spec.yaml` config

* Support Pagination

* Remove `is_free_tier`, because as of now, dynamic `page_size` is not supported

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

6 participants