Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Linkedin Ads: fix changing next_page_token stopping criteria #34166

Conversation

FVidalCarneiro
Copy link
Contributor

@FVidalCarneiro FVidalCarneiro commented Jan 11, 2024

What

This pull request aims in changing the criteria used by the airbyte Linkedin Ads source connector for when to stop requesting the Linkedin API for more pages. Currently, the criteria is the following - stop requesting a new page when page size is smaller than maximum allowed records per page. We propose to change this criteria to - stop requesting a new page when page size is equal to zero.

Fixed #34164

How

This is done by modifying the next_page_token method in both the LinkedinAdsStream class and the LinkedInAdsAnalyticsStream.

Recommended reading order

  1. streams.py

🚨 User Impact 🚨

The source connector will behave as previously expected, no breaking changes. Given that an extra API call is made per account, this might slightly impact source connector performance, but this difference is negligible. The version change should therefore only be of type patch, rendering the connector version therefore equal to 0.6.5.

Pre-merge Actions

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Unit & integration tests added

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.

@octavia-squidington-iii octavia-squidington-iii added the area/connectors Connector related issues label Jan 11, 2024
Copy link

vercel bot commented Jan 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 23, 2024 7:45am

Copy link
Contributor

github-actions bot commented Jan 11, 2024

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@CLAassistant
Copy link

CLAassistant commented Jan 11, 2024

CLA assistant check
All committers have signed the CLA.

@FVidalCarneiro
Copy link
Contributor Author

This PR solves related issue #34164

Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @FVidalCarneiro the issue you opened made more clear about the proposition you're suggesting. The change will impact in the connector behavior to probably always retrieve the N+1 page (the page doesn't exist) in most cases. What is the API response to that case? Can you confirm it won't return a 404 error?

Meanwhile I requested to the connector team take a look into your change

@FVidalCarneiro
Copy link
Contributor Author

FVidalCarneiro commented Jan 19, 2024

Thank you @marcosmarxm for your review.

Indeed, with the suggested implementation, we will always obtain the N+1 page, where N is the last page of the object. As you can see in the screenshot below, the API responds with an empty page and status code 200:

last-page-ok

This means the current suggested implementation works.

If we wanted to avoid this "useless" call (but required when there is a broken creative, please read initial description of this PR), there is another solution which would be to add the condition parsed_response.get("paging")["start"] + self.records_limit > parsed_response.get("paging")["total"] to the end pagination criteria making the full condition equal to:
if len(parsed_response.get("elements")) < self.records_limit and (parsed_response.get("paging")["start"] + self.records_limit > parsed_response.get("paging")["total"]):. In this case we would maintain the original condition, but refine it by adding a condition which confirms that N is indeed the last page as the next page would request more elements than there is total. We could likewise only apply the condition parsed_response.get("paging")["start"] + self.records_limit > parsed_response.get("paging")["total"] which would maintain the current connector behavior and avoid the issue we identified.

Even though the current suggested implementation would work, this last suggestion would also protect the connector against a potential future Linkedin API update, where a non existing page responds with 404 instead of 200 HTTP code.

If you (and/or the connector team) give me your approval, I can add this last suggested implementation (bearing in mind that the downside is that it requires more parsing of the response elements). Do you have a preference on one of these two solutions ?

Copy link
Collaborator

@lazebnyi lazebnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @FVidalCarneiro
Thanks for your PR. Let's leave len(parsed_response.get("elements")) < self.records_limit and (parsed_response.get("paging")["start"] + self.records_limit > parsed_response.get("paging")["total"]) as condition for last page. And add comment with link to issue why we need additional check

@FVidalCarneiro
Copy link
Contributor Author

Following the feedback of the engineering team available here, I have decided to complement the pagination ending condition with a condition that ensures, based on the total record count field, that there are no other pages to request. This will both:

  1. Allow to manage broken creatives that are missing from paginated responses (page responding with 99 records instead of 100).
  2. Avoid sending a request to an empty page (the last request will be to the last page).

Ready for review @lazebnyi , many thanks !

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Feb 9, 2024
@FVidalCarneiro
Copy link
Contributor Author

Hi @lazebnyi , I tried to follow the provided documentation to grant edit access to maintainers but I do not think this is possible given that my PR is coming from an organization GitHub project. I have found this issue describes this problem.

What is the recommended course of action ? Thanks !

@lazebnyi
Copy link
Collaborator

lazebnyi commented Feb 9, 2024

@FVidalCarneiro Can you pull changes from #35046 to your branch?

@FVidalCarneiro
Copy link
Contributor Author

I merged your changes to this branch @lazebnyi , thank you. Let me know if it is ok on your end ?

@FVidalCarneiro
Copy link
Contributor Author

Hi @lazebnyi , noticed recently there was a little merge conflict to be solved on docs/integrations/sources/linkedin-ads.md, should be good now. Let me know if there are any more actions on my side. Thank you for the support !

@FVidalCarneiro FVidalCarneiro requested review from a team as code owners February 27, 2024 10:12
@marcosmarxm
Copy link
Member

It was merged in #37421

@FVidalCarneiro FVidalCarneiro mentioned this pull request Jul 8, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
7 participants