Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Source Salesforce: add optional parameters to filter properties within streams #9931

Closed
wants to merge 6 commits into from

Conversation

jkaelin
Copy link

@jkaelin jkaelin commented Jan 31, 2022

What

Allows a user to exclude specific fields and/or data-types from the stream.

Current Issues

There are at least a few issues directly requesting the generic ability to exclude specific fields from streams:

I acknowledge that this PR is specific to the Salesforce Source and does not fully address the above issues.

General Use Cases

Some use cases that may be faciliated by this PR:

  • Exclude sensitive fields or data types from streams in order to comply with legal/privacy regulations
  • Exclude fields or data types which may prohibit the stream from syncing successfully
  • Exclude datatypes which may invoke the fallback to the non-Bulk API (e.g. base64/object)

Specific Use Case

The immediate use case being solved for with this implementation is the ability to exclude certain calculated fields within Salesforce objects that may generate exceptions on evaluation. Although these exceptions are effectively masked in the UI, they will cause the Salesforce Bulk API to fail with only a generic error message.

How

  • Add two optional string array parameters to the spec
  • Inject evaluation of these user exclusions against the fields returned by generate_schema

Recommended reading order

  1. spec.json
    • Added two optional parameters exclude_fields & exclude_types
  2. source.py
    • get_user_excluded_fields & get_user_excluded_types
      • Added two helper functions to evaluate the above parameters
    • generate_streams
      • Pass parameters on to api.py:generate_schema
  3. api.py
    • generate_schema
      • add optional parameters for user exclusions
      • exclude fields from schema based on optional parameters
  4. unit_test.py
    • test_bulk_sync_pagination
      • No functional changes
      • Forced to add the #noqa E203 directive due to conflict between black & flake8 on line 143
    • test_schema_with_user_excluded_fields
      • parameterized unit test for named fields
    • test_schema_with_user_excluded_types
      • parameterized unit test for data types

User Impact

  • Users will see two new optional parameters on the New Source page
    • Fields to Exclude
      msedge_EY1gpJuSHa
    • Data Types to Exclude
      msedge_zwrrmF7FuN
  • Sync logging will explicitly list the fields which are being skipped

Pre-merge Checklist

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions) -- will do
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

@CLAassistant
Copy link

CLAassistant commented Jan 31, 2022

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Jan 31, 2022
@harshithmullapudi
Copy link
Contributor

Hey, thanks for the contribution can you sourceAcceptanceTests screenshot?

@jkaelin
Copy link
Author

jkaelin commented Feb 4, 2022

@harshithmullapudi
Is this what you need for the sourceAcceptanceTests?
image

@jkaelin
Copy link
Author

jkaelin commented Feb 4, 2022

Question for maintainers
The pre-commit hooks often formats/flags code that I didn't write.
Is it correct behavior to submit these changes with the PR?
I had to use the noqa directive due to conflicting black/flake behavior.
Is that the correct way to resolve this type of issue?

@marcosmarxm
Copy link
Member

There is some conflicts because other code was merged, can you solved it @jkaelin ?

@jkaelin
Copy link
Author

jkaelin commented Feb 12, 2022

@marcosmarxm made the changes you requested & rebased to latest.

@jkaelin
Copy link
Author

jkaelin commented Feb 15, 2022

@marcosmarxm anything else you need me to address here?

@marcosmarxm
Copy link
Member

I'll test your code tomorrow @jkaelin !

@alafanechere alafanechere self-assigned this Feb 24, 2022
@alafanechere alafanechere mentioned this pull request Feb 24, 2022
@alafanechere alafanechere temporarily deployed to more-secrets February 24, 2022 14:21 Inactive
@alafanechere alafanechere temporarily deployed to more-secrets February 24, 2022 15:16 Inactive
@alafanechere alafanechere temporarily deployed to more-secrets February 24, 2022 15:16 Inactive
@alafanechere alafanechere temporarily deployed to more-secrets February 24, 2022 15:16 Inactive
@alafanechere alafanechere temporarily deployed to more-secrets February 24, 2022 15:17 Inactive
Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution! I slightly changed the custom exception you tried to raise, which broke the test.
@misteryeo / @sherifnada, please give this a quick review as the spec.json was updated: two new optional fields.

try:
exclude_stream_name, exclude_field_name = stream_field.split(".")
except ValueError:
raise f"Format for excluded field [{stream_field}] is incorrect. Expected format is Stream.Field"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise f"Format for excluded field [{stream_field}] is incorrect. Expected format is Stream.Field"
raise Exception(f"Format for excluded field [{stream_field}] is incorrect. Expected format is Stream.Field")

This leads to TypeError: exceptions must derive from BaseException, please raise a valid Exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change in 8092db2

Comment on lines 41 to 42
if config.get("exclude_fields") and stream_name is not None:
for stream_field in config["exclude_fields"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if config.get("exclude_fields") and stream_name is not None:
for stream_field in config["exclude_fields"]:
if stream_name:
for stream_field in config.get("exclude_fields",[]):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change in 8092db2

Copy link
Contributor

@misteryeo misteryeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@misteryeo
Copy link
Contributor

Don't think this is blocking but I'm curious why the implementation here is just a field and not the variable input that we currently have for "Streams Filter Criteria" https://share.getcloudapp.com/P8u6njow

@jkaelin
Copy link
Author

jkaelin commented Feb 28, 2022

@misteryeo - I just wanted to address your question directly based on my experience as a user. I did implement this feature using the compound-type input initially but the management was overly cumbersome. The main issue I have with that style input is the lack of a meaningful preview (e.g. seeing 0 as the label).

Having three of these compound-type inputs takes up a lot of screen real estate to effectively show a count to the user. If it offered some type of auto-complete or dynamic drop-down schema discovery then I'd be all over it.

I started working on a spec extension to drive some ability to define how labels should be rendered. But when I saw this array implementation, it looked like an ideal UX for these parameters. The pattern validation attribute ensures the input matches the expected format which should offer the same benefits that the compound-style input would provide.

Here's an example from one of my sources to illustrate the usefulness and accessibility of these fields:
image

@misteryeo
Copy link
Contributor

Thanks, appreciate the insight @jkaelin - we'll definitely look to improve the way the variable input module works. cc: @timroes

@alafanechere
Copy link
Contributor

Hey @jkaelin an other PR was merged on this connector, leading to multiple conflicts, do you mind fixing these?

@alafanechere
Copy link
Contributor

Hi @jkaelin, we closed this PR as we did not receive updates following our latest interactions. You are welcome to re-open it as soon as you want. We’ll be happy to give additional reviews. Do not hesitate to reach our team on Slack if you need support finishing this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation community
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants