Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentry: Obfuscation and dealing with sensitive data #7958

Closed
Tracked by #7956
keu opened this issue Nov 14, 2021 · 4 comments · Fixed by #8248
Closed
Tracked by #7956

Sentry: Obfuscation and dealing with sensitive data #7958

keu opened this issue Nov 14, 2021 · 4 comments · Fixed by #8248
Labels
area/connectors Connector related issues area/reliability type/enhancement New feature or request

Comments

@keu
Copy link
Contributor

keu commented Nov 14, 2021

Here we explore all options that Sentry provides for obfuscation and dealing with sensitive data

@keu keu added the type/enhancement New feature or request label Nov 14, 2021
@keu keu mentioned this issue Nov 14, 2021
4 tasks
@sherifnada sherifnada added area/connectors Connector related issues area/reliability labels Nov 15, 2021
@sherifnada sherifnada added this to the Connectors Nov 26 2021 milestone Nov 15, 2021
@avida avida self-assigned this Nov 24, 2021
@avida
Copy link
Contributor

avida commented Nov 24, 2021

PII Data filtering

This report cover investigationg of PII data sent over sentry.

Sentry could expose PII data in following fields:

  • Set directly by application with context/tags/log messages
  • captured by default integrations
  • Frame variables captured by stacktrace for unhandled exception
  • Transaction's span name/description/context

Sensitive information could be handled on two sides:

  • Scrubbing data on client side before sending to sentry.io server
  • Server side data scrubbing after sending data on sentry.io

Also there is an option of sending data to middle layer prior sending it to Sentry

Client side scrubbing

Using hooks

Client side data scrubbling could be implemented by setting before_send and before_breadcrumb hooks in init method:

def before_send(event, hint):
    event["sensitive-info"] = None
    return event

def before_breadcrumb(event, hint):
    event["sensitive-info"] = None
    return event

sentry_sdk.init(before_send = before_send,
                before_breadcrumb = before_breadcrumb)

before_send hook is called before sending event to sentry server. It allow appply PII data filtering before sending event. Event could be message, handled/unhanled exception or transaction data.

before_breadcrumb called before adding breadcrumb to the scope. Breadcrumb is event that comes with exception to give a clue on preceding events.

Disabling default integrations

Some events (like http request or logging) could be sent automatically by default integration. We can disable or override default integrations to minimazie data sending to Sentry.

Server side scrubbing

By default Sentry server applies Server side scrubbing based on regular expression and field name containing possible secrets (password, key, access_token etc.). Look "Example" section for details.

More Details on default server side scrubbing: https://docs.sentry.io/product/data-management-settings/scrubbing/server-side-scrubbing/

Also server side scrubbing could be tuned to apply against regular expression, filter out emails, PEM keys, IP addresses, SSN and so on (Details:https://docs.sentry.io/product/data-management-settings/scrubbing/advanced-datascrubbing/ )

Example

Here is some examples of Servier side scrubbing for running sentry on existing connectors (no client side filtering were applied):

asana-headers-pii
Scrubbing Authorization header for context data (asana source)
iterable-integration-pii
Filtering url from integration's breadcrumb and context (iterable source)
iterable-no-pii
No PII filtering for api key inside url for transaction event (iterable source)
pipedrive-sampling-no-pii
No PII filtering for api key and email inside url for transaction event (pipedrive source)
surveymonkey-config-pii
Filtering out client_secret and access_token from frame variables of stacktrace (surveymonkey source)

@sherifnada
Copy link
Contributor

sherifnada commented Dec 3, 2021

@avida I see that Sentry does server side filtering, but in my opinion this is not sufficiently safe. As an Airbyte user, I would be pretty shocked if Airbyte sent my API keys to a 3rd party and let them handle the filtering. I would require that secrets are not sent at all. What are our options for implementing that? It seems like it might be difficult since each source would have to implement this in a custom way?

Could the next step here be to demo this for one of the connectors you showed in the example e.g: Iterable?

How could we be 100% certain that the filtering is working correctly?

@avida
Copy link
Contributor

avida commented Dec 10, 2021

@sherifnada Ive updated sentry PR with client side sensitive data scrubbing. It implemented on CDK level and works for each connector.
We can do a demo on todays sync call.

How could we be 100% certain that the filtering is working correctly?

Ive tried to come up with unittests overriding http transport and cover every possible case of possible sensitive data transfer (including transaction, contexts, different events and integrations). Looks like everything works fine.

@sherifnada
Copy link
Contributor

demo sounds great! Let's do it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/reliability type/enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants