Skip to content

Conversation

@Amruta-Ranade
Copy link
Contributor

@Amruta-Ranade Amruta-Ranade commented Oct 19, 2020

Closes #7490 and #8395

Documented the --redact-logs flag for debug-zip and debug-merge-logs commands. Did not document the redactable-logs flag for cockroach start, cockroach start-single-node, and cockroach demo because the flag is now enabled by default and no longer needs user input.

Added an example to redact logs and the sample output. Not sure if we need to elaborate on the logging format - IMO, it needs to be a follow-up docs project that's bigger than just redacted logs.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Amruta-Ranade Amruta-Ranade requested a review from knz October 19, 2020 18:43
@cockroach-teamcity
Copy link
Member

@cockroach-teamcity
Copy link
Member

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Amruta-Ranade)


v20.2/cockroach-debug-merge-logs.md, line 34 at r2 (raw file):

`--from` | Start time for the time range filter.
`--to` | End time for the time range filter.
`--redact-logs` | Redact sensitive PII data from the log files. Note that this flag removes sensitive information only from the log files. The other items (listed ablve) collected by the `debug zip` command may still contain sensitive information.
  1. I think we want to remove the abbreviation "PII". Back when we implemented this we did not fully understand that customers only care about "sensitive" data. Whether PII is sensitive or not depends on the application, and it's not a relevant distinction from CRL's perspective.

  2. in any case the phrase "sensitive data" is not an industry standard and thus needs to be defined. I would recommend that you create a callout include with the definition, and include it from here and the other place.

Copy link
Contributor Author

@Amruta-Ranade Amruta-Ranade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


v20.2/cockroach-debug-merge-logs.md, line 34 at r2 (raw file):

Previously, knz (kena) wrote…
  1. I think we want to remove the abbreviation "PII". Back when we implemented this we did not fully understand that customers only care about "sensitive" data. Whether PII is sensitive or not depends on the application, and it's not a relevant distinction from CRL's perspective.

  2. in any case the phrase "sensitive data" is not an industry standard and thus needs to be defined. I would recommend that you create a callout include with the definition, and include it from here and the other place.

I agree. How do we define sensitive data? Do we have a writeup about it somewhere?

@jseldess
Copy link
Contributor

@Amruta-Ranade, I think we need to do more here than just document the debug flags. If people know to look at those commands, this is great, but how can we expose this functionality to users who might not know it exists? Please think about this. There are a few places where we describe logging options. We may also want to mention this ability in the context of security in a place or two.

@knz
Copy link
Contributor

knz commented Oct 24, 2020

I agree. How do we define sensitive data? Do we have a writeup about it somewhere?

We say that information is "sensitive" (also called "unsafe") when it does not fit clearly into the definition of “safe for reporting”. ("reporting" as in "can be reported automatically via telemetry to CRL")

So the right way to explain this is to start by defining what we consider “safe” then explain that everything else is considered unsafe/sensitive.

What is safe:

  • text strings and messages that are encoded inside the cockroachdb source code or its dependencies. For example, the fixed portion of a log or error message which is always the same regardless of cluster so is considered safe.

  • the value of numeric variables, durations and certain other data types inside the control code of CockroachDB (for example, a range ID is considered safe) but excluding logical SQL data from these types (e.g. a number stored in a SQL table is not considered safe).

  • certain text strings composed from a mix of cockroachdb-internal data and customer data, when these combinations have been vetted by the cockroachdb team explicitly to be “safe”, using the following criteria:

    • it does not reveal logical data stored in a user's cluster (i.e. SQL data is never safe)
    • it cannot be used to de-anonymize a user (i.e. command-line arguments, IP addresses and table/schema names are not safe)

    An example such mix is the numeric ID of a database or table. This has been designated as safe: it is part of the customer's metadata but it does not reveal the logical data stored in the databases/tables, nor does it help identify customers from each other.

Except for these specific items which are designated as safe, everything else is assumed to be unsafe/sensitive. This includes (but not exclusively):

  • stored values
  • the text of SQL statements, especially the values embedded therein
  • result rows
  • the dynamic part of error messages that includes applicaiton-provided parameters.
  • IP addresses or hostnames
  • database, schema, table or column names
  • cluster IDs
  • file names of stored data or log files
  • user/role names

@Amruta-Ranade
Copy link
Contributor Author

@Amruta-Ranade, I think we need to do more here than just document the debug flags. If people know to look at those commands, this is great, but how can we expose this functionality to users who might not know it exists? Please think about this. There are a few places where we describe logging options. We may also want to mention this ability in the context of security in a place or two.

I don't think this feature affects any other docs. It redacts sensitive information only from the output of cockroach debug zip but not from the logs written on disk using the logging flags. I searched through the docs and didn't find instances where we ask users to run cockroach debug zip. Let me know if you think I missed any :)

@jseldess
Copy link
Contributor

@Amruta-Ranade, that make sense. But do you think we should mention this feature in places like https://www.cockroachlabs.com/docs/v20.1/debug-and-error-logs.html? Does it apply to sql logs (https://www.cockroachlabs.com/docs/v20.1/sql-faqs.html#how-do-i-log-sql-queries)?

@Amruta-Ranade
Copy link
Contributor Author

Amruta-Ranade commented Oct 26, 2020

Yes, I updated the debug-and-errors-log doc. SQL logs and hence the SQL FAQs doc are unaffected by this change.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: with two nits

Reviewed 1 of 3 files at r3, 1 of 2 files at r4, 1 of 1 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @Amruta-Ranade and @knz)


v20.2/cockroach-debug-merge-logs.md, line 12 at r5 (raw file):

{{site.data.alerts.callout_danger}}
The file produced by `cockroach debug zip` can contain highly [sensitive, unanonymized information](debug-and-error-logs.html#redacted-logs), such as usernames, hashed passwords, and possibly your table's data. You can use the [`redact`](#example) flag to redact the sensitive data out of log files and crash reports before sharing them with Cockroach Labs.

I believe the common word for "unanomymized" is "identifiable" or "nominal".


v20.2/cockroach-debug-merge-logs.md, line 34 at r5 (raw file):

`--from` | Start time for the time range filter.
`--to` | End time for the time range filter.
`--redact` | Redact [sensitive data](debug-and-error-logs.html#redacted-logs) from the log files. Note that this flag removes sensitive information only from the log files. The other items (listed above) collected by the `debug zip` command may still contain sensitive information.

in the command cockroach debug merge-log, there is no need to refer to "only log files" or debug zip. It's a separate command.


v20.2/debug-and-error-logs.md, line 117 at r5 (raw file):

## Redacted logs

If you contact CockroachDB Support for troubleshooting help, you might be asked to run [`cockroach debug zip`](cockroach-debug-zip.html) and share the resulting file with the CockroachDB team. The log files created by `cockroach debug zip` may contain highly sensitive, unanonymized information, such as usernames, hashed passwords, and possibly your table's data.

see my comment above about the word "unanonymized"

Copy link
Contributor Author

@Amruta-Ranade Amruta-Ranade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @knz)


v20.2/cockroach-debug-merge-logs.md, line 34 at r2 (raw file):

Previously, Amruta-Ranade (Amruta Ranade) wrote…

I agree. How do we define sensitive data? Do we have a writeup about it somewhere?

Done.


v20.2/cockroach-debug-merge-logs.md, line 12 at r5 (raw file):

Previously, knz (kena) wrote…

I believe the common word for "unanomymized" is "identifiable" or "nominal".

Done.


v20.2/cockroach-debug-merge-logs.md, line 34 at r5 (raw file):

Previously, knz (kena) wrote…

in the command cockroach debug merge-log, there is no need to refer to "only log files" or debug zip. It's a separate command.

Done.


v20.2/debug-and-error-logs.md, line 117 at r5 (raw file):

Previously, knz (kena) wrote…

see my comment above about the word "unanonymized"

Done.

Copy link
Contributor

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple suggestions!


{{site.data.alerts.callout_danger}}
The file produced by `cockroach debug merge-log` can contain highly sensitive, unanonymized information, such as usernames, passwords, and possibly your table's data. You should share this data only with Cockroach Labs developers and only after determining the most secure method of delivery.
The file produced by `cockroach debug zip` can contain highly [sensitive, identifiable information](debug-and-error-logs.html#redacted-logs), such as usernames, hashed passwords, and possibly your table's data. You can use the [`redact`](#example) flag to redact the sensitive data out of log files and crash reports before sharing them with Cockroach Labs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest calling this "the --redact flag"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


{{site.data.alerts.callout_danger}}
The file produced by `cockroach debug zip` can contain highly sensitive, unanonymized information, such as usernames, hashed passwords, and possibly your table's data. You should share this data only with Cockroach Labs developers and only after determining the most secure method of delivery.
The file produced by `cockroach debug zip` can contain highly [sensitive, identifiable information](debug-and-error-logs.html#redacted-logs), such as usernames, hashed passwords, and possibly your table's data. You can use the [`redact-logs`](#redact-sensitive-information-from-the-logs) flag to redact the sensitive data out of log files and crash reports before sharing them with Cockroach Labs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest calling this "the --redact-logs flag" (easier to search for)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


## Redacted logs

If you contact CockroachDB Support for troubleshooting help, you might be asked to run [`cockroach debug zip`](cockroach-debug-zip.html) and share the resulting file with the CockroachDB team. The log files created by `cockroach debug zip` may contain highly sensitive, unanonymized information, such as usernames, hashed passwords, and possibly your table's data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest working in the phrase "personally identifiable information (PII)" since that is a standard term that will also be searched for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raphael recommended not using the term (PII) and saying "sensitive, unanonymized information" instead. I agree with you though -- IMO, PII is a standard term that should feature somewhere in the doc. cc @knz for approval.


<span class="version-tag">New in v20.2</span> You can run `cockroach debug zip` with the [`redact-logs` flag](cockroach-debug-zip.html#redact-sensitive-information-from-the-logs) to redact the sensitive data out of log files and crash reports before sharing them with Cockroach Labs. Redactable sensitive data includes but is not limited to:

- Stored values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better yet, the PII term could go here

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @Amruta-Ranade and @rmloveland)


v20.2/debug-and-error-logs.md, line 117 at r6 (raw file):

Previously, Amruta-Ranade (Amruta Ranade) wrote…

Raphael recommended not using the term (PII) and saying "sensitive, unanonymized information" instead. I agree with you though -- IMO, PII is a standard term that should feature somewhere in the doc. cc @knz for approval.

  1. I wrote in my comments above that the word "unanonymized" is a strange word. For one, I can't find it in any online dictionary. I would encourage you to use a word that actually exists. I made a few suggestions earlier, maybe you have a better opinion.

  2. Regarding the phrase "PII" or "personally identifiable information". This is only one kind of sensitive data. For example, our customer's IP address if often not PII, but it is most definitely sensitive.

So you may wish to write "sensitive information, included but not limited to PII" or something to that effect.

@Amruta-Ranade
Copy link
Contributor Author

Reviewed 2 of 2 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @Amruta-Ranade and @rmloveland)

v20.2/debug-and-error-logs.md, line 117 at r6 (raw file):

Previously, Amruta-Ranade (Amruta Ranade) wrote…

  1. I wrote in my comments above that the word "unanonymized" is a strange word. For one, I can't find it in any online dictionary. I would encourage you to use a word that actually exists. I made a few suggestions earlier, maybe you have a better opinion.
  2. Regarding the phrase "PII" or "personally identifiable information". This is only one kind of sensitive data. For example, our customer's IP address if often not PII, but it is most definitely sensitive.

So you may wish to write "sensitive information, included but not limited to PII" or something to that effect.

Hmm.. I did change it to "sensitive, identifiable information" in the other files, but forgot to fix it in the debug-and-error-logs file. Fixed.

@Amruta-Ranade Amruta-Ranade merged commit a60c7c5 into master Nov 2, 2020
@lnhsingh lnhsingh deleted the redactable_logs branch June 23, 2022 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

document redactable logs + logging format

5 participants