Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up logs 3.x.v2 #3380

Merged
merged 4 commits into from Mar 24, 2021
Merged

Clean up logs 3.x.v2 #3380

merged 4 commits into from Mar 24, 2021

Conversation

iilyak
Copy link
Contributor

@iilyak iilyak commented Feb 23, 2021

Overview

This PR started as a port of #3031 (Clean up logs) into 3.x branch. The main goal of the PR is to remove sensitive data from the logs.
However as it was pointed out by reviewers it might be not the right approach to convert shape of gen_server state for a CouchDB version which is already widely deployed. Another problem with the scope of the PR is lack of definition of sensitive data. There are multiple points of view on the sensitive data:

  • user credentials
  • user documents
  • design documents

This means that there is no generic solution which would fit all. In order to solve this problem for the purposes of this PR the sensitive data are defined as "user credentials".

Before going further down with description of the solution presented in the PR. I wanted to mention the requirements as I understood them:

  1. sys:get_status/1 should return state as is (wrapped in [{data, [{"State", Term}]}]) see http://erlang.org/doc/man/gen_server.html#Module:format_status-2
  2. the couch_log_formatter should be able to call a configurable sanitizer module which would deal with changing the shape of log entries and removing user's data from terms
  3. "user credentials" should be removed from state for both sys:get_status/1 use case and termination of a process.

In order to achieve 1) we need to handle :format_status(normal, ...) and :format_status(terminate, ...) differently.

In order to achieve 2) we need to pass the callback module name so it is available in couch_log_formatter. We also pass the process dictionary so sanitizer module could get some meta information about the process (for example tracing span_id). The process dictionary is not logged by default. However sanitizer module can include elements from process dictionary in a term it returns.

The approach to format_status/2 is to return term as is when first argument is normal in order for sys:get_status to return original term. In both cases we remove "sensitive data" before returning the term from format_status/2. The terminate clause of a format_status/2 returns {Term, Ctx :: map()}. The Ctx includes module and dictionary keys for now. We need this information so sanitizer could distinguish messages coming from different gen_servers.

Testing recommendations

make eunit

Related Issues or Pull Requests

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

@iilyak iilyak mentioned this pull request Feb 23, 2021
4 tasks
@@ -71,7 +71,9 @@ handle_node_req(#httpd{method='PUT', path_parts=[_, Node, <<"_config">>, Section
Value = couch_util:trim(chttpd:json_body(Req)),
Persist = chttpd:header_value(Req, "X-Couch-Persist") /= "false",
OldValue = call_node(Node, config, get, [Section, Key, ""]),
case call_node(Node, config, set, [Section, Key, ?b2l(Value), Persist]) of
IsSensitive = Section == <<"admins">>,
Opts = #{persisit => Persist, sensitive => IsSensitive},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
Opts = #{persisit => Persist, sensitive => IsSensitive},
Opts = #{persist => Persist, sensitive => IsSensitive},

Copy link
Member

@rnewson rnewson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I closely reviewed the first few cases and skimmed the rest as they seemed the same idea but adapted for the gen_server in question.

I like the structure of the code but I am concerned about the choices made. This is a great deal of redaction and most of it appears not to be related to credentials or sensitive information (reducing the btree record to just the size of the btree, for example).

our logs are an important source of debugging information, particularly for users who aren't comfortable with digging around in a remsh (or are in an environment where they are prohibited from doing so).

I don't think redaction beyond obvious things like passwords can be the default behaviour of CouchDB. I'm less sure if there should be an option for the kind of deep "cleanup" here. With my couchdb support hat on, would I be happy to diagnose a problem from a CouchDB log this heavily modified?

@@ -71,7 +71,9 @@ handle_node_req(#httpd{method='PUT', path_parts=[_, Node, <<"_config">>, Section
Value = couch_util:trim(chttpd:json_body(Req)),
Persist = chttpd:header_value(Req, "X-Couch-Persist") /= "false",
OldValue = call_node(Node, config, get, [Section, Key, ""]),
case call_node(Node, config, set, [Section, Key, ?b2l(Value), Persist]) of
IsSensitive = Section == <<"admins">>,
Opts = #{persisit => Persist, sensitive => IsSensitive},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@@ -173,3 +173,8 @@ get_body(Req) ->
couch_log:notice("Body Fail: ~p~n", [Else]),
couch_httpd:send_error(Req, 400, <<"bad_request">>, <<"Missing JSON body'">>)
end.

remove_sensitive(KVList0) ->
KVList1 = lists:keyreplace(<<"username">>, 1, KVList0, {<<"username">>, <<"****">>}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

username isn't sensitive (or shouldn't be!)

remove_sensitive(KVList0) ->
KVList1 = lists:keyreplace(<<"username">>, 1, KVList0, {<<"username">>, <<"****">>}),
KVList2 = lists:keyreplace(<<"password">>, 1, KVList1, {<<"password">>, <<"****">>}),
KVList2.
Copy link
Member

@rnewson rnewson Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a style point (and I guess you're anticipating other fields in future) but I would just return the lists:keyreplace result

@@ -192,6 +193,44 @@ handle_db_updater_info({'DOWN', Ref, _, _, _}, #st{fd_monitor=Ref} = St) ->
{stop, normal, St#st{fd=undefined, fd_monitor=closed}}.


format_status(normal, [_PDict, #st{} = St]) ->
[{data, [{"State", St}]}];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like that we don't interfere with a sys:get_status call from a remsh (etc).

[{data, [{"State", reduce_record(St)}]}].


reduce_record(#st{} = St) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think redact is a better word as we have a quite different meaning for reduce in couchdb.

Copy link
Contributor Author

@iilyak iilyak Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format_status/2 was added to prevent huge btree structures hitting logs. Since the reduction of terms is not longer a goal I am thinking about removing this function.

purge_seq_tree = reduce_record(PurgeSeqTree, undefined)
};

reduce_record(Header, _) when element(1, Header) =:= db_header ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be when is_record(Header, db_header).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_record/2 works only for records in scope of a module (defined locally or included). The db_header is a private record defined in couch_bt_engine_header.erl.

couchdb/src/couch/src/couch_bt_engine.erl:224: record db_header undefined
Compiling couchdb/src/couch/src/couch_bt_engine.erl failed:
ERROR: compile failed while processing couchdb/src/couch: rebar_abort

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, true.

reduce_record(Header, _) when element(1, Header) =:= db_header ->
couch_bt_engine_header:reduce_record(Header);

reduce_record(Tree, _) when element(1, Tree) =:= btree ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be when is_record(Tree, btree).

%% We just need a correctly shaped btree record,
%% so we can pass Fd = nil
{ok, Tree} = couch_btree:open(TreeState, nil),
couch_btree:size(Tree).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

pids=Pids
} = State,
Scrubbed = State#state{
pids={length, length(Pids)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pids aren't sensitive.

funs = Funs
} = State,
Scrubbed = State#evstate{
ddocs = {dict_size, dict:size(DDocs)},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguably not sensitive either

@iilyak
Copy link
Contributor Author

iilyak commented Feb 23, 2021

I don't think redaction beyond obvious things like passwords can be the default behaviour of CouchDB. I'm less sure if there should be an option for the kind of deep "cleanup" here. With my couchdb support hat on, would I be happy to diagnose a problem from a CouchDB log this heavily modified?

Personally I hit the same problem over and over. I have a bug and go to check logs. The log contains huge term which we truncate and all important information about other fields of a record omitted.

@rnewson
Copy link
Member

rnewson commented Feb 23, 2021

Ah, sure, I get that point, thank you.

@janl
Copy link
Member

janl commented Feb 23, 2021

Hey hey, just casually following this, but the diff is a bit complex. I think this might have a better chance if we focus on redacting sensitive data first, and tackle garbage cleanup later.

@iilyak
Copy link
Contributor Author

iilyak commented Feb 23, 2021

I think this might have a better chance if we focus on redacting sensitive data first, and tackle garbage cleanup later.

Agree. I'll update the PR to decouple garbage cleanup out of this PR.

@rnewson
Copy link
Member

rnewson commented Mar 9, 2021

I don't understand these;

format_status(normal, [_PDict, State]) ->
    [{data, [{"State", State}]}];

format_status(terminate, [PDict, State]) ->
    {State, #{module => ?MODULE, dictionary => PDict}}.

Why so many conversions to map?

@iilyak
Copy link
Contributor Author

iilyak commented Mar 9, 2021

I don't understand these;

format_status(normal, [_PDict, State]) ->
    [{data, [{"State", State}]}];

format_status(terminate, [PDict, State]) ->
    {State, #{module => ?MODULE, dictionary => PDict}}.

Why so many conversions to map?

There is no conversion to map for the state of the gen_server. The state is returned as is. The only thing I do here is return additional context. I.e. return {State, Ctx} instead of State. Then the couch_log_formatter passes the Ctx to plug-able sanitizer module. If sanitizer is not specified Ctx is not used or logged.

@rnewson
Copy link
Member

rnewson commented Mar 9, 2021

I am very uncomfortable to see that pattern cut and paste so many times, it strongly implies a bad design. I think I see how you got here though.

@rnewson
Copy link
Member

rnewson commented Mar 9, 2021

we also can't use the ableist slur "sanitize" but that's a mechanical change ("scrub" perhaps?). will ponder if the boilerplating can be reduced/removed (my concern is not just the duplication but also the deviation from OTP and the need to remember to do this for every new gen_server).

@iilyak
Copy link
Contributor Author

iilyak commented Mar 9, 2021

The alternative to map() for the Ctx is a KV list. However map is superior because it can be matched in the function clause. The KV list preserve the order of elements so cannot be matched against. Which simplify dispatch in couch_log_formatter and sanitizer module.

format_status(terminate, [PDict, State]) ->
    {State, #{module => ?MODULE, dictionary => PDict}}.
    
sanitizer:sanitize_state(State, #{module := couch_server}) ->
    ....;
sanitizer:sanitize_state(State, #{module := smoosh_server}) ->
    ....;

compared to

format_status(terminate, [PDict, State]) ->
    {State, [{module, ?MODULE}, {dictionary, PDict}]}.
    
sanitizer:sanitize_state(State, Ctx) ->
    Module = case lists:keyfind(module, 1, Ctx) -> of
        {module, M} -> M;
        false -> undefined
     end,
    PDict = case lists:keyfind(dictionary, 1, Ctx) -> of
        {dictionary, D} -> D;
        false -> []
     end,
     sanitize_state(State, Module, PDict).
     
sanitizer:sanitize_state(State, smoosh_server, PDict) ->
    ....;

sanitizer:sanitize_state(State, couch_server, PDict) ->
    ....;

@rnewson
Copy link
Member

rnewson commented Mar 9, 2021

now you've explained I'm fine with the "context" item being a map. What's really awkward is having to alter every gen_server in an identical way in order to pass it.

@iilyak
Copy link
Contributor Author

iilyak commented Mar 9, 2021

I am very uncomfortable to see that pattern cut and paste so many times, it strongly implies a bad design. I think I see how you got here though.

The alternative is a common module somewhere.

format_status(terminate, [PDict, State]) ->
      Scrubed = ...
      some_module:format_status(Scrubbed, PDict).

However the gen_servers we update are in different applications. We don't have an application which is included in all the others. I could create a new app which would hold the module. This app would be only for utility functions and couldn't depend on other apps to avoid circular dependencies.

@rnewson
Copy link
Member

rnewson commented Mar 9, 2021

could this be done with a parse transform? When we used lager it added a parse transform (https://github.com/apache/couchdb-lager/blob/master/src/lager_transform.erl) which captures module name among other things.

@iilyak
Copy link
Contributor Author

iilyak commented Mar 9, 2021

In fact my first take on this problem was a parse transform. The problem with this approach is use of rebar. Rebar has bugs the parse transform configured in main application is not passed to included application for umbrella apps. We would have to update rebar.config for each app we include.

@rnewson
Copy link
Member

rnewson commented Mar 9, 2021

changing all the rebar.configs seems ok, and would have the advantage that any new gen_servers are automatically covered. Did you show the parse_transform version? I've lost track.

@iilyak
Copy link
Contributor Author

iilyak commented Mar 9, 2021

Did you show the parse_transform version? I've lost track.

I didn't

@iilyak
Copy link
Contributor Author

iilyak commented Mar 9, 2021

I think I would need to create a new repo for the parse_transform code. Because it need to be used as dependency for other applications which are not part of the monolithic repository.

I am trying to choose between

  1. a repository specifically for this transform
  2. a repository which we would use to hold all transforms we ever implement
  3. a repository with utility functions which would also have parse transforms in there

@janl
Copy link
Member

janl commented Mar 10, 2021

Good cleanup @iilyak! This is a lot easer to comprehend.

The one place where this could do with a comment about what happens is https://github.com/apache/couchdb/pull/3380/files#diff-33000329e1e5eb2554d72f4675fb32c2bf3e0599c981f044683467cb25efc02aR378 which seems very meta and it is not clear to me what is happening there, or if we still need it, now that ddocs are out of scope for redacting.

I agree with @rnewson on the duplication and agree with methods to mitigate. A separate repo sounds good. I would go with option 1. until we have more transforms, at which point we can move things around.

@iilyak
Copy link
Contributor Author

iilyak commented Mar 16, 2021

I rolled back the controversial changes and included only the changes needed to prevent leaking of passwords. I think we can have separate PR if we would want to prevent leaking of users' data or minimize state terms.

Copy link
Member

@rnewson rnewson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iilyak this is very tidy, thank you for persevering. +1 on merging though do please squash beforehand.

@iilyak iilyak merged commit 663b314 into apache:3.x Mar 24, 2021
@iilyak iilyak deleted the clean-up-logs-3.x.v2 branch March 24, 2021 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants