Skip to content

Add escape json for logging#8886

Merged
ywkaras merged 9 commits intoapache:masterfrom
hnakamur:json_access_log
Jun 9, 2022
Merged

Add escape json for logging#8886
ywkaras merged 9 commits intoapache:masterfrom
hnakamur:json_access_log

Conversation

@hnakamur
Copy link
Copy Markdown
Contributor

@hnakamur hnakamur commented Jun 4, 2022

This pull requests adds an optional escape attribute in format entries in logging.yaml.
The possible values are json and none.

An example logging.yaml

logging:
  formats:
    - name: "mycustom"
      escape: "json"
      format: '{"chi":"%<chi>","caun":"%<caun>","cqtn":"%<cqtn>","cqtx":"%<cqtx>","pssc":"%<pssc>","pscl":"%<pscl>","sssc":"%<sssc>","sscl":"%<sscl>","cqcl":"%<cqcl>","pqcl":"%<pqcl>","cqhl":"%<cqhl>","pshl":"%<pshl>","pqhl":"%<pqhl>","sshl":"%<sshl>","tts":"%<tts>","phr":"%<phr>","cfsc":"%<cfsc>","pfsc":"%<pfsc>","crc":"%<crc>","my_header":"%<{My-Header}cqh>","my_header_slice":"%<{My-Header}cqh[2:-3]>"}'

  logs:
    - filename: custom
      format: mycustom
      mode: ascii

An example request:

curl -v -H $'My-Header: a\tbcde\x1dfghijkl\x1em' 'localhost:8080?a=c\td'

An example log output:

{"chi":"127.0.0.1","caun":"-","cqtn":"04/Jun/2022:10:46:55 +0900","cqtx":"GET http://127.0.0.1/?a=c\\td HTTP/1.1","pssc":"200","pscl":"34","sssc":"200","sscl":"34","cqcl":"0","pqcl":"0","cqhl":"125","pshl":"242","pqhl":"256","sshl":"236","tts":"0","phr":"DIRECT","cfsc":"FIN","pfsc":"FIN","crc":"TCP_MISS","my_header":"a\tbcde\u001dfghijkl\u001em","my_header_slice":"bcde\u001df"}

@masaori335 masaori335 added this to the 10.0.0 milestone Jun 6, 2022
@bryancall
Copy link
Copy Markdown
Contributor

[approve ci]

@bryancall bryancall requested a review from ywkaras June 6, 2022 23:24
@ywkaras
Copy link
Copy Markdown
Contributor

ywkaras commented Jun 7, 2022

We had a fairly long conversation about this in the weekly Monday meeting. We talked about whether there might be a more general solution, but couldn't immediately think of one.

Does the escaping assume the message has no deviations from the IETF standards?

Comment thread proxy/logging/LogAccess.cc Outdated

for (int i = 0; i < len; i++) {
char c = buf[i];
switch (c) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are two alternatives to a switch statement: https://godbolt.org/z/G3fz3469v

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion.
I tried to modify escape_json using bitset but I found that I need the mapping like \b to b.

I made a modified version at
https://github.com/hnakamur/json-escape-cpp-experiment/blob/main/test.cpp

This version properly escapes the DEL 0x7f character too.

Can I replace escape_json with this one?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard headers with a .h extension are deprecated and replaced in C++, so it's best to use rather than <string.h>, and std::strlen rather than strlen. https://stackoverflow.com/questions/8380805/difference-between-string-h-and-cstring .

Copy link
Copy Markdown
Contributor

@ywkaras ywkaras Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

namespace
{
class EscLookup
{
public:
  static const char NO_ESCAPE{'\0'};
  static const char LONG_ESCAPE{'\x01'};

  static char result(char c)
  {
    return _lu.table[static_cast<unsigned char>(c)];
  }

private:
  struct _LUT
  {
    _LUT();

    char table[1 << 8];
  };

  inline static _LUT const _lu;
};

EscLookup::_LUT::_LUT()
{
  for (unsigned i = 0; i < ' '; ++i) {
    table[i] = LONG_ESCAPE;
  }
  for (unsigned i = '\x7f'; i < sizeof(table); ++i) {
    table[i] = LONG_ESCAPE;
  }

  // Short escapes.
  //
  table['\b'] = 'b';
  table['\t'] = 't';
  table['\n'] = 'n';
  table['\f'] = 'f';
  table['\r'] = 'r';
  table['\\'] = '\\';
  table['\"'] = '\"';
}

char nibble(int nib)
{
  return nib >= 0xa ? 'a' + (nib - 0xa) : '0' + nib;
}

} // end anonymous namespace

static int
escape_json(char *dest, const char *buf, int len)
{
  int escaped_len = 0;

  for (int i = 0; i < len; i++) {
    char c = buf[i];
    char ec = EscLookup::result(c);
    if (__builtin_expect(EscLookup::NO_ESCAPE == ec, 1)) {
      if (dest) {
        *dest++ = c;
      }
      escaped_len++;

    } else if (EscLookup::LONG_ESCAPE == ec) {
      if (dest) {
        *dest++ = '\\';
        *dest++ = 'u';
        *dest++ = '0';
        *dest++ = '0';
        *dest++ = nibble(static_cast<unsigned char>(c) >> 4);
        *dest++ = nibble(c & 0x0f);
      }
      escaped_len += 6;

    } else { // Short escape.
      if (dest) {
        *dest++ = '\\';
        *dest++ = ec;
      }
      escaped_len += 2;
    }
  } // end for
  return escaped_len;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I added commit ab97729 before noticing your code.
I'll update it later.

Also I noticed slicing is done after escape now and produces incorrect strings like \u0 or so.
I'll looking into it later this too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied the code at #8886 (comment) to a65b8b4 with specifying the author ywkaras. Is it OK with you?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied the code at #8886 (comment) to a65b8b4 with specifying the author ywkaras. Is it OK with you?

Yes OK.

@hnakamur
Copy link
Copy Markdown
Contributor Author

hnakamur commented Jun 8, 2022

Thanks for taking time to discuss log escape in the meeting.
escape_json function in this pull request was based on esc_json_out in the traffic_dump plugin.

Does the escaping assume the message has no deviations from the IETF standards?

Could you tell me what do you mean by "the IETF standards"?
I would like to do the string escape defined in https://www.json.org/json-en.html

hnakamur added a commit to hnakamur/json-escape-cpp-experiment that referenced this pull request Jun 8, 2022
@ywkaras
Copy link
Copy Markdown
Contributor

ywkaras commented Jun 8, 2022

That page seems to say / should also be escaped. It doesn't say that " should be escaped as \ . Seems like that would confuse parses of the log, if \ and " both are mapped to \.

I looked through the IETF standards a little, they are not very clear. I think the function should handle bytes in the 0x80 to 0xff .

traffic_dump is only uses once in a while. It would not do harm to make the function more efficient, since logging is often done for each and every transaction.

@hnakamur
Copy link
Copy Markdown
Contributor Author

hnakamur commented Jun 8, 2022

  • " must be escaped as \", not as \\
  • It seems ok / not escaped

ECMA-404 The JSON Data Interchange Standard (linked from https://www.json.org/json-en.html) p.11

All code points may be placed within the quotation marks except for the code points that must be escaped: quotation mark
(U+0022), reverse solidus (U+005C), and the control characters U+0000 to U+001F.

Comment thread proxy/logging/LogAccess.cc Outdated

char *p = dest;

// int res1 = unmarshal_http_method (buf, p, len);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not remove this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Removed it at 672e340

@ywkaras
Copy link
Copy Markdown
Contributor

ywkaras commented Jun 9, 2022

As well as fixing the compiler warnings, I think we need some Au test coverage of this new feature before it can be merged.

@ywkaras ywkaras closed this Jun 9, 2022
@ywkaras ywkaras reopened this Jun 9, 2022
@ywkaras
Copy link
Copy Markdown
Contributor

ywkaras commented Jun 9, 2022

  • " must be escaped as \", not as \\

Sorry I read the code wrong.

  • It seems ok / not escaped

The goal is that someone could write a script to parse a log without having to read the source code to find the details of how the escaping works. If the escaping standard has different versions and options, we should say in the ATS documentation which version and options ATS implements.

@hnakamur
Copy link
Copy Markdown
Contributor Author

hnakamur commented Jun 9, 2022

Changed to escape forward slash at 110ce54

The tasks left:

  • Fix slicing
  • Add tests

Comment thread proxy/logging/LogAccess.cc Outdated

// Short escapes.
//
table['\b'] = 'b';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our PR checks are now tricky to use. After you click on details, you must then again find the URL for the check in the list of URLs, and click on it. Then you can click Console Output.

With the change above, you can change these to be like:

table('\\b') = 'b';

to get rid of the warning that is treated as an error.

@ywkaras ywkaras merged commit d187218 into apache:master Jun 9, 2022
@hnakamur
Copy link
Copy Markdown
Contributor Author

hnakamur commented Jun 9, 2022

Thanks for your review!

@hnakamur hnakamur deleted the json_access_log branch June 9, 2022 23:04
zwoop pushed a commit that referenced this pull request Jun 15, 2022
* Add escape json for logging

* Modify escape_json to escape DEL (0x7f) correctly

* Modify escape_json for logging

(Copied from #8886 (comment))

Co-authored-by: Walt Karas <wkaras@yahooinc.com>

* Fix escape of doublequote in escape_json

* Escape forward slash in escape_json

* Removed a comment-outed line in LogAccess::unmarshal_http_text_json

* Fix slicing for escape_json in LogAccess.cc

* Cast char index to int to avoid char-subscripts warnings

* Add AuTest for json log escape

Co-authored-by: Walt Karas <wkaras@verizonmedia.com>
Co-authored-by: Walt Karas <wkaras@yahooinc.com>
(cherry picked from commit d187218)
@zwoop
Copy link
Copy Markdown
Contributor

zwoop commented Jun 15, 2022

Cherry-picked to v9.2.x

@zwoop zwoop modified the milestones: 10.0.0, 9.2.0 Jun 15, 2022
@hnakamur hnakamur restored the json_access_log branch December 25, 2022 12:16
masaori335 pushed a commit to masaori335/trafficserver that referenced this pull request Feb 21, 2023
* asf/9.2.x:
  Updated ChangeLog
  Modifying array outside bounds (apache#8806)
  Add escape json for logging (apache#8886)
  Removes remaining vestiges to the backdoor port (apache#8793)
  Add Au test for prefetch plugin in simple mode. (apache#8810)
  uri_signing plugin: Fix missing payload validation for the iss field. (apache#8901)
  Cleanup SNIConfig (apache#8892)
  Fix autest uses of File exists parameter (apache#8906)
  Do not modify Transfer-Encoding header on retry (apache#8899)
  Fix plugin stats_over_http OK reason phrase (apache#8902)
  Add AuTest for stats-over-http plugin (apache#8422)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants