Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flatten_json converts all values in that nested blob to string #13888

Closed
jivepig opened this issue Nov 4, 2022 · 11 comments · Fixed by #13947
Closed

flatten_json converts all values in that nested blob to string #13888

jivepig opened this issue Nov 4, 2022 · 11 comments · Fixed by #13947

Comments

@jivepig
Copy link

jivepig commented Nov 4, 2022

As per discussion in this :
https://community.graylog.org/t/parsing-nested-json-message-in-field-with-parent-object-in-pipeline/26292/19

Expected Behavior

When using flatten_json and uncovering different field types, flatten_json to_map should carry field types into the result.

Current Behavior

Currently it appears flatten_json converts all nested flattened json to string only when it has to flatten.

Possible Solution

Carry field types through when creating parse to_map

Steps to Reproduce (for bugs)

Parse Nginx logs to json,

rule "extract json"
when 
  regex("(\\{.*\\})", to_string($message.message)).matches == true
then
  let json = regex("(\\{.*\\})", to_string($message.message), ["json"])["json"];
  // set_field("json", json);
  set_fields(to_map(flatten_json(value: to_string(json), array_handler: "json")));
end

Context

Can to_map fields be carried forward when parsing with flatten_json?

Your Environment

  • Graylog Version: 4.3
  • Java Version: 11
  • Elasticsearch Version: 7.10.2
  • MongoDB Version: 4.0
  • Operating System:n/a
  • Browser version: n/a

Raw Data:
{"@timestamp": "2022-11-03T20:39:07+00:00", "source": "router", "nginx": {"remote_addr": "xx.xx.12.123", "remote_user": "39942", "body_bytes_sent": 0, "request_length": 656, "request_time": 0.464, "status": 202, "request": "PATCH /xxxxxx/emapi/v1/enablement/53815 HTTP/1.1", "request_method": "PATCH", "http_origin": "-", "http_referrer": "-", "site": "xxxxx.com", "port": 443, "http_user_agent": "python-requests/2.28.1" }}

@jivepig jivepig added the bug label Nov 4, 2022
@jivepig
Copy link
Author

jivepig commented Nov 4, 2022

image

@jivepig
Copy link
Author

jivepig commented Nov 4, 2022

image

@bernd bernd added the triaged label Nov 7, 2022
@brijeshkalavadia
Copy link

Do we have any update on this ?

@brijeshkalavadia
Copy link

Ok so meanwhile i tried below and it seems it only converted “nginx_body_bytes_sent” field type and kept rest of the field as it is “string”… am I doing anything wrong ?

rule "extract json"
when 
    regex("(\\{.*\\})", to_string($message.message)).matches == true
then
   let json = regex("(\\{.*\\})", to_string($message.message), ["json"])["json"];
  // set_field("json", json);

set_fields(to_map(flatten_json(value: to_string(json), array_handler: "json")));
set_field("nginx_body_bytes_sent",to_double($message.nginx_body_bytes_sent));
set_field("nginx_bytes_sent",to_double($message.nginx_bytes_sent));
set_field("nginx_port",to_long($message.nginx_port));
set_field("nginx_request_length",to_double($message.nginx_request_length));
set_field("nginx_request_time",to_double($message.nginx_request_time));
set_field("nginx_site",to_ip($message.nginx_site));
set_field("nginx_status",to_long($message.nginx_status));

end

@patrickmann
Copy link
Contributor

Keep in mind that Elastic does not change field types after initially creating a mapping. If a field is first created as string, it cannot be mapped to e.g. numeric later on.
That said - the function currently does indeed emit everything as string. It's a simple change to preserve the types.

@boosty
Copy link
Contributor

boosty commented Nov 11, 2022

@patrickmann Good point. We could add a parameter allowing users to keep the legacy behaviour. What do you think?

@patrickmann
Copy link
Contributor

@boosty

We could add a parameter allowing users to keep the legacy behaviour. What do you think?

I don't think many people are using it yet since it was new in 4.3. We are moving to a new major version anyway, can we just do a breaking change?
Otherwise yes, easy to add an optional legacy flag which we can plan to drop again later.

@boosty
Copy link
Contributor

boosty commented Nov 11, 2022

Yes, I think we could fix the default behaviour for 5.0, but ideally before the RC.

Otherwise yes, easy to add an optional legacy flag which we can plan to drop again later.

Instead of calling it legacy or so, we could name the parameter stringify_values (false by default) and just keep it, unless it's a bigger burden to keep the legacy implementation.

@patrickmann patrickmann self-assigned this Nov 11, 2022
@patrickmann
Copy link
Contributor

@boosty Do we want to backport to 4.3? If so, I assume we would not want to break existing rules. I.e. the optional parameter would be available to access the new behavior, rather than the legacy behavior as in 5.0.

@boosty
Copy link
Contributor

boosty commented Nov 14, 2022

@patrickmann Yes, I think backporting the new parameter to 4.3 so that users can opt-in to the new behaviour would be good.

@brijeshkalavadia
Copy link

Keep in mind that Elastic does not change field types after initially creating a mapping. If a field is first created as string, it cannot be mapped to e.g. numeric later on. That said - the function currently does indeed emit everything as string. It's a simple change to preserve the types.

Agree and that is what we expect to behave ... keep original type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants