Skip to content

Conversation

Superskyyy
Copy link
Member

This is a work in progress.

Now reporting logs in JSON through HTTP to http://oap/v3/logs does work.

But, I'm not sure if the oap/v3/logs endpoint is for such usage(It seems designed for fluent-bit batch reporting?). Reporting logs one by one through HTTP may not be ideal in terms of performance. I'm not sure whether the Java agent only implements gRPC reporter intentionally out of this reason.

Please advise.

Signed-off-by: YihaoChen Superskyyy@outlook.com

Signed-off-by: YihaoChen <Superskyyy@outlook.com>
@kezhenxu94 kezhenxu94 added the feature New feature label Aug 10, 2021
@kezhenxu94 kezhenxu94 added this to the 0.7.0 milestone Aug 10, 2021
@tom-pytel
Copy link
Contributor

But, I'm not sure if the oap/v3/logs endpoint is for such usage(It seems designed for fluent-bit batch reporting?). Reporting logs one by one through HTTP may not be ideal in terms of performance. I'm not sure whether the Java agent only implements gRPC reporter intentionally out of this reason.

Not sure what the INTENDED usage of that endpoint was, but the same individual send inefficiency applies to how the spans are sent currently to /v3/segment instead of batching to /v3/segments, this should be optimized at some point. The hit is not that bad though since the http protocol uses requests.Session which should do persistent connections as per https://docs.python-requests.org/en/master/user/advanced/:

"The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3’s connection pooling. So if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection)."

In any case, if it works but may not yet be optional it is a step forward.

def report(self, generator):
for log_data in generator:
json_string = json_format.MessageToJson(log_data)
res = self.session.post(self.url_report, json=[json.loads(json_string)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking at this suggests that the /v3/logs endpoint can take an array of logs so a batch send could be done. Change to:

def report(self, generator):
    json = [json_format.MessageToJson(log_data) for log_data in generator]
    res = self.session.post(self.url_report, json=json)

Your call if you want to do now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, should I make a new config entry allowing the user to choose whether batch or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, batch is the right way to go (assuming this endpoint does take arrays, which is what you should check).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, it does take arrays.

@kezhenxu94
Copy link
Member

But, I'm not sure if the oap/v3/logs endpoint is for such usage(It seems designed for fluent-bit batch reporting?).

Yes it's correct usage. Http protocol is provided for those language that don't (or hard to ) support gRPC protocol.

Reporting logs one by one through HTTP may not be ideal in terms of performance. I'm not sure whether the Java agent only implements gRPC reporter intentionally out of this reason.

Yes, Java agent can eliminate all possible side effects via shading the gRPC libs but for Python, we still need http protocol in case that users' applications are using a different (incompatible) gRPC package version, we decided to implement http protocol in Python agent from day one to give the users a secondary choice.

@Humbertzhang
Copy link
Member

So far looks good to me, looking forward to your Kafka part, thank you.

@tom-pytel
Copy link
Contributor

I would say merge this first then do Kafka as a separate PR.

@kezhenxu94
Copy link
Member

I would say merge this first then do Kafka as a separate PR.

I agree. @Superskyyy let's do one thing at a time, in a single PR

@Superskyyy
Copy link
Member Author

I would say merge this first then do Kafka as a separate PR.

I agree. @Superskyyy let's do one thing at a time, in a single PR

No problem, but let me add a commit on the batch reporting first. Then lets merge.

@Superskyyy Superskyyy changed the title Enable HTTP Kafka log reporting Enable HTTP log reporting Aug 12, 2021
@Superskyyy Superskyyy closed this Aug 12, 2021
@Superskyyy Superskyyy deleted the HTTP-Kafka-logging branch August 12, 2021 03:54
@Superskyyy Superskyyy restored the HTTP-Kafka-logging branch August 12, 2021 03:56
@Superskyyy
Copy link
Member Author

Oops, messed up a bit.

@Superskyyy Superskyyy reopened this Aug 12, 2021
@kezhenxu94
Copy link
Member

Oops, messed up a bit.

It's ok, we will squash the commits into one when merging

@kezhenxu94
Copy link
Member

@Superskyyy ping me when it's ready to merge

Signed-off-by: YihaoChen <Superskyyy@outlook.com>
@Superskyyy Superskyyy marked this pull request as ready for review August 12, 2021 05:40
@Superskyyy
Copy link
Member Author

@kezhenxu94 Checks done, ready to merge.

@kezhenxu94 kezhenxu94 merged commit 8039d8b into apache:master Aug 12, 2021
@kezhenxu94
Copy link
Member

@kezhenxu94 Checks done, ready to merge.

Thank you @Superskyyy very much 🙇🏻 , excellent work!

@Superskyyy Superskyyy deleted the HTTP-Kafka-logging branch August 12, 2021 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants