-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Website] A journey with Apache Arrow - Part 1 - POST #340
Conversation
Thanks @lquerel -- this looks great. I hope to review this PR sometime this week |
@@ -0,0 +1,382 @@ | |||
--- | |||
layout: post | |||
title: "A journey with Apache Arrow (part 1)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding something related to OpenTelemetry to the title?
https://arrow.apache.org/ is the Apache Arrow web site. So all contents are related to Apache Arrow. In the context, the current title may be too generic. If we can add some contexts in the title, it may help readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great idea -- maybe something like "Storing tracing, metrics and logs efficiently with OpenTelemtry and Apache Arrow" 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @lquerel -- I read this entire article carefully and I found it well done and informative and a great primer on some of the aspects to consider when mapping data to the Arrow model
I left some suggestions that I think would help strengthen the piece, but I don't think any of them are necessary to publish this article.
Unless there are objections, I'll plan to merge this PR (which will publish the article) this next week (April 11, 2023) to ensure there is sufficient time for anyone else who would like to review to or request more time to do so.
Again, thank you very much
@@ -0,0 +1,382 @@ | |||
--- | |||
layout: post | |||
title: "A journey with Apache Arrow (part 1)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great idea -- maybe something like "Storing tracing, metrics and logs efficiently with OpenTelemtry and Apache Arrow" 🤔
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very nice 👌
@@ -24,7 +24,7 @@ limitations under the License. | |||
{% endcomment %} | |||
--> | |||
|
|||
Apache Arrow is a technology widely adopted in big data, analytics, and machine learning applications. This article discusses our journey with Arrow, specifically its application to telemetry, and the challenges we encountered while optimizing the OpenTelemetry protocol to significantly reduce bandwidth costs. The promising results we achieved inspired us to share our insights. This article specifically focuses on transforming data from an XYZ format into an efficient Arrow representation that optimizes both compression ratio, transport, and data processing. Our benchmarks thus far have shown promising results, with compression ratio improvements ranging from 1.5x to 6x, depending on the data type (metrics, logs, traces) and distribution. The approaches presented for addressing these challenges may be applicable to other Arrow domains as well. This article serves as the first installment in a two-part series. | |||
Apache Arrow is a technology widely adopted in big data, analytics, and machine learning applications. In this article, we share F5's experience with Arrow, specifically its application to telemetry, and the challenges we encountered while optimizing the OpenTelemetry protocol to significantly reduce bandwidth costs. The promising results we achieved inspired us to share our insights. This article specifically focuses on transforming relatively complex data structure from various formats into an efficient Arrow representation that optimizes both compression ratio, transport, and data processing. We also explore the trade-offs between different mapping and normalization strategies, as well as the nuances of streaming and batch communication using Arrow and Arrow Flight. Our benchmarks thus far have shown promising results, with compression ratio improvements ranging from 1.5x to 6x, depending on the data type (metrics, logs, traces) and distribution. The approaches presented for addressing these challenges may be applicable to other Arrow domains as well. This article serves as the first installment in a two-part series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Edit: I see the date is set to 4/11 -- thus I will plan to publish it tomorrow |
Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thanks again @lquerel
The content is live: https://arrow.apache.org/blog/2023/04/11/our-journey-at-f5-with-apache-arrow-part-1/ |
Thank you all for your reviews and suggestions. I will follow the same
process for the second article.
…On Tue, Apr 11, 2023 at 4:28 AM Andrew Lamb ***@***.***> wrote:
The content is live:
https://arrow.apache.org/blog/2023/04/11/our-journey-at-f5-with-apache-arrow-part-1/
—
Reply to this email directly, view it on GitHub
<#340 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFAUSSXPLAPGO7STZVRHJLXAU56LANCNFSM6AAAAAAWQUWSZE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Laurent Quérel
|
This PR is a markdown version of the article proposed on the mailing list (see https://lists.apache.org/thread/jxpypxwjh4jhpk2xvj0z3woy7yr0z0sk).
The
author
field currently contains my full name and not theapacheId
because I was unable to getjekyll
to take into account the change I made incontributors.yml
.