[Website] A journey with Apache Arrow - Part 1 - POST #340

lquerel · 2023-04-03T01:05:36Z

This PR is a markdown version of the article proposed on the mailing list (see https://lists.apache.org/thread/jxpypxwjh4jhpk2xvj0z3woy7yr0z0sk).

The author field currently contains my full name and not the apacheId because I was unable to get jekyll to take into account the change I made in contributors.yml.

alamb · 2023-04-03T12:50:35Z

Thanks @lquerel -- this looks great. I hope to review this PR sometime this week

kou · 2023-04-03T20:41:14Z

_posts/2023-04-02-a-journey-with-apache-arrow-part-1.md

@@ -0,0 +1,382 @@
+---
+layout: post
+title: "A journey with Apache Arrow (part 1)"


How about adding something related to OpenTelemetry to the title?
https://arrow.apache.org/ is the Apache Arrow web site. So all contents are related to Apache Arrow. In the context, the current title may be too generic. If we can add some contexts in the title, it may help readers.

I think this is a great idea -- maybe something like "Storing tracing, metrics and logs efficiently with OpenTelemtry and Apache Arrow" 🤔

_posts/2023-04-02-a-journey-with-apache-arrow-part-1.md

alamb

Thank you @lquerel -- I read this entire article carefully and I found it well done and informative and a great primer on some of the aspects to consider when mapping data to the Arrow model

I left some suggestions that I think would help strengthen the piece, but I don't think any of them are necessary to publish this article.

Unless there are objections, I'll plan to merge this PR (which will publish the article) this next week (April 11, 2023) to ensure there is sufficient time for anyone else who would like to review to or request more time to do so.

Again, thank you very much

_posts/2023-04-02-a-journey-with-apache-arrow-part-1.md

alamb · 2023-04-04T18:11:53Z

_posts/2023-04-02-a-journey-with-apache-arrow-part-1.md

@@ -0,0 +1,382 @@
+---
+layout: post
+title: "A journey with Apache Arrow (part 1)"


I think this is a great idea -- maybe something like "Storing tracing, metrics and logs efficiently with OpenTelemtry and Apache Arrow" 🤔

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

alamb

Looking very nice 👌

alamb · 2023-04-05T12:49:35Z

_posts/2023-04-02-a-journey-with-apache-arrow-part-1.md

@@ -24,7 +24,7 @@ limitations under the License.
 {% endcomment %}
 -->

-Apache Arrow is a technology widely adopted in big data, analytics, and machine learning applications. This article discusses our journey with Arrow, specifically its application to telemetry, and the challenges we encountered while optimizing the OpenTelemetry protocol to significantly reduce bandwidth costs. The promising results we achieved inspired us to share our insights. This article specifically focuses on transforming data from an XYZ format into an efficient Arrow representation that optimizes both compression ratio, transport, and data processing. Our benchmarks thus far have shown promising results, with compression ratio improvements ranging from 1.5x to 6x, depending on the data type (metrics, logs, traces) and distribution. The approaches presented for addressing these challenges may be applicable to other Arrow domains as well. This article serves as the first installment in a two-part series.
+Apache Arrow is a technology widely adopted in big data, analytics, and machine learning applications. In this article, we share F5's experience with Arrow, specifically its application to telemetry, and the challenges we encountered while optimizing the OpenTelemetry protocol to significantly reduce bandwidth costs. The promising results we achieved inspired us to share our insights. This article specifically focuses on transforming relatively complex data structure from various formats into an efficient Arrow representation that optimizes both compression ratio, transport, and data processing. We also explore the trade-offs between different mapping and normalization strategies, as well as the nuances of streaming and batch communication using Arrow and Arrow Flight. Our benchmarks thus far have shown promising results, with compression ratio improvements ranging from 1.5x to 6x, depending on the data type (metrics, logs, traces) and distribution. The approaches presented for addressing these challenges may be applicable to other Arrow domains as well. This article serves as the first installment in a two-part series.


alamb · 2023-04-10T13:21:29Z

~~I plan to update the date on this post and publish it shortly~~

Edit: I see the date is set to 4/11 -- thus I will plan to publish it tomorrow

lquerel · 2023-04-10T17:35:14Z

I see the date is set to 4/11 -- thus I will plan to publish it tomorrow

Thanks.
FYI, I have updated the performance section to reflect my latest tests.

alamb

Looks good to me -- thanks again @lquerel

alamb · 2023-04-11T11:28:27Z

The content is live: https://arrow.apache.org/blog/2023/04/11/our-journey-at-f5-with-apache-arrow-part-1/

lquerel · 2023-04-11T14:25:25Z

Thank you all for your reviews and suggestions. I will follow the same process for the second article.

…

On Tue, Apr 11, 2023 at 4:28 AM Andrew Lamb ***@***.***> wrote: The content is live: https://arrow.apache.org/blog/2023/04/11/our-journey-at-f5-with-apache-arrow-part-1/ — Reply to this email directly, view it on GitHub <#340 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFAUSSXPLAPGO7STZVRHJLXAU56LANCNFSM6AAAAAAWQUWSZE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Laurent Quérel

lquerel added 4 commits April 2, 2023 16:54

add blog post A journey with Apache Arrow - Part 1

3fa6346

update blog post A journey with Apache Arrow - Part 1

2dc8e46

update blog post A journey with Apache Arrow - Part 1

f431617

update blog post A journey with Apache Arrow - Part 1

7481537

update blog post A journey with Apache Arrow - Part 1

1e55a78

kou reviewed Apr 3, 2023

View reviewed changes

alamb approved these changes Apr 4, 2023

View reviewed changes

lquerel and others added 5 commits April 4, 2023 15:52

Update _posts/2023-04-02-a-journey-with-apache-arrow-part-1.md

7a92130

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

update categories arrow -> application

3931578

update intro based @alamb feedback

f06490d

add F5's link

f7eb353

rename file with new title and correct date

0a45577

alamb reviewed Apr 5, 2023

View reviewed changes

fix typos and grammatical issues

42d4c7d

update performance results

e551289

alamb approved these changes Apr 11, 2023

View reviewed changes

alamb merged commit fd7fc44 into apache:main Apr 11, 2023

zeroshade mentioned this pull request Apr 12, 2023

Add CloudQuery to powered by arrow #343

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Website] A journey with Apache Arrow - Part 1 - POST #340

[Website] A journey with Apache Arrow - Part 1 - POST #340

lquerel commented Apr 3, 2023

alamb commented Apr 3, 2023

kou Apr 3, 2023

alamb Apr 4, 2023

alamb left a comment

alamb Apr 4, 2023

alamb left a comment

alamb Apr 5, 2023

alamb commented Apr 10, 2023 •

edited

Loading

lquerel commented Apr 10, 2023

alamb left a comment

alamb commented Apr 11, 2023

lquerel commented Apr 11, 2023 via email

[Website] A journey with Apache Arrow - Part 1 - POST #340

[Website] A journey with Apache Arrow - Part 1 - POST #340

Conversation

lquerel commented Apr 3, 2023

alamb commented Apr 3, 2023

kou Apr 3, 2023

Choose a reason for hiding this comment

alamb Apr 4, 2023

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Apr 4, 2023

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Apr 5, 2023

Choose a reason for hiding this comment

alamb commented Apr 10, 2023 • edited Loading

lquerel commented Apr 10, 2023

alamb left a comment

Choose a reason for hiding this comment

alamb commented Apr 11, 2023

lquerel commented Apr 11, 2023 via email

alamb commented Apr 10, 2023 •

edited

Loading