Skip to content

Conversation

@andreyvelich
Copy link
Contributor

Which issue does this PR close?

No issue

Rationale for this change

Hi Folks, thanks for driving DataFusion forward!

We've recently released support for distributed data cache in Kubeflow Trainer.
It allows users to stream massive datasets directly to distributed training nodes and optimize GPU utilization.

Docs and public talks are available in this guide: https://www.kubeflow.org/docs/components/trainer/user-guides/data-cache/

I've updated the DataFusion known users with that.

cc @akshaychitneni @bigsur0 @comphead @andygrove

What changes are included in this PR?

Update DataFusion docs to include Kubeflow Trainer.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 25, 2025
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andreyvelich it is LGTM
Pending formatting CI

@comphead
Copy link
Contributor

@alamb cc

@alamb
Copy link
Contributor

alamb commented Nov 25, 2025

Thanks @andreyvelich and @comphead

@comphead comphead added this pull request to the merge queue Nov 25, 2025
Merged via the queue into apache:main with commit 0871741 Nov 25, 2025
5 checks passed
@andreyvelich andreyvelich deleted the add-kubeflow-trainer branch November 25, 2025 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants