Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data pipelines #57

Closed
blairdrummond opened this issue May 14, 2020 · 3 comments
Closed

Data pipelines #57

blairdrummond opened this issue May 14, 2020 · 3 comments
Assignees
Labels
component/kubeflow Kubeflow Related component/storage Persistence related (e.g. Minio, cloud, or user storage)

Comments

@blairdrummond
Copy link
Contributor

  • Figure out best practices
  • How do users create containers? (or bypass the need for them)
  • Effectively passing data between containers?
  • Establish patterns for ingress/egress of data from pipelines to/from storage.

This is just a starting list of objectives, but the overall goal of the epic is to make kf-pipelines and/or Pachyderm a polished experience for data scientists making pipelines.

@blairdrummond blairdrummond added component/kubeflow Kubeflow Related component/storage Persistence related (e.g. Minio, cloud, or user storage) size/L 4-5 days component/pachyderm labels May 14, 2020
@chritter
Copy link

chritter commented May 15, 2020

@blairdrummond Good idea to open these points for discussion. Could you schedule a discussion in Teams or Slack audio?

@brendangadd brendangadd removed the size/L 4-5 days label May 27, 2020
@brendangadd brendangadd changed the title [Epic] Kf pipelines and/or Pachyderm workflow [Epic] Data pipelines Jun 24, 2020
@brendangadd
Copy link
Contributor

With the change to a non-open-source licence, we need to evaluate alternatives to Pachyderm. Key features we're looking for data pipelines:

  • Orchestration (schedule, trigger)
  • Parallelization (multi-container)
  • Historical metadata (what ran when with what parameters)
  • Data snapshotting/lineage (what data was consumed/produced, maybe diffed)

@ca-scribner
Copy link
Contributor

from @justbert : https://dvc.org/blog/cml-release
looks interesting

@brendangadd brendangadd added the kind/epic An epic label May 20, 2021
@brendangadd brendangadd changed the title [Epic] Data pipelines Data pipelines Apr 6, 2022
@brendangadd brendangadd removed the kind/epic An epic label Apr 6, 2022
@wg102 wg102 mentioned this issue Jul 12, 2022
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/kubeflow Kubeflow Related component/storage Persistence related (e.g. Minio, cloud, or user storage)
Projects
None yet
Development

No branches or pull requests

5 participants