Skip to content

CodeCutTech/dvc-demo

Repository files navigation

View the article

DVC Demo

A demonstration of Data Version Control (DVC) for managing ML pipelines and data versioning.

What is DVC?

DVC is an open-source version control system for machine learning projects. It helps you:

  • Version control large files, data sets, machine learning models, and metrics
  • Track ML experiments
  • Create reproducible ML pipelines
  • Collaborate with team members

Project Structure

.
├── data/              # Raw and processed data files
│   └── raw.dvc        # DVC file for raw data
├── src/               # Source code for data processing and model training
├── config/            # Configuration files
├── .dvc/              # DVC internal files
├── dvc.yaml           # DVC pipeline definition
├── dvc.lock           # DVC lock file for reproducible pipelines
└── .dvcignore         # Files/directories to be ignored by DVC

Setup

  1. Install project dependencies using uv:
uv sync dvc
  1. Pull the data from remote storage:
dvc pull
  1. Run the pipeline to reproduce all stages:
dvc repro

Version Control

  • Track data files: dvc add <file>
  • Push data to remote storage: dvc push
  • Pull data from remote storage: dvc pull
  • Check status: dvc status

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages