Skip to content

Using DataPackages for Daytona.io samples management #1564

@s-celles

Description

@s-celles

Hi,

DataPackages provide a standardized format to describe and share collections of data using JSON metadata. This specification includes schemas for data structure and validation rules, making it ideal for managing development environment samples.

https://frictionlessdata.io/
https://datapackage.org/

Basic Sample Management

A simple DataPackage can list all Daytona samples with their essential metadata like name, description, and repository URL. This approach provides consistent structure and validation while enabling automated tooling integration.

Key benefits include:

  • Standardized sample metadata
  • Built-in validation
  • Tooling support
  • Clear documentation

External Index References

DataPackages can reference other DataPackages through external resources. This enables:

Distribution Benefits

  • Teams maintain their own sample indexes
  • Independent versioning per language/framework
  • Reduced coordination overhead
  • Selective loading of sample collections

Management Benefits

  • Modular organization by language
  • Simplified updates
  • Improved maintainability
  • Team autonomy

Implementation Plan

  1. Define base schema for sample metadata
  2. Add support for external index references
  3. Create validation tooling
  4. Update documentation and workflows
  5. Migrate existing samples

Discussion Points

  • Validation across referenced indexes
  • Version compatibility handling
  • Caching and availability strategy
  • Additional metadata requirements

Examples

  1. Basic Sample Index
{
  "name": "daytona-base-samples",
  "resources": [{
    "name": "base",
    "data": [
      {
        "name": "Python",
        "description": "Develop Python applications.",
        "gitUrl": "https://github.com/daytonaio/sample-python"
      }
    ]
  }]
}
  1. Organized by Framework
{
  "name": "daytona-python-samples",
  "resources": [{
    "name": "python-frameworks",
    "data": [
      {
        "name": "Python/Flask - AI Playlist Generator",
        "type": "flask",
        "description": "Generates playlists based on user emotions"
      },
      {
        "name": "Python/Django - CrisisMonitor", 
        "type": "django",
        "description": "Natural disaster tracking dashboard"
      }
    ]
  }]
}
  1. External Reference Pattern
{
  "name": "daytona-main-index",
  "resources": [
    {
      "name": "python-samples",
      "path": "https://raw.githubusercontent.com/org/python-samples/index.json"
    },
    {
      "name": "nodejs-samples", 
      "path": "https://raw.githubusercontent.com/org/nodejs-samples/index.json"
    }
  ]
}

The structure allows for organized sample discovery and distributed maintenance.

Any opinion?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions