-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
enhancementNew feature or requestNew feature or requestepicLarge feature to be broken down into user storiesLarge feature to be broken down into user stories
Milestone
Description
User Story
When creating a new dataset, the user can pick from a list of types. The built-in dataset is the current one and stores everything local. The new type will be of type s3 bucket mirror. The user will provide s3 credentials as part of the dataset creation step. After that the dataset will function similarly to the built-in type, but the users cannot add/remove/edit files/folders (maybe we can support this in the future?).
Implementation
An s3 bucket dataset has the following attributes:
- Files and folders are not stored locally, but on the s3 bucket
- Add webhook to clowder 2 to send s3 events back to clowder. When things are modified in the bucket, clowder2 can update its state.
- Alternative is to always query the bucket live.
- Extend files and folders need to support the concept of remote versions.
- S3 bucket credentials are provided by the user when creating a dataset. Credentials are kept in a vault (for example HashiCorp Vault / OpenBao).
- Adding local files folders is disabled. Users can still add metadata and run extractors on them.
- File versions are ignored? Updated through events?
- Download endpoints will stream data from the S3 bucket (phase 1). We could eventually have extractors download directly from S3 buckets (phase 2).
Updates to models
- Dataset models stay the same. Add type. Type drives everything about the dataset behavior.
- File already includes StorageType. Folder does not.
Other Thoughts
- If at any point the credentials stop working, the system should send a clear message to the user and allow them to update credentials.
- When running an extractor on a file, the extractor could download directly from the bucket. This will require more development.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestepicLarge feature to be broken down into user storiesLarge feature to be broken down into user stories
Type
Projects
Status
No status
Status
No status