Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

设计一个完整的数据结构,包含 file 和 artifacts 信息,以及可以进一步组合,然后可以被检索 #59

Closed
2 of 3 tasks
web3nomad opened this issue Apr 25, 2024 · 3 comments
Labels
feature New features

Comments

@web3nomad
Copy link
Member

web3nomad commented Apr 25, 2024

  • 定义 artifacts.json 的格式,包含 file 被处理的各种中间结果,需要保持之后一系列版本里的稳定

下一步

  • artifacts 可以被组合
  • 让 artifacts 可以搜索,(比如把 sqlite 和 qdrant 中的信息也放进去),然后做一个类似于 sanity 一样的数据库。

不过上面👆两点可能不是 artifacts 的职责,应该再定义一个新的数据结构。

@web3nomad web3nomad added the feature New features label Apr 25, 2024
@web3nomad
Copy link
Member Author

web3nomad commented May 28, 2024

一个文件的 artifacts(除了thumbnail.jpg)用一个 artifacts.json 来记录
文件包含了不同任务的所选模型、不同任务在不同模型下产生的结果

当artifacts被共享到其他地方时,可以根据模型选择判断哪些任务需要重新触发,哪些可以直接复用

示例 artifacts.json

{
    // 这里记录了不同任务对应的模型(修改library设置时这里会被修改)
    "models": {
        "transcript-embedding": "stella-base-zh-v3-1792d",
        "frame-caption-embedding": "stella-base-zh-v3-1792d",
        "audio": "",
        "transcript": "whisper-small",
        "frame-caption": "blip-base",
        "frame": "",
        "frame-content-embedding": "clip-multilingual-v1"
    },
    // 这里记录了不同任务在不同模型下的结果
    "results": {
        // 对于任务frame-content-embedding
        "frame-content-embedding": {
            // key 的命名规则:{模型名称}:{输入文件夹名称}[:{其余输入文件夹名称}]
            "clip-multilingual-v1:frames": {
                // 输出目录
                "dir": "frame-content-embedding-3623e254-90bd-4586-a2cf-c24da23fdd48",
                // 目录中的具体文件
                "files": [
                    "0.json",
                    "4000.json",
                    "1000.json",
                    "2000.json",
                    "3000.json"
                ]
            }
        },
        "frame-caption": {
            "blip-base:frames": {
                "dir": "frame-caption-e886b05b-6589-4129-ab6e-9becbce7d6ba",
                "files": [
                    "0.json",
                    "4000.json",
                    "1000.json",
                    "2000.json",
                    "3000.json"
                ]
            }
        },
        "transcript-embedding": {
            "stella-base-zh-v3-1792d:transcript-df1e28ac-b50c-475c-8d53-d897dbc480cf": {
                "dir": "transcript-embedding-1c011476-9b13-474f-9b96-7c063a0e6d1f",
                "files": [
                    "0-4020.json"
                ]
            }
        },
        "transcript": {
            "whisper-small:audio-3c8c7880-aa6c-466f-973b-6e0b93854824": {
                "dir": "transcript-df1e28ac-b50c-475c-8d53-d897dbc480cf",
                "files": [
                    "output.json"
                ]
            }
        },
        "audio": {
            ":": {
                "dir": "audio-3c8c7880-aa6c-466f-973b-6e0b93854824",
                "files": [
                    "audio.wav"
                ]
            }
        },
        "frame": {
            ":": {
                "dir": "frames",
                "files": [
                    "4000.jpg",
                    "2000.jpg",
                    "3000.jpg",
                    "1000.jpg",
                    "0.jpg"
                ]
            }
        },
        "frame-caption-embedding": {
            "stella-base-zh-v3-1792d:frame-caption-e886b05b-6589-4129-ab6e-9becbce7d6ba": {
                "dir": "frame-caption-embedding-ffb246a5-7601-4675-ac34-8d4a5a2f6b5c",
                "files": [
                    "0.json",
                    "4000.json",
                    "1000.json",
                    "2000.json",
                    "3000.json"
                ]
            }
        }
    }
}

@web3nomad web3nomad changed the title 需要一个完整的数据结构,比如包含 file 和 artifacts,以及可以进一步组合,然后可以被检索 设计一个完整的数据结构,包含 file 和 artifacts 信息,以及可以进一步组合,然后可以被检索 May 28, 2024
@web3nomad
Copy link
Member Author

如果要把 thumbnail 的格式修改成 webp #72
artifacts 里面要把 thumbnail 的图片记录下来,才能兼容多种 thumbnail 格式同时存在

thumbnail 记录在 artifacts 里其实合理,不如加进去。
@zhuojg

@web3nomad
Copy link
Member Author

content base #84 和 surrealdb #94 两个 pr 已经解决了这个需求

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New features
Projects
None yet
Development

No branches or pull requests

1 participant