Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other playable models-Text2Image #1

Open
Wulx2050 opened this issue Jun 24, 2022 · 3 comments
Open

Other playable models-Text2Image #1

Wulx2050 opened this issue Jun 24, 2022 · 3 comments

Comments

@Wulx2050
Copy link

Wulx2050 commented Jun 24, 2022

playable models

  1. dalle-mini & craiyon
    https://github.com/borisdayma/dalle-mini

  2. CogView2
    https://github.com/THUDM/CogView2

待添加


No pretrained models

  1. imagen
    https://github.com/lucidrains/imagen-pytorch

  2. 文心 ERNIE-ViLG
    https://wenxin.baidu.com/wenxin/modelbasedetail/ernie_vilg/

待添加

@HighCWu
Copy link
Collaborator

HighCWu commented Jun 24, 2022

If we have enough time, we will try to migrate. However, I hope that Baidu official can release an open source model of text to image on paddlepaddle.
I also know a popular model trained by Tsinghua University, although it is also a pytorch version.
CogView2: https://github.com/THUDM/CogView2

@Wulx2050
Copy link
Author

Wulx2050 commented Jun 25, 2022

If we have enough time, we will try to migrate. However, I hope that Baidu official can release an open source model of text to image on paddlepaddle. I also know a popular model trained by Tsinghua University, although it is also a pytorch version. CogView2: https://github.com/THUDM/CogView2

我刚刚找了一下,文心 ERNIE-ViLG 文本生成图像的能力在开放领域公开数据集 MS-COCO 上进行了验证。评估指标使用 FID(该指标数值越低效果越好), 在 zero-shot 和 finetune 两种方式下,文心 ERNIE-ViLG 都取得了最佳成绩,效果远超 OpenAI 发布的 DALL-E 等模型。他们提供 ERNIE-ViLG API 体验调用的入口,也许你可以联系作者团队,找他们要预训练模型?

I just found it, and the ability of Wenxin ERNIE-ViLG to generate images from text is verified on the open domain public dataset MS-COCO. The evaluation index uses FID (the lower the value of the index, the better the effect). In both zero-shot and finetune methods, Wenxin ERNIE-ViLG has achieved the best results, and the effect is far superior to the models such as DALL-E released by OpenAI. They provide an entry to the ERNIE-ViLG API experience call, maybe you can contact the author team and ask them to pre-train the model?

文心 ERNIE-ViLG
https://wenxin.baidu.com/wenxin/modelbasedetail/ernie_vilg/
paper:
https://arxiv.org/pdf/2112.15283.pdf

@Wulx2050
Copy link
Author

Another project with code and models

  • ERNIE-SAT
    类别文心·跨模态大模型
    应用语音编辑、语音生成、语音克隆、带语音克隆的语音到语音翻译

ERNIE-SAT 采用语音-文本联合训练的方式在中文和英文数据集上进行预训练。使得模型学到了语音和文本的对齐关系,并且生成频谱的精度更高,合成声音的质量更高。

https://wenxin.baidu.com/wenxin/modelbasedetail/ernie_sat/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants