Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

将pytorch模型转化到tf,效果变差 #12

Open
Biaocsu opened this issue Dec 8, 2021 · 8 comments
Open

将pytorch模型转化到tf,效果变差 #12

Biaocsu opened this issue Dec 8, 2021 · 8 comments

Comments

@Biaocsu
Copy link

Biaocsu commented Dec 8, 2021

您好,如题按照您的脚本,对distill模型进行转化,pytorch到tf,怎么效果变差很多,您知道可能哪里出问题的吗?

脚本除了仍使用GPU加载模型,未作任何改变

Loading model cost 0.702 seconds.
Prefix dict has been built successfully.
(1, 1, 30000) (12, 1, 2, 12, 1, 64)
(1, 1, 30000) (12, 1, 2, 12, 1, 64)
tf.Tensor(
[[  837   259   497   788 22707 22707 22707 22707 22707 22707 22707 22707
  22707 22707 22707 22707 22707 22707 22707 22707]], shape=(1, 20), dtype=int64)
今天天气 不错 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵
tf.Tensor(
[[  837   259   497   788 24672  6655  7254  6123 22707  2779  8494 28689
  20220 28689  2779  2779 28689 22707  2779  5469]], shape=(1, 20), dtype=int64)
今天天气 不错 裁定穷 驾清楚 猥亵脑 10000畫 电饭畫脑脑畫 猥亵脑 一方
@qhduan
Copy link
Collaborator

qhduan commented Dec 8, 2021

这个看结果不是变差了,是没转换对,肯定中间某些参数或步骤出问题了。

我的ipynb里面有一些其他的中间输出,可以尝试对照哪个中间输出是否和我现在的不一样,慢慢排查,具体原因我现在也无法判断

@Biaocsu
Copy link
Author

Biaocsu commented Dec 9, 2021

我再检查下,多谢

@Biaocsu
Copy link
Author

Biaocsu commented Dec 9, 2021

我这边这里加载出来的均为gpt2模型,而您那边均为gpt模型,怀疑是这里不一致导致后面转化出错的,但不知道原因,我这边CPM-distill模型来自官方下载

for x in gpt.weights:
    if 'gpt/layer' in x.name:
        if 'gpt/layer00' in x.name:
            print(x.name, x.shape)
    else:
        print(x.name, x.shape)
gpt_2/embedding_2/embeddings:0 (30000, 768)
position_embeddings:0 (1024, 768)
gpt_2/layer00/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer00/attention/query_layer/bias:0 (768,)
gpt_2/layer00/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer00/attention/key_layer/bias:0 (768,)
gpt_2/layer00/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer00/attention/value_layer/bias:0 (768,)
gpt_2/layer00/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer00/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer00/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer00/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer00/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer00/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer00/intermediate/kernel:0 (768, 3072)
gpt_2/layer00/intermediate/bias:0 (3072,)
gpt_2/layer00/output/kernel:0 (3072, 768)
gpt_2/layer00/output/bias:0 (768,)
gpt_2/layer01/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer01/attention/query_layer/bias:0 (768,)
gpt_2/layer01/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer01/attention/key_layer/bias:0 (768,)
gpt_2/layer01/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer01/attention/value_layer/bias:0 (768,)
gpt_2/layer01/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer01/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer01/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer01/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer01/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer01/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer01/intermediate/kernel:0 (768, 3072)
gpt_2/layer01/intermediate/bias:0 (3072,)
gpt_2/layer01/output/kernel:0 (3072, 768)
gpt_2/layer01/output/bias:0 (768,)
gpt_2/layer02/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer02/attention/query_layer/bias:0 (768,)
gpt_2/layer02/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer02/attention/key_layer/bias:0 (768,)
gpt_2/layer02/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer02/attention/value_layer/bias:0 (768,)
gpt_2/layer02/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer02/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer02/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer02/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer02/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer02/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer02/intermediate/kernel:0 (768, 3072)
gpt_2/layer02/intermediate/bias:0 (3072,)
gpt_2/layer02/output/kernel:0 (3072, 768)
gpt_2/layer02/output/bias:0 (768,)
gpt_2/layer03/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer03/attention/query_layer/bias:0 (768,)
gpt_2/layer03/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer03/attention/key_layer/bias:0 (768,)
gpt_2/layer03/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer03/attention/value_layer/bias:0 (768,)
gpt_2/layer03/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer03/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer03/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer03/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer03/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer03/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer03/intermediate/kernel:0 (768, 3072)
gpt_2/layer03/intermediate/bias:0 (3072,)
gpt_2/layer03/output/kernel:0 (3072, 768)
gpt_2/layer03/output/bias:0 (768,)
gpt_2/layer04/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer04/attention/query_layer/bias:0 (768,)
gpt_2/layer04/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer04/attention/key_layer/bias:0 (768,)
gpt_2/layer04/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer04/attention/value_layer/bias:0 (768,)
gpt_2/layer04/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer04/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer04/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer04/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer04/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer04/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer04/intermediate/kernel:0 (768, 3072)
gpt_2/layer04/intermediate/bias:0 (3072,)
gpt_2/layer04/output/kernel:0 (3072, 768)
gpt_2/layer04/output/bias:0 (768,)
gpt_2/layer05/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer05/attention/query_layer/bias:0 (768,)
gpt_2/layer05/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer05/attention/key_layer/bias:0 (768,)
gpt_2/layer05/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer05/attention/value_layer/bias:0 (768,)
gpt_2/layer05/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer05/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer05/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer05/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer05/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer05/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer05/intermediate/kernel:0 (768, 3072)
gpt_2/layer05/intermediate/bias:0 (3072,)
gpt_2/layer05/output/kernel:0 (3072, 768)
gpt_2/layer05/output/bias:0 (768,)
gpt_2/layer06/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer06/attention/query_layer/bias:0 (768,)
gpt_2/layer06/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer06/attention/key_layer/bias:0 (768,)
gpt_2/layer06/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer06/attention/value_layer/bias:0 (768,)
gpt_2/layer06/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer06/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer06/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer06/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer06/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer06/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer06/intermediate/kernel:0 (768, 3072)
gpt_2/layer06/intermediate/bias:0 (3072,)
gpt_2/layer06/output/kernel:0 (3072, 768)
gpt_2/layer06/output/bias:0 (768,)
gpt_2/layer07/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer07/attention/query_layer/bias:0 (768,)
gpt_2/layer07/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer07/attention/key_layer/bias:0 (768,)
gpt_2/layer07/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer07/attention/value_layer/bias:0 (768,)
gpt_2/layer07/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer07/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer07/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer07/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer07/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer07/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer07/intermediate/kernel:0 (768, 3072)
gpt_2/layer07/intermediate/bias:0 (3072,)
gpt_2/layer07/output/kernel:0 (3072, 768)
gpt_2/layer07/output/bias:0 (768,)
gpt_2/layer08/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer08/attention/query_layer/bias:0 (768,)
gpt_2/layer08/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer08/attention/key_layer/bias:0 (768,)
gpt_2/layer08/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer08/attention/value_layer/bias:0 (768,)
gpt_2/layer08/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer08/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer08/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer08/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer08/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer08/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer08/intermediate/kernel:0 (768, 3072)
gpt_2/layer08/intermediate/bias:0 (3072,)
gpt_2/layer08/output/kernel:0 (3072, 768)
gpt_2/layer08/output/bias:0 (768,)
gpt_2/layer09/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer09/attention/query_layer/bias:0 (768,)
gpt_2/layer09/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer09/attention/key_layer/bias:0 (768,)
gpt_2/layer09/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer09/attention/value_layer/bias:0 (768,)
gpt_2/layer09/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer09/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer09/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer09/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer09/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer09/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer09/intermediate/kernel:0 (768, 3072)
gpt_2/layer09/intermediate/bias:0 (3072,)
gpt_2/layer09/output/kernel:0 (3072, 768)
gpt_2/layer09/output/bias:0 (768,)
gpt_2/layer10/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer10/attention/query_layer/bias:0 (768,)
gpt_2/layer10/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer10/attention/key_layer/bias:0 (768,)
gpt_2/layer10/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer10/attention/value_layer/bias:0 (768,)
gpt_2/layer10/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer10/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer10/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer10/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer10/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer10/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer10/intermediate/kernel:0 (768, 3072)
gpt_2/layer10/intermediate/bias:0 (3072,)
gpt_2/layer10/output/kernel:0 (3072, 768)
gpt_2/layer10/output/bias:0 (768,)
gpt_2/layer11/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer11/attention/query_layer/bias:0 (768,)
gpt_2/layer11/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer11/attention/key_layer/bias:0 (768,)
gpt_2/layer11/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer11/attention/value_layer/bias:0 (768,)
gpt_2/layer11/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer11/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer11/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer11/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer11/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer11/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer11/intermediate/kernel:0 (768, 3072)
gpt_2/layer11/intermediate/bias:0 (3072,)
gpt_2/layer11/output/kernel:0 (3072, 768)
gpt_2/layer11/output/bias:0 (768,)
gpt_2/LayerNorm_final_norm/gamma:0 (768,)
gpt_2/LayerNorm_final_norm/beta:0 (768,)

@qhduan
Copy link
Collaborator

qhduan commented Dec 9, 2021

GPT模型的只能初始化一次,如果第二次运行出来的就是gpt_2
再运行一次还可以是gpt_3

@Biaocsu
Copy link
Author

Biaocsu commented Dec 9, 2021

这样的呀,哈哈哈哈哈。初始化多次对CPM转换会有影响吗?
或者怎样只能初始化一次呢?

@qhduan
Copy link
Collaborator

qhduan commented Dec 9, 2021

如果你用notebook,就只能每次修改之后重启,保证只初始化一次

这部分当然也可以做一些冗余处理,其实只要保证weights都对上号就行,名字并不太重要

@realTaki
Copy link

这样的呀,哈哈哈哈哈。初始化多次对CPM转换会有影响吗? 或者怎样只能初始化一次呢?

@Biaocsu 您好,请问原版的distill模型还在吗?官方链接失效了,如果大佬有留存可以分享一下吗?

@447428054
Copy link

@xingyaoww @Biaocsu 原CPM模型能分享一下吗

https://cpm.baai.ac.cn/

这个链接打不开了呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants