将pytorch模型转化到tf，效果变差 #12

Biaocsu · 2021-12-08T09:53:03Z

您好，如题按照您的脚本，对distill模型进行转化，pytorch到tf，怎么效果变差很多，您知道可能哪里出问题的吗？

脚本除了仍使用GPU加载模型，未作任何改变

Loading model cost 0.702 seconds.
Prefix dict has been built successfully.
(1, 1, 30000) (12, 1, 2, 12, 1, 64)
(1, 1, 30000) (12, 1, 2, 12, 1, 64)
tf.Tensor(
[[  837   259   497   788 22707 22707 22707 22707 22707 22707 22707 22707
  22707 22707 22707 22707 22707 22707 22707 22707]], shape=(1, 20), dtype=int64)
今天天气 不错 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵 猥亵
tf.Tensor(
[[  837   259   497   788 24672  6655  7254  6123 22707  2779  8494 28689
  20220 28689  2779  2779 28689 22707  2779  5469]], shape=(1, 20), dtype=int64)
今天天气 不错 裁定穷 驾清楚 猥亵脑 10000畫 电饭畫脑脑畫 猥亵脑 一方

The text was updated successfully, but these errors were encountered:

qhduan · 2021-12-08T15:53:41Z

这个看结果不是变差了，是没转换对，肯定中间某些参数或步骤出问题了。

我的ipynb里面有一些其他的中间输出，可以尝试对照哪个中间输出是否和我现在的不一样，慢慢排查，具体原因我现在也无法判断

Biaocsu · 2021-12-09T01:21:13Z

我再检查下，多谢

Biaocsu · 2021-12-09T02:45:35Z

我这边这里加载出来的均为gpt2模型，而您那边均为gpt模型，怀疑是这里不一致导致后面转化出错的，但不知道原因，我这边CPM-distill模型来自官方下载

for x in gpt.weights:
    if 'gpt/layer' in x.name:
        if 'gpt/layer00' in x.name:
            print(x.name, x.shape)
    else:
        print(x.name, x.shape)

gpt_2/embedding_2/embeddings:0 (30000, 768)
position_embeddings:0 (1024, 768)
gpt_2/layer00/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer00/attention/query_layer/bias:0 (768,)
gpt_2/layer00/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer00/attention/key_layer/bias:0 (768,)
gpt_2/layer00/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer00/attention/value_layer/bias:0 (768,)
gpt_2/layer00/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer00/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer00/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer00/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer00/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer00/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer00/intermediate/kernel:0 (768, 3072)
gpt_2/layer00/intermediate/bias:0 (3072,)
gpt_2/layer00/output/kernel:0 (3072, 768)
gpt_2/layer00/output/bias:0 (768,)
gpt_2/layer01/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer01/attention/query_layer/bias:0 (768,)
gpt_2/layer01/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer01/attention/key_layer/bias:0 (768,)
gpt_2/layer01/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer01/attention/value_layer/bias:0 (768,)
gpt_2/layer01/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer01/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer01/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer01/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer01/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer01/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer01/intermediate/kernel:0 (768, 3072)
gpt_2/layer01/intermediate/bias:0 (3072,)
gpt_2/layer01/output/kernel:0 (3072, 768)
gpt_2/layer01/output/bias:0 (768,)
gpt_2/layer02/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer02/attention/query_layer/bias:0 (768,)
gpt_2/layer02/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer02/attention/key_layer/bias:0 (768,)
gpt_2/layer02/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer02/attention/value_layer/bias:0 (768,)
gpt_2/layer02/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer02/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer02/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer02/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer02/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer02/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer02/intermediate/kernel:0 (768, 3072)
gpt_2/layer02/intermediate/bias:0 (3072,)
gpt_2/layer02/output/kernel:0 (3072, 768)
gpt_2/layer02/output/bias:0 (768,)
gpt_2/layer03/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer03/attention/query_layer/bias:0 (768,)
gpt_2/layer03/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer03/attention/key_layer/bias:0 (768,)
gpt_2/layer03/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer03/attention/value_layer/bias:0 (768,)
gpt_2/layer03/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer03/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer03/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer03/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer03/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer03/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer03/intermediate/kernel:0 (768, 3072)
gpt_2/layer03/intermediate/bias:0 (3072,)
gpt_2/layer03/output/kernel:0 (3072, 768)
gpt_2/layer03/output/bias:0 (768,)
gpt_2/layer04/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer04/attention/query_layer/bias:0 (768,)
gpt_2/layer04/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer04/attention/key_layer/bias:0 (768,)
gpt_2/layer04/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer04/attention/value_layer/bias:0 (768,)
gpt_2/layer04/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer04/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer04/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer04/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer04/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer04/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer04/intermediate/kernel:0 (768, 3072)
gpt_2/layer04/intermediate/bias:0 (3072,)
gpt_2/layer04/output/kernel:0 (3072, 768)
gpt_2/layer04/output/bias:0 (768,)
gpt_2/layer05/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer05/attention/query_layer/bias:0 (768,)
gpt_2/layer05/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer05/attention/key_layer/bias:0 (768,)
gpt_2/layer05/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer05/attention/value_layer/bias:0 (768,)
gpt_2/layer05/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer05/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer05/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer05/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer05/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer05/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer05/intermediate/kernel:0 (768, 3072)
gpt_2/layer05/intermediate/bias:0 (3072,)
gpt_2/layer05/output/kernel:0 (3072, 768)
gpt_2/layer05/output/bias:0 (768,)
gpt_2/layer06/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer06/attention/query_layer/bias:0 (768,)
gpt_2/layer06/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer06/attention/key_layer/bias:0 (768,)
gpt_2/layer06/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer06/attention/value_layer/bias:0 (768,)
gpt_2/layer06/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer06/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer06/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer06/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer06/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer06/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer06/intermediate/kernel:0 (768, 3072)
gpt_2/layer06/intermediate/bias:0 (3072,)
gpt_2/layer06/output/kernel:0 (3072, 768)
gpt_2/layer06/output/bias:0 (768,)
gpt_2/layer07/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer07/attention/query_layer/bias:0 (768,)
gpt_2/layer07/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer07/attention/key_layer/bias:0 (768,)
gpt_2/layer07/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer07/attention/value_layer/bias:0 (768,)
gpt_2/layer07/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer07/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer07/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer07/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer07/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer07/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer07/intermediate/kernel:0 (768, 3072)
gpt_2/layer07/intermediate/bias:0 (3072,)
gpt_2/layer07/output/kernel:0 (3072, 768)
gpt_2/layer07/output/bias:0 (768,)
gpt_2/layer08/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer08/attention/query_layer/bias:0 (768,)
gpt_2/layer08/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer08/attention/key_layer/bias:0 (768,)
gpt_2/layer08/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer08/attention/value_layer/bias:0 (768,)
gpt_2/layer08/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer08/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer08/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer08/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer08/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer08/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer08/intermediate/kernel:0 (768, 3072)
gpt_2/layer08/intermediate/bias:0 (3072,)
gpt_2/layer08/output/kernel:0 (3072, 768)
gpt_2/layer08/output/bias:0 (768,)
gpt_2/layer09/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer09/attention/query_layer/bias:0 (768,)
gpt_2/layer09/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer09/attention/key_layer/bias:0 (768,)
gpt_2/layer09/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer09/attention/value_layer/bias:0 (768,)
gpt_2/layer09/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer09/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer09/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer09/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer09/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer09/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer09/intermediate/kernel:0 (768, 3072)
gpt_2/layer09/intermediate/bias:0 (3072,)
gpt_2/layer09/output/kernel:0 (3072, 768)
gpt_2/layer09/output/bias:0 (768,)
gpt_2/layer10/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer10/attention/query_layer/bias:0 (768,)
gpt_2/layer10/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer10/attention/key_layer/bias:0 (768,)
gpt_2/layer10/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer10/attention/value_layer/bias:0 (768,)
gpt_2/layer10/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer10/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer10/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer10/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer10/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer10/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer10/intermediate/kernel:0 (768, 3072)
gpt_2/layer10/intermediate/bias:0 (3072,)
gpt_2/layer10/output/kernel:0 (3072, 768)
gpt_2/layer10/output/bias:0 (768,)
gpt_2/layer11/attention/query_layer/kernel:0 (768, 768)
gpt_2/layer11/attention/query_layer/bias:0 (768,)
gpt_2/layer11/attention/key_layer/kernel:0 (768, 768)
gpt_2/layer11/attention/key_layer/bias:0 (768,)
gpt_2/layer11/attention/value_layer/kernel:0 (768, 768)
gpt_2/layer11/attention/value_layer/bias:0 (768,)
gpt_2/layer11/attention/context_projection_layer/kernel:0 (768, 768)
gpt_2/layer11/attention/context_projection_layer/bias:0 (768,)
gpt_2/layer11/LayerNorm_mlp_ln0/gamma:0 (768,)
gpt_2/layer11/LayerNorm_mlp_ln0/beta:0 (768,)
gpt_2/layer11/LayerNorm_mlp_ln1/gamma:0 (768,)
gpt_2/layer11/LayerNorm_mlp_ln1/beta:0 (768,)
gpt_2/layer11/intermediate/kernel:0 (768, 3072)
gpt_2/layer11/intermediate/bias:0 (3072,)
gpt_2/layer11/output/kernel:0 (3072, 768)
gpt_2/layer11/output/bias:0 (768,)
gpt_2/LayerNorm_final_norm/gamma:0 (768,)
gpt_2/LayerNorm_final_norm/beta:0 (768,)

qhduan · 2021-12-09T02:47:18Z

GPT模型的只能初始化一次，如果第二次运行出来的就是gpt_2
再运行一次还可以是gpt_3

Biaocsu · 2021-12-09T02:53:37Z

这样的呀，哈哈哈哈哈。初始化多次对CPM转换会有影响吗？
或者怎样只能初始化一次呢？

qhduan · 2021-12-09T02:55:04Z

如果你用notebook，就只能每次修改之后重启，保证只初始化一次

这部分当然也可以做一些冗余处理，其实只要保证weights都对上号就行，名字并不太重要

realTaki · 2023-02-11T09:02:50Z

这样的呀，哈哈哈哈哈。初始化多次对CPM转换会有影响吗？或者怎样只能初始化一次呢？

@Biaocsu 您好，请问原版的distill模型还在吗？官方链接失效了，如果大佬有留存可以分享一下吗？

447428054 · 2023-03-18T12:21:29Z

@xingyaoww @Biaocsu 原CPM模型能分享一下吗

https://cpm.baai.ac.cn/

这个链接打不开了呢

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

将pytorch模型转化到tf，效果变差 #12

将pytorch模型转化到tf，效果变差 #12

Biaocsu commented Dec 8, 2021

qhduan commented Dec 8, 2021

Biaocsu commented Dec 9, 2021

Biaocsu commented Dec 9, 2021

qhduan commented Dec 9, 2021

Biaocsu commented Dec 9, 2021

qhduan commented Dec 9, 2021

realTaki commented Feb 11, 2023

447428054 commented Mar 18, 2023

将pytorch模型转化到tf，效果变差 #12

将pytorch模型转化到tf，效果变差 #12

Comments

Biaocsu commented Dec 8, 2021

qhduan commented Dec 8, 2021

Biaocsu commented Dec 9, 2021

Biaocsu commented Dec 9, 2021

qhduan commented Dec 9, 2021

Biaocsu commented Dec 9, 2021

qhduan commented Dec 9, 2021

realTaki commented Feb 11, 2023

447428054 commented Mar 18, 2023