-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change lod tensor to absolute offsets #4952
change lod tensor to absolute offsets #4952
Conversation
…/lod-absolute-offset-to-relative-offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* | ||
* 0 2 3 | ||
* 0 2 4 7 | ||
* 0 2 5 7 10 12 15 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a detailed explanation here? About what's the meaning of this example, 3-level LoD.
lod.push_back(std::vector<size_t>{0, 10, 20}); | ||
lod.push_back(std::vector<size_t>{0, 5, 10, 15, 20}); | ||
lod.push_back(std::vector<size_t>{0, 2, 3}); | ||
lod.push_back(std::vector<size_t>{0, 2, 5, 8}); | ||
lod.push_back(std::vector<size_t>{0, 2, 5, 7, 10, 12, 15, 17, 20}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请问这里33行,能从35行直接来么?就是除了最底层的lod,上层的lod都直接根据最底层的来算,而不是根据下一层来算。这样要是取某一层,只需要依赖两层,而不是依赖这一层往下的所有层。
lod.push_back(std::vector<size_t>{0, 4, 8});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该会有问题,那样应该是和之前一样,不能表示空序列。
空序列会在beam search里用,beam search的每个 time step 的候选集(当成变长序列)和翻译结果都会用 LoDTensor 存,如果有空的候选集,也需要存储。
@luotao1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为这个PR中,第二层是直接根据最底层的相对位置的来算。那么如果第二层的计算没问题的话,最顶层直接根据最底层的相对位置来算,为什么不能表示空序列呢?
lod.push_back(std::vector<size_t>{0, 4, 8})
4,是35行{0, 2, 5, 7, 10, 12, 15, 17, 20}
的第四个元素,就是10;8是第八个元素,就是20.
这样要是取某一层,只需要依赖最底下那层即可,而不是依赖这一层往下的所有层。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
也可以,但比较trick。
现在的方案,相对偏移从上往下是统一的,而且最底层也是相对偏移,但因为tensor自身instance认为长度为1,所以绝对偏移和相对偏移在最底层表达一致。
这个概念是统一的,不需要绝对和相对偏移的混用。
感觉相比于矩阵的计算,这个耗时不算太大,到后面成为瓶颈的时候再考虑优化吧。
@luotao1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的。而且 @qingqing01 说目前这样表示,很容易拿到每层的句子数,比如{0, 2, 3}
就知道第一个句子有两个子句,第2个句子有1个子句。如果存成{0, 4, 8}
,算length的时候就比较麻烦了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用 ToAbsOffset
改写成之前的格式就可以了
fixes: #4945