Top-K and Top-p sampling #7

boblee22 · 2022-10-19T04:39:42Z

Hi, thanks for your great work!

I have a question about the sampling process. When both top-K and top-p are enabled (e.g., https://github.com/allenai/RL4LMs/blob/main/scripts/training/task_configs/common_gen/t5_nlpo.yml#L44-L51), isn't top-p just ignored because the K most likely next words are filtered and the probability mass is redistributed among only those K next words? Please correct me if my understanding is wrong. Thank you!

rajcscw · 2022-10-22T08:24:00Z

This top p mask is quite different from typical top-p sampling. This is particular to NLPO algorithm. Before sampling, we generate a top p mask from the mask policy (a copy of policy from previous epochs). Depending on generation kwargs, top k is applied on top of this. For details, you can refer to our paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Top-K and Top-p sampling #7

Top-K and Top-p sampling #7

boblee22 commented Oct 19, 2022

rajcscw commented Oct 22, 2022

Top-K and Top-p sampling #7

Top-K and Top-p sampling #7

Comments

boblee22 commented Oct 19, 2022

rajcscw commented Oct 22, 2022