You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
it seems works at the sample phase. which helps reduce the repeativeness problem.
this is a description about repeat_penalty in kimi chat:
The function of repeat_penalty is to adjust the model's sensitivity to tokens that have already been generated when creating new tokens. By increasing the value of repeat_penalty, the model is encouraged to avoid reusing recently generated words, thereby increasing the diversity and novelty of the text. Conversely, reducing the value of repeat_penalty lessens the penalty for repetition, which may result in more repetitive content in the generated text.
Here's a simplified explanation of how the repetition penalty works:
When the model generates text, it calculates the probability of each word or token in its vocabulary to be the next one in the sequence.
If the repetition penalty is greater than 1, the probabilities of tokens that have already appeared in the current output are divided by this penalty factor, effectively reducing their chances of being selected again.
The model then samples the next token from the adjusted distribution of probabilities, taking the repetition penalty into account.
import numpy as np
def apply_repetition_penalty(probabilities, token_sequence, penalty=1.2):
"""
Apply repetition penalty to the probabilities of tokens.
:param probabilities: numpy array of original probabilities for each token
:param token_sequence: list of tokens already generated
:param penalty: repetition penalty factor (default 1.2)
:return: adjusted probabilities
"""
# If penalty is set to 1, no penalty is applied
if penalty == 1:
return probabilities
# Apply penalty to the probabilities of tokens that have already appeared
for token in set(token_sequence):
token_idx = token_to_index[token] # Assuming a function that maps tokens to their index
probabilities[token_idx] /= penalty
# Re-normalize the probabilities
probabilities /= sum(probabilities)
return probabilities
# Example usage
original_probabilities = np.array([0.1, 0.2, 0.3, 0.4]) # Probabilities for 4 hypothetical tokens
generated_sequence = ['token2', 'token3'] # Tokens that have been generated so far
token_to_index = {'token1': 0, 'token2': 1, 'token3': 2, 'token4': 3} # Mapping tokens to their index
# Apply repetition penalty
adjusted_probabilities = apply_repetition_penalty(original_probabilities, generated_sequence, penalty=1.2)
# Sample the next token using the adjusted probabilities
next_token_index = np.random.choice(range(len(adjusted_probabilities)), p=adjusted_probabilities)
next_token = index_to_token[next_token_index] # Assuming a function that maps indices back to tokens
print(f"Next token: {next_token}")
it seems works at the sample phase. which helps reduce the repeativeness problem.
this is a description about repeat_penalty in kimi chat:
related links: https://www.reddit.com/r/LocalLLaMA/comments/15s7ln1/potential_fix_to_the_repetitiveness_problem_of/
The text was updated successfully, but these errors were encountered: