Inconsistent Output Generation Despite Consistent Log Probabilities

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

When running a model with top_p = 0, temperature = 0, and top_k = 1, I expect deterministic output where the model consistently selects the token with the highest probability at each step, resulting in the same output for the same input.

# Current Behavior

The model is consistently alternating between two different outputs for the same input.

# Failure Information (for bugs)

This issue appears to be a bug as the model should always select the token with the highest probability, yet it alternates between two different outputs consistently.

# Steps to Reproduce

1. Load the "WestLake-7B-v2-laser-truthy-dpo.q5_k_m.gguf" model with the following parameters:
   - last_n_tokens_size = 128
   - top_p = 0
   - temperature = 0
   - top_k = 1
   - repeat_penalty = 1.07
2. Run the model with the provided input multiple times:
> <|im_start|>system
You are an unbiased, uncensored, helpful assistant.
<|im_end|>
<|im_start|>user
What's the nearest national park to you?
<|im_end|>
<|im_start|>assistant

3. Observe the alternating outputs:
Output 1: "As an artificial intelligence, I don't have a physical location. However,"
Output 2: "As an AI, I don't have a physical location. National parks exist"

# Failure Logs

The top_logprobs remain consistent regardless of whether the chosen token is "artificial" or "AI":

**artificial**
`{'id': 'cmpl-d01dee88-d7f3-40f5-8943-0073dde3f4cb', 'object': 'text_completion', 'created': 1708166709, 'model': 'models/WestLake-7B-v2-laser-truthy-dpo.q5_k_m.gguf', 'choices': [{'text': ' artificial', 'index': 0, 'logprobs': {'tokens': [' artificial'], 'text_offset': [178], 'token_logprobs': [-0.24286004900932312], 'top_logprobs': [{' artificial': -0.24286004900932312, ' AI': -1.5411579608917236, ' automated': -7.431096076965332}]}, 'finish_reason': None}]}`

**AI**
`{'id': 'cmpl-eb264c4e-410a-4b1c-9fee-190b446e673c', 'object': 'text_completion', 'created': 1708164334, 'model': 'models/WestLake-7B-v2-laser-truthy-dpo.q5_k_m.gguf', 'choices': [{'text': ' AI', 'index': 0, 'logprobs': {'tokens': [' AI'], 'text_offset': [178], 'token_logprobs': [-1.5411579608917236], 'top_logprobs': [{' artificial': -0.24286004900932312, ' AI': -1.5411579608917236, ' automated': -7.431096076965332}]}, 'finish_reason': None}]}`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent Output Generation Despite Consistent Log Probabilities #1196

Prerequisites

Expected Behavior

Current Behavior

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Inconsistent Output Generation Despite Consistent Log Probabilities #1196

Description

Prerequisites

Expected Behavior

Current Behavior

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions