Hello. Thank you for your work.
In this line:
https://github.com/codelion/optillm/blob/12ac7863cf713c5cd417f81ec1e52ed28017bc16/optillm/plugins/longcepo/mapreduce.py#L224
We use max_context_window - tok_len(collapse_prompt) - max_output_tokens to estimate how much tokens we have to fit all the answers we previously had from MAP stage. But we probably want to estimate it as: max_context_window - tok_len(reduce_prompt) - max_output_tokens because we will actually pass combined answers to reduce prompt in the next step.
Here is a "pseudopatch" to demonstrate what I mean:
num_tokens = get_prompt_length(format_chunk_list(context_chunks), tokenizer)
token_budget = (
longcepo_config.max_context_window
- - get_prompt_length(longcepo_config.collapse_prompt, tokenizer)
+ - get_prompt_length(longcepo_config.reduce_prompt, tokenizer)
- longcepo_config.max_output_tokens
)
logger.info(f"Pre-collapse length of chunks {num_tokens}, allowed {token_budget}")