How to run LLama-3 or Phi with more then 4096 prompt tokens? #2171

baleksey · 2024-05-07T20:15:28Z

Could you please show me an example where LLama-3 model used (better GGUF quantized) and initial prompt is more then 4096 tokens long? Or better 16-64K long (for RAG). Currently everything I do ends with error:
In this code:
let logits = model.forward(&input, 0); // input is > 4096 tokens

Error:
narrow invalid args start + len > dim_len: [4096, 64], dim: 0, start: 0, len:4240

Model used:
https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF

Thank you a lot in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run LLama-3 or Phi with more then 4096 prompt tokens? #2171

How to run LLama-3 or Phi with more then 4096 prompt tokens? #2171

baleksey commented May 7, 2024 •

edited

How to run LLama-3 or Phi with more then 4096 prompt tokens? #2171

How to run LLama-3 or Phi with more then 4096 prompt tokens? #2171

Comments

baleksey commented May 7, 2024 • edited

baleksey commented May 7, 2024 •

edited