Optional logprobs & fix llama eos/stop token #40

guoqingbao · 2024-06-19T07:22:25Z

Make logprobs as optional (no logprobs field in chat completion response by default),
Fix the EOS problem for the LLaMa pipeline.

Tested cases:

First request:

curl -X POST "http://127.0.0.1:65320/v1/chat/completions" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     -d '{
           "model": "llama7b",
           "messages": [
               {"role": "user", "content": "Explain how to best learn Rust."}
           ],
           "temperature": 0.7,
          "max_tokens": 1024,
          "stop": {"Single":"</s>"}
       }'

First response:

{"id":"cmpl-88fad42f-271f-49b6-ad8c-b6dc8ddda132","choices":[{"message":{"content":" Learning any programming language requires a combination of theory, practice, and dedication. Here are some steps and resources to help you learn Rust effectively:\n\n1. Start with the basics:\n\t* Understand the syntax and basic structure of Rust programs.\n\t* Learn about variables, data types, loops, and control structures.\n\t* Familiarize yourself with Rust's ownership system and borrowing mechanism.\n2. Read the Rust book:\n\t* The Rust book is an official resource that provides a comprehensive introduction to the language.\n\t* It covers topics such as syntax, type system, lifetimes, and error handling.\n\t* Read the book from start to finish, and take breaks to practice what you've learned.\n3. Experiment and build projects:\n\t* Start by writing simple programs to get a feel for the language.\n\t* Gradually move on to more complex projects, such as building a command-line tool or a small library.\n\t* Experiment with different libraries and frameworks to see how they can help you build more robust and efficient programs.\n4. Join the Rust community:\n\t* Participate in online forums, such as the Rust subreddit or the Rust Discord server.\n\t* Attend Rust meetups or conferences to connect with other Rust users and learn from their experiences.\n\t* Contribute to open-source Rust projects to gain practical experience and build your reputation in the community.\n5. Take online courses or tutorials:\n\t* There are many online courses and tutorials available that cover various aspects of Rust programming.\n\t* Some popular resources include the Rust by Example course, the Rust Programming Language course on Pluralsight, and the Rust course on Udemy.\n6. Read Rust-related articles and blogs:\n\t* Stay up-to-date with the latest developments in the Rust ecosystem by reading articles and blog posts on Rust-related topics.\n\t* Some popular Rust blogs include the Rust Bytes blog, the Rust Weekly newsletter, and the Rustlings blog.\n7. Learn C and other programming languages:\n\t* Rust is built on top of the C programming language, so having a solid understanding of C is essential for mastering Rust.\n\t* Familiarize yourself with other programming languages, such as Python, JavaScript, or Java, as they can provide a different perspective on programming and help you understand Rust's design choices.\n8. Practice, practice, practice:\n\t* The best way to learn any programming language is by writing code.\n\t* Set aside time each day or week to practice writing Rust code, and gradually increase the complexity of your projects.\n9. Learn error handling and debugging:\n\t* Rust has a strong focus on error handling and debugging, so it's essential to learn how to use the language's error handling mechanisms effectively.\n\t* Practice using the `?` operator, `match` statements, and `Result` types to handle errors in your code.\n10. Learn advanced topics:\n\t* Once you have a solid foundation in the basics, start exploring more advanced topics such as:\n\t\t+ Macros: Rust's macro system allows you to extend the language itself, which can be a powerful tool for building libraries and frameworks.\n\t\t+ Traits: Traits are a mechanism for defining a set of functions or methods that can be implemented by multiple types.\n\t\t+ Smart pointers: Rust's ownership system is based on smart pointers, which allow you to manage memory safely and efficiently.\nBy following these steps and resources, you can gain a deep understanding of the Rust programming language and become proficient in building robust and efficient software.","role":"[INST]"},"finish_reason":"stop","index":0,"logprobs":null}],"created":1718781072,"model":"llama7b","object":"chat.completion","usage":{"completion_tokens":834,"prompt_tokens":29,"total_tokens":863}}

Second request:

curl -X POST "http://127.0.0.1:65320/v1/chat/completions" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     -d '{
           "model": "llama7b",
           "messages": [
               {"role": "user", "content": "How to practice?"}
           ],
           "temperature": 0.7,
          "max_tokens": 1024,
          "stop": {"Single":"</s>"}
       }'

Second response:

{"id":"cmpl-b1523b98-b71e-49de-979d-70667a5963f8","choices":[{"message":{"content":" Learning Rust can be a challenging but rewarding experience. Here are some steps you can take to practice and improve your skills in Rust:\n1. Start with the basics: Understand the syntax, variables, data types, loops, and control structures in Rust. The official Rust book is a great resource for this.\n2. Build small projects: Start by building small projects, such as a calculator, a to-do list app, or a simple game. This will help you get comfortable with the language and its ecosystem.\n3. Experiment with libraries and frameworks: Rust has a rich ecosystem of libraries and frameworks that can help you build more complex applications. Experiment with different libraries and frameworks to see how they can help you build your projects.\n4. Learn error handling: Error handling is an important aspect of Rust programming. Learn how to use Result, Option, and other error handling mechanisms to handle errors in your code.\n5. Learn about ownership and borrowing: Rust's ownership and borrowing system is one of its most unique features. Understand how to manage ownership and borrowing in your code to avoid common errors.\n6. Learn about advanced topics: Once you have a good grasp of the basics, start exploring more advanced topics such as lifetimes, borrowing, and smart pointers. These topics can help you write more efficient and safe code.\n7. Practice, practice, practice: The best way to learn Rust is by writing code. Start with small projects and gradually work your way up to more complex applications.\n8. Join a community: Join online communities such as the Rust subreddit, Rust Discord, or Rust forums to connect with other Rust users, ask questions, and learn from their experiences.\n9. Read Rust documentation: The Rust documentation is a wealth of information, and it's a great resource to learn about different aspects of the language.\n10. Take online courses or tutorials: There are many online courses or tutorials that can help you learn Rust, such as the Rust by Example course, or the Rust programming language tutorial on Pluralsight.\n111. Read Rust books: There are many great books available to learn Rust, such as \"The Rust Programming Language\", \"Rust Crash Course\", or \"Rust by Example\".\n12. Participate in coding challenges: Participate in coding challenges such as Rust's \"Rust By Example\" challenges, or the \"Rust Community Book\" to practice your skills and learn from others.\n13. Learn about advanced topics: Once you have a good grasp of the basics, start exploring more advanced topics such as lifetimes, borrowing, and smart pointers. These topics can help you write more efficient and safe code.\n14. Learn about Rust's type system: Rust's type system is one of its most powerful features. Learn how to use enums, generics, and other type-related features to write more robust and safe code.\n15. Learn about Rust's concurrency model: Rust has a unique concurrency model that allows you to write concurrent code that is both safe and efficient. Learn how to use async/await, futures, and other concurrency-related features to write concurrent code that is both safe and efficient.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","role":"[INST]"},"finish_reason":"length","index":0,"logprobs":null}],"created":1718781138,"model":"llama7b","object":"chat.completion","usage":{"completion_tokens":1025,"prompt_tokens":39,"total_tokens":1064}}

Apparently, the KV cache in the previous generation was used in the second request for context learning. However, the second response has some redundant outputs (no eos token was found in the second response).

EricLBuehler · 2024-06-19T07:55:18Z

Apparently, the KV cache in the previous generation was used in the second request for context learning. However, the second response has some redundant outputs (no eos token was found in the second response).

Have you implemented per-request cache clearing? Great work here as usual 😃

guoqingbao · 2024-06-19T08:14:17Z

Apparently, the KV cache in the previous generation was used in the second request for context learning. However, the second response has some redundant outputs (no eos token was found in the second response).

Have you implemented per-request cache clearing? Great work here as usual 😃

Do we need to clear kv cache for each request? I thought kv cache could be used for each session (each session may have multiple requests) for context learning.

EricLBuehler · 2024-06-19T09:02:53Z

Do we need to clear kv cache for each request? I thought kv cache could be used for each session (each session may have multiple requests) for context learning.

If users deploy candle-vllm to a system with multiple chat interactions interleaved, we do not want the KV cache to be shared across requests. I think the right way to do this is to clear the KV cache when a sequence finishes (this is what we do in mistral.rs). There is also prefix caching, which can be used to allow faster in-context learning.

guoqingbao · 2024-06-19T09:33:29Z

Do we need to clear kv cache for each request? I thought kv cache could be used for each session (each session may have multiple requests) for context learning.

If users deploy candle-vllm to a system with multiple chat interactions interleaved, we do not want the KV cache to be shared across requests. I think the right way to do this is to clear the KV cache when a sequence finishes (this is what we do in mistral.rs). There is also prefix caching, which can be used to allow faster in-context learning.

Yes, there is a kv cache leak problem because only one cache_engine instance is used for the entire lifetime of llm_engine. One solution is to create the cache_engine on a per-session basis, allowing multiple requests within a session (from a single user) to share the kv cache. This cache should be cleared after the session ends or automatically if the server's cache is nearly full. Another strategy, as you suggested, is to clear the kv cache after each request and rebuild the session context using prefix caching (this is new to me) or by recomputing the session's chat history.

EricLBuehler

I'll probably add the KV cache clearing in a later PR. Thank you!

guoqingbao added 2 commits June 19, 2024 15:02

Optional logprobs & fix llama eos/stop token

b7c2e3d

Cargo fmt

a449cad

Mention other options for chat completion request

f7f1988

EricLBuehler approved these changes Jun 19, 2024

View reviewed changes

EricLBuehler merged commit 266c2da into EricLBuehler:master Jun 19, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional logprobs & fix llama eos/stop token #40

Optional logprobs & fix llama eos/stop token #40

guoqingbao commented Jun 19, 2024

EricLBuehler commented Jun 19, 2024 •

edited

Loading

guoqingbao commented Jun 19, 2024

EricLBuehler commented Jun 19, 2024

guoqingbao commented Jun 19, 2024

EricLBuehler left a comment

Optional logprobs & fix llama eos/stop token #40

Optional logprobs & fix llama eos/stop token #40

Conversation

guoqingbao commented Jun 19, 2024

EricLBuehler commented Jun 19, 2024 • edited Loading

guoqingbao commented Jun 19, 2024

EricLBuehler commented Jun 19, 2024

guoqingbao commented Jun 19, 2024

EricLBuehler left a comment

Choose a reason for hiding this comment

EricLBuehler commented Jun 19, 2024 •

edited

Loading