Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional logprobs & fix llama eos/stop token #40

Merged
merged 3 commits into from
Jun 19, 2024

Conversation

guoqingbao
Copy link
Collaborator

Make logprobs as optional (no logprobs field in chat completion response by default),
Fix the EOS problem for the LLaMa pipeline.

Tested cases:

First request:

curl -X POST "http://127.0.0.1:65320/v1/chat/completions" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     -d '{
           "model": "llama7b",
           "messages": [
               {"role": "user", "content": "Explain how to best learn Rust."}
           ],
           "temperature": 0.7,
          "max_tokens": 1024,
          "stop": {"Single":"</s>"}
       }'

First response:

{"id":"cmpl-88fad42f-271f-49b6-ad8c-b6dc8ddda132","choices":[{"message":{"content":" Learning any programming language requires a combination of theory, practice, and dedication. Here are some steps and resources to help you learn Rust effectively:\n\n1. Start with the basics:\n\t* Understand the syntax and basic structure of Rust programs.\n\t* Learn about variables, data types, loops, and control structures.\n\t* Familiarize yourself with Rust's ownership system and borrowing mechanism.\n2. Read the Rust book:\n\t* The Rust book is an official resource that provides a comprehensive introduction to the language.\n\t* It covers topics such as syntax, type system, lifetimes, and error handling.\n\t* Read the book from start to finish, and take breaks to practice what you've learned.\n3. Experiment and build projects:\n\t* Start by writing simple programs to get a feel for the language.\n\t* Gradually move on to more complex projects, such as building a command-line tool or a small library.\n\t* Experiment with different libraries and frameworks to see how they can help you build more robust and efficient programs.\n4. Join the Rust community:\n\t* Participate in online forums, such as the Rust subreddit or the Rust Discord server.\n\t* Attend Rust meetups or conferences to connect with other Rust users and learn from their experiences.\n\t* Contribute to open-source Rust projects to gain practical experience and build your reputation in the community.\n5. Take online courses or tutorials:\n\t* There are many online courses and tutorials available that cover various aspects of Rust programming.\n\t* Some popular resources include the Rust by Example course, the Rust Programming Language course on Pluralsight, and the Rust course on Udemy.\n6. Read Rust-related articles and blogs:\n\t* Stay up-to-date with the latest developments in the Rust ecosystem by reading articles and blog posts on Rust-related topics.\n\t* Some popular Rust blogs include the Rust Bytes blog, the Rust Weekly newsletter, and the Rustlings blog.\n7. Learn C and other programming languages:\n\t* Rust is built on top of the C programming language, so having a solid understanding of C is essential for mastering Rust.\n\t* Familiarize yourself with other programming languages, such as Python, JavaScript, or Java, as they can provide a different perspective on programming and help you understand Rust's design choices.\n8. Practice, practice, practice:\n\t* The best way to learn any programming language is by writing code.\n\t* Set aside time each day or week to practice writing Rust code, and gradually increase the complexity of your projects.\n9. Learn error handling and debugging:\n\t* Rust has a strong focus on error handling and debugging, so it's essential to learn how to use the language's error handling mechanisms effectively.\n\t* Practice using the `?` operator, `match` statements, and `Result` types to handle errors in your code.\n10. Learn advanced topics:\n\t* Once you have a solid foundation in the basics, start exploring more advanced topics such as:\n\t\t+ Macros: Rust's macro system allows you to extend the language itself, which can be a powerful tool for building libraries and frameworks.\n\t\t+ Traits: Traits are a mechanism for defining a set of functions or methods that can be implemented by multiple types.\n\t\t+ Smart pointers: Rust's ownership system is based on smart pointers, which allow you to manage memory safely and efficiently.\nBy following these steps and resources, you can gain a deep understanding of the Rust programming language and become proficient in building robust and efficient software.","role":"[INST]"},"finish_reason":"stop","index":0,"logprobs":null}],"created":1718781072,"model":"llama7b","object":"chat.completion","usage":{"completion_tokens":834,"prompt_tokens":29,"total_tokens":863}}

Second request:

curl -X POST "http://127.0.0.1:65320/v1/chat/completions" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     -d '{
           "model": "llama7b",
           "messages": [
               {"role": "user", "content": "How to practice?"}
           ],
           "temperature": 0.7,
          "max_tokens": 1024,
          "stop": {"Single":"</s>"}
       }'

Second response:

{"id":"cmpl-b1523b98-b71e-49de-979d-70667a5963f8","choices":[{"message":{"content":" Learning Rust can be a challenging but rewarding experience. Here are some steps you can take to practice and improve your skills in Rust:\n1. Start with the basics: Understand the syntax, variables, data types, loops, and control structures in Rust. The official Rust book is a great resource for this.\n2. Build small projects: Start by building small projects, such as a calculator, a to-do list app, or a simple game. This will help you get comfortable with the language and its ecosystem.\n3. Experiment with libraries and frameworks: Rust has a rich ecosystem of libraries and frameworks that can help you build more complex applications. Experiment with different libraries and frameworks to see how they can help you build your projects.\n4. Learn error handling: Error handling is an important aspect of Rust programming. Learn how to use Result, Option, and other error handling mechanisms to handle errors in your code.\n5. Learn about ownership and borrowing: Rust's ownership and borrowing system is one of its most unique features. Understand how to manage ownership and borrowing in your code to avoid common errors.\n6. Learn about advanced topics: Once you have a good grasp of the basics, start exploring more advanced topics such as lifetimes, borrowing, and smart pointers. These topics can help you write more efficient and safe code.\n7. Practice, practice, practice: The best way to learn Rust is by writing code. Start with small projects and gradually work your way up to more complex applications.\n8. Join a community: Join online communities such as the Rust subreddit, Rust Discord, or Rust forums to connect with other Rust users, ask questions, and learn from their experiences.\n9. Read Rust documentation: The Rust documentation is a wealth of information, and it's a great resource to learn about different aspects of the language.\n10. Take online courses or tutorials: There are many online courses or tutorials that can help you learn Rust, such as the Rust by Example course, or the Rust programming language tutorial on Pluralsight.\n111. Read Rust books: There are many great books available to learn Rust, such as \"The Rust Programming Language\", \"Rust Crash Course\", or \"Rust by Example\".\n12. Participate in coding challenges: Participate in coding challenges such as Rust's \"Rust By Example\" challenges, or the \"Rust Community Book\" to practice your skills and learn from others.\n13. Learn about advanced topics: Once you have a good grasp of the basics, start exploring more advanced topics such as lifetimes, borrowing, and smart pointers. These topics can help you write more efficient and safe code.\n14. Learn about Rust's type system: Rust's type system is one of its most powerful features. Learn how to use enums, generics, and other type-related features to write more robust and safe code.\n15. Learn about Rust's concurrency model: Rust has a unique concurrency model that allows you to write concurrent code that is both safe and efficient. Learn how to use async/await, futures, and other concurrency-related features to write concurrent code that is both safe and efficient.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","role":"[INST]"},"finish_reason":"length","index":0,"logprobs":null}],"created":1718781138,"model":"llama7b","object":"chat.completion","usage":{"completion_tokens":1025,"prompt_tokens":39,"total_tokens":1064}}

Apparently, the KV cache in the previous generation was used in the second request for context learning. However, the second response has some redundant outputs (no eos token was found in the second response).

@EricLBuehler
Copy link
Owner

EricLBuehler commented Jun 19, 2024

Apparently, the KV cache in the previous generation was used in the second request for context learning. However, the second response has some redundant outputs (no eos token was found in the second response).

Have you implemented per-request cache clearing? Great work here as usual 😃

@guoqingbao
Copy link
Collaborator Author

Apparently, the KV cache in the previous generation was used in the second request for context learning. However, the second response has some redundant outputs (no eos token was found in the second response).

Have you implemented per-request cache clearing? Great work here as usual 😃

Do we need to clear kv cache for each request? I thought kv cache could be used for each session (each session may have multiple requests) for context learning.

@EricLBuehler
Copy link
Owner

Do we need to clear kv cache for each request? I thought kv cache could be used for each session (each session may have multiple requests) for context learning.

If users deploy candle-vllm to a system with multiple chat interactions interleaved, we do not want the KV cache to be shared across requests. I think the right way to do this is to clear the KV cache when a sequence finishes (this is what we do in mistral.rs). There is also prefix caching, which can be used to allow faster in-context learning.

@guoqingbao
Copy link
Collaborator Author

Do we need to clear kv cache for each request? I thought kv cache could be used for each session (each session may have multiple requests) for context learning.

If users deploy candle-vllm to a system with multiple chat interactions interleaved, we do not want the KV cache to be shared across requests. I think the right way to do this is to clear the KV cache when a sequence finishes (this is what we do in mistral.rs). There is also prefix caching, which can be used to allow faster in-context learning.

Yes, there is a kv cache leak problem because only one cache_engine instance is used for the entire lifetime of llm_engine. One solution is to create the cache_engine on a per-session basis, allowing multiple requests within a session (from a single user) to share the kv cache. This cache should be cleared after the session ends or automatically if the server's cache is nearly full. Another strategy, as you suggested, is to clear the kv cache after each request and rebuild the session context using prefix caching (this is new to me) or by recomputing the session's chat history.

Copy link
Owner

@EricLBuehler EricLBuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll probably add the KV cache clearing in a later PR. Thank you!

@EricLBuehler EricLBuehler merged commit 266c2da into EricLBuehler:master Jun 19, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants