Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new chat cli with auto backend feature #1276

Merged
merged 10 commits into from
Mar 26, 2024

Conversation

RunningLeon
Copy link
Collaborator

@RunningLeon RunningLeon commented Mar 12, 2024

Motivation

Support auto backend feature in chat cli

Modification

  • Combine lmdeploy chat torch and lmdeploy chat turbomind into lmdeploy chat command. Note the old commands still work
  • Support auto backend feature in cli: lmdeploy chat .

BC-breaking (Optional)

No BC

Use cases (Optional)

pytorch backend

lmdeploy chat internlm/internlm-chat-7b --backend pytorch

turbomind

lmdeploy chat internlm/internlm-chat-7b

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

@RunningLeon RunningLeon marked this pull request as ready for review March 12, 2024 07:14
@zhyncs
Copy link
Contributor

zhyncs commented Mar 20, 2024

Hi @RunningLeon After the recent auto backend feature, there have been a few mishaps. For example, I compiled TurboMind using Python 3.9, so the turbomind.so is for Python 3.9. If I try to import lmdeploy with a version other than Python 3.9, TurboMind won't work. Previously, without auto backend, it would directly throw an error. Now it doesn't show an error but falls back to PyTorch instead. In most cases, this isn't a problem. However, if a user actually wants to use TurboMind and is unaware of the automatic fallback, unexpected results may occur. Therefore, I am thinking that making auto backend an option might be more appropriate so users can explicitly force it off.

@RunningLeon
Copy link
Collaborator Author

Hi @RunningLeon After the recent auto backend feature, there have been a few mishaps. For example, I compiled TurboMind using Python 3.9, so the turbomind.so is for Python 3.9. If I try to import lmdeploy with a version other than Python 3.9, TurboMind won't work. Previously, without auto backend, it would directly throw an error. Now it doesn't show an error but falls back to PyTorch instead. In most cases, this isn't a problem. However, if a user actually wants to use TurboMind and is unaware of the automatic fallback, unexpected results may occur. Therefore, I am thinking that making auto backend an option might be more appropriate so users can explicitly force it off.

@zhyncs hi, thanks for your feedback. There is a warning for this case:

try:
from lmdeploy.turbomind.supported_models import \
is_supported as is_supported_turbomind
turbomind_has = is_supported_turbomind(model_path)
except ImportError:
logger.warning(
'Lmdeploy with turbomind engine is not installed correctly. '
'You may need to install lmdeploy from pypi or build from source '
'for turbomind engine.')

@zhyncs
Copy link
Contributor

zhyncs commented Mar 20, 2024

Hi @RunningLeon Yes, I did see this warning in source code, but in the actual usage process, seeing the warning did not attract attention. Should we consider raising the log level to error or as mentioned above, adding an option to force it off?

@lvhan028
Copy link
Collaborator

Hi, @zhyncs
Seamlessly falling back to the PyTorch engine when a model isn't supported by TurboMind is a key feature of LMDeploy.
Our goal is to offer LMDeploy as a comprehensive suite rather than just an individual engine, ensuring users a hassle-free experience when deploying models without having to worry about engine-specific issues.
We appreciate your suggestion regarding the log level and the potential for an additional optional argument. However, after careful consideration, we have decided to maintain our current approach.
Could you try to set log_level='warning' when you use lmdeploy?

@lvhan028 lvhan028 self-requested a review March 20, 2024 11:53
@lvhan028
Copy link
Collaborator

@RunningLeon please add logs when fallback to pytorch engine

try: 
     from lmdeploy.turbomind.supported_models import \ 
         is_supported as is_supported_turbomind 
     turbomind_has = is_supported_turbomind(model_path) 
 except ImportError: 
     logger.warning( 
         'Lmdeploy with turbomind engine is not installed correctly. ' 
         'You may need to install lmdeploy from pypi or build from source ' 
         'for turbomind engine.') 

If turbomind is installed, but the model is not supported, there is no log to show the engine falllbacks to pytorch engine

@lvhan028
Copy link
Collaborator

This case failed

CUDA_VISIBLE_DEVICES=1 lmdeploy chat /workspace/models-140/Qwen/Qwen-7B-Chat/ --backend pytorch

lmdeploy/archs.py Outdated Show resolved Hide resolved
f' Try to run with lmdeploy pytorch engine.')
try_run_msg = (f'Try to run with pytorch engine because `{model_path}`'
f' is not explicitly supported by lmdeploy. ')
if is_turbomind_installed:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these warning should only appear when user intent to use turbomind (for example, sending a turbomindconfig)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is using turbomind. Autobackend works only when backendconfig is None or TurbomindConfig.

lmdeploy/cli/utils.py Outdated Show resolved Hide resolved
@lvhan028 lvhan028 merged commit 893a574 into InternLM:main Mar 26, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants