-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add deepseek vl #1335
Add deepseek vl #1335
Conversation
Conflicts: lmdeploy/serve/vl_async_engine.py
麻烦修改下面这两个地方,看下还会不会挂掉或者卡住。cache_max_entry_count 可以设小一点。 backend_config=TurbomindEngineConfig(tp=2, session_len=8192, cache_max_entry_count=0.5) lmdeploy/lmdeploy/vl/model/deepseek.py Line 24 in c9b61e3
这个地方改成 cuda:0
with torch.device('cuda:0'):
time_start = time.perf_counter()
outputs = self.model.forward(inputs)
time_end = time.perf_counter()
logger.info(f'ImageEncoder forward {len(inputs)} images, '
f'cost {time_end - time_start:.3f}s') |
lmdeploy/vl/model/deepseek.py
Outdated
with torch.device('cpu'): | ||
model = AutoModelForCausalLM.from_pretrained( | ||
self.model_path, trust_remote_code=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may use init_empty_weights
to accelerate loading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried. But seemed the output of the model would be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accelerating loading model is very important. Please investigate
ValueError: Could not find the operator torchvision::nms. Please make sure you have already registered the operator and (if registered from C++) loaded it via torch.ops.load_library. |
torch 2.1.2+cu118 |
#1321