-
Notifications
You must be signed in to change notification settings - Fork 21
Dockerfile for cloud conversion of LLM models #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -0,0 +1,41 @@ | |||
FROM python:3.12-slim AS base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we plan to use prebuilt docker, make sure to release it in an official place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think prebuilt docker is necessary, since otherwise the job needs to install requirements every time, which take a long time..
Working on official release. For this PR, just a place to put the dockerfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, make sure we versioned the dockerfile for new requirements txt
* nit * add amd llm phi * update parameters like isLLM * add evalRuntime * use runtime * add back isGPURequired * update * update * wrong phi * use copy * add execute ep * fix model list (#255) * update phi silica * intel npu (#257) * update intel npu * fix --------- Co-authored-by: hualxie <hualxie@microsoft.com> * add og to amd * nit * Fangyangci/convert to intel npu (#256) * resnet * bert-base-multilingual-cased * 1 * 1 * vit * 1 * fix inference_sample for intel * remove * fix lf * fix intel npu bugs (#259) * fix inference_sample bug * add intelNpu * add intelnpu runtime * fix lf * fix lf * llm intel (#261) * fix inference_sample bug * depseek * add intelNpu * update optimum * add intelnpu runtime * fix lf * fix lf * llm intel model * fix diff * fix check * code style --------- Co-authored-by: hualxie <hualxie@microsoft.com> * clip intel (#262) * 1 * clip * 1 * 1 * naming * add "library": "transformers" * \n * \n * \n * \n * \n * redundant file * use with open_ex (#264) * use open_ex * add comment * remove dup * fix clip copy * update olive to latest * onnxruntime 1.22.0 * onnxruntime does not have 1.22.0 for windows x64, weird * remove QNN in readme * update more readme * forget * add check * use default name pixel_values * rename to ov_model_st_quant * fix * default qnn * strange name for clip * update olive; rollback qnn * update ov name * add deps to resnet * gpu down back to 1.21.0 * fix * change workload profile * loginrequiredmodelids (#273) * Change nonllm model wcr sample code. (#274) * change wcr sample * fix comments * intel evaluations (#276) * vit * fix default size * fix sanitize.py * update wcr for evaluation (#277) * add olive, genai to wcr * update WCR * personal update 377e233de4814b1bc92e173d6dbb503f1f94dc04 * intel use cpu version * add npu to open vino --------- Co-authored-by: hualxie <hualxie@microsoft.com> * add genai and fix sample (#275) * Hualxie/intel wcr (#278) * update olive version * update py * all use new olive --------- Co-authored-by: hualxie <hualxie@microsoft.com> * evaluations (#279) * vit * fix default size * intel bert * fix intel bert * fix intel bert size * clip * run sanitize.py * update copy, intel bert for intel * simplify ov sample * fix bert * rollback for label incorrectness * add evaluate / scikit-learn * google bert * google bert * remove unused * 1 --------- Co-authored-by: hualxie <hualxie@microsoft.com> * qnn still use official * update test * Hualxie/add qnn (#283) * add vit qnn * add google bert * add intel bert * update resnet * rename qdq * nit * 512 * fix data_config * 512 * fix data_config again --------- Co-authored-by: hualxie <hualxie@microsoft.com> * update clip 16 * add clip32 * add clip32, lain * fix * fix * change to genai_winml for qnn llm (#285) * change genai winml version * Hualxie/more fixes (#288) * fix samples * olive-ai==0.9.1 * use AutoImageProcessor --------- Co-authored-by: hualxie <hualxie@microsoft.com> * use mini-imagenet to align with training data * use mini * intel clip accuracy (#287) * intel clip accuracy * fix metrics * add requirement * fix laion bug * handle transpose * 4.48 work for resnet auto processor (#290) * 4.48 work for resnet auto processor * add use_fast = False * add transpose for amd * remove clean_cache * fix onnx --------- Co-authored-by: hualxie <hualxie@microsoft.com> * Fix llama sample and bert recipe. (#291) * fix llama sample * fix intel bert and google bert max_length * fix * add EP and check * remove unused & update error logic (#294) Co-authored-by: hualxie <hualxie@microsoft.com> * change genai sample (#292) * change genai sample * remove unused statement * test (#295) Co-authored-by: hualxie <hualxie@microsoft.com> * Hualxie/add deps (#296) * update install_freeze * update * all use separate installation * comment * revert * update --------- Co-authored-by: hualxie <hualxie@microsoft.com> * use final url (#297) Co-authored-by: hualxie <hualxie@microsoft.com> * remove transformers in system * to lf * lf * install separately (#299) Co-authored-by: hualxie <hualxie@microsoft.com> * clean up reqs & all lf (#300) * clean up reqs * all lf --------- Co-authored-by: hualxie <hualxie@microsoft.com> * update torch version to support RTX50** (#301) * update torch version to support RTX50** * fix download * fix url * update other version * fix version * 1 * fix version * fix version * use 2.6.0 in intelNPU * fix NvidiaGpu-AutoGptq * 1.22.0 & 0.8.0 (#302) Co-authored-by: hualxie <hualxie@microsoft.com> * unify workflow name (#303) * add * add phi3.5 * add inference_model.json * fix all inference_model.json * Hualxie/passes check (#306) * add olive pass check * test some * nit * add more --------- Co-authored-by: hualxie <hualxie@microsoft.com> * Revert "Merge pull request #304 from microsoft/fangyangci/addInferenceModel" (#308) This reverts commit cf1a9f6, reversing changes made to 31e5ed8. * Hualxie/more config check against Olive (#307) * check more * more * more * more --------- Co-authored-by: hualxie <hualxie@microsoft.com> * Hualxie/update and comment pass check (#309) * they use default value, clean up * comment --------- Co-authored-by: hualxie <hualxie@microsoft.com> * add contribute/get-started (#311) * add openai/clip-vit-large-patch14 * data * revert * in the middle * use debugInfo * update * some thoughts * remove empty debugInfo --------- Co-authored-by: hualxie <hualxie@microsoft.com> * fix error * fix sanitize.py print * change name * revert * fix mistake * remove * add version in modeproject * modelinfo version sync * do not exit when error * revert some change * move errors to end * fix mistake * do not exit when error occurs * fix naming * revert miss merge * Update readmes (#315) * Update readmes * fix --------- Co-authored-by: hualxie <hualxie@microsoft.com> * fix sanitize print (#313) * fix error * fix sanitize.py print * change name * revert * fix mistake * do not exit when error occurs * fix naming * revert miss merge * fix * revert test code * use default version -1 * Hualxie/update contribute guide (#318) * add openai/clip-vit-large-patch14 * data * update docs * check if exist * nit * remove openai/clip-vit-large-patch14 * already set from config --------- Co-authored-by: hualxie <hualxie@microsoft.com> * revert to open load model in playground (#323) * update to 1.22.0.post1 (#322) Co-authored-by: hualxie <hualxie@microsoft.com> * add name to templates * updates * add name * all lf * nit * remove * Dockerfile for cloud conversion of LLM models (#319) * gpu dockerfile * update docker files * intel docker image * reuse * fix * add readme * Refactor the code in `sanitize.py` * 1 * 1 * use displayName (#325) * use displayName * update --------- Co-authored-by: hualxie <hualxie@microsoft.com> * manual fix * manual fix * add format * remove import in __init__.py * add print tip * move auto formatter to file * try fix rename diff * backup: rename original sanitze.py to sanitize_old.py * 1 * try fix rename * backup: rename original sanitze.py to sanitize_old.py * add new sanitize.py * add comment * feat: add more checks (#329) * dump checks to file * commit * update --------- Co-authored-by: hualxie <hualxie@microsoft.com> * feat: add llm eval config (#328) * add LLM Evaluator Template" * fix * use fallbackValue * add description * - * update * merge * clean * fix * add req --------- Co-authored-by: hualxie <hualxie@microsoft.com> * runtime = ep+device * add runtime in passes * add phi 4 mini for open vino (#327) * add phi 4 mini for open vino * fix * update transformers for phi4 * ? * use features * revert * nit --------- Co-authored-by: hualxie <hualxie@microsoft.com> * use action * rename * remove any * fix naming * add display name in conversion * fix naming * phi4 * fix naming * fix naming * fix naming check path * remove readonly * fix tab * rtx recipe * fix install_freeze * delete copy * fix comments * fix comments * fix conflicts * fix name * fix name * rename intel recipe remove reuse_cache to fix intelGpu/intelNpu cache model name not match * fix recipe name * Add biceps (#321) * revert * update image * delete dockerfile * add pyEnvRuntimeFeatures (#334) Co-authored-by: hualxie <hualxie@microsoft.com> * update resource (#337) * add dml recipes; update onnxruntime-genai-winml==0.8.3 (#336) * will it work? * add bert dml * ignore * ? * copy OrtTransformersOptimization * correct target * add llm ones * update data * add latency * ds * llama * qwen * add others * update * nit * update sanitize * DirectML * rename * 0.8.3 * add pyEnvRuntimeFeatures * add eval nightly for olive * vit * add more * add clips * save_as_external_data = true * more samples * clean up --------- Co-authored-by: hualxie <hualxie@microsoft.com> * new recipe for Mistral-7B-Instruct-v0.3 * remove usecache * remove dml * feat: write line endings (#339) * use [] * updated * nit --------- Co-authored-by: hualxie <hualxie@microsoft.com> * feat: fix clip (#341) * need another pr * remove Hide models * fix --------- Co-authored-by: hualxie <hualxie@microsoft.com> * add qianwen2.5 7b * feat: add line endings (#344) * line endings * . * add back for 6033 error --------- Co-authored-by: hualxie <hualxie@microsoft.com> * fix * add qwen other models * fix name * wirh * remove status and hide * revert * Update cloud conversion bicep workload (#347) * update bicep * update wcr * sort * add GetSortKey * add deepseek * fix * add requirements.txt * fix * default index * remove genai * fix string * update readme for DML * phi4 * fix README.md * feat: finalize project update scenario (#352) * version consideration * align files --------- Co-authored-by: hualxie <hualxie@microsoft.com> * phi3-mini * add left intel gpu * remove some mistake * format inference_model.json (#355) Co-authored-by: hualxie <hualxie@microsoft.com> * add DisplayNameToRuntimeRPC map (#354) * add DisplayNameToRuntimeRPC map * rename --------- Co-authored-by: hualxie <hualxie@microsoft.com> --------- Co-authored-by: hualxie <hualxie@microsoft.com> Co-authored-by: Charles Zhang <progzhangchao@163.com> Co-authored-by: Yue Sun <yuesu@microsoft.com> Co-authored-by: xieofxie <xieofxie@126.com> Co-authored-by: fangyangci <133664123+fangyangci@users.noreply.github.com> Co-authored-by: Yue Sun <2015.apro@gmail.com> Co-authored-by: Chao Zhang <zhangchao@microsoft.com> Co-authored-by: fangyangci <fangyangci@microsoft.com> Co-authored-by: ssss141414 <407748083@qq.com>
This docker file will create an image contains environment for cloud conversion of LLM models.
Python 3.10 with autogptq installed.
Python 3.12 with nightly ort-qnn installed.
Shared with QNN.
Python 3.12 with openvino installed.