Skip to content

Dockerfile for cloud conversion of LLM models #319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 26, 2025
Merged

Conversation

ssss141414
Copy link
Contributor

@ssss141414 ssss141414 commented Jun 19, 2025

This docker file will create an image contains environment for cloud conversion of LLM models.

  • QNN. Including:
    Python 3.10 with autogptq installed.
    Python 3.12 with nightly ort-qnn installed.
  • AMD.
    Shared with QNN.
  • Intel.
    Python 3.12 with openvino installed.

@@ -0,0 +1,41 @@
FROM python:3.12-slim AS base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we plan to use prebuilt docker, make sure to release it in an official place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think prebuilt docker is necessary, since otherwise the job needs to install requirements every time, which take a long time..

Working on official release. For this PR, just a place to put the dockerfile.

@ssss141414 ssss141414 changed the title Dockerfile for cloud conversion of QNN Dockerfile for cloud conversion of LLM models Jun 23, 2025
@ssss141414 ssss141414 marked this pull request as ready for review June 25, 2025 06:35
Copy link
Contributor

@xieofxie xieofxie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, make sure we versioned the dockerfile for new requirements txt

@ssss141414 ssss141414 merged commit 6710400 into dev Jun 26, 2025
1 check passed
@ssss141414 ssss141414 deleted the shzhen/docker branch June 26, 2025 01:42
swatDong added a commit that referenced this pull request Aug 4, 2025
* nit

* add amd llm phi

* update parameters like isLLM

* add evalRuntime

* use runtime

* add back isGPURequired

* update

* update

* wrong phi

* use copy

* add execute ep

* fix model list (#255)

* update phi silica

* intel npu (#257)

* update intel npu

* fix

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* add og to amd

* nit

* Fangyangci/convert to intel npu (#256)

* resnet

* bert-base-multilingual-cased

* 1

* 1

* vit

* 1

* fix inference_sample for intel

* remove

* fix lf

* fix intel npu bugs (#259)

* fix inference_sample bug

* add intelNpu

* add intelnpu runtime

* fix lf

* fix lf

* llm intel (#261)

* fix inference_sample bug

* depseek

* add intelNpu

* update optimum

* add intelnpu runtime

* fix lf

* fix lf

* llm intel model

* fix diff

* fix check

* code style

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* clip intel (#262)

* 1

* clip

* 1

* 1

* naming

* add "library": "transformers"

* \n

* \n

* \n

* \n

* \n

* redundant file

* use with open_ex (#264)

* use open_ex

* add comment

* remove dup

* fix clip copy

* update olive to latest

* onnxruntime 1.22.0

* onnxruntime does not have 1.22.0 for windows x64, weird

* remove QNN in readme

* update more readme

* forget

* add check

* use default name pixel_values

* rename to ov_model_st_quant

* fix

* default qnn

* strange name for clip

* update olive; rollback qnn

* update ov name

* add deps to resnet

* gpu down back to 1.21.0

* fix

* change workload profile

* loginrequiredmodelids (#273)

* Change nonllm model wcr sample code. (#274)

* change wcr sample

* fix comments

* intel evaluations (#276)

* vit

* fix default size

* fix sanitize.py

* update wcr for evaluation (#277)

* add olive, genai to wcr

* update WCR

* personal update 377e233de4814b1bc92e173d6dbb503f1f94dc04

* intel use cpu version

* add npu to open vino

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* add genai and fix sample (#275)

* Hualxie/intel wcr (#278)

* update olive version

* update py

* all use new olive

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* evaluations (#279)

* vit

* fix default size

* intel bert

* fix intel bert

* fix intel bert size

* clip

* run sanitize.py

* update copy, intel bert for intel

* simplify ov sample

* fix bert

* rollback for label incorrectness

* add evaluate / scikit-learn

* google bert

* google bert

* remove unused

* 1

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* qnn still use official

* update test

* Hualxie/add qnn (#283)

* add vit qnn

* add google bert

* add intel bert

* update resnet

* rename qdq

* nit

* 512

* fix data_config

* 512

* fix data_config again

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* update clip 16

* add clip32

* add clip32, lain

* fix

* fix

* change to genai_winml for qnn llm (#285)

* change genai winml version

* Hualxie/more fixes (#288)

* fix samples

* olive-ai==0.9.1

* use AutoImageProcessor

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* use mini-imagenet to align with training data

* use mini

* intel clip accuracy (#287)

* intel clip accuracy

* fix metrics

* add requirement

* fix laion bug

* handle transpose

* 4.48 work for resnet auto processor (#290)

* 4.48 work for resnet auto processor

* add use_fast = False

* add transpose for amd

* remove clean_cache

* fix onnx

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* Fix llama sample and bert recipe. (#291)

* fix llama sample

* fix intel bert and google bert max_length

* fix

* add EP and check

* remove unused & update error logic (#294)

Co-authored-by: hualxie <hualxie@microsoft.com>

* change genai sample (#292)

* change genai sample

* remove unused statement

* test (#295)

Co-authored-by: hualxie <hualxie@microsoft.com>

* Hualxie/add deps (#296)

* update install_freeze

* update

* all use separate installation

* comment

* revert

* update

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* use final url (#297)

Co-authored-by: hualxie <hualxie@microsoft.com>

* remove transformers in system

* to lf

* lf

* install separately (#299)

Co-authored-by: hualxie <hualxie@microsoft.com>

* clean up reqs & all lf (#300)

* clean up reqs

* all lf

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* update torch version to support RTX50** (#301)

* update torch version to support RTX50**

* fix download

* fix url

* update other version

* fix version

* 1

* fix version

* fix version

* use 2.6.0 in intelNPU

* fix NvidiaGpu-AutoGptq

* 1.22.0 & 0.8.0 (#302)

Co-authored-by: hualxie <hualxie@microsoft.com>

* unify workflow name (#303)

* add

* add phi3.5

* add inference_model.json

* fix all inference_model.json

* Hualxie/passes check (#306)

* add olive pass check

* test some

* nit

* add more

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* Revert "Merge pull request #304 from microsoft/fangyangci/addInferenceModel" (#308)

This reverts commit cf1a9f6, reversing
changes made to 31e5ed8.

* Hualxie/more config check against Olive (#307)

* check more

* more

* more

* more

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* Hualxie/update and comment pass check (#309)

* they use default value, clean up

* comment

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* add contribute/get-started (#311)

* add  openai/clip-vit-large-patch14

* data

* revert

* in the middle

* use debugInfo

* update

* some thoughts

* remove empty debugInfo

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* fix error

* fix sanitize.py print

* change name

* revert

* fix mistake

* remove

* add version in modeproject

* modelinfo version sync

* do not exit when error

* revert some change

* move errors to end

* fix mistake

* do not exit when error occurs

* fix naming

* revert miss merge

* Update readmes (#315)

* Update readmes

* fix

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* fix sanitize print (#313)

* fix error

* fix sanitize.py print

* change name

* revert

* fix mistake

* do not exit when error occurs

* fix naming

* revert miss merge

* fix

* revert test code

* use default version -1

* Hualxie/update contribute guide (#318)

* add  openai/clip-vit-large-patch14

* data

* update docs

* check if exist

* nit

* remove openai/clip-vit-large-patch14

* already set from config

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* revert to open load model in playground (#323)

* update to 1.22.0.post1 (#322)

Co-authored-by: hualxie <hualxie@microsoft.com>

* add name to templates

* updates

* add name

* all lf

* nit

* remove

* Dockerfile for cloud conversion of LLM models (#319)

* gpu dockerfile

* update docker files

* intel docker image

* reuse

* fix

* add readme

* Refactor the code in `sanitize.py`

* 1

* 1

* use displayName (#325)

* use displayName

* update

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* manual fix

* manual fix

* add format

* remove import in __init__.py

* add print tip

* move auto formatter to file

* try fix rename diff

* backup: rename original sanitze.py to sanitize_old.py

* 1

* try fix rename

* backup: rename original sanitze.py to sanitize_old.py

* add new sanitize.py

* add comment

* feat: add more checks (#329)

* dump checks to file

* commit

* update

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* feat: add llm eval config (#328)

* add LLM Evaluator Template"

* fix

* use fallbackValue

* add description

* -

* update

* merge

* clean

* fix

* add req

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* runtime = ep+device

* add runtime in passes

* add phi 4 mini for open vino (#327)

* add phi 4 mini for open vino

* fix

* update transformers for phi4

* ?

* use features

* revert

* nit

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* use action

* rename

* remove any

* fix naming

* add display name in conversion

* fix naming

* phi4

* fix naming

* fix naming

* fix naming  check path

* remove readonly

* fix tab

* rtx recipe

* fix install_freeze

* delete copy

* fix comments

* fix comments

* fix conflicts

* fix name

* fix name

* rename intel recipe
remove reuse_cache to fix intelGpu/intelNpu cache model name not match

* fix recipe name

* Add biceps (#321)

* revert

* update image

* delete dockerfile

* add pyEnvRuntimeFeatures (#334)

Co-authored-by: hualxie <hualxie@microsoft.com>

* update resource (#337)

* add dml recipes; update onnxruntime-genai-winml==0.8.3 (#336)

* will it work?

* add bert dml

* ignore

* ?

* copy OrtTransformersOptimization

* correct target

* add llm ones

* update data

* add latency

* ds

* llama

* qwen

* add others

* update

* nit

* update sanitize

* DirectML

* rename

* 0.8.3

* add pyEnvRuntimeFeatures

* add eval nightly for olive

* vit

* add more

* add clips

* save_as_external_data = true

* more samples

* clean up

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* new recipe for Mistral-7B-Instruct-v0.3

* remove usecache

* remove dml

* feat: write line endings (#339)

* use []

* updated

* nit

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* feat: fix clip (#341)

* need another pr

* remove Hide models

* fix

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* add qianwen2.5 7b

* feat: add line endings (#344)

* line endings

* .

* add back for 6033 error

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* fix

* add qwen other models

* fix name

* wirh

* remove status and hide

* revert

* Update cloud conversion bicep workload (#347)

* update bicep

* update wcr

* sort

* add GetSortKey

* add deepseek

* fix

* add requirements.txt

* fix

* default index

* remove genai

* fix string

* update readme for DML

* phi4

* fix README.md

* feat: finalize project update scenario (#352)

* version consideration

* align files

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

* phi3-mini

* add left intel gpu

* remove some mistake

* format inference_model.json (#355)

Co-authored-by: hualxie <hualxie@microsoft.com>

* add DisplayNameToRuntimeRPC map (#354)

* add DisplayNameToRuntimeRPC map

* rename

---------

Co-authored-by: hualxie <hualxie@microsoft.com>

---------

Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: Charles Zhang <progzhangchao@163.com>
Co-authored-by: Yue Sun <yuesu@microsoft.com>
Co-authored-by: xieofxie <xieofxie@126.com>
Co-authored-by: fangyangci <133664123+fangyangci@users.noreply.github.com>
Co-authored-by: Yue Sun <2015.apro@gmail.com>
Co-authored-by: Chao Zhang <zhangchao@microsoft.com>
Co-authored-by: fangyangci <fangyangci@microsoft.com>
Co-authored-by: ssss141414 <407748083@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants