Arch LinuxでGPUモードがSEGVする #227

qryxip · 2022-08-09T17:32:29Z

不具合の内容

Arch LinuxでGPUモードを使おうとすると、SEGVが起きます。旧C++実装だと普通に動作します。

現象・ログ

❯ sha1sum ./libcore.so ./libonnxruntime.so.1.11.1
3d9669ed2994d03c03ea490e65fdacdf239f9b2c  ./libcore.so
bd1621ce029720b048811c5b9e8041143fce4421  ./libonnxruntime.so.1.11.1
❯ ./voicevox
[02:19:14.176] [info]  Starting 1 engine/s...
[02:19:14.179] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d: Start launching
[02:19:14.179] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d: Starting process
[02:19:14.180] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d mode: GPU
[02:19:14.180] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d path: /home/ryo/Downloads/VOICEVOX/run
[02:19:14.180] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d args: ["--use_gpu"]
[02:19:14.988] [error] ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDERR: Warning: cpu_num_threads is set to 0. ( The library leaves the decision to the synthesis runtime )

[02:19:14.995] [error] ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDERR: INFO:     Started server process [232236]

[02:19:14.996] [error] ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDERR: INFO:     Waiting for application startup.
INFO:     Application startup complete.

[02:19:14.997] [error] ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDERR: INFO:     Uvicorn running on http://127.0.0.1:50021 (Press CTRL+C to quit)

[02:19:15.452] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39440 - "GET /version HTTP/1.1" 200 OK

[02:19:15.464] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39440 - "GET /speakers HTTP/1.1" 200 OK

[02:19:15.498] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39456 - "GET /speaker_info?speaker_uuid=35b2c544-660e-401e-b503-0e14c635303a HTTP/1.1" 200 OK

[02:19:15.504] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39474 - "GET /speaker_info?speaker_uuid=b1a81618-b27b-40d2-b0ea-27a9ad408c4b HTTP/1.1" 200 OK

[02:19:15.512] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39472 - "GET /speaker_info?speaker_uuid=3474ee95-c274-47f9-aa1a-8322163d96f1 HTTP/1.1" 200 OK

[02:19:15.516] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39478 - "GET /speaker_info?speaker_uuid=c30dc15a-0992-4f8d-8bb8-ad3b314e6a6f HTTP/1.1" 200 OK

[02:19:15.542] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39440 - "GET /speaker_info?speaker_uuid=7ffcb7ce-00ec-4bdc-82cd-45a8889e43ff HTTP/1.1" 200 OK

[02:19:15.551] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39456 - "GET /speaker_info?speaker_uuid=e5020595-5c5d-4e87-b849-270a518d0dcf HTTP/1.1" 200 OK

[02:19:15.578] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39442 - "GET /speaker_info?speaker_uuid=388f246b-8c41-4ac1-8e2d-5d79f3ff56d9 HTTP/1.1" 200 OK

[02:19:15.587] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39474 - "GET /speaker_info?speaker_uuid=4f51116a-d9ee-4516-925d-21f183e2afad HTTP/1.1" 200 OK

[02:19:15.596] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39472 - "GET /speaker_info?speaker_uuid=8eaad775-3119-417e-8cf4-2a10bfd592c8 HTTP/1.1" 200 OK

[02:19:15.605] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39456 - "GET /speaker_info?speaker_uuid=9f3ee141-26ad-437e-97bd-d22298d02ad2 HTTP/1.1" 200 OK

[02:19:15.611] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39440 - "GET /speaker_info?speaker_uuid=1a17ca16-7ee5-4ea5-b191-2f02ace24d21 HTTP/1.1" 200 OK

[02:19:15.633] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39478 - "GET /speaker_info?speaker_uuid=481fb609-6446-4870-9f46-90c4dd623403 HTTP/1.1" 200 OK

[02:19:15.973] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:39478 - "POST /audio_query?text=&speaker=2 HTTP/1.1" 200 OK

[02:19:24.278] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d STDOUT: INFO:     127.0.0.1:36752 - "OPTIONS /synthesis?speaker=2&enable_interrogative_upspeak=false HTTP/1.1" 200 OK

[02:19:29.862] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d: Process terminated due to receipt of signal SIGSEGV
[02:19:29.862] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d: Process exited with code null
[02:19:32.344] [error] Error: Failed to fetch
[02:19:32.346] [error] Error: Failed to fetch Failed to fetch AccentPhrases for the text "めたんめたんめたん".
[02:19:32.347] [error] TypeError: Failed to fetch
    at Te.next (app://./js/webpack:/src/openapi/runtime.ts:89:62)
    at Generator.next (<anonymous>)
    at app://./js/webpack:/node_modules/tslib/tslib.es6.js:74:71
    at new Promise (<anonymous>)
    at o (app://./js/webpack:/node_modules/tslib/tslib.es6.js:70:12)
    at Te.fetchApi (app://./js/webpack:/src/openapi/runtime.ts:79:65)
    at Te.next (app://./js/webpack:/src/openapi/runtime.ts:49:37)
    at Generator.next (<anonymous>)
    at app://./js/webpack:/node_modules/tslib/tslib.es6.js:74:71
    at new Promise (<anonymous>)
[02:19:41.697] [info]  All windows closed. Quitting app
[02:19:41.697] [info]  Checking ENGINE status before app quit
[02:19:41.697] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d: last exit code: null, signal: SIGSEGV
[02:19:41.697] [info]  ENGINE 074fc39e-678b-4c13-8916-ffca8d505d1d: Process already closed
[02:19:41.697] [info]  ENGINE 1 / 1 processes killed

再現手順

製品版のVOICEVOX 0.12.5にRustのlibcore.soとlibonnxruntime.so.1.11.1を入れ、GPUモードで何かしらの入力をする

期待動作

SEGVせずに動作する

OSの種類/ディストリ/バージョン

Arch Linux (kernel: 5.15.58-2-lts)
公式NVIDIAドライバ 515.65.01

その他

どこでSEGVしているのかわからないため、明日あたり切り分けを行なってみます。

Hiroshiba · 2022-08-09T18:00:40Z

報告ありがとうございます！！切り分け助かります･･･！！！

qryxip · 2022-08-09T18:24:02Z

use_gpuをオンにしてsimple_ttsを動かしたところ、load_modelで(SEGVとかではなく正常に)落ちているっぽいことまで確認しました。

qwerty2501 · 2022-08-10T01:04:18Z

Windows GPU版だと正常に動いてるのとWindowsとLinuxで特に実装は変えていないのでなんで落ちてしまうか謎ですね。
C++版なら動作するようなのでダウンロードしてるonnxruntimeが違うものでもなさそうですし

qwerty2501 · 2022-08-10T04:47:08Z

そういえば、Rust版のみ軽いモデルは常にCPUを使うようになってましたけど、これが関係してるかも
https://github.com/VOICEVOX/voicevox_core/blob/main/crates/voicevox_core/src/status.rs#L116-L117

qryxip · 2022-08-10T04:54:36Z

(今スマホなのですが)確かdecode_modelの読み込みで失敗していた記憶があります。

PickledChair · 2022-08-10T05:11:22Z

@qwerty2501

そういえば、Rust版のみ軽いモデルは常にCPUを使うようになってましたけど、これが関係してるかも
https://github.com/VOICEVOX/voicevox_core/blob/main/crates/voicevox_core/src/status.rs#L116-L117

C++ 版も軽いモデルは常に CPU を使うようになっていました……（Rust 版を作り始めた時にはなかった条件分岐ですが）。なので C++ 版と比較してここに違いがあるわけではなさそうです。しかし、だとするとどこに違いがあるんでしょう……🤔

voicevox_core/core/src/core.cpp

Lines 90 to 103 in 7ff2e7f

    
               // 軽いモデルの場合はCPUの方が速い 
        
               light_session_options.SetInterOpNumThreads(cpu_num_threads).SetIntraOpNumThreads(cpu_num_threads); 
        
               // 重いモデルはGPUを使ったほうが速い 
        
               heavy_session_options.SetInterOpNumThreads(cpu_num_threads).SetIntraOpNumThreads(cpu_num_threads); 
        
               if (use_gpu) { 
        
           #ifdef DIRECTML 
        
                 heavy_session_options.DisableMemPattern().SetExecutionMode(ExecutionMode::ORT_SEQUENTIAL); 
        
                 Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_DML(heavy_session_options, 0)); 
        
           #else 
        
                 const OrtCUDAProviderOptions cuda_options; 
        
                 heavy_session_options.AppendExecutionProvider_CUDA(cuda_options); 
        
           #endif 
        
               }

qwerty2501 · 2022-08-10T05:11:53Z

あれ本当だ

qwerty2501 · 2022-08-10T05:16:59Z

あ、 C++版だと
AppendExecutionProvider_CUDA のみ実行されているところが、
Rust版だと
with_disable_mem_pattern , with_append_execution_provider_cuda 実行されてるところを見つけました。
with_disable_mem_pattern はdirectmlの時のみに実行されているもので、Windows版だと成功していたのはこれが関係してるのかも

PickledChair · 2022-08-10T05:20:59Z

あ、 C++版だと AppendExecutionProvider_CUDA のみ実行されているところが、 Rust版だと with_disable_mem_pattern , with_append_execution_provider_cuda 実行されてるところを見つけました。 with_disable_mem_pattern はdirectmlの時のみに実行されているもので、Windows版だと成功していたのはこれが関係してるのかも

本当ですね！　これは見落としていました。これを修正して解決するかどうか様子を見たいですね……。

qryxip · 2022-08-10T17:36:41Z

これの続きですが、

ORT_USE_CUDA=1とした上で
libonnxruntime_providers_{cuda,shared}.soを用意
すればGPUモードの動作ができました。

ただその先でまだ失敗
と言っていたのはsimple_tts特有の問題だったのでそこを迂回すれば動作できました。

#228 (comment)

qwerty2501 · 2022-08-10T17:40:49Z

なるほど。
現状providers系のshared libraryは配布対象になっていないので加える必要がありそうですね。

simple_tts特有の問題だったのでそこを迂回すれば動作できました。

これコード修正必要そうですか？

qryxip · 2022-08-10T17:44:48Z

製品VOICEVOX0.12.5 + CUDAからwith_disable_mem_patternを削除する前の声データ入りRust版core(このissueの概要での状況)に、ORT 1.11.1のlibonnxruntime_providers_{cuda,shared}.soを入れればエラーもSEGVもせずに普通に動きました。
core自体には多分コードの修正は要らないと思います。

qryxip · 2022-08-10T17:47:12Z

フルセットのVOICEVOXでSEGVしたのはおそらく古いlibonnxruntime_providers_{cuda,shared}.soを読んでいたからですね...

qryxip · 2022-08-10T18:00:47Z

CUDA関係では要らないというだけでsimple_ttsの実行には何かしらの修正は要るかもしれません。

なるほど。現状providers系のshared libraryは配布対象になっていないので加える必要がありそうですね。

simple_tts特有の問題だったのでそこを迂回すれば動作できました。

これコード修正必要そうですか？

initializeをuse_gpu=true, load_all_models=trueで呼ぶと、self.initialized = falseのままここに突入するので確定でこける、という感じの問題に見えました。

voicevox_core/crates/voicevox_core/src/internal.rs

Line 210 in 18dc02d

let _ = self.decode_forward(LENGTH, PHONEME_SIZE, &f0, &phoneme, speaker_id)?;

Hiroshiba · 2022-08-10T19:33:35Z

core自体には多分コードの修正は要らないと思います。

CUDAでwith_disable_mem_patterを呼ばない修正は、必須じゃないですが呼ばないほうが速くなるはずなので修正しちゃいたいなという感じですかね･･･！

CUDA関係では要らないというだけでsimple_ttsの実行には何かしらの修正は要るかもしれません。

ちょっと原因が追えてないのですが、もしよければissue作成かいきなりPR作成かお願いできると嬉しいです！！

qryxip · 2022-08-10T23:55:40Z

CUDAでwith_disable_mem_patterを呼ばない修正は、必須じゃないですが呼ばないほうが速くなるはずなので修正しちゃいたいなという感じですかね･･･！

あ、いえCUDAではwith_disable_mem_patterは呼ばない方がよいと私も思います。言葉が不足していたのですが、「[CUDA関係ではこれ以上]多分コードの修正は要らない」という意味でした。

ちょっと原因が追えてないのですが、もしよければissue作成かいきなりPR作成かお願いできると嬉しいです！！

はい、PR作る方向でいってみます。

Hiroshiba · 2022-08-11T08:56:33Z

あ、なるほどです！PRもありがとうございます、お待ちしています！！

qwerty2501 mentioned this issue Aug 10, 2022

Rust化の追従をしていく #213

Open

17 tasks

qwerty2501 mentioned this issue Aug 10, 2022

with_disable_mem_pattern を削除(C++版ではなかったコードのため) #228

Merged

qryxip closed this as completed Aug 10, 2022

qryxip mentioned this issue Aug 11, 2022

initializeをuse_gpu=true, load_all_models=trueで呼ぶと確実に失敗する #230

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arch LinuxでGPUモードがSEGVする #227

Arch LinuxでGPUモードがSEGVする #227

qryxip commented Aug 9, 2022 •

edited

Loading

Hiroshiba commented Aug 9, 2022

qryxip commented Aug 9, 2022

qwerty2501 commented Aug 10, 2022

qwerty2501 commented Aug 10, 2022

qryxip commented Aug 10, 2022

PickledChair commented Aug 10, 2022 •

edited

Loading

qwerty2501 commented Aug 10, 2022

qwerty2501 commented Aug 10, 2022

PickledChair commented Aug 10, 2022

qryxip commented Aug 10, 2022

qwerty2501 commented Aug 10, 2022

qryxip commented Aug 10, 2022

qryxip commented Aug 10, 2022

qryxip commented Aug 10, 2022

Hiroshiba commented Aug 10, 2022

qryxip commented Aug 10, 2022

Hiroshiba commented Aug 11, 2022

Arch LinuxでGPUモードがSEGVする #227

Arch LinuxでGPUモードがSEGVする #227

Comments

qryxip commented Aug 9, 2022 • edited Loading

不具合の内容

現象・ログ

再現手順

期待動作

OSの種類/ディストリ/バージョン

その他

Hiroshiba commented Aug 9, 2022

qryxip commented Aug 9, 2022

qwerty2501 commented Aug 10, 2022

qwerty2501 commented Aug 10, 2022

qryxip commented Aug 10, 2022

PickledChair commented Aug 10, 2022 • edited Loading

qwerty2501 commented Aug 10, 2022

qwerty2501 commented Aug 10, 2022

PickledChair commented Aug 10, 2022

qryxip commented Aug 10, 2022

qwerty2501 commented Aug 10, 2022

qryxip commented Aug 10, 2022

qryxip commented Aug 10, 2022

qryxip commented Aug 10, 2022

Hiroshiba commented Aug 10, 2022

qryxip commented Aug 10, 2022

Hiroshiba commented Aug 11, 2022

qryxip commented Aug 9, 2022 •

edited

Loading

PickledChair commented Aug 10, 2022 •

edited

Loading