Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNXRuntimeErrors trying to run or train Splendor #5

Closed
lumi-a opened this issue May 8, 2024 · 4 comments
Closed

ONNXRuntimeErrors trying to run or train Splendor #5

lumi-a opened this issue May 8, 2024 · 4 comments

Comments

@lumi-a
Copy link

lumi-a commented May 8, 2024

Heya, thank you for your awesome additions to alpha-zero!

I tried to run your code, but unfortunately ran into some errors that look similar to the ones in #3.
First, updated all the dependencies:

pip3 install -U onnxruntime numba tqdm colorama coloredlogs
pip3 install -U torch torchvision --extra-index-url https://download.pytorch.org/whl/cpu

In particular, I'm using:

colorama             0.4.6
coloredlogs          15.0.1
numba                0.59.1
onnxruntime          1.17.3
torch                2.3.0+cpu
torchvision          0.18.0+cpu
tqdm                 4.66.4

And then tried the commands from the readme:

python ./pit.py splendor splendor/pretrained_2players.pt human -n 1

which still printed the initial game-board, but then threw:

Error log
Turn 1 Player 0: Traceback (most recent call last):
  File "D:\alpha-zero-general\pit.py", line 252, in <module>
    main()
  File "D:\alpha-zero-general\pit.py", line 246, in main
    play(args)
  File "D:\alpha-zero-general\pit.py", line 71, in play
    result = arena.playGames(args.num_games, initial_state=args.state, verbose=args.display or human)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\Arena.py", line 123, in playGames
    gameResult = self.playGame(verbose=verbose, initial_state=initial_state, other_way=not one_vs_two)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\Arena.py", line 74, in playGame
    action = players[curPlayer](canonical_board, it)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\pit.py", line 59, in <lambda>
    player = lambda x, n: np.argmax(mcts.getActionProb(x, temp=(0.5 if n <= 6 else 0.), force_full_search=True)[0])
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\MCTS.py", line 65, in getActionProb
    self.search(canonicalBoard, dirichlet_noise=dir_noise, forced_playouts=forced_playouts)
  File "D:\alpha-zero-general\MCTS.py", line 144, in search
    Ps, v = self.nnet.predict(canonicalBoard, Vs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 100, in predict
    self.switch_target('inference')
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 290, in switch_target
    self.export_and_load_onnx()
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 338, in export_and_load_onnx
    self.ort_session = ort.InferenceSession(temporary_file, sess_options=opts, providers=['CPUExecutionProvider'])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (MatMulBnFusion_Gemm) Op (Gemm) [ShapeInferenceError] First input does not have rank 2
And running:
python main.py splendor -m 800 -f 0.1 -l 0.0003 -D 0.3 -C ../results/mytest -V 74

Yielded:

Error log
Traceback (most recent call last):
  File "C:\Python\Python312\Lib\threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "C:\Python\Python312\Lib\threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 142, in predict_server
    self.switch_target('inference')
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 290, in switch_target
    self.export_and_load_onnx()
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 338, in export_and_load_onnx
    self.ort_session = ort.InferenceSession(temporary_file, sess_options=opts, providers=['CPUExecutionProvider'])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (MatMulBnFusion_Gemm) Op (Gemm) [ShapeInferenceError] First input does not have rank 2

Unrelated, but I think in the readme, line 108, the game-argument splendor is missing.

@cestpasphoto
Copy link
Owner

cestpasphoto commented May 10, 2024

I've tried different version of torch and onnxruntime, and it looks like export of some networks (including splendor one) aren't compatible with onnxruntime>=1.17.0, whatever the export settings are. That is super weird because exported file seems ok (it passes tests with onnx.checker, visible with netron, ..). So the only workaround I find is to force using onnxruntime 1.16.3.
Also using pytorch 2.x require installing onnx and onnxscript.

I'll update Readme accordingly.

@cestpasphoto
Copy link
Owner

I also raised microsoft/onnxruntime#20641

@cestpasphoto
Copy link
Owner

cestpasphoto commented Jun 14, 2024

Hi @lumi-a ,
I have seen your answer on ticket above: there isn't a proper fix but a workaround. I've added it in the code, which should now support all versions of onnxruntime. I've tried quickly your 2 commands and it works with latest version of onnxruntime.

Please try and let me know if you still have issues or if we can close the ticket.

@lumi-a
Copy link
Author

lumi-a commented Jun 15, 2024

It works now, thank you very much for taking the time!

@lumi-a lumi-a closed this as completed Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants