Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlx5_init: IB device not found #6

Closed
sctb512 opened this issue Apr 21, 2021 · 4 comments
Closed

mlx5_init: IB device not found #6

sctb512 opened this issue Apr 21, 2021 · 4 comments

Comments

@sctb512
Copy link

sctb512 commented Apr 21, 2021

Hello, i try to run this project on my nodes and get the error as follows:

mlx5_init: IB device not found

I found this issue happen in the file ./shenango/runtime/net/directpath/mlx5/mlx5_init.c

int mlx5_common_init(struct hardware_q **rxq_out, struct direct_txq **txq_out,
	             unsigned int nr_rxq, unsigned int nr_txq, bool use_rss)

The value of dev_list[0] is NULL:

dev_list[i]: (nil)

It looks like i can't get device list.

Question 1: Why dev_list[0] is NULL? Is there any way to solve this problem?

Then, I found there is only mlx5 directory in ./shenango/runtime/net/directpath/

common.c common.d common.o defs.h mlx5

but my nodes use mlx4:

02:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]

If i modify CONFIG_DIRECTPATH=y to CONFIG_DIRECTPATH=n in shared.mk, the runtime not works.

Question 2: Whether there is only mlx5 implementation? If I want to run this project on ConnectX-3 devices, can you give me some advice? (I can't apply for a cloudlab account successful.)

Thanks!

@BinZlP
Copy link

BinZlP commented Apr 22, 2021

There's build option for mlx4 devices in shenango/shared.mk.
Modify it as below and re-build shenango:

CONFIG_MLX5=n
CONFIG_MLX4=y

I'm not sure it's working but you can try.

@sctb512
Copy link
Author

sctb512 commented Apr 22, 2021

Thanks for your reply. Before I was able to build successfully, I had modified these places in shenango/shared.mk.
Before modifying, I would get the error as follows:

iokernel/mlx.h:5:10: fatal error: mlx5_custom.h: No such file or directory

@zainryan
Copy link
Contributor

zainryan commented Jun 5, 2021

Thanks for your reply. Before I was able to build successfully, I had modified these places in shenango/shared.mk.
Before modifying, I would get the error as follows:

iokernel/mlx.h:5:10: fatal error: mlx5_custom.h: No such file or directory

Sorry for the late reply. The error is caused by the intermediate mlx5 files left by your first compilation with CONFIG_MLX5=y. You can simply clone a new repo from scratch, set CONFIG_MLX5=n & CONFIG_MLX4=y & CONFIG_DIRECTPATH=n, and recompile.

@zainryan
Copy link
Contributor

zainryan commented Jun 5, 2021

In addition, our project is mostly implemented using mlx5 NIC and only has limited support for mlx4, so you may observe reduced performance with mlx4. When running the program, you have to delete enable_directpath 1 of all config files in AIFM/aifm/configs/. Please let me know if you have any further questions, I'm happy to answer.

@zainryan zainryan closed this as completed Jun 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants