Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please submit this for inclusion in the Linux kernel #1

Closed
bootc opened this issue Jun 28, 2016 · 7 comments
Closed

Please submit this for inclusion in the Linux kernel #1

bootc opened this issue Jun 28, 2016 · 7 comments

Comments

@bootc
Copy link

bootc commented Jun 28, 2016

Maintaining an out of tree driver is unhelpful to you, unhelpful to your users and unhelpful to Linux kernel developers. Having this included in the kernel means everyone can benefit from using it on AWS without having to deliberately go and jump through hoops to install it. Please, we'll all love you more for doing so!

@mswilson
Copy link

Hi Chris,

The driver was initially submitted on June 13th. See http://thread.gmane.org/gmane.linux.network/416446

It was also posted a while back while in development: http://thread.gmane.org/gmane.linux.network/404282

A v2 in response to reviews is being prepared.

Kind regards,

Matt

@james-coder
Copy link

james-coder commented Nov 26, 2018

A v2 in response to reviews is being prepared.

Was any progress towards including this in the Linux kernel ever made? It seems like a big hurdle to have to recompile my Linux kernel every time there's a new version of AWS ENA drivers released. I'm using Debian, if there's some kind of sudo apt-get install amzn-drivers-ena I can do, I would also like that.

@netanelbelgazal-zz
Copy link

A v2 in response to reviews is being prepared.

Was any progress towards including this in the Linux kernel ever made? It seems like a big hurdle to have to recompile my Linux kernel every time there's a new version of AWS ENA drivers released. I'm using Debian, if there's some kind of sudo apt-get install amzn-drivers-ena I can do, I would also like that.

Yes,
We constantly sync the ENA driver to the Linux kernel. (ENAv2 was already merged and it is expected to be part of kernel 4.20).
Beside that, you don't have to recompile the entire kernel every time there is a new release.
You can compile the driver as a module and use DKMS for automatic compilation.

@bootc
Copy link
Author

bootc commented Nov 27, 2018

There isn't a DKMS package for ENA in Debian, sadly, and by the time buster is released I would expect it to mostly be unnecessary. ENA 1.0.2 is in Debian's stable kernels, so old but still generally usable, and ENA 1.5.0K is in the current 4.18 kernel in buster (but that will almost certainly change by the time buster is released).

To be fair Amazon deserve plenty of credit for upstreaming this driver in the way they have.

@bootc
Copy link
Author

bootc commented Nov 27, 2018

I should add if Amazon don't think the 1.0.2 driver in Debian is adequate then they should reach out to the appropriate people in Debian to get it updated. I'm not the right person, but I can probably find the right person/people to have that discussion with if necessary.

@netanelbelgazal-zz
Copy link

If you want to use DKMS to build the ENA driver you can follow the instructions propose by @akiyano
#88

@bootc
We recommend upgrading the ENA driver from 1.0.2.
We'll appreciate if you can connect us to the right people inside Debian.

@nmeyerhans
Copy link

@bootc, @NetanelBelgazal I've opened a merge request to get the 2.0.2 driver backported to the linux 4.19 kernel that should be included in buster. It was a pretty straightforward change. Please feel free to leave feedback there if you have access, or here otherwise.

I hope to put something similar together for stretch, but with its 4.9 kernel it's going to be a bit more of a challenge.

tych0 added a commit to tych0/amzn-drivers that referenced this issue Sep 12, 2023
Otherwise we end up with splats like this:

[   74.385177] ------------[ cut here ]------------
[   74.385179] WARNING: CPU: 96 PID: 1289 at net/devlink/core.c:39 devl_assert_locked+0x29/0x30
[   74.385183] Modules linked in: sch_htb act_mirred cls_matchall sch_ingress ifb overlay xfs libcrc32c intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm tls nvme_fabrics irqbypass sch_fq crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd rapl sunrpc binfmt_misc ena(OE) ipmi_ssif ipmi_devintf ahci ccp sp5100_tco ipmi_msghandler libahci k10temp i2c_piix4 8250_dw tcp_bbr sch_fq_codel drm ramoops efi_pstore drm_panel_orientation_quirks reed_solomon configfs bpf_preload ip_tables x_tables autofs4
[   74.385255] CPU: 96 PID: 1289 Comm: kworker/96:1 Tainted: G        W  OE      6.4.15netflix-gf8b10483bd29 amzn#1
[   74.385258] Hardware name: Amazon EC2 m7a.metal-48xl/Not Specified, BIOS 1.0 10/16/2017
[   74.385261] Workqueue: events work_for_cpu_fn
[   74.385291] RIP: 0010:devl_assert_locked+0x29/0x30
[   74.385295] Code: 00 0f 1f 44 00 00 8b 05 45 f0 0e 01 85 c0 75 05 e9 cc ba 08 00 48 81 c7 70 02 00 00 be ff ff ff ff e8 2b 35 07 00 85 c0 75 e6 <0f> 0b e9 b0 ba 08 00 0f 1f 44 00 00 48 81 c7 70 02 00 00 be ff ff
[   74.385298] RSP: 0018:ffffc2d01e647c78 EFLAGS: 00010246
[   74.385302] RAX: 0000000000000000 RBX: ffff9e9b3976a800 RCX: 0000000000000001
[   74.385304] RDX: 0000000000000000 RSI: ffff9e9b3976aa70 RDI: ffff9e9b0be3be68
[   74.385307] RBP: ffff9e9aed82d000 R08: 0000000000000001 R09: 00000000001f0cc0
[   74.385309] R10: ffffc2d01e647bd0 R11: 0000000000000001 R12: ffff9e9b10b96ba0
[   74.385312] R13: 0000000000000028 R14: ffff9e9b90000000 R15: 0000000000000000
[   74.385314] FS:  0000000000000000(0000) GS:ffff9ef831800000(0000) knlGS:0000000000000000
[   74.385317] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   74.385319] CR2: 000056419aee2650 CR3: 0000000178a12001 CR4: 0000000000770ee0
[   74.385322] PKRU: 55555554
[   74.385325] Call Trace:
[   74.385327]  <TASK>
[   74.385330]  ? __warn+0x81/0x160
[   74.385335]  ? devl_assert_locked+0x29/0x30
[   74.385339]  ? report_bug+0xf8/0x1e0
[   74.385347]  ? handle_bug+0x44/0x80
[   74.385351]  ? exc_invalid_op+0x13/0x60
[   74.385355]  ? asm_exc_invalid_op+0x16/0x20
[   74.385366]  ? devl_assert_locked+0x29/0x30
[   74.385372]  ? devl_assert_locked+0x25/0x30
[   74.385375]  devlink_param_notify.constprop.98+0x19/0xd0
[   74.385381]  ena_devlink_alloc+0xa2/0x100 [ena]
[   74.385388]  ? dma_direct_alloc+0x41/0x160
[   74.385393]  ? dma_direct_alloc+0x41/0x160
[   74.385400]  ena_probe+0x221/0xac0 [ena]
[   74.385406]  ? local_clock+0xb/0xd0
[   74.385411]  ? srso_alias_return_thunk+0x5/0x7f
[   74.385414]  ? __lock_acquire.isra.28+0x164/0x6a0
[   74.385420]  ? srso_alias_return_thunk+0x5/0x7f
[   74.385423]  ? srso_alias_return_thunk+0x5/0x7f
[   74.385428]  ? srso_alias_return_thunk+0x5/0x7f
[   74.385432]  ? local_clock+0xb/0xd0
[   74.385436]  ? srso_alias_return_thunk+0x5/0x7f
[   74.385440]  ? lock_release+0x24e/0x390
[   74.385446]  ? srso_alias_return_thunk+0x5/0x7f
[   74.385450]  ? _raw_spin_unlock_irqrestore+0x23/0x50
[   74.385456]  local_pci_probe+0x42/0xb0
[   74.385463]  work_for_cpu_fn+0x13/0x20
[   74.385466]  process_one_work+0x2b5/0x5a0
[   74.385475]  ? process_one_work+0x5a0/0x5a0
[   74.385480]  worker_thread+0x208/0x3b0
[   74.385485]  ? process_one_work+0x5a0/0x5a0
[   74.385489]  kthread+0xe4/0x120
[   74.385492]  ? kthread_complete_and_exit+0x20/0x20
[   74.385498]  ret_from_fork+0x1f/0x30
[   74.385512]  </TASK>
[   74.385514] ---[ end trace 0000000000000000 ]---

The documentation says:

https://github.com/torvalds/linux/blob/65d6e954e37872fd9afb5ef3fc0481bb3c2f20f4/Documentation/networking/devlink/index.rst?plain=1#L12-L17

to use devl_(un)lock() for devl_* functions, while devlink_* functions
acquire the lock on their own. Let's wrap our two devl_* calls in the lock.

Signed-off-by: Tycho Andersen <tandersen@netflix.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants