Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udev rule not running after loading fpga image #560

Open
cmoore1776 opened this issue May 18, 2022 · 6 comments
Open

udev rule not running after loading fpga image #560

cmoore1776 opened this issue May 18, 2022 · 6 comments
Assignees

Comments

@cmoore1776
Copy link

cmoore1776 commented May 18, 2022

Summary

The udev rule created by add_udev_rules.sh does not match the device ID used after loading an fpga image.

The rule, which is deployed to /etc/udev/rules.d/9999-presistent-fpga.rules, only matches on:

ATTR{device}=="0x1041"
ATTR{device}=="0x1042"

but it needs to also match on:

ATTR{device}=="0xf001"

Reproduction steps

  1. Launch an F1 instance on the latest AL2 FPGA Developer AMI
  2. Deploy the aws-fpga SDK
  3. Load an fpga image, e.g.
fpga-load-local-image -S 0 -I agfi-xxxxxSOMExIDxxxxx
  1. Note the permissions at /sys/devices/pci0000:00/0000:00:1d.0/resource* are 444
$ ls -lah /sys/devices/pci0000\:00/0000\:00\:1d.0/resource*
-r--r--r-- 1 root root 4.0K Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource
-r--r--r-- 1 root root  32M Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource0
-r--r--r-- 1 root root 2.0M Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource1
-r--r--r-- 1 root root  64K Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource2
-r--r--r-- 1 root root  64K Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource2_wc
-r--r--r-- 1 root root 128G Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource4
-r--r--r-- 1 root root 128G Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource4_wc

Also note the device ID after loading the image:

$ sudo udevadm info -a -p /devices/pci0000:00/0000:00:1d.0 | grep "ATTR{device}"
ATTR{device}=="0xf001"

Fix

Add the following two lines to /etc/udev/rules.d/9999-presistent-fpga.rules:

ATTR{vendor}=="0x1d0f", ATTR{device}=="0xf001", RUN+="/opt/aws/bin/change-fpga-perm.sh %k"
ATTR{vendor}=="0x1d0f", ATTR{device}=="0xf001", ACTION=="add", RUN+="/opt/aws/bin/change-fpga-perm.sh %k"

After loading an image, permissions are 666:

$ ls -lah /sys/devices/pci0000\:00/0000\:00\:1d.0/resourc*
-r--r--r-- 1 root root 4.0K May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource
-rw-rw-rw- 1 root root  32M May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource0
-rw-rw-rw- 1 root root 2.0M May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource1
-rw-rw-rw- 1 root root  64K May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource2
-rw-rw-rw- 1 root root  64K May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource2_wc
-rw-rw-rw- 1 root root 128G May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource4
-rw-rw-rw- 1 root root 128G May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource4_wc
@jacobmgn
Copy link

jacobmgn commented May 25, 2022

Thanks for reporting this.
For reproduction step 3

fpga-load-local-image -S 0 -I agfi-xxxxxSOMExIDxxxxx

Does the image loaded specify a device ID as per https://github.com/aws/aws-fpga/blob/4750aacb4dac9d464b099b27e4337220cf0b0713/hdk/cl/examples/cl_dram_dma_hlx/README.md#create-example-design-gui ?

set ::env(device_id) "0xF001"
set ::env(vendor_id) "0x1D0F"
set ::env(subsystem_id) "0x1D51"
set ::env(subsystem_vendor_id) "0xFEDC"

For example, the cl_dram_dma example is configured to use 0xf001

If so, what device_id is specified.

@cmoore1776
Copy link
Author

Does the image loaded specify a device ID as per https://github.com/aws/aws-fpga/blob/4750aacb4dac9d464b099b27e4337220cf0b0713/hdk/cl/examples/cl_dram_dma_hlx/README.md#create-example-design-gui ?

Yes, 0xf001 is based on using the device_id provided in the example.

@jacobmgn
Copy link

jacobmgn commented May 26, 2022

I think I understand the issue, so let me rephrase.

When following the steps in the HOW TO, setting a device ID of "0xF001" and then running the udev permission script, the PCIe device does not have the permissions properly applied.

Therefore

  • The udev script should be corrected to include the default example device ID
  • The documentation should be updated to note that when using a non-default device ID, the udev script should be patched by the user to enable non-root access to the FPGA device

@jacobmgn
Copy link

jacobmgn commented Jun 10, 2022

Notes:

  • For reproduction, add export AWS_FPGA_ALLOW_NON_ROOT=y to the setup step
  • For reproduction, add export AWS_FPGA_SDK_OTHERS=y to the setup step

@jacobmgn
Copy link

Hello @shamelesscookie ,

I have been trying to reproduce the issue you described, along with the fix in PR #561 .
I haven't been able to reproduce the device permissions you list under step 4.

[centos@ip-172-31-83-184 ~]$ ls -lah /sys/devices/pci0000\:00/0000\:00\:1d.0/resource*
-r--r--r-- 1 root root 4.0K Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource
-rw------- 1 root root  32M Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource0
-rw------- 1 root root 2.0M Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource1
-rw------- 1 root root  64K Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource2
-rw------- 1 root root  64K Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource2_wc
-rw------- 1 root root 128G Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource4
-rw------- 1 root root 128G Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource4_wc
[centos@ip-172-31-83-184 ~]$ sudo udevadm info -a -p /devices/pci0000:00/0000:00:1d.0 | grep "ATTR{device}"
    ATTR{device}=="0xf001"

Are you using any environment variables that are not listed in your reproduction steps?

As a note, I have been using the public cl_dram_dma AGFI ( agfi-0b5c35827af676702) with a PCI Device ID of 0xF001.

https://github.com/aws/aws-fpga/blob/4750aacb4dac9d464b099b27e4337220cf0b0713/hdk/cl/examples/cl_examples_list.md

@AWSjoeluc
Copy link

Hello!

Is there anything that AWS can help to resolve this issue? If the issue is resolved, we're curious to know the resolution.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants