Skip to content

Conversation

@abhsahu
Copy link

@abhsahu abhsahu commented Apr 28, 2025

Realtek R8127 driver can be downloaded from https://www.realtek.com/Download/List?cate_id=584

Where it is maintained as out of tree module.

This patch series adds this out of tree module in drivers/net/ethernet/realtek/r8127 and update makefile and Kconfig to get R8127 driver build from kernel build system.

Once Realtek upstream these changes, then these patches can be reverted.

sourabgupta3 and others added 30 commits April 18, 2025 09:42
BugLink: https://bugs.launchpad.net/bugs/2059814

With this change, the NFS driver would be enabled to support GPUDirectStorage(GDS).
The change is around frwr_map and frwr_unmap in the NFS driver, where the IO request
is first intercepted to check for GDS pages and if it is a GDS page then the
request is served by GDS driver component called nvidia-fs,
else the request would be served by the standard NFS driver code.

Signed-off-by: Sourab Gupta <sougupta@nvidia.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Ian May <ian.may@canonical.com>
Signed-off-by: Ian May <ian.may@canonical.com>
(cherry picked from commit 5cf0699 noble:linux-nvidia/main-next)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…on Linux 6.8 Kernel

BugLink: https://bugs.launchpad.net/bugs/2059814

With this change, the NVMe and NVMeOF driver would be enabled to support GPUDirectStorage(GDS).
The change is around nvme/nvme rdma map_data()
and unmap_data(), where the IO request is
first intercepted to check for GDS pages and
if it is a GDS page then the request is served
by GDS driver component called nvidia-fs,
else the request would be served by the standard NVMe driver code

Signed-off-by: Sourab Gupta <sougupta@nvidia.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Ian May <ian.may@canonical.com>
Signed-off-by: Ian May <ian.may@canonical.com>
(cherry picked from commit 3ea8193 noble:linux-nvidia/main-next)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2059814

Signed-off-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Ian May <ian.may@canonical.com>
Signed-off-by: Ian May <ian.may@canonical.com>
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2059316

Signed-off-by: dann frazier <dann.frazier@canonical.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2061930
BugLink: https://bugs.launchpad.net/bugs/2067106

There are systems in production that don't have
firmware that supports coresight_etm4x.  Instead of
removing completely, blacklist coresight_etm4x so
systems with the correct firmware can use the module.

Signed-off-by: Ian May <ian.may@canonical.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2068544

On linux kernel 6.6 and above, __symbol_get() on the registration functions
from nvidia-fs was failing as a GPL modules are no longer allowed to
__symbol_get() on non-gpl exported symbols. This change fixes that issue for nfs.

Signed-off-by: Sourab Gupta <sougupta@nvidia.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 01d274f noble:linux-nvidia/main-next)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…ions as GPL

BugLink: https://bugs.launchpad.net/bugs/2068544

On linux kernel 6.6 and above, __symbol_get() on the registration functions
from nvidia-fs was failing as a GPL modules are no longer allowed to
__symbol_get() on non-gpl exported symbols. This change fixes that issue.

Signed-off-by: Sourab Gupta <sougupta@nvidia.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 6148092 noble:linux-nvidia/main-next)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2067111

Nvidia provide a way to flash the UEFI via capsule loader in arm64.
CAPSULE_LOADER is also built-in in L4T kernel so for the easy use,
need to make CAPSULE_LOADER as built-in in arm64.

Nvidia-BugLink: https://nvbugspro.nvidia.com/bug/4601764

Signed-off-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2084598

Set the following configs on x86 and arm64:

CONFIG_MANA_INFINIBAND=m
CONFIG_MICROSOFT_MANA=m

Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Guoqing Jiang <guoqing.jiang@canonical.com>
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…VER package

BugLink: https://bugs.launchpad.net/bugs/2084598

Include mana.ko in linux-modules-ABIVER, rather than
linux-modules-extra-ABIVER.

Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Guoqing Jiang <guoqing.jiang@canonical.com>
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1786013
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1786013
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2084704
Properties: no-test-build
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…ernel-versions (adhoc/d2024.10.14)

BugLink: https://bugs.launchpad.net/bugs/1786013
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Remove these architectures from the list to stop failing build attempts from
automation.

Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2085443
Properties: no-test-build
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…ping_range()""

BugLink: https://bugs.launchpad.net/bugs/2091887

This reverts commit "UBUNTU: SAUCE: Revert "vfio/pci: Use
unmap_mapping_range()"".  This is needed to be compatible with the huge
pfnmap support patches.

Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…ma on mmap'd MMIO fault""

BugLink: https://bugs.launchpad.net/bugs/2091887

This reverts commit "UBUNTU: SAUCE: Revert "vfio/pci: Insert full vma on
mmap'd MMIO fault"". This is required to be compatible with the huge
pfnmap support patches.

Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
… macros for mmap_lock""

BugLink: https://bugs.launchpad.net/bugs/2091887

This reverts commit "UBUNTU: SAUCE: Revert "mm: use rwsem assertion
macros for mmap_lock"". This is required to be compatible with the huge
pfnmap support patches.

Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2091887

This reverts commit "UBUNTU: SAUCE: Revert "mm: remove follow_pfn"" with
the intention of restoring the original "mm: remove follow_pfn" commit.

This was originally reverted to resolve NVIDIA graphics driver build
failures in K6.11, but this build issue has since been resolved in the
graphics driver. The original commit "mm: remove follow_pfn" is expected
to be present by the "mm: replace follow_page() by folio_walk"
backports, hence why we want to restore it here.

Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…_LEAVES

BugLink: https://bugs.launchpad.net/bugs/2091887

Patch series "mm: replace follow_page() by folio_walk".

Looking into a way of moving the last folio_likely_mapped_shared() call in
add_folio_for_migration() under the PTL, I found myself removing
follow_page().  This paves the way for cleaning up all the FOLL_, follow_*
terminology to just be called "GUP" nowadays.

The new page table walker will lookup a mapped folio and return to the
caller with the PTL held, such that the folio cannot get unmapped
concurrently.  Callers can then conditionally decide whether they really
want to take a short-term folio reference or whether the can simply unlock
the PTL and be done with it.

folio_walk is similar to page_vma_mapped_walk(), except that we don't know
the folio we want to walk to and that we are only walking to exactly one
PTE/PMD/PUD.

folio_walk provides access to the pte/pmd/pud (and the referenced folio
page because things like KSM need that), however, as part of this series
no page table modifications are performed by users.

We might be able to convert some other walk_page_range() users that really
only walk to one address, such as DAMON with
damon_mkold_ops/damon_young_ops.  It might make sense to extend folio_walk
in the future to optionally fault in a folio (if applicable), such that we
can replace some get_user_pages() users that really only want to lookup a
single page/folio under PTL without unconditionally grabbing a folio
reference.

I have plans to extend the approach to a range walker that will try
batching various page table entries (not just folio pages) to be a better
replace for walk_page_range() -- and users will be able to opt in which
type of page table entries they want to process -- but that will require
more work and more thoughts.

KSM seems to work just fine (ksm_functional_tests selftests) and
move_pages seems to work (migration selftest).  I tested the leaf
implementation excessively using various hugetlb sizes (64K, 2M, 32M, 1G)
on arm64 using move_pages and did some more testing on x86-64.  Cross
compiled on a bunch of architectures.

This patch (of 11):

We want to make use of vm_normal_page_pmd() in generic page table walking
code where we might walk hugetlb folios that are mapped by PMDs even
without CONFIG_TRANSPARENT_HUGEPAGE.

So let's expose vm_normal_page_pmd() + vm_normal_folio_pmd() with
CONFIG_PGTABLE_HAS_HUGE_LEAVES.

Link: https://lkml.kernel.org/r/20240802155524.517137-1-david@redhat.com
Link: https://lkml.kernel.org/r/20240802155524.517137-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 3523a37)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2091887

We want to get rid of follow_page(), and have a more reasonable way to
just lookup a folio mapped at a certain address, perform some checks while
still under PTL, and then only conditionally grab a folio reference if
really required.

Further, we might want to get rid of some walk_page_range*() users that
really only want to temporarily lookup a single folio at a single address.

So let's add a new page table walker that does exactly that, similarly to
GUP also being able to walk hugetlb VMAs.

Add folio_walk_end() as a macro for now: the compiler is not easy to
please with the pte_unmap()->kunmap_local().

Note that one difference between follow_page() and get_user_pages(1) is
that follow_page() will not trigger faults to get something mapped.  So
folio_walk is at least currently not a replacement for get_user_pages(1),
but could likely be extended/reused to achieve something similar in the
future.

Link: https://lkml.kernel.org/r/20240802155524.517137-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit aa39ca6)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…_walk

BugLink: https://bugs.launchpad.net/bugs/2091887

Let's use folio_walk instead, so we can avoid taking a folio reference
just to read the nid and get rid of another follow_page()/FOLL_DUMP user.
Use FW_ZEROPAGE so we can return "-EFAULT" for it as documented.

The possible return values for follow_page() were confusing, especially
with FOLL_DUMP set.  We'll handle it like documented in the man page:

* -EFAULT: This is a zero page or the memory area is not mapped by the
   process.
* -ENOENT: The page is not present.

We'll keep setting -ENOENT for ZONE_DEVICE.  Maybe not the right thing to
do, but it likely doesn't really matter (just like for weird devmap,
whereby we fake "not present").

Note that the other errors (-EACCESS, -EBUSY, -EIO, -EINVAL, -ENOMEM) so
far only applied when actually moving pages, not when only querying stats.

We'll effectively drop the "secretmem" check we had in follow_page(), but
that shouldn't really matter here, we're not accessing folio/page content
after all.

Link: https://lkml.kernel.org/r/20240802155524.517137-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 46d6a9b)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…lio_walk

BugLink: https://bugs.launchpad.net/bugs/2091887

Let's use folio_walk instead, so we can avoid taking a folio reference
when we won't even be trying to migrate the folio and to get rid of
another follow_page()/FOLL_DUMP user.  Use FW_ZEROPAGE so we can return
"-EFAULT" for it as documented.

We now perform the folio_likely_mapped_shared() check under PTL, which is
what we want: relying on the mapcount and friends after dropping the PTL
does not make too much sense, as the page can get unmapped concurrently
from this process.

Further, we perform the folio isolation under PTL, similar to how we
handle it for MADV_PAGEOUT.

The possible return values for follow_page() were confusing, especially
with FOLL_DUMP set. We'll handle it like documented in the man page:
 * -EFAULT: This is a zero page or the memory area is not mapped by the
    process.
 * -ENOENT: The page is not present.

We'll keep setting -ENOENT for ZONE_DEVICE.  Maybe not the right thing to
do, but it likely doesn't really matter (just like for weird devmap,
whereby we fake "not present").

The other errros are left as is, and match the documentation in the man
page.

While at it, rename add_page_for_migration() to add_folio_for_migration().

We'll lose the "secretmem" check, but that shouldn't really matter because
these folios cannot ever be migrated.  Should vma_migratable() refuse
these VMAs?  Maybe.

Link: https://lkml.kernel.org/r/20240802155524.517137-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 7dff875)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2091887

Let's use folio_walk instead, for example avoiding taking temporary folio
references if the folio does not even apply and getting rid of one more
follow_page() user.

Note that zeropages obviously don't apply: old code could just have
specified FOLL_DUMP.  Anon folios are never secretmem, so we don't care
about losing the check in follow_page().

Link: https://lkml.kernel.org/r/20240802155524.517137-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 184e916)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
sudeep-holla and others added 13 commits April 28, 2025 06:40
…support

BugLink: https://bugs.launchpad.net/bugs/2109019

Currently, the framework notifications are not supported at all.
handle_notif_callbacks() doesn't handle them though it is called with
framework bitmap. Make that explicit by adding checks for the same.

Also, we need to further classify the framework notifications as Secure
Partition Manager(SPM) and NonSecure Hypervisor(NS_HYP). Extend/change
notify_type enumeration to accommodate all the 4 type and rejig the
values so that it can be reused in the bitmap enable mask macros.

While at this, move ffa_notify_type_get() so that it can be used in
notifier_hash_node_get() in the future.

No functional change.

Tested-by: Viresh Kumar <viresh.kumar@linaro.org>
Message-Id: <20250217-ffa_updates-v3-14-bd1d9de615e7@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit 07b760e)
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…r_cb_info

BugLink: https://bugs.launchpad.net/bugs/2109019

Currently, we store the type of the notification in the notifier_cb_info
structure that is put into the hast list to identify if the notification
block is for the secure partition or the non secure VM.

In order to support framework notifications to reuse the hash list and
to avoid creating one for each time, we need store the ffa_device pointer
itself as the same notification ID in framework notifications can be
registered by multiple FF-A devices.

Tested-by: Viresh Kumar <viresh.kumar@linaro.org>
Message-Id: <20250217-ffa_updates-v3-15-bd1d9de615e7@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit a3d73fe)
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…ifications

BugLink: https://bugs.launchpad.net/bugs/2109019

Framework notifications are doorbells that are rung by the partition
managers to signal common events to an endpoint. These doorbells cannot
be rung by an endpoint directly. A partition manager can signal a
Framework notification in response to an FF-A ABI invocation by an
endpoint.

Two additional notify_ops interface is being added for any FF-A device/
driver to register and unregister for such a framework notifications.

Tested-by: Viresh Kumar <viresh.kumar@linaro.org>
Message-Id: <20250217-ffa_updates-v3-16-bd1d9de615e7@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit c10debf)
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2109019

Currently FF-A specification defines only one framework notification:
RX buffer full notification. This notification is signaled by the
partition manager during transmission of a partition message through
indirect messaging to,

1. Notify an endpoint that it has a pending message in its Rx buffer.
2. Inform the message receiver’s scheduler via the schedule receiver
   interrupt that the receiver must be run.

In response to an FFA_MSG_SEND2 invocation by a sender endpoint, the
framework performs the following actions after the message is copied
from the Tx buffer of the sender to the Rx buffer of the receiver:

1. The notification is pended in the framework notification bitmap of
   the receiver.
2. The partition manager of the endpoint that contains receiver’s
   scheduler pends the schedule receiver interrupt for this endpoint.

The receiver receives the notification and copies out the message from
its Rx buffer.

Tested-by: Viresh Kumar <viresh.kumar@linaro.org>
Message-Id: <20250217-ffa_updates-v3-17-bd1d9de615e7@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit 285a5ea)
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
… callback

BugLink: https://bugs.launchpad.net/bugs/2109019

A partition can implement multiple UUIDs and currently we successfully
register each UUID service as a FF-A device. However when adding the
same partition info to the XArray which tracks the SRI callbacks more
than once, it fails.

In order to allow multiple UUIDs per partition to register SRI callbacks
the partition information stored in the XArray needs to be extended to
a listed list.

A function to remove the list of partition information in the XArray
is not added as there are no users at the time. All the partitions are
added at probe/initialisation and removed at cleanup stage.

Tested-by: Viresh Kumar <viresh.kumar@linaro.org>
Message-Id: <20250217-ffa_updates-v3-18-bd1d9de615e7@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit be61da9)
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…F-A instance

BugLink: https://bugs.launchpad.net/bugs/2109019

Currently it is assumed that the driver always calls ffa_notification_get()
at the NS physical FF-A instance to request the SPMC to return pending
SP or SPM Framework notifications. However, in order to support the driver
invoking ffa_notification_get() at virtual FF-A instance, we need to make
sure correct bits are enabled in the bitmaps enable flag.

It is expected to have hypervisor framework and VM notifications bitmap
to be zero at the non-secure physical FF-A instance.

Message-Id: <20250217-ffa_updates-v3-19-bd1d9de615e7@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit 9472fe2)
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…re comparison

BugLink: https://bugs.launchpad.net/bugs/2109019

The return value ver.a0 is unsigned long type and FFA_RET_NOT_SUPPORTED
is a negative value.

Since the return value from the firmware can be just 32-bit even on
64-bit systems as FFA specification mentions it as int32 error code in
w0 register, explicitly casting to s32 ensures correct sign interpretation
when comparing against a signed error code FFA_RET_NOT_SUPPORTED.

Without casting, comparison between unsigned long and a negative
constant could lead to unintended results due to type promotions.

Fixes: 3bbfe98 ("firmware: arm_ffa: Add initial Arm FFA driver support")
Reported-by: Andrei Homescu <ahomescu@google.com>
Message-Id: <20250221095633.506678-1-sudeep.holla@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit cecf6a5)
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…O_GET

BugLink: https://bugs.launchpad.net/bugs/2109019

The return value ret.a2 is of type unsigned long and FFA_RET_NO_DATA is
a negative value.

Since the return value from the firmware can be just 32-bit even on
64-bit systems as FFA specification mentions it as int32 error code in
w0 register, explicitly casting to s32 ensures correct sign interpretation
when comparing against a signed error code FFA_RET_NO_DATA.

Without casting, comparison between unsigned long and a negative
constant could lead to unintended results due to type promotions.

Fixes: 3522be4 ("firmware: arm_ffa: Implement the NOTIFICATION_INFO_GET interface")
Reported-by: Andrei Homescu <ahomescu@google.com>
Message-Id: <20250221095633.506678-2-sudeep.holla@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit 3e282f4)
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2109019

The FF-A notification id list received in response to the call
FFA_NOTIFICATION_INFO_GET is encoded as: partition ID followed by 0 or
more vCPU ID. The count includes all of them.

Fix the issue by skipping the first/partition ID so that only the list
of vCPU IDs are processed correctly for a given partition ID. The first/
partition ID is read before the start of the loop.

Fixes: 3522be4 ("firmware: arm_ffa: Implement the NOTIFICATION_INFO_GET interface")
Reported-by: Andrei Homescu <ahomescu@google.com>
Message-Id: <20250223213909.1197786-1-sudeep.holla@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit c67c233)
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2109019

Set dma_mask for FFA devices, otherwise DMA allocation using the device pointer
lead to following warning:

WARNING: CPU: 1 PID: 1 at kernel/dma/mapping.c:597 dma_alloc_attrs+0xe0/0x124

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Message-Id: <e3dd8042ac680bd74b6580c25df855d092079c18.1737107520.git.viresh.kumar@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit cc0aac7)
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
Realtek R8127 driver can be downloaded from
https://www.realtek.com/Download/List?cate_id=584

Where it is maintained as out of tree module.

This patch adds the extracted content of r8127-11.014.00.tar.bz2 in
the folder drivers/net/ethernet/realtek/r8127.

4bd62fc87de32760fb1f3b9cd3ec14e933035623  r8127-11.014.00.tar.bz2

All the clean-up, makefile and Kconfig related changes will be
done in the subsequent commits. The source code contains a GPL2
compatible license. All the license information and Realtek
copyright notice will be maintained in each file and newly added files.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
These files are not needed to build r8127 as part of kernel
source code build, so removed these non required files.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
This commit moved all files from src folder to parent folder itself.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
@clsotog
Copy link
Collaborator

clsotog commented Apr 28, 2025

Do you have dmesg when loading this driver at the system?

Also last line of drivers/net/ethernet/realtek/r8127/Kconfig is missing "r". Driver module will be called r8127.

@nvmochs
Copy link
Collaborator

nvmochs commented Apr 28, 2025

@abhsahu Couple of questions...

  • What are the upstream plans for this driver?

  • This series is still leaving a lot of OOT driver support in place, e.g. there are a lot of ifdefs to handle different Linux kernel versions and features within the driver. Is the thought that this is carried until the driver is upstream and this series is reverted?

  • In "NVIDIA: SAUCE: Add r8127 in kernel build”;

    • Nit: The newly added Kconfig refers to this as a “fast ethernet” adapter. Isn’t this a 10GbE NIC?
    • How was it decided on which features to leave enabled? Was it just based on the “= y” from the OOT Makefile? That thought process should be captured in the commit message.
  • In "NVIDIA: SAUCE: r8127: remove unused files”

    • How was it determined that these files are not required? The commit message should include more details as to what is being removed and why.

terjebergstrom and others added 2 commits April 29, 2025 07:25
In the original code, r8127 driver was build as out of tree module.
This commit adds Kconfig and updates Makefile for building it
with kernel build.

r8127 driver internally uses different config flags and these are set
through EXTRA_CFLAGS.  These config flags are now set in the Makefile
with ccflags-y. All the flags, that were getting enabled by default in
the original code, have been enabled in ccflags-y. This commit is not
enabling any extra flags.

Some of the files compilation are dependent upon a particular flag.
Now, only default flags are set, so these files will become unused,
This commit has removed these files.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
…127 module

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
@abhsahu abhsahu force-pushed the r8127_ethernet_driver branch from b9d03d3 to bdd2e14 Compare April 29, 2025 15:50
@abhsahu
Copy link
Author

abhsahu commented Apr 29, 2025

Do you have dmesg when loading this driver at the system?

I will get the system again and share the dmesg logs.

Also last line of drivers/net/ethernet/realtek/r8127/Kconfig is missing "r". Driver module will be called r8127.

I have fixed this and updated.

@abhsahu
Copy link
Author

abhsahu commented Apr 29, 2025

@abhsahu Couple of questions...

  • What are the upstream plans for this driver?

I checked with Realtek and their tentative timeline to start the upstreaming process in next month (May 2025).

  • This series is still leaving a lot of OOT driver support in place, e.g. there are a lot of ifdefs to handle different Linux kernel versions and features within the driver. Is the thought that this is carried until the driver is upstream and this series is reverted?

Given that we plan to revert this series once the driver is upstreamed, we have planned to leave those ifdefs. It will leave us with unused code for sometime but it should not cause any issue. Correct ?

  • In "NVIDIA: SAUCE: Add r8127 in kernel build”;

    • Nit: The newly added Kconfig refers to this as a “fast ethernet” adapter. Isn’t this a 10GbE NIC?

I have updated the Kconfig description and fixed that.

  • How was it decided on which features to leave enabled? Was it just based on the “= y” from the OOT Makefile? That thought process should be captured in the commit message.

Yes. It is based on "=y" from OOT makefile. I have update the commit message now and mentioned it.

  • In "NVIDIA: SAUCE: r8127: remove unused files”

    • How was it determined that these files are not required? The commit message should include more details as to what is being removed and why.

I have squashed this commit into previous code where all other flags are being removed.
These files build are dependent upon these removed flags, so we can remove the unused files also in the same commit.

@abhsahu
Copy link
Author

abhsahu commented Apr 29, 2025

Thanks @clsotog and @nvmochs for your comments.
I have addressed that and updated the PR.

@nvmochs
Copy link
Collaborator

nvmochs commented Apr 29, 2025

Given that we plan to revert this series once the driver is upstreamed, we have planned to leave those ifdefs. It will leave us with unused code for sometime but it should not cause any issue. Correct ?

Correct, it should not cause issues and is more of an annoyance, i.e. when grepping through the code. Hopefully the driver is upstreamed quickly and we can move to that.


Thanks for the other updates, no further questions from me.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@abhsahu
Copy link
Author

abhsahu commented Apr 30, 2025

The following are the dmesg logs after loading this module for r8127.

[ 5.208893] r8127 0000:01:00.0: Adding to iommu group 16
[ 5.209071] r8127 Ethernet controller driver 11.014.00-NAPI loaded
[ 5.209119] r8127 0000:01:00.0: enabling device (0000 -> 0003)
[ 5.223622] r8127 0000:01:00.0 (unnamed net_device) (uninitialized): Invalid ether addr 00:00:00:00:00:00
[ 5.223635] r8127 0000:01:00.0 (unnamed net_device) (uninitialized): Random ether addr 8e:8b:ad:ab:cd:c5
[ 5.224067] r8127: This product is covered by one or more of the following patents: US6,570,884, US6,115,776, and US6,327,625.
[ 5.224131] r8127 Copyright (C) 2025 Realtek NIC software team nicfae@realtek.com
This program comes with ABSOLUTELY NO WARRANTY; for details, please see http://www.gnu.org/licenses/.
This is free software, and you are welcome to redistribute it under certain conditions; see http://www.gnu.org/licenses/.
[ 5.261720] r8127 0000:01:00.0 enp1s0: renamed from eth1
[ 62.830328] r8127: enp1s0: link up

lspci output:

0000:01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8127 (rev 05)
Subsystem: Realtek Semiconductor Co., Ltd. Device 8127
Physical Slot: 0
Flags: bus master, fast devsel, latency 0, IRQ 161, IOMMU group 16
I/O ports at 1000 [size=256]
Memory at 67100000 (64-bit, non-prefetchable) [size=256K]
Memory at 67140000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable+ Count=64 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Virtual Channel
Capabilities: [164] Device Serial Number 00-00-00-00-00-00-00-00
Capabilities: [174] Secondary PCI Express
Capabilities: [184] Physical Layer 16.0 GT/s Capabilities: [1a8] Lane Margining at the Receiver
Capabilities: [244] Latency Tolerance Reporting
Capabilities: [24c] L1 PM Substates
Capabilities: [25c] Data Link Feature Capabilities: [268] Precision Time Measurement Capabilities: [274] Vendor Specific Information: ID=0003 Rev=1 Len=054
Kernel driver in use: r8127
Kernel modules: r8127

ifconfig output:

ifconfig enp1s0

enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.179.156 netmask 255.255.252.0 broadcast 172.16.179.255
inet6 fe80::453b:e7f8:9f41:6a66 prefixlen 64 scopeid 0x20
ether 8a:5c:32:58:3f:77 txqueuelen 1000 (Ethernet)
RX packets 2620486 bytes 2789836571 (2.7 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1388 bytes 162044 (162.0 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 161

@clsotog
Copy link
Collaborator

clsotog commented Apr 30, 2025

These cards does not have Mac address?
Everytime you boot you have different Mac address?
Does the vendor will provide a way to get permanent Mac Address?

@clsotog
Copy link
Collaborator

clsotog commented Apr 30, 2025

Acked-by: Carol L Soto (csoto@nvidia.com)

My last question can be answered in an email. its more a question of the final product.

@nvidia-bfigg nvidia-bfigg force-pushed the 24.04_linux-nvidia-6.11-next branch from 318d674 to 2fd4cb0 Compare May 2, 2025 15:01
@nvmochs
Copy link
Collaborator

nvmochs commented May 2, 2025

@nvmochs nvmochs closed this May 2, 2025
nvidia-bfigg pushed a commit that referenced this pull request Jan 16, 2026
mlx5e_netdev_change_profile can fail to attach a new profile and can
fail to rollback to old profile, in such case, we could end up with a
dangling netdev with a fully reset netdev_priv. A retry to change
profile, e.g. another attempt to call mlx5e_netdev_change_profile via
switchdev mode change, will crash trying to access the now NULL
priv->mdev.

This fix allows mlx5e_netdev_change_profile() to handle previous
failures and an empty priv, by not assuming priv is valid.

Pass netdev and mdev to all flows requiring
mlx5e_netdev_change_profile() and avoid passing priv.
In mlx5e_netdev_change_profile() check if current priv is valid, and if
not, just attach the new profile without trying to access the old one.

This fixes the following oops, when enabling switchdev mode for the 2nd
time after first time failure:

 ## Enabling switchdev mode first time:

mlx5_core 0012:03:00.1: E-Switch: Supported tc chains and prios offload
workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12
mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: new profile init failed, -12
workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12
mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12
                                                                         ^^^^^^^^
mlx5_core 0000:00:03.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)

 ## retry: Enabling switchdev mode 2nd time:

mlx5_core 0000:00:03.0: E-Switch: Supported tc chains and prios offload
BUG: kernel NULL pointer dereference, address: 0000000000000038
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 13 UID: 0 PID: 520 Comm: devlink Not tainted 6.18.0-rc4+ #91 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
RIP: 0010:mlx5e_detach_netdev+0x3c/0x90
Code: 50 00 00 f0 80 4f 78 02 48 8b bf e8 07 00 00 48 85 ff 74 16 48 8b 73 78 48 d1 ee 83 e6 01 83 f6 01 40 0f b6 f6 e8 c4 42 00 00 <48> 8b 45 38 48 85 c0 74 08 48 89 df e8 cc 47 40 1e 48 8b bb f0 07
RSP: 0018:ffffc90000673890 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8881036a89c0 RCX: 0000000000000000
RDX: ffff888113f63800 RSI: ffffffff822fe720 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000002dcd R09: 0000000000000000
R10: ffffc900006738e8 R11: 00000000ffffffff R12: 0000000000000000
R13: 0000000000000000 R14: ffff8881036a89c0 R15: 0000000000000000
FS:  00007fdfb8384740(0000) GS:ffff88856a9d6000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000038 CR3: 0000000112ae0005 CR4: 0000000000370ef0
Call Trace:
 <TASK>
 mlx5e_netdev_change_profile+0x45/0xb0
 mlx5e_vport_rep_load+0x27b/0x2d0
 mlx5_esw_offloads_rep_load+0x72/0xf0
 esw_offloads_enable+0x5d0/0x970
 mlx5_eswitch_enable_locked+0x349/0x430
 ? is_mp_supported+0x57/0xb0
 mlx5_devlink_eswitch_mode_set+0x26b/0x430
 devlink_nl_eswitch_set_doit+0x6f/0xf0
 genl_family_rcv_msg_doit+0xe8/0x140
 genl_rcv_msg+0x18b/0x290
 ? __pfx_devlink_nl_pre_doit+0x10/0x10
 ? __pfx_devlink_nl_eswitch_set_doit+0x10/0x10
 ? __pfx_devlink_nl_post_doit+0x10/0x10
 ? __pfx_genl_rcv_msg+0x10/0x10
 netlink_rcv_skb+0x52/0x100
 genl_rcv+0x28/0x40
 netlink_unicast+0x282/0x3e0
 ? __alloc_skb+0xd6/0x190
 netlink_sendmsg+0x1f7/0x430
 __sys_sendto+0x213/0x220
 ? __sys_recvmsg+0x6a/0xd0
 __x64_sys_sendto+0x24/0x30
 do_syscall_64+0x50/0x1f0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fdfb8495047

Fixes: c4d7eb5 ("net/mxl5e: Add change profile method")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260108212657.25090-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.