Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
hugetlb: handle truncate racing with page faults
When page fault code needs to allocate and instantiate a new hugetlb page (huegtlb_no_page), it checks early to determine if the fault is beyond i_size. When discovered early, it is easy to abort the fault and return an error. However, it becomes much more difficult to handle when discovered later after allocating the page and consuming reservations and adding to the page cache. Backing out changes in such instances becomes difficult and error prone. Instead of trying to catch and backout all such races, use the hugetlb fault mutex to handle truncate racing with page faults. The most significant change is modification of the routine remove_inode_hugepages such that it will take the fault mutex for EVERY index in the truncated range (or hole in the case of hole punch). Since remove_inode_hugepages is called in the truncate path after updating i_size, we can experience races as follows: - truncate code updates i_size and takes fault mutex before a racing fault. After fault code takes mutex, it will notice fault beyond i_size and abort early. - fault code obtains mutex, and truncate updates i_size after early checks in fault code. fault code will add page beyond i_size. When truncate code takes mutex for page/index, it will remove the page. - truncate updates i_size, but fault code obtains mutex first. If fault code sees updated i_size it will abort early. If fault code does not see updated i_size, it will add page beyond i_size and truncate code will remove page when it obtains fault mutex. Note, for performance reasons remove_inode_hugepages will still use filemap_get_folios for bulk folio lookups. For indicies not returned in the bulk lookup, it will need to lookup individual folios to check for races with page fault. Link: https://lkml.kernel.org/r/20220824175757.20590-5-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: James Houghton <jthoughton@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Prakash Sangappa <prakash.sangappa@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- Loading branch information
Showing
2 changed files
with
152 additions
and
73 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters