Skip to content
Permalink
Daniel-Vetter/…
Switch branches/tags

Commits on Jun 22, 2021

  1. RFC: drm/amdgpu: Implement a proper implicit fencing uapi

    WARNING: Absolutely untested beyond "gcc isn't dying in agony".
    
    Implicit fencing done properly needs to treat the implicit fencing
    slots like a funny kind of IPC mailbox. In other words it needs to be
    explicitly. This is the only way it will mesh well with explicit
    fencing userspace like vk, and it's also the bare minimum required to
    be able to manage anything else that wants to use the same buffer on
    multiple engines in parallel, and still be able to share it through
    implicit sync.
    
    amdgpu completely lacks such an uapi. Fix this.
    
    Luckily the concept of ignoring implicit fences exists already, and
    takes care of all the complexities of making sure that non-optional
    fences (like bo moves) are not ignored. This support was added in
    
    commit 177ae09
    Author: Andres Rodriguez <andresx7@gmail.com>
    Date:   Fri Sep 15 20:44:06 2017 -0400
    
        drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2
    
    Unfortuantely it's the wrong semantics, because it's a bo flag and
    disables implicit sync on an allocated buffer completely.
    
    We _do_ want implicit sync, but control it explicitly. For this we
    need a flag on the drm_file, so that a given userspace (like vulkan)
    can manage the implicit sync slots explicitly. The other side of the
    pipeline (compositor, other process or just different stage in a media
    pipeline in the same process) can then either do the same, or fully
    participate in the implicit sync as implemented by the kernel by
    default.
    
    By building on the existing flag for buffers we avoid any issues with
    opening up additional security concerns - anything this new flag here
    allows is already.
    
    All drivers which supports this concept of a userspace-specific
    opt-out of implicit sync have a flag in their CS ioctl, but in reality
    that turned out to be a bit too inflexible. See the discussion below,
    let's try to do a bit better for amdgpu.
    
    This alone only allows us to completely avoid any stalls due to
    implicit sync, it does not yet allow us to use implicit sync as a
    strange form of IPC for sync_file.
    
    For that we need two more pieces:
    
    - a way to get the current implicit sync fences out of a buffer. Could
      be done in a driver ioctl, but everyone needs this, and generally a
      dma-buf is involved anyway to establish the sharing. So an ioctl on
      the dma-buf makes a ton more sense:
    
      https://lore.kernel.org/dri-devel/20210520190007.534046-4-jason@jlekstrand.net/
    
      Current drivers in upstream solves this by having the opt-out flag
      on their CS ioctl. This has the downside that very often the CS
      which must actually stall for the implicit fence is run a while
      after the implicit fence point was logically sampled per the api
      spec (vk passes an explicit syncobj around for that afaiui), and so
      results in oversync. Converting the implicit sync fences into a
      snap-shot sync_file is actually accurate.
    
    - Simillar we need to be able to set the exclusive implicit fence.
      Current drivers again do this with a CS ioctl flag, with again the
      same problems that the time the CS happens additional dependencies
      have been added. An explicit ioctl to only insert a sync_file (while
      respecting the rules for how exclusive and shared fence slots must
      be update in struct dma_resv) is much better. This is proposed here:
    
      https://lore.kernel.org/dri-devel/20210520190007.534046-5-jason@jlekstrand.net/
    
    These three pieces together allow userspace to fully control implicit
    fencing and remove all unecessary stall points due to them.
    
    Well, as much as the implicit fencing model fundamentally allows:
    There is only one set of fences, you can only choose to sync against
    only writers (exclusive slot), or everyone. Hence suballocating
    multiple buffers or anything else like this is fundamentally not
    possible, and can only be fixed by a proper explicit fencing model.
    
    Aside from that caveat this model gets implicit fencing as closely to
    explicit fencing semantics as possible:
    
    On the actual implementation I opted for a simple setparam ioctl, no
    locking (just atomic reads/writes) for simplicity. There is a nice
    flag parameter in the VM ioctl which we could use, except:
    - it's not checked, so userspace likely passes garbage
    - there's already a comment that userspace _does_ pass garbage in the
      priority field
    So yeah unfortunately this flag parameter for setting vm flags is
    useless, and we need to hack up a new one.
    
    v2: Explain why a new SETPARAM (Jason)
    
    v3: Bas noticed I forgot to hook up the dependency-side shortcut. We
    need both, or this doesn't do much.
    
    v4: Rebase over the amdgpu patch to always set the implicit sync
    fences.
    
    Cc: mesa-dev@lists.freedesktop.org
    Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
    Cc: Dave Airlie <airlied@gmail.com>
    Cc: Rob Clark <robdclark@chromium.org>
    Cc: Kristian H. Kristensen <hoegsberg@google.com>
    Cc: Michel Dänzer <michel@daenzer.net>
    Cc: Daniel Stone <daniels@collabora.com>
    Cc: Sumit Semwal <sumit.semwal@linaro.org>
    Cc: "Christian König" <christian.koenig@amd.com>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
    Cc: Deepak R Varma <mh12gx2825@gmail.com>
    Cc: Chen Li <chenli@uniontech.com>
    Cc: Kevin Wang <kevin1.wang@amd.com>
    Cc: Dennis Li <Dennis.Li@amd.com>
    Cc: Luben Tuikov <luben.tuikov@amd.com>
    Cc: linaro-mm-sig@lists.linaro.org
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  2. drm/gem: Tiny kernel clarification for drm_gem_fence_array_add

    Spotted while trying to convert panfrost to these.
    
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: "Christian König" <christian.koenig@amd.com>
    Cc: Lucas Stach <l.stach@pengutronix.de>
    Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Cc: Maxime Ripard <mripard@kernel.org>
    Cc: Thomas Zimmermann <tzimmermann@suse.de>
    Cc: David Airlie <airlied@linux.ie>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  3. drm/tiny: drm_gem_simple_display_pipe_prepare_fb is the default

    Goes through all the drivers and deletes the default hook since it's
    the default now.
    
    Acked-by: David Lechner <david@lechnology.com>
    Acked-by: Noralf Trønnes <noralf@tronnes.org>
    Acked-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
    Acked-by: Linus Walleij <linus.walleij@linaro.org>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Joel Stanley <joel@jms.id.au>
    Cc: Andrew Jeffery <andrew@aj.id.au>
    Cc: "Noralf Trønnes" <noralf@tronnes.org>
    Cc: Linus Walleij <linus.walleij@linaro.org>
    Cc: Emma Anholt <emma@anholt.net>
    Cc: David Lechner <david@lechnology.com>
    Cc: Kamlesh Gurudasani <kamlesh.gurudasani@gmail.com>
    Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
    Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
    Cc: Maxime Ripard <mripard@kernel.org>
    Cc: Thomas Zimmermann <tzimmermann@suse.de>
    Cc: Sam Ravnborg <sam@ravnborg.org>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: linux-aspeed@lists.ozlabs.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: xen-devel@lists.xenproject.org
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  4. drm/simple-helper: drm_gem_simple_display_pipe_prepare_fb as default

    It's tedious to review this all the time, and my audit showed that
    arcpgu actually forgot to set this.
    
    Make this the default and stop worrying.
    
    Again I sprinkled WARN_ON_ONCE on top to make sure we don't have
    strange combinations of hooks: cleanup_fb without prepare_fb doesn't
    make sense, and since simpler drivers are all new they better be GEM
    based drivers.
    
    v2: Warn and bail when it's _not_ a GEM driver (Noralf)
    
    Cc: Noralf Trønnes <noralf@tronnes.org>
    Acked-by: Noralf Trønnes <noralf@tronnes.org>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Cc: Maxime Ripard <mripard@kernel.org>
    Cc: Thomas Zimmermann <tzimmermann@suse.de>
    Cc: David Airlie <airlied@linux.ie>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  5. drm/omap: Follow implicit fencing in prepare_fb

    I guess no one ever tried running omap together with lima or panfrost,
    not even sure that's possible. Anyway for consistency, fix this.
    
    Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Tomi Valkeinen <tomba@kernel.org>
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  6. drm/vram-helpers: Create DRM_GEM_VRAM_PLANE_HELPER_FUNCS

    Like we have for the shadow helpers too, and roll it out to drivers.
    
    Acked-by: Tian Tao <tiantao6@hisilicon.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Dave Airlie <airlied@redhat.com>
    Cc: Thomas Zimmermann <tzimmermann@suse.de>
    Cc: Hans de Goede <hdegoede@redhat.com>
    Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Cc: Maxime Ripard <mripard@kernel.org>
    Cc: David Airlie <airlied@linux.ie>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: Tian Tao <tiantao6@hisilicon.com>
    Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  7. drm/armada: Remove prepare/cleanup_fb hooks

    All they do is refcount the fb, which the atomic helpers already do.
    
    This is was necessary with the legacy helpers and I guess just carry
    over in the conversion. drm_plane_state always has a full reference
    for its ->fb pointer during its entire lifetime,
    see __drm_atomic_helper_plane_destroy_state()
    
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Russell King <linux@armlinux.org.uk>
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  8. drm/<driver>: drm_gem_plane_helper_prepare_fb is now the default

    No need to set it explicitly.
    
    Acked-by: Heiko Stuebner <heiko@sntech.de>
    Acked-by: Paul Cercueil <paul@crapouillou.net>
    Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
    Acked-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
    Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
    Acked-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
    Acked-by: Philippe Cornu <philippe.cornu@foss.st.com>
    Acked-by: Lucas Stach <l.stach@pengutronix.de>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com>
    Cc: Lucas Stach <l.stach@pengutronix.de>
    Cc: Shawn Guo <shawnguo@kernel.org>
    Cc: Sascha Hauer <s.hauer@pengutronix.de>
    Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
    Cc: Fabio Estevam <festevam@gmail.com>
    Cc: NXP Linux Team <linux-imx@nxp.com>
    Cc: Philipp Zabel <p.zabel@pengutronix.de>
    Cc: Paul Cercueil <paul@crapouillou.net>
    Cc: Chun-Kuang Hu <chunkuang.hu@kernel.org>
    Cc: Matthias Brugger <matthias.bgg@gmail.com>
    Cc: Neil Armstrong <narmstrong@baylibre.com>
    Cc: Kevin Hilman <khilman@baylibre.com>
    Cc: Jerome Brunet <jbrunet@baylibre.com>
    Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
    Cc: Marek Vasut <marex@denx.de>
    Cc: Stefan Agner <stefan@agner.ch>
    Cc: Sandy Huang <hjc@rock-chips.com>
    Cc: "Heiko Stübner" <heiko@sntech.de>
    Cc: Yannick Fertre <yannick.fertre@foss.st.com>
    Cc: Philippe Cornu <philippe.cornu@foss.st.com>
    Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
    Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
    Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
    Cc: Maxime Ripard <mripard@kernel.org>
    Cc: Chen-Yu Tsai <wens@csie.org>
    Cc: Jernej Skrabec <jernej.skrabec@gmail.com>
    Cc: Jyri Sarha <jyri.sarha@iki.fi>
    Cc: Tomi Valkeinen <tomba@kernel.org>
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-mips@vger.kernel.org
    Cc: linux-mediatek@lists.infradead.org
    Cc: linux-amlogic@lists.infradead.org
    Cc: linux-rockchip@lists.infradead.org
    Cc: linux-stm32@st-md-mailman.stormreply.com
    Cc: linux-sunxi@lists.linux.dev
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  9. drm/atomic-helper: make drm_gem_plane_helper_prepare_fb the default

    There's a bunch of atomic drivers who don't do this quite correctly,
    luckily most of them aren't in wide use or people would have noticed
    the tearing.
    
    By making this the default we avoid the constant audit pain and can
    additionally remove a ton of lines from vfuncs for a bit more clarity
    in smaller drivers.
    
    While at it complain if there's a cleanup_fb hook but no prepare_fb
    hook, because that makes no sense. I haven't found any driver which
    violates this, but better safe than sorry.
    
    Subsequent patches will reap the benefits.
    
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Cc: Maxime Ripard <mripard@kernel.org>
    Cc: Thomas Zimmermann <tzimmermann@suse.de>
    Cc: David Airlie <airlied@linux.ie>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  10. drm/panfrost: Fix implicit sync

    Currently this has no practial relevance I think because there's not
    many who can pull off a setup with panfrost and another gpu in the
    same system. But the rules are that if you're setting an exclusive
    fence, indicating a gpu write access in the implicit fencing system,
    then you need to wait for all fences, not just the previous exclusive
    fence.
    
    panfrost against itself has no problem, because it always sets the
    exclusive fence (but that's probably something that will need to be
    fixed for vulkan and/or multi-engine gpus, or you'll suffer badly).
    Also no problem with that against display.
    
    With the prep work done to switch over to the dependency helpers this
    is now a oneliner.
    
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Rob Herring <robh@kernel.org>
    Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
    Cc: Steven Price <steven.price@arm.com>
    Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
    Cc: Sumit Semwal <sumit.semwal@linaro.org>
    Cc: "Christian König" <christian.koenig@amd.com>
    Cc: linux-media@vger.kernel.org
    Cc: linaro-mm-sig@lists.linaro.org
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  11. drm/panfrost: Use xarray and helpers for depedency tracking

    More consistency and prep work for the next patch.
    
    Aside: I wonder whether we shouldn't just move this entire xarray
    business into the scheduler so that not everyone has to reinvent the
    same wheels. Cc'ing some scheduler people for this too.
    
    v2: Correctly handle sched_lock since Lucas pointed out it's needed.
    
    v3: Rebase, dma_resv_get_excl_unlocked got renamed
    
    v4: Don't leak job references on failure (Steven).
    
    Cc: Lucas Stach <l.stach@pengutronix.de>
    Cc: "Christian König" <christian.koenig@amd.com>
    Cc: Luben Tuikov <luben.tuikov@amd.com>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Cc: Lee Jones <lee.jones@linaro.org>
    Cc: Steven Price <steven.price@arm.com>
    Cc: Rob Herring <robh@kernel.org>
    Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
    Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
    Cc: Sumit Semwal <sumit.semwal@linaro.org>
    Cc: linux-media@vger.kernel.org
    Cc: linaro-mm-sig@lists.linaro.org
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  12. drm/panfrost: Shrink sched_lock

    drm/scheduler requires a lock between _init and _push_job, but the
    reservation lock dance doesn't. So shrink the critical section a
    notch.
    
    v2: Lucas pointed out how this should really work, I got it all wrong
    in v1.
    
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Lucas Stach <l.stach@pengutronix.de>
    Cc: Rob Herring <robh@kernel.org>
    Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
    Cc: Steven Price <steven.price@arm.com>
    Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  13. dma-buf: Document dma-buf implicit fencing/resv fencing rules

    Docs for struct dma_resv are fairly clear:
    
    "A reservation object can have attached one exclusive fence (normally
    associated with write operations) or N shared fences (read
    operations)."
    
    https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-objects
    
    Furthermore a review across all of upstream.
    
    First of render drivers and how they set implicit fences:
    
    - nouveau follows this contract, see in validate_fini_no_ticket()
    
    			nouveau_bo_fence(nvbo, fence, !!b->write_domains);
    
      and that last boolean controls whether the exclusive or shared fence
      slot is used.
    
    - radeon follows this contract by setting
    
    		p->relocs[i].tv.num_shared = !r->write_domain;
    
      in radeon_cs_parser_relocs(), which ensures that the call to
      ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the
      right thing.
    
    - vmwgfx seems to follow this contract with the shotgun approach of
      always setting ttm_val_buf->num_shared = 0, which means
      ttm_eu_fence_buffer_objects() will only use the exclusive slot.
    
    - etnaviv follows this contract, as can be trivially seen by looking
      at submit_attach_object_fences()
    
    - i915 is a bit a convoluted maze with multiple paths leading to
      i915_vma_move_to_active(). Which sets the exclusive flag if
      EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for
      softpin mode, or through the write_domain when using relocations. It
      follows this contract.
    
    - lima follows this contract, see lima_gem_submit() which sets the
      exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that
      bo
    
    - msm follows this contract, see msm_gpu_submit() which sets the
      exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer
    
    - panfrost follows this contract with the shotgun approach of just
      always setting the exclusive fence, see
      panfrost_attach_object_fences(). Benefits of a single engine I guess
    
    - v3d follows this contract with the same shotgun approach in
      v3d_attach_fences_and_unlock_reservation(), but it has at least an
      XXX comment that maybe this should be improved
    
    - v4c uses the same shotgun approach of always setting an exclusive
      fence, see vc4_update_bo_seqnos()
    
    - vgem also follows this contract, see vgem_fence_attach_ioctl() and
      the VGEM_FENCE_WRITE. This is used in some igts to validate prime
      sharing with i915.ko without the need of a 2nd gpu
    
    - vritio follows this contract again with the shotgun approach of
      always setting an exclusive fence, see virtio_gpu_array_add_fence()
    
    This covers the setting of the exclusive fences when writing.
    
    Synchronizing against the exclusive fence is a lot more tricky, and I
    only spot checked a few:
    
    - i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all
      implicit dependencies (which is used by vulkan)
    
    - etnaviv does this. Implicit dependencies are collected in
      submit_fence_sync(), again with an opt-out flag
      ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in
      etnaviv_sched_dependency which is the
      drm_sched_backend_ops->dependency callback.
    
    - v4c seems to not do much here, maybe gets away with it by not having
      a scheduler and only a single engine. Since all newer broadcom chips than
      the OG vc4 use v3d for rendering, which follows this contract, the
      impact of this issue is fairly small.
    
    - v3d does this using the drm_gem_fence_array_add_implicit() helper,
      which then it's drm_sched_backend_ops->dependency callback
      v3d_job_dependency() picks up.
    
    - panfrost is nice here and tracks the implicit fences in
      panfrost_job->implicit_fences, which again the
      drm_sched_backend_ops->dependency callback panfrost_job_dependency()
      picks up. It is mildly questionable though since it only picks up
      exclusive fences in panfrost_acquire_object_fences(), but not buggy
      in practice because it also always sets the exclusive fence. It
      should pick up both sets of fences, just in case there's ever going
      to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a
      pcie port and a real gpu, which might actually happen eventually. A
      bug, but easy to fix. Should probably use the
      drm_gem_fence_array_add_implicit() helper.
    
    - lima is nice an easy, uses drm_gem_fence_array_add_implicit() and
      the same schema as v3d.
    
    - msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT,
      but because it doesn't use the drm/scheduler it handles fences from
      the wrong context with a synchronous dma_fence_wait. See
      submit_fence_sync() leading to msm_gem_sync_object(). Investing into
      a scheduler might be a good idea.
    
    - all the remaining drivers are ttm based, where I hope they do
      appropriately obey implicit fences already. I didn't do the full
      audit there because a) not follow the contract would confuse ttm
      quite well and b) reading non-standard scheduler and submit code
      which isn't based on drm/scheduler is a pain.
    
    Onwards to the display side.
    
    - Any driver using the drm_gem_plane_helper_prepare_fb() helper will
      correctly. Overwhelmingly most drivers get this right, except a few
      totally dont. I'll follow up with a patch to make this the default
      and avoid a bunch of bugs.
    
    - I didn't audit the ttm drivers, but given that dma_resv started
      there I hope they get this right.
    
    In conclusion this IS the contract, both as documented and
    overwhelmingly implemented, specically as implemented by all render
    drivers except amdgpu.
    
    Amdgpu tried to fix this already in
    
    commit 049aca4
    Author: Christian König <christian.koenig@amd.com>
    Date:   Wed Sep 19 16:54:35 2018 +0200
    
        drm/amdgpu: fix using shared fence for exported BOs v2
    
    but this fix falls short on a number of areas:
    
    - It's racy, by the time the buffer is shared it might be too late. To
      make sure there's definitely never a problem we need to set the
      fences correctly for any buffer that's potentially exportable.
    
    - It's breaking uapi, dma-buf fds support poll() and differentitiate
      between, which was introduced in
    
    	commit 9b495a5
    	Author: Maarten Lankhorst <maarten.lankhorst@canonical.com>
    	Date:   Tue Jul 1 12:57:43 2014 +0200
    
    	    dma-buf: add poll support, v3
    
    - Christian König wants to nack new uapi building further on this
      dma_resv contract because it breaks amdgpu, quoting
    
      "Yeah, and that is exactly the reason why I will NAK this uAPI change.
    
      "This doesn't works for amdgpu at all for the reasons outlined above."
    
      https://lore.kernel.org/dri-devel/f2eb6751-2f82-9b23-f57e-548de5b729de@gmail.com/
    
      Rejecting new development because your own driver is broken and
      violates established cross driver contracts and uapi is really not
      how upstream works.
    
    Now this patch will have a severe performance impact on anything that
    runs on multiple engines. So we can't just merge it outright, but need
    a bit a plan:
    
    - amdgpu needs a proper uapi for handling implicit fencing. The funny
      thing is that to do it correctly, implicit fencing must be treated
      as a very strange IPC mechanism for transporting fences, where both
      setting the fence and dependency intercepts must be handled
      explicitly. Current best practices is a per-bo flag to indicate
      writes, and a per-bo flag to to skip implicit fencing in the CS
      ioctl as a new chunk.
    
    - Since amdgpu has been shipping with broken behaviour we need an
      opt-out flag from the butchered implicit fencing model to enable the
      proper explicit implicit fencing model.
    
    - for kernel memory fences due to bo moves at least the i915 idea is
      to use ttm_bo->moving. amdgpu probably needs the same.
    
    - since the current p2p dma-buf interface assumes the kernel memory
      fence is in the exclusive dma_resv fence slot we need to add a new
      fence slot for kernel fences, which must never be ignored. Since
      currently only amdgpu supports this there's no real problem here
      yet, until amdgpu gains a NO_IMPLICIT CS flag.
    
    - New userspace needs to ship in enough desktop distros so that users
      wont notice the perf impact. I think we can ignore LTS distros who
      upgrade their kernels but not their mesa3d snapshot.
    
    - Then when this is all in place we can merge this patch here.
    
    What is not a solution to this problem here is trying to make the
    dma_resv rules in the kernel more clever. The fundamental issue here
    is that the amdgpu CS uapi is the least expressive one across all
    drivers (only equalled by panfrost, which has an actual excuse) by not
    allowing any userspace control over how implicit sync is conducted.
    
    Until this is fixed it's completely pointless to make the kernel more
    clever to improve amdgpu, because all we're doing is papering over
    this uapi design issue. amdgpu needs to attain the status quo
    established by other drivers first, once that's achieved we can tackle
    the remaining issues in a consistent way across drivers.
    
    v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I
    entirely missed.
    
    This is great because it means the amdgpu specific piece for proper
    implicit fence handling exists already, and that since a while. The
    only thing that's now missing is
    - fishing the implicit fences out of a shared object at the right time
    - setting the exclusive implicit fence slot at the right time.
    
    Jason has a patch series to fill that gap with a bunch of generic
    ioctl on the dma-buf fd:
    
    https://lore.kernel.org/dri-devel/20210520190007.534046-1-jason@jlekstrand.net/
    
    v3: Since Christian has fixed amdgpu now in
    
    commit 8c505bd (drm-misc/drm-misc-next)
    Author: Christian König <christian.koenig@amd.com>
    Date:   Wed Jun 9 13:51:36 2021 +0200
    
        drm/amdgpu: rework dma_resv handling v3
    
    Use the audit covered in this commit message as the excuse to update
    the dma-buf docs around dma_buf.resv usage across drivers.
    
    Since dynamic importers have different rules also hammer these in
    again while we're at it.
    
    Cc: mesa-dev@lists.freedesktop.org
    Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
    Cc: Dave Airlie <airlied@gmail.com>
    Cc: Rob Clark <robdclark@chromium.org>
    Cc: Kristian H. Kristensen <hoegsberg@google.com>
    Cc: Michel Dänzer <michel@daenzer.net>
    Cc: Daniel Stone <daniels@collabora.com>
    Cc: Sumit Semwal <sumit.semwal@linaro.org>
    Cc: "Christian König" <christian.koenig@amd.com>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
    Cc: Deepak R Varma <mh12gx2825@gmail.com>
    Cc: Chen Li <chenli@uniontech.com>
    Cc: Kevin Wang <kevin1.wang@amd.com>
    Cc: Dennis Li <Dennis.Li@amd.com>
    Cc: Luben Tuikov <luben.tuikov@amd.com>
    Cc: linaro-mm-sig@lists.linaro.org
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  14. dma-buf: Switch to inline kerneldoc

    Also review & update everything while we're at it.
    
    This is prep work to smash a ton of stuff into the kerneldoc for
    @resv.
    
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Sumit Semwal <sumit.semwal@linaro.org>
    Cc: "Christian König" <christian.koenig@amd.com>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
    Cc: Dave Airlie <airlied@redhat.com>
    Cc: Nirmoy Das <nirmoy.das@amd.com>
    Cc: Deepak R Varma <mh12gx2825@gmail.com>
    Cc: Chen Li <chenli@uniontech.com>
    Cc: Kevin Wang <kevin1.wang@amd.com>
    Cc: linux-media@vger.kernel.org
    Cc: linaro-mm-sig@lists.linaro.org
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  15. dma-resv: Fix kerneldoc

    Oversight from
    
    commit 6edbd6a
    Author: Christian König <christian.koenig@amd.com>
    Date:   Mon May 10 16:14:09 2021 +0200
    
        dma-buf: rename and cleanup dma_resv_get_excl v3
    
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Sumit Semwal <sumit.semwal@linaro.org>
    Cc: "Christian König" <christian.koenig@amd.com>
    Cc: linux-media@vger.kernel.org
    Cc: linaro-mm-sig@lists.linaro.org
    danvet authored and intel-lab-lkp committed Jun 22, 2021
  16. Merge remote-tracking branch 'drm/drm-next' into drm-tip

    # Conflicts:
    #	drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
    #	drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
    #	drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
    #	drivers/gpu/drm/vc4/vc4_hdmi.c
    ChristianKoenigAMD committed Jun 22, 2021
  17. drm/amdgpu: wait for moving fence after pinning

    We actually need to wait for the moving fence after pinning
    the BO to make sure that the pin is completed.
    
    Signed-off-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    References: https://lore.kernel.org/dri-devel/20210621151758.2347474-1-daniel.vetter@ffwll.ch/
    CC: stable@kernel.org
    Link: https://patchwork.freedesktop.org/patch/msgid/20210622114506.106349-3-christian.koenig@amd.com
    ChristianKoenigAMD committed Jun 22, 2021
  18. drm/radeon: wait for moving fence after pinning

    We actually need to wait for the moving fence after pinning
    the BO to make sure that the pin is completed.
    
    Signed-off-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    References: https://lore.kernel.org/dri-devel/20210621151758.2347474-1-daniel.vetter@ffwll.ch/
    CC: stable@kernel.org
    Link: https://patchwork.freedesktop.org/patch/msgid/20210622114506.106349-2-christian.koenig@amd.com
    ChristianKoenigAMD committed Jun 22, 2021
  19. drm/nouveau: wait for moving fence after pinning v2

    We actually need to wait for the moving fence after pinning
    the BO to make sure that the pin is completed.
    
    v2: grab the lock while waiting
    
    Signed-off-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    References: https://lore.kernel.org/dri-devel/20210621151758.2347474-1-daniel.vetter@ffwll.ch/
    CC: stable@kernel.org
    Link: https://patchwork.freedesktop.org/patch/msgid/20210622114506.106349-1-christian.koenig@amd.com
    ChristianKoenigAMD committed Jun 22, 2021
  20. drm/i915/dsc: abstract helpers to get bigjoiner primary/secondary crtc

    Add a single point of truth for figuring out the primary/secondary crtc
    for bigjoiner instead of duplicating the magic pipe +/- 1 in multiple
    places.
    
    Also fix the pipe validity checks to properly take non-contiguous pipes
    into account. The current checks may theoretically overflow
    i915->pipe_to_crtc_mapping[pipe], albeit with a warning, due to fused
    off pipes, as INTEL_NUM_PIPES() returns the actual number of pipes on
    the platform, and the check is for INTEL_NUM_PIPES() == pipe + 1.
    
    Prefer primary/secondary terminology going forward.
    
    v2:
    - Improved abstractions for pipe validity etc.
    
    Fixes: 8a029c1 ("drm/i915/dp: Modify VDSC helpers to configure DSC for Bigjoiner slave")
    Fixes: d961eb2 ("drm/i915/bigjoiner: atomic commit changes for uncompressed joiner")
    Cc: Animesh Manna <animesh.manna@intel.com>
    Cc: Manasi Navare <manasi.d.navare@intel.com>
    Cc: Vandita Kulkarni <vandita.kulkarni@intel.com>
    Reviewed-by: Manasi Navare <manasi.dl.navare@intel.com>
    Signed-off-by: Jani Nikula <jani.nikula@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20210610090528.20511-1-jani.nikula@intel.com
    jnikula committed Jun 22, 2021
  21. drm/amdgpu: rework dma_resv handling v3

    Drop the workaround and instead implement a better solution.
    
    Basically we are now chaining all submissions using a dma_fence_chain
    container and adding them as exclusive fence to the dma_resv object.
    
    This way other drivers can still sync to the single exclusive fence
    while amdgpu only sync to fences from different processes.
    
    v3: add the shared fence first before the exclusive one
    
    Signed-off-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Link: https://patchwork.freedesktop.org/patch/msgid/20210614174536.5188-2-christian.koenig@amd.com
    ChristianKoenigAMD committed Jun 22, 2021
  22. drm/amdgpu: unwrap fence chains in the explicit sync fence

    Unwrap the explicit fence if it is a dma_fence_chain and
    sync to the first fence not matching the owner rules.
    
    Signed-off-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Link: https://patchwork.freedesktop.org/patch/msgid/20210614174536.5188-1-christian.koenig@amd.com
    ChristianKoenigAMD committed Jun 22, 2021
  23. Revert "drm: add a locked version of drm_is_current_master"

    This reverts commit 1815d9c.
    
    Unfortunately this inverts the locking hierarchy, so back to the
    drawing board. Full lockdep splat below:
    
    ======================================================
    WARNING: possible circular locking dependency detected
    5.13.0-rc7-CI-CI_DRM_10254+ #1 Not tainted
    ------------------------------------------------------
    kms_frontbuffer/1087 is trying to acquire lock:
    ffff88810dcd01a8 (&dev->master_mutex){+.+.}-{3:3}, at: drm_is_current_master+0x1b/0x40
    but task is already holding lock:
    ffff88810dcd0488 (&dev->mode_config.mutex){+.+.}-{3:3}, at: drm_mode_getconnector+0x1c6/0x4a0
    which lock already depends on the new lock.
    the existing dependency chain (in reverse order) is:
    -> #2 (&dev->mode_config.mutex){+.+.}-{3:3}:
           __mutex_lock+0xab/0x970
           drm_client_modeset_probe+0x22e/0xca0
           __drm_fb_helper_initial_config_and_unlock+0x42/0x540
           intel_fbdev_initial_config+0xf/0x20 [i915]
           async_run_entry_fn+0x28/0x130
           process_one_work+0x26d/0x5c0
           worker_thread+0x37/0x380
           kthread+0x144/0x170
           ret_from_fork+0x1f/0x30
    -> #1 (&client->modeset_mutex){+.+.}-{3:3}:
           __mutex_lock+0xab/0x970
           drm_client_modeset_commit_locked+0x1c/0x180
           drm_client_modeset_commit+0x1c/0x40
           __drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
           drm_fb_helper_set_par+0x34/0x40
           intel_fbdev_set_par+0x11/0x40 [i915]
           fbcon_init+0x270/0x4f0
           visual_init+0xc6/0x130
           do_bind_con_driver+0x1e5/0x2d0
           do_take_over_console+0x10e/0x180
           do_fbcon_takeover+0x53/0xb0
           register_framebuffer+0x22d/0x310
           __drm_fb_helper_initial_config_and_unlock+0x36c/0x540
           intel_fbdev_initial_config+0xf/0x20 [i915]
           async_run_entry_fn+0x28/0x130
           process_one_work+0x26d/0x5c0
           worker_thread+0x37/0x380
           kthread+0x144/0x170
           ret_from_fork+0x1f/0x30
    -> #0 (&dev->master_mutex){+.+.}-{3:3}:
           __lock_acquire+0x151e/0x2590
           lock_acquire+0xd1/0x3d0
           __mutex_lock+0xab/0x970
           drm_is_current_master+0x1b/0x40
           drm_mode_getconnector+0x37e/0x4a0
           drm_ioctl_kernel+0xa8/0xf0
           drm_ioctl+0x1e8/0x390
           __x64_sys_ioctl+0x6a/0xa0
           do_syscall_64+0x39/0xb0
           entry_SYSCALL_64_after_hwframe+0x44/0xae
    other info that might help us debug this:
    Chain exists of: &dev->master_mutex --> &client->modeset_mutex --> &dev->mode_config.mutex
     Possible unsafe locking scenario:
           CPU0                    CPU1
           ----                    ----
      lock(&dev->mode_config.mutex);
                                   lock(&client->modeset_mutex);
                                   lock(&dev->mode_config.mutex);
      lock(&dev->master_mutex);
    *** DEADLOCK ***
    1 lock held by kms_frontbuffer/1087:
     #0: ffff88810dcd0488 (&dev->mode_config.mutex){+.+.}-{3:3}, at: drm_mode_getconnector+0x1c6/0x4a0
    stack backtrace:
    CPU: 7 PID: 1087 Comm: kms_frontbuffer Not tainted 5.13.0-rc7-CI-CI_DRM_10254+ #1
    Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3234.A01.1906141750 06/14/2019
    Call Trace:
     dump_stack+0x7f/0xad
     check_noncircular+0x12e/0x150
     __lock_acquire+0x151e/0x2590
     lock_acquire+0xd1/0x3d0
     __mutex_lock+0xab/0x970
     drm_is_current_master+0x1b/0x40
     drm_mode_getconnector+0x37e/0x4a0
     drm_ioctl_kernel+0xa8/0xf0
     drm_ioctl+0x1e8/0x390
     __x64_sys_ioctl+0x6a/0xa0
     do_syscall_64+0x39/0xb0
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Note that this broke the intel-gfx CI pretty much across the board
    because it has to reboot machines after it hits a lockdep splat.
    
    Testcase: igt/debugfs_test/read_all_entries
    Acked-by: Petri Latvala <petri.latvala@intel.com>
    Fixes: 1815d9c ("drm: add a locked version of drm_is_current_master")
    Cc: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
    Cc: Emil Velikov <emil.l.velikov@gmail.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Cc: Maxime Ripard <mripard@kernel.org>
    Cc: Thomas Zimmermann <tzimmermann@suse.de>
    Cc: David Airlie <airlied@linux.ie>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Link: https://patchwork.freedesktop.org/patch/msgid/20210622075409.2673805-1-daniel.vetter@ffwll.ch
    danvet committed Jun 22, 2021
  24. drm: bridge: ti-sn65dsi83: Retrieve the display mode from the state

    Instead of storing a copy of the display mode in the sn65dsi83
    structure, retrieve it from the atomic state in
    sn65dsi83_atomic_enable(). This allows the removal of the .mode_set()
    operation, and completes the transition to the atomic API.
    
    Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Acked-by: Sam Ravnborg <sam@ravnborg.org>
    Signed-off-by: Robert Foss <robert.foss@linaro.org>
    Link: https://patchwork.freedesktop.org/patch/msgid/20210621125518.13715-6-laurent.pinchart@ideasonboard.com
    pinchartl authored and robertfoss committed Jun 22, 2021
  25. drm: bridge: ti-sn65dsi83: Retrieve output format from bridge state

    The driver currently iterates over all connectors to get the bus format,
    used to configure the LVDS output format. This causes several issues:
    
    - If other connectors than the LVDS output are present, the format used
      by the driver may end up belonging to an entirely different output.
    
    - The code can crash if some connectors are not connected, as bus_format
      may then be NULL.
    
    - There's no guarantee that the bus format on the connector at the
      output of the pipeline matches the output of the sn65dsi83, as there
      may be other bridges in the pipeline.
    
    Solve this by retrieving the format from the bridge state instead, which
    provides the format corresponding to the output of the bridge.
    
    The struct sn65dsi83 lvds_format_24bpp and lvds_format_jeida fields are
    moved to local variables in sn65dsi83_atomic_enable() as they're now
    used in that function only.
    
    Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Acked-by: Sam Ravnborg <sam@ravnborg.org>
    Signed-off-by: Robert Foss <robert.foss@linaro.org>
    Link: https://patchwork.freedesktop.org/patch/msgid/20210621125518.13715-5-laurent.pinchart@ideasonboard.com
    pinchartl authored and robertfoss committed Jun 22, 2021
  26. drm: bridge: ti-sn65dsi83: Switch to atomic operations

    Use the atomic version of the enable/disable operations to continue the
    transition to the atomic API, started with the introduction of
    .atomic_get_input_bus_fmts(). This will be needed to access the mode
    from the atomic state.
    
    Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Acked-by: Sam Ravnborg <sam@ravnborg.org>
    Signed-off-by: Robert Foss <robert.foss@linaro.org>
    Link: https://patchwork.freedesktop.org/patch/msgid/20210621125518.13715-4-laurent.pinchart@ideasonboard.com
    pinchartl authored and robertfoss committed Jun 22, 2021
  27. drm: bridge: ti-sn65dsi83: Pass mode explicitly to helper functions

    Pass the display mode explicitly to the sn65dsi83_get_lvds_range() and
    sn65dsi83_get_dsi_range() functions to prepare for its removal from the
    sn65dsi83 structure. This is not meant to bring any functional change.
    
    Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Acked-by: Sam Ravnborg <sam@ravnborg.org>
    Signed-off-by: Robert Foss <robert.foss@linaro.org>
    Link: https://patchwork.freedesktop.org/patch/msgid/20210621125518.13715-3-laurent.pinchart@ideasonboard.com
    pinchartl authored and robertfoss committed Jun 22, 2021
Older