Is there a reason to have a separate function derivative for the gradient wrt a scalar value? I find that it tends to make me have to add special cases where code would just work for both vectors and scalars if there was a definition of something like:
AD.gradient(ad::AD.AbstractBackend, f, x::Number) = AD.derivative(ad, f, x)