The SciML ecosystem strongly prefers Lux.jl (see https://github.com/SciML/DiffEqFlux.jl README) it would be nice if Transformers.jl could also be put on top of Lux.jl