Skip to content

SoftMax: Multiply by 1 / expSum instead of dividing by expSum#125269

Open
BarionLP wants to merge 1 commit intodotnet:mainfrom
BarionLP:barion-softmax-multiply
Open

SoftMax: Multiply by 1 / expSum instead of dividing by expSum#125269
BarionLP wants to merge 1 commit intodotnet:mainfrom
BarionLP:barion-softmax-multiply

Conversation

@BarionLP
Copy link
Contributor

@BarionLP BarionLP commented Mar 6, 2026

System.Numerics.Tensors.TensorPrimitives.SoftMax:
We can convert a bunch of divisions to multiplications by calculating 1 / expSum once and then multiplying.

I assume this will slightly change the output of SoftMax but since float operations aren't consistent across platforms this should not matter.

Benchmark results

I assume this largely depends on the hardware

| Method  | Count   | Mean         | Error       | StdDev      | Ratio | Allocated | Alloc Ratio |
|-------- |-------- |-------------:|------------:|------------:|------:|----------:|------------:|
| BuiltIn | 1000    |     242.2 ns |     0.65 ns |     0.61 ns |  1.00 |         - |          NA |
| Mine    | 1000    |     228.1 ns |     0.06 ns |     0.05 ns |  0.94 |         - |          NA |
|         |         |              |             |             |       |           |             |
| BuiltIn | 1000000 | 309,299.3 ns |   736.81 ns |   689.21 ns |  1.00 |         - |          NA |
| Mine    | 1000000 | 301,680.0 ns | 1,214.74 ns | 1,136.27 ns |  0.98 |         - |          NA |

code: https://gist.github.com/BarionLP/aff1bca0d507dfb16f52bb715e3a58a2

late follow up to #111615

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 6, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics-tensors
See info in area-owners.md if you want to be subscribed.

@BarionLP
Copy link
Contributor Author

BarionLP commented Mar 7, 2026

i think the build fails are unrelated?

@BarionLP
Copy link
Contributor Author

BarionLP commented Mar 7, 2026

would it make sense to change the implementation of Divide too?

    public static void Divide(ReadOnlySpan<float> x, float y, Span<float> destination) =>
        InvokeSpanScalarIntoSpan<MultiplyOperator_Single>(x, 1 / y, destination);

is there a reason SoftMax calls InvokeSpanScalarIntoSpan manually instead of using Divide?

@BarionLP
Copy link
Contributor Author

BarionLP commented Mar 7, 2026

i just realized that if x and y are very large the difference between x / y and x * (1/y) can be quite big (x * (1/y) could even become zero) so maybe this is not a good idea after all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Numerics.Tensors community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant