[API] Support np.where via ILKernelGenerator

## Overview

Add SIMD optimization for `np.where(condition, x, y)` using `ILKernelGenerator` to improve performance for contiguous arrays.

## Problem

The current `np.where(condition, x, y)` implementation uses NDIterator-based sequential access for all cases. For large contiguous arrays, this is significantly slower than SIMD-optimized code. NumPy uses vectorized operations internally.

## Proposal

Add a SIMD fast path using `Vector256.ConditionalSelect` while keeping the iterator fallback for non-contiguous arrays.

### Implementation

- [x] Create `ILKernelGenerator.Where.cs` with SIMD helpers
- [x] Add bool mask expansion (1-byte bools → 4/8-byte vector masks)
- [x] Support all 11 SIMD-capable dtypes via SIMD path
- [x] Support Decimal via iterator fallback (16 bytes, not vectorizable)
- [x] Modify `np.where.cs` to dispatch to SIMD path when eligible
- [x] Add comprehensive tests

### Dtype Support

All 12 NumSharp types are supported:

| Type | Path | Reason |
|------|------|--------|
| Boolean, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Char, Single, Double | SIMD | 1-8 byte types, vectorizable |
| Decimal | Iterator | 16 bytes, not vectorizable |

### SIMD Eligibility Criteria

```csharp
bool canSimd = ILKernelGenerator.Enabled &&
               outType != NPTypeCode.Decimal &&
               cond.typecode == NPTypeCode.Boolean &&
               cond.Shape.IsContiguous &&
               xArr.Shape.IsContiguous &&
               yArr.Shape.IsContiguous;
```

### Bool Mask Expansion Challenge

The condition array is `bool[]` (1 byte per element), but x/y can be any dtype (1-8 bytes):

| Type | Element Size | V256 Elements | Bools to Load |
|------|-------------|---------------|---------------|
| byte | 1 | 32 | 32 |
| int/float | 4 | 8 | 8 |
| long/double | 8 | 4 | 4 |

Solution: Load N bools, expand to N-element mask, then `ConditionalSelect`.

## Evidence

Implemented in commit 3162df0c. All 83 tests pass:
- 36 existing `np.where` tests
- 21 battle tests
- 26 new SIMD correctness tests

## Scope / Non-goals

- **Broadcast arrays**: Use iterator path (stride=0 not contiguous)
- **Non-bool conditions**: Use iterator path (need truthiness conversion)

## Related Files

- `src/NumSharp.Core/Backends/Kernels/ILKernelGenerator.Where.cs`
- `src/NumSharp.Core/APIs/np.where.cs`
- `test/NumSharp.UnitTest/Backends/Kernels/WhereSimdTests.cs`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API] Support np.where via ILKernelGenerator #604

Overview

Problem

Proposal

Implementation

Dtype Support

SIMD Eligibility Criteria

Bool Mask Expansion Challenge

Evidence

Scope / Non-goals

Related Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type	Path	Reason
Boolean, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Char, Single, Double	SIMD	1-8 byte types, vectorizable
Decimal	Iterator	16 bytes, not vectorizable

[API] Support np.where via ILKernelGenerator #604

Description

Overview

Problem

Proposal

Implementation

Dtype Support

SIMD Eligibility Criteria

Bool Mask Expansion Challenge

Evidence

Scope / Non-goals

Related Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions