Skip to content

[Random, Alignment] Refactor np.random.* to match NumPy parity #600

@Nucs

Description

@Nucs

Overview

Major refactor of NumSharp's random sampling module to achieve 1-to-1 parity with NumPy 2.x. This includes replacing the RNG engine, implementing 25 missing distributions, and standardizing the API surface.

Problem

NumSharp's np.random.* had several critical misalignments with NumPy:

  1. Different RNG algorithm — NumSharp used .NET's Subtractive Generator (Knuth), while NumPy uses Mersenne Twister (MT19937). Same seed produced completely different sequences.

  2. Missing distributions — Only ~10 of NumPy's 35+ distributions were implemented.

  3. Inconsistent API — Size parameters, scalar returns, and validation differed from NumPy behavior.

  4. No long indexing — Size parameters used int instead of long, limiting array sizes.

Proposal

RNG Engine Replacement

  • Implement MT19937 (Mersenne Twister) to match NumPy's random state
  • Add NextGaussian() for Box-Muller transform (normal distribution)
  • Verify seed compatibility: np.random.seed(42) produces identical sequences

Missing Distributions (25 new)

  • weibull, standard_cauchy, vonmises, f, logseries
  • standard_normal, standard_t, triangular
  • dirichlet, gumbel, hypergeometric, laplace, logistic
  • multinomial, multivariate_normal, negative_binomial
  • noncentral_chisquare, noncentral_f, pareto, power
  • rayleigh, standard_exponential, standard_gamma
  • wald, zipf

API Standardization

  • Migrate all distributions to UnmanagedMemoryBlock
  • Standardize size parameter overloads: int[] + params long[]
  • Align scalar returns (0D arrays) with NumPy
  • Fix size/axis/seed validation to match NumPy

Multivariate Normal Fix

  • Implement SVD-based algorithm for covariance decomposition
  • Match NumPy's eigenvector sign convention

Evidence

Before (seed=42, first value):

Function NumPy NumSharp (old)
rand() 0.3745401188 0.6681064659 ❌
randn() 0.4967141530 Different ❌
randint(0,10) 6 Different ❌

After:
All functions produce identical sequences to NumPy given the same seed.

Scope / Non-goals

In scope:

  • Legacy np.random.* API (RandomState-style)
  • All distributions available in NumPy 2.x legacy API

Not in scope:

  • NumPy's new Generator API (np.random.default_rng())
  • BitGenerator architecture
  • PCG64/Philox/SFC64 generators

Breaking Changes

Change Impact Migration
RNG sequences differ Code relying on specific random values with fixed seeds Re-generate expected values or remove seed-dependent assertions
params int[] removed rand(5, 10) no longer compiles Use rand(5L, 10L) or rand(new Shape(5, 10))
Scalar returns are 0D rand().GetDouble()rand().GetDouble(0) Add index [0] or use .GetValue()

Benchmark / Performance

  • MT19937 performance is comparable to .NET's RNG
  • UnmanagedMemoryBlock migration eliminates managed array allocations
  • No measurable regression in distribution sampling

Related Issues

  • Depends on longindexing branch for long array indexing support
  • Enables future work on Generator API (NumPy 1.17+ style)

Summary

  • 49 commits, 100 files changed, +20,930 / -666 lines
  • 25 new distributions implemented
  • MT19937 RNG engine for NumPy seed compatibility
  • Comprehensive test coverage
  • Documentation: NUMPY_RANDOM.md, RANDOM_MIGRATION_PLAN.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions