# NumPy Tips and Tricks

## Achieve Reproducibility with `np.random.RandomState()`

Reproducibility in Data Science projects is key.

For larger projects, use `numpy.random.RandomState()` to construct a random number generator.

Using `numpy.random.seed()` sets the global random seed, which affects all uses to the `numpy.random.*` module.

Imported packages or other modules can reset the global random seed to another one.

This can result in undesirable and unreproducible results across your project.

With `numpy.random.RandomState()`, you are not relying on the global random state anymore (which could be resetted).

It's a subtle, but important step to achieve reproducibility.

In [None]:
import numpy as np

rng = np.random.RandomState(1234)

print(rng.rand(3))

## Fast Alternative to NumPy

I just stumbled upon a fast alternative to NumPy.

With **NumExpr** I transformed a sluggish 650 ms loop into a 60 ms calculation and that was on a single thread!

Here’s how NumExpr turbocharges your array computations:

- **Chunked, in-cache execution**: Avoids building giant temporaries by splitting arrays into cache-sized blocks and streaming them through a lightweight virtual machine.
- **SIMD & VML acceleration**: Leverages single-instruction-multiple-data instructions and-when available-Intel’s Math Kernel Library for transcendent functions.
- **True multi-core scaling**: Automatically farms out chunks to all your CPU cores, delivering up to 5×–15× speed-ups on complex expressions.

In [None]:
!pip install numexpr

In [None]:
import numpy as np, numexpr as ne

x = np.linspace(-1, 1, int(1e7))
expr = "0.25*x**3 + 0.75*x**2 + 1.5*x - 2"

# NumPy: ~650 ms
%timeit -n10 eval(expr)

# NumExpr @1 thread: ~60 ms  
ne.set_num_threads(1)
%timeit -n10 ne.evaluate(expr)