# FIRSTRank

We now demonstrate how to use `FIRSTRank` for factor importance ranking. If you have not installed `pyfirst`, please uncomment and run `%pip install pyfirst` below before proceeding. 

In [1]:
# %pip install pyfirst

## Imports

In [2]:
import numpy as np
from pyfirst import FIRSTRank

## Simulate Data

We simulate clean data from the Ishigami function 

$$
    y = f(X) = \sin(X_{1}) + 7\sin^2(X_{2}) + 0.1X_{3}^{4}\sin(X_{1}),
$$

where the input $X$ are independent features uniformly distributed on $[-\pi,\pi]^{3}$.

In [3]:
def ishigami(x):
    x = -np.pi + 2 * np.pi * x
    y = np.sin(x[0]) + 7 * np.sin(x[1])**2 + 0.1 * x[2]**4 * np.sin(x[0])
    return y

np.random.seed(43)
n = 10000
p = 3
X = np.random.uniform(size=(n,p))
y = np.apply_along_axis(ishigami, 1, X)

## Run FIRSTRank

In [5]:
FIRSTRank(X, y, noise=False)

{'ranking': array([1, 0, 2]),
 'explained_variance': array([0.44777553, 0.75116322, 1.        ])}

This shows that the ranking of importance is by factor 1, factor 0, and factor 2. Factor 1 can explain 45% of the model variance, Factor 1 and 2 together can explain 75% of the model variance, and etc.

## Noisy Data

We now look at the estimation performance on the noisy data $y = f(X) + \epsilon$ where $\epsilon\sim\mathcal{N}(0,1)$ is the random error. For noisy data, `FIRSTRank` implements the Noise-Adjusted Nearest-Neighbor estimator in Huang and Joseph (2025), which corrects the bias by the Nearest-Neighbor estimator from Broto et al. (2020) when applied on noisy data.

In [6]:
np.random.seed(43)
n = 10000
p = 3
X = np.random.uniform(size=(n,p))
y = np.apply_along_axis(ishigami, 1, X) + np.random.normal(size=n)
X = np.hstack([X, np.zeros((n, 2))])

FIRSTRank(X, y, noise=True)

{'ranking': array([1, 0, 2, 3, 4]),
 'explained_variance': array([0.36974645, 0.67381678, 1.        , 1.        , 1.        ])}

Factor 3 and 4 are non-important variables, so we would expect Factor 0, 1, 2 can explain all the model variance.

For more details about `FIRSTRank`, please Huang and Joseph (2025).

## References

Huang, C., & Joseph, V. R. (2025). Factor Importance Ranking and Selection using Total Indices. Technometrics.

Sobol', I. M. (2001). Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Mathematics and computers in simulation, 55(1-3), 271-280.
    
Broto, B., Bachoc, F., & Depecker, M. (2020). Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2), 693-716.

Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P.E., Lomeli, M., Hosseini, L., & Jégou, H., (2024). The Faiss library. arXiv preprint arXiv:2401.08281.
    
Vakayil, A., & Joseph, V. R. (2022). Data twinning. Statistical Analysis and Data Mining: The ASA Data Science Journal, 15(5), 598-610.