# NIPALS in Chemometrics

## Introduction
NIPALS (Non-linear Iterative Partial Least Squares) is an algorithm primarily used in chemometrics for handling large datasets with many variables. It is particularly useful when dealing with multicollinear data, where traditional methods like ordinary least squares regression may fail.

## Why Use NIPALS?
In chemometrics, datasets often contain highly collinear variables, making it challenging to extract meaningful information. NIPALS helps by reducing the dimensionality of the data while preserving the essential information. This makes it easier to interpret the results and build predictive models.

## How NIPALS Works
NIPALS iteratively extracts principal components from the data matrix. Each iteration focuses on finding a new component that explains the maximum variance in the data. The algorithm continues until the desired number of components is extracted or a specified tolerance level is reached.

### Steps:
1. **Center the Data**: Subtract the mean of each variable from the dataset.
2. **Initialize**: Choose an initial estimate for the score vector.
3. **Iterate**: 
    - Calculate the loading vector.
    - Normalize the loading vector.
    - Update the score vector.
    - Check for convergence.
4. **Deflate**: Subtract the outer product of the score and loading vectors from the data matrix.
5. **Repeat**: Continue until the desired number of components is extracted.

## Applications in PLS and PCR
NIPALS is a foundational algorithm for more advanced techniques like Partial Least Squares (PLS) and Principal Component Regression (PCR).

### Partial Least Squares (PLS)
PLS is used to find the relationship between two matrices, typically the predictor and response matrices. NIPALS is employed to extract latent variables that maximize the covariance between these matrices, making PLS a powerful tool for predictive modeling in chemometrics.

### Principal Component Regression (PCR)
PCR combines principal component analysis (PCA) with regression. NIPALS is used to extract principal components from the predictor matrix, which are then used in a regression model to predict the response variable. This approach helps in dealing with multicollinearity and improving model stability.

## Conclusion
NIPALS is a crucial algorithm in chemometrics, enabling the analysis of complex, multicollinear datasets. Its application in PLS and PCR further extends its utility, making it an indispensable tool for chemometricians.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import loadmat

def nipals(X, n_components=2, tolerance=1e-3):
    # Center the data
    X_mean = np.mean(X, axis=0)
    X_centered = X - X_mean

    n_samples, n_features = X_centered.shape

    t = np.zeros((n_samples, n_components))
    p = np.zeros((n_features, n_components))

    for i in range(n_components):
        # Initialization
        t[:, i] = X_centered[:, 0]  # Choose an initial t

        while True:
            # Calculate p
            p[:, i] = (X_centered.T @ t[:, i]) / (t[:, i].T @ t[:, i])
            p[:, i] = p[:, i] / np.linalg.norm(p[:, i])

            # Calculate t
            t_new = X_centered @ p[:, i]

            if np.linalg.norm(t_new - t[:,i]) < tolerance:
                break

            t[:,i] = t_new

        X_centered = X_centered - np.outer(t[:,i], p[:,i].T)

    return t, p


# Load data from .mat file
data = loadmat('/content/DataFish.mat')  # Replace 'your_file.mat' with actual filename
X = data['X0'] # Replace 'your_variable' with the variable name in your .mat file


# Example usage:

t, p = nipals(X, 7, 1e-3)



In [11]:
# Some functions and formula to check matrices and results
# I may want to compare my results with matlab

first_column = t[:, 0]
first_column


array([ 757.34144439,  760.40646968,  754.51228919,  755.26560324,
        755.73356958,  407.07249032,  367.5095935 ,  404.58030768,
        410.42993855,  401.86541564,  263.50709859,  242.4282857 ,
        261.59857718,  254.34817696,  230.7510114 , -460.75815958,
       -476.80570021, -468.02355881, -500.0843069 , -532.33884463,
       -939.46394397, -948.51905608, -951.18065803, -892.78363975,
       -857.39240367])

# Notes

here is the first column in Matlab (3 first values and last one):  
757.341447524204  
760.406472122233  
754.512291631646  
...  
-857.392391561138  

In [18]:
# check the sizes
print("X=",X.shape)
print("t=", t.shape)
print("p=", p.shape)

X= (25, 7)
t= (25, 7)
p= (7, 7)
