## 1.0 Introduction

The main objective of this function is to make possible to use two different methods for dimentionality reduction (PCA and SVD) with the same API.

Dimensionality reduction can be used with two techniques:
- PCA (Principal Components Analysis) technique, which is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

- SVD (Singular-Value Decomposition), that is a matrix decomposition method for reducing a matrix to its constituent parts in order to make certain subsequent matrix calculations simpler.

## 1.1 Import modules

In [1]:
# For dimencionality reduction
from gumly.dimensionality_reduction import dimensionality_reduction

#Others
import numpy as np
import pandas as pd

## 1.2 Using SVD  with numpy array

In [2]:
df = np.array(
        [
            [-1, -1, 2, -2],
            [-2, -1, 3, -1],
            [-3, -2, 5, 1],
            [1, 1, 6, 1],
            [2, 1, 7, 1],
            [3, 2, 8, 1],
        ],
    )
    
df_out = dimensionality_reduction(df, decomposition_method= "SVD", k=2)
df_out

array([[ 0.34073313,  0.10864144],
       [ 0.47650997,  0.17692445],
       [ 0.70110855,  0.31684481],
       [-0.0675579 ,  0.44724498],
       [-0.17133298,  0.52770822],
       [-0.36244574,  0.61481713]])

## 1.3 Using SVD with pandas DataFrame

In [3]:
def get_df(m, n):
    df = pd.DataFrame(np.random.rand(m,n))
    return df

df = get_df(6, 5)
df_out = dimensionality_reduction(df, decomposition_method="SVD", k=3)
df_out

array([[-0.07250719, -0.33848518,  0.40055807],
       [-0.47128542,  0.42480268,  0.45936698],
       [-0.43574197, -0.13118455,  0.43320854],
       [ 0.59775086,  0.57853257,  0.29583586],
       [ 0.38624385, -0.59413792,  0.29942121],
       [ 0.27617261,  0.00782794,  0.5135111 ]])

## 1.4 Using PCA with Numpy Array

In [4]:
df = np.array(
        [
            [-1, -1, 2, -2],
            [-2, -1, 3, -1],
            [-3, -2, 5, 1],
            [1, 1, 6, 1],
            [2, 1, 7, 1],
            [3, 2, 8, 1],
        ],
    )

df_out = dimensionality_reduction(df, decomposition_method="PCA", k=2)
df_out

array([[-3.55416303, -2.01120392],
       [-3.29181646, -0.37102401],
       [-2.55517195,  2.67830462],
       [ 1.76504972,  0.04940415],
       [ 2.9974995 ,  0.00646722],
       [ 4.63860223, -0.35194805]])

## 1.5 Using PCA with pandas DataFrame

In [5]:
df_out = dimensionality_reduction(df, decomposition_method="PCA", k=3)
df_out

array([[-3.55416303, -2.01120392, -0.17043529],
       [-3.29181646, -0.37102401,  0.15484655],
       [-2.55517195,  2.67830462, -0.13828135],
       [ 1.76504972,  0.04940415,  0.50760573],
       [ 2.9974995 ,  0.00646722, -0.18007924],
       [ 4.63860223, -0.35194805, -0.1736564 ]])

## 2.0 Conclusion and library advantages

This implementation is an advantage for dimentionality reduction process during the development of ML models due to its possibility to use to different techniques (PCA and SVD) by only changing the decomposition_method parameter. It allows the data scientist to run several different tests for better results in terms of explained variance.

## References

[PCA Skitlearn](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)



[SVD Sciypy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html)