In [3]:
import pandas as pd

# Load the CSV file to examine its structure

data = pd.read_csv('100_Sales.csv')

# Display the first few rows of the dataset
data.head(), data.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Region          100 non-null    object 
 1   Country         100 non-null    object 
 2   Item_Type       100 non-null    object 
 3   Sales_Channel   100 non-null    object 
 4   Order_Priority  100 non-null    object 
 5   Ship_Date       100 non-null    object 
 6   Unit_Cost       100 non-null    float64
 7   Total_Revenue   100 non-null    float64
 8   Total_Profit    100 non-null    float64
 9   Unnamed: 9      0 non-null      float64
 10  Unnamed: 10     0 non-null      float64
dtypes: float64(5), object(6)
memory usage: 8.7+ KB


(                              Region                Country        Item_Type  \
 0              Australia and Oceania                 Tuvalu        Baby Food   
 1  Central America and the Caribbean                Grenada           Cereal   
 2                             Europe                 Russia  Office Supplies   
 3                 Sub_Saharan Africa  Sao Tome and Principe           Fruits   
 4                 Sub_Saharan Africa                 Rwanda  Office Supplies   
 
   Sales_Channel Order_Priority   Ship_Date  Unit_Cost  Total_Revenue  \
 0       Offline              H  27/06/2010     159.42     2533654.00   
 1        Online              C  15/09/2012     117.11      576782.80   
 2       Offline              L  05/08/2014     524.96     1158502.59   
 3        Online              C  07/05/2014       6.92       75591.66   
 4       Offline              L  02/06/2013     524.96     3296425.02   
 
    Total_Profit  Unnamed: 9  Unnamed: 10  
 0     951410.50         NaN

In [7]:
import numpy as np

# Select relevant numeric columns and create a matrix for SVD
numeric_columns = ["Unit_Cost", "Total_Revenue", "Total_Profit"]
matrix = data[numeric_columns].to_numpy()

# Apply SVD using NumPy
U, Sigma, VT = np.linalg.svd(matrix, full_matrices=False)

# Reconstruct the matrix from SVD components to validate
reconstructed_matrix = np.dot(U, np.dot(np.diag(Sigma), VT))

# Check the shape of components and a preview of the reconstructed matrix
(U.shape, Sigma.shape, VT.shape), reconstructed_matrix[:5]


(((100, 3), (3,), (3, 3)),
 array([[1.59420000e+02, 2.53365400e+06, 9.51410500e+05],
        [1.17110000e+02, 5.76782800e+05, 2.48406360e+05],
        [5.24960000e+02, 1.15850259e+06, 2.24598750e+05],
        [6.92000000e+00, 7.55916600e+04, 1.95258200e+04],
        [5.24960000e+02, 3.29642502e+06, 6.39077500e+05]]))

SVD is a powerful technique for decomposing a matrix into three other matrices: 𝑈,Σ, and $𝑉^𝑇$
 . This is useful in many applications such as dimensionality reduction, collaborative filtering, and data compression.

Perform SVD: Use NumPy’s $np.linalg.svd()$ to decompose the matrix into three components: 
𝑈, 
Σ, and 
$𝑉
^
𝑇 $.

Reconstruct the matrix: Multiply the decomposed matrices to validate the factorization.

U: (100, 3) — Orthogonal matrix corresponding to rows.

Σ: (3,) — Singular values (diagonal matrix in compact form).

$V^T$ : (3, 3) — Orthogonal matrix corresponding to columns.