# Session 3: SVD and PCA

Welcome to Session 3 of the practical Machine Learning sessions. In this practical session, we'll understand and implement SVD and PCA, which stand for **Singlular Value Decomposition** and **Principal Component Analysis**. These two ideas are very similar, and have the same end goal. 

In this session, we'll only introduce ourselves to these concepts. The reason we won't go deep into these topics in this session itself, is because these topics have no practical application independently. They are used as a part of bigger models. So in this session we'll understand the concepts and intuition behind SVD and PCA, and later use it as a part of other Machine Learning projects. But yet, these topics deserve an entire session dedicated to themselves, because even though we cannot build a model using these topics, the idea behind them is quite important in Machine Learning, and its not totally self-intuitive. So the ideas do need a little bit of explanation. Also, we will still build interesting models using these concepts.

As you know, these sessions are more practically-oriented than theoretically-oriented. So, we will not dive into most of the mathematics and theory behind the concepts, and only touch the information that is needed to build the code for our model. Our focus will be on understanding the functionality. You can find the exact innards of these concepts through a lot of online resources, and even the theory lectures of this course.

So, Let us first understand what is the motivation behind these concepts, and then understand what SVD and PCA are. 

# The Curse of Dimensionality

Data, especially in recent times, often are of very high *dimensions*, or features. For example, satellite images sometimes contains thousands of bands of frequency spectrums, called *channels*, each of which is a separate feature. In Natural Language Processing (that is, the branch of AI that deals with Language understanding, for example, voice recognition (Siri, Google Assistant, Alexa), language to language translation, Sentiment Analysis (identifying hate speech, violence, etc)), words are represented as individual features. So if you wanted to model the entire English Language, you would be dealing with 1 million+ features. 

Infact, think of the amount of data that your phone generates. Modern phones have a lot of sensors, like Accelerometer, Gyroscope, Magnetometer, light sensor, biometric sensor, GPS sensor, etc., all of which generate data continuously and independently, which is eventually transmitted to the company, which it uses to provide various services. Similarly, Aeroplanes generate hundreds of biliions of gigabytes of data every year, in the form of multiple sensors and manually fed data. The list goes on and on. 

Now, ideally, the goal of any model is to understand the data completely, or in other words, extract *all* the information that the data can provide to us, and the utopian solution to this is to build a model over all the data available, including all features.

But, practically, its not an efficient method to use all the data 

In [1]:
import torch

In [34]:
A= torch.rand((10,2))

In [43]:
u,s,v=torch.pca_lowrank(A,center=False)

In [36]:
u.shape, s.shape, v.shape

(torch.Size([10, 2]), torch.Size([2]), torch.Size([2, 2]))

In [37]:
u1,s1,v1=torch.svd_lowrank(A)

In [18]:
u

tensor([[-0.3577,  0.1316],
        [ 0.2309, -0.0058],
        [ 0.0937, -0.4915],
        [ 0.3151,  0.6912],
        [-0.4855,  0.1157],
        [ 0.0844, -0.4373],
        [ 0.3858, -0.1735],
        [-0.5298, -0.0116],
        [ 0.1745,  0.1683],
        [ 0.0887,  0.0130]])

In [19]:
u1

tensor([[ 0.3830,  0.3136],
        [ 0.2460, -0.2374],
        [ 0.3272, -0.4082],
        [ 0.1582,  0.0980],
        [ 0.4171,  0.4068],
        [ 0.3244, -0.3694],
        [ 0.2225, -0.4584],
        [ 0.4405,  0.3687],
        [ 0.2438, -0.0916],
        [ 0.2804, -0.1127]])

In [20]:
s

tensor([0.9113, 0.4729])

In [21]:
s1

tensor([2.9020, 0.6642])

In [22]:
v

tensor([[-0.2747, -0.9615],
        [-0.9615,  0.2747]])

In [23]:
v1

tensor([[ 0.7845, -0.6202],
        [ 0.6202,  0.7845]])

In [44]:
u@torch.diag(s)@v.T

tensor([[0.1562, 0.2489],
        [0.7874, 0.3799],
        [0.9528, 0.9841],
        [0.8291, 0.5549],
        [0.7017, 0.8754],
        [0.8887, 0.5054],
        [0.7023, 0.8682],
        [0.5082, 0.1849],
        [0.5827, 0.0928],
        [0.5316, 0.6022]])

In [39]:
A

tensor([[0.1562, 0.2489],
        [0.7874, 0.3799],
        [0.9528, 0.9841],
        [0.8291, 0.5549],
        [0.7017, 0.8754],
        [0.8887, 0.5054],
        [0.7023, 0.8682],
        [0.5082, 0.1849],
        [0.5827, 0.0928],
        [0.5316, 0.6022]])

In [40]:
u1@torch.diag(s1)@v1.T

tensor([[0.1562, 0.2489],
        [0.7874, 0.3799],
        [0.9528, 0.9841],
        [0.8291, 0.5549],
        [0.7017, 0.8754],
        [0.8887, 0.5054],
        [0.7023, 0.8682],
        [0.5082, 0.1849],
        [0.5827, 0.0928],
        [0.5316, 0.6022]])