#**EXERCISE-3**

#Que.1
Given the regularized problem:

$
f_{\lambda}(x) = \lambda \|x\|_2^2 + \frac{1}{2} \|\mathbf{Ax} - \mathbf{y}\|_2^2
$

We want to express it in the form:

$
f_{\lambda}(x) = \sum_{i=1}^{N} f_i(x)
$

where $ f_i(x) $ represents individual loss terms.

The regularization term $ \lambda \|x\|_2^2 $ can be divided among $ N $ loss terms, so each $ f_i(x) $ will include $ \frac{\lambda}{N} \|x\|_2^2 $.

Thus, we need to express $ f_i(x) $ as:

$
f_i(x) = \frac{\lambda}{N} \|x\|_2^2 + \frac{1}{2} \left( (\mathbf{a}_i^T x - y_i)^2 \right)
$

Where:
- $ \mathbf{a}_i^T $ is the transpose of the $ i $-th row of matrix $ A $, representing the features for the $ i $-th data point.
- $ y_i $ is the $ i $-th element of vector $ y $, representing the corresponding target value.
- $ \lambda $ is the regularization parameter.
- $ N $ is the number of data points.

This expression includes both the regularization term and the data loss term for each data point.



#Que.2

The $k^{th}$ component of the function can be written as:

$$g_i(x)=∇xf_i(x)=\frac{λ}{N}x_k+[(\sum_{j=1}^{d}A_{ij}x_j)-y_i]A_{ik}$$

#Que.3

In [1]:
import numpy as np
import timeit
from tabulate import tabulate
np.random.seed(1000)

In [2]:
N = 200
d = 10000
lambda_reg = 0.001
eps = np.random.randn(N,1)

A = np.random.randn(N,d)
for  j in range(A.shape[1]):
  A[:,j] = A[:,j]/np.linalg.norm(A[:,j])

xorig = np.ones((d,1))
y = np.dot(A,xorig) + eps

In [3]:
def evalg(x, i, lamda, d):
  assert type(x) is np.ndarray
  A_i = np.reshape(A[i], (d, 1))
  return np.add(np.reshape(np.matmul(A_i, np.subtract(np.matmul(A[i], x), y[i])), (d,1)), np.multiply(lamda/N, x))

In [4]:
from tqdm import tqdm

In [5]:
x = np.zeros((d,1))
epochs = 1e4
t = 1
arr = np.arange(N)
table = [["Time Taken", "||Ax_alglab7 - y||^2", "||x_alglab7 - xorig||^2"]]
start = timeit.default_timer()
for epoch in tqdm(range(int(epochs))):
  np.random.shuffle(arr)
  for i in np.nditer(arr):
    g_x = evalg(x, i, lambda_reg, d)
    x = np.subtract(x , (1/t)*g_x)
    t = t+1
    if t>1e4:
      t = 1
alglab7time = timeit.default_timer() - start
x_alglab7 = x
table.append([alglab7time, np.linalg.norm(np.subtract(np.matmul(A, x_alglab7) , y))**2, np.linalg.norm(np.subtract(x_alglab7 , xorig)**2)])

100%|██████████| 10000/10000 [01:29<00:00, 111.50it/s]


In [6]:
print(tabulate(table, headers = 'firstrow', tablefmt = 'fancy_grid'))

╒══════════════╤════════════════════════╤═══════════════════════════╕
│   Time Taken │   ||Ax_alglab7 - y||^2 │   ||x_alglab7 - xorig||^2 │
╞══════════════╪════════════════════════╪═══════════════════════════╡
│      89.6863 │            2.14994e-05 │                   101.695 │
╘══════════════╧════════════════════════╧═══════════════════════════╛


#Que.4

In [7]:
epoch_list = [10**3, 10**5]
table1 = [["epochs","Time Taken", "||Ax_alglab7 - y||^2", "||x_alglab7 - xorig||^2"]]
for e in epoch_list:
  start = timeit.default_timer()
  for epoch in tqdm(range(int(e))):
    np.random.shuffle(arr)
    for i in np.nditer(arr):
      g_x = evalg(x, i, lambda_reg, d)
      x = np.subtract(x , (1/t)*g_x)
      t = t+1
      if t>1e4:
       t = 1
  alglab7time = timeit.default_timer() - start
  x_alglab7 = x
  table1.append([e,alglab7time, np.linalg.norm(np.subtract(np.matmul(A, x_alglab7) , y))**2, np.linalg.norm(np.subtract(x_alglab7 , xorig)**2)])

100%|██████████| 1000/1000 [00:27<00:00, 35.89it/s]
100%|██████████| 100000/100000 [20:45<00:00, 80.26it/s] 


In [8]:
print(tabulate(table1, headers = 'firstrow', tablefmt = 'fancy_grid'))

╒══════════╤══════════════╤════════════════════════╤═══════════════════════════╕
│   epochs │   Time Taken │   ||Ax_alglab7 - y||^2 │   ||x_alglab7 - xorig||^2 │
╞══════════╪══════════════╪════════════════════════╪═══════════════════════════╡
│     1000 │      27.8826 │            5.73464e-06 │                   101.695 │
├──────────┼──────────────┼────────────────────────┼───────────────────────────┤
│   100000 │    1245.88   │            1.92578e-05 │                   101.695 │
╘══════════╧══════════════╧════════════════════════╧═══════════════════════════╛


we can observe from the above table that the time taken by algorithm increses with increase in no. of epochs. and we can see that the value of norm (||Ax_alglab7 - y||^2) approaches to zero in every case.

#Que.5

In [9]:
from tqdm import tqdm

In [10]:
epochs = 10**5
lamda_lst = [1000, 100, 10, 1, 0, 0.1]
table2 = [["lamda","Time Taken", "||Ax_alglab7 - y||^2", "||x_alglab7 - xorig||^2"]]
for l in range(len(lamda_lst)):
  start = timeit.default_timer()
  lamda=lamda_lst[l]
  print('current lambda is:',lamda)
  for epoch in tqdm(range(epochs)):
    np.random.shuffle(arr)
    for i in np.nditer(arr):
      g_x = evalg(x, i, lamda, d)
      x = np.subtract(x , (1/t)*g_x)
      t = t+1
      if t>1e4:
       t = 1
  alglab7time = timeit.default_timer() - start
  x_alglab7 = x
  table2.append([lamda,alglab7time, np.linalg.norm(np.subtract(np.matmul(A, x_alglab7) , y))**2, np.linalg.norm(np.subtract(x_alglab7 , xorig)**2)])


current lambda is: 1000


  0%|          | 0/100000 [00:00<?, ?it/s]

100%|██████████| 100000/100000 [18:28<00:00, 90.19it/s] 


current lambda is: 100


100%|██████████| 100000/100000 [11:26<00:00, 145.76it/s]


current lambda is: 10


100%|██████████| 100000/100000 [11:27<00:00, 145.44it/s]


current lambda is: 1


100%|██████████| 100000/100000 [11:22<00:00, 146.58it/s]


current lambda is: 0


100%|██████████| 100000/100000 [11:53<00:00, 140.13it/s]


current lambda is: 0.1


100%|██████████| 100000/100000 [11:44<00:00, 141.95it/s]


In [11]:
epochs = 10**5
lamda_lst = [0.01, 0.001]
for lamda in lamda_lst:
  start = timeit.default_timer()
  for epoch in tqdm(range(epochs)):
    np.random.shuffle(arr)
    for i in np.nditer(arr):
      g_x = evalg(x, i, lamda, d)
      x = np.subtract(x , (1/t)*g_x)
      t = t+1
      if t>1e4:
       t = 1
  alglab7time = timeit.default_timer() - start
  x_alglab7 = x
  table2.append([lamda,alglab7time, np.linalg.norm(np.subtract(np.matmul(A, x_alglab7) , y))**2, np.linalg.norm(np.subtract(x_alglab7 , xorig)**2)])


100%|██████████| 100000/100000 [11:25<00:00, 145.93it/s]
100%|██████████| 100000/100000 [11:30<00:00, 144.74it/s]


In [12]:
print(tabulate(table2, headers = 'firstrow', tablefmt = 'fancy_grid'))

╒══════════╤══════════════╤════════════════════════╤═══════════════════════════╕
│    lamda │   Time Taken │   ||Ax_alglab7 - y||^2 │   ||x_alglab7 - xorig||^2 │
╞══════════╪══════════════╪════════════════════════╪═══════════════════════════╡
│ 1000     │     1108.72  │         7965.58        │                   99.8482 │
├──────────┼──────────────┼────────────────────────┼───────────────────────────┤
│  100     │      686.038 │         3756.67        │                   99.4431 │
├──────────┼──────────────┼────────────────────────┼───────────────────────────┤
│   10     │      687.551 │          553.909       │                  101.005  │
├──────────┼──────────────┼────────────────────────┼───────────────────────────┤
│    1     │      682.212 │           10.2718      │                  101.604  │
├──────────┼──────────────┼────────────────────────┼───────────────────────────┤
│    0     │      713.637 │            9.53832e-26 │                  101.695  │
├──────────┼──────────────┼─

we can see that the taken taken by algorithm is almost same for each value of lambda and we can also observe that the value of norm (||Ax_alglab7 - y||^2) decreases if we decrease the value of lamda from 1000 to 0 after that the value of norm slightly increases and then again decreases and approached to zero.

#Que.6

we can see in the previous exercises that our failure dimension was d=10,000 and in this exercise we saw that ALG-LAB7 works for d=10000.

#Que.7
 I understand that the ALG-LAB7 method is almost similar to gradient descent because in this we are updating our x by $x=x-\frac{1}{t}∇f_i(x)$, where $\frac{1}{t}$ is considered as step length.

This method works for higher dimensions also.