You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in linear-classify.md you first introduce the linear classifier as
$$f(x_i, W, b) = W x_i + b $$
In the above equation, we are assuming that the image (x_i) has all of its pixels flattened out to a single column vector of shape [D x 1]. The matrix W (of size [K x D]), and the vector b (of size [K x 1]) are the parameters of the function
Later, in SVM loss you have
$$ L_i = \sum_{j\neq y_i} \max(0, w_j^T x_i - w_{y_i}^T x_i + \Delta) $$
where (w_j) is the j-th row of (W) reshaped as a column.
This sounds overly complex to me:
(w_j) is the j-th row of (W) reshaped as a column
is the justification to have the transpose in w_j^T x_i. But all it means, referring to your initial definition of W, is that you have the dot-product of the w_j's row with x_i. So, given your initial definition the loss could be defined as
$$ L_i = \sum_{j\neq y_i} \max(0, w_j x_i - w_{y_i} x_i + \Delta) $$
where w_j is the j-th row of (W).
Am I overlooking something?
The text was updated successfully, but these errors were encountered:
I think in algebra a vector is generally referring to a column vector. So w_j is a column vector (with D elements) after reshaping j-row of W. And by convention, w_j^T x_i will give a real number. Check out section 1.1 and 2.1 in http://cs229.stanford.edu/section/cs229-linalg.pdf.
I think there's a typo. In order to obtain a scalar the first operand must have size (1,k) and the second must have size (k,1).
So the row vector w_j must not be transposed.
there was a comment: where w_j is the j-th row of W reshaped as a column, means that w_j is a column vector which is reshaped from the j-th row of W.
I was confused by this point by weeks. Intuitively, I think the w_j is the j-th row of W. I still think the way of dealing with the vector is meaningless.
Hi,
in linear-classify.md you first introduce the linear classifier as
Later, in SVM loss you have
This sounds overly complex to me:
is the justification to have the transpose in w_j^T x_i. But all it means, referring to your initial definition of W, is that you have the dot-product of the w_j's row with x_i. So, given your initial definition the loss could be defined as
$$ L_i = \sum_{j\neq y_i} \max(0, w_j x_i - w_{y_i} x_i + \Delta) $$
where w_j is the j-th row of (W).
Am I overlooking something?
The text was updated successfully, but these errors were encountered: