Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linear-classify.md - Redundant transposition #125

Open
cheind opened this issue Jan 4, 2017 · 3 comments
Open

linear-classify.md - Redundant transposition #125

cheind opened this issue Jan 4, 2017 · 3 comments

Comments

@cheind
Copy link

cheind commented Jan 4, 2017

Hi,

in linear-classify.md you first introduce the linear classifier as

$$f(x_i, W, b) = W x_i + b $$
In the above equation, we are assuming that the image (x_i) has all of its pixels flattened out to a single column vector of shape [D x 1]. The matrix W (of size [K x D]), and the vector b (of size [K x 1]) are the parameters of the function

Later, in SVM loss you have

$$ L_i = \sum_{j\neq y_i} \max(0, w_j^T x_i - w_{y_i}^T x_i + \Delta) $$
where (w_j) is the j-th row of (W) reshaped as a column.

This sounds overly complex to me:

(w_j) is the j-th row of (W) reshaped as a column

is the justification to have the transpose in w_j^T x_i. But all it means, referring to your initial definition of W, is that you have the dot-product of the w_j's row with x_i. So, given your initial definition the loss could be defined as

$$ L_i = \sum_{j\neq y_i} \max(0, w_j x_i - w_{y_i} x_i + \Delta) $$
where w_j is the j-th row of (W).

Am I overlooking something?

@flyman3046
Copy link

I think in algebra a vector is generally referring to a column vector. So w_j is a column vector (with D elements) after reshaping j-row of W. And by convention, w_j^T x_i will give a real number. Check out section 1.1 and 2.1 in http://cs229.stanford.edu/section/cs229-linalg.pdf.

@nizza
Copy link

nizza commented Mar 23, 2017

I think there's a typo. In order to obtain a scalar the first operand must have size (1,k) and the second must have size (k,1).
So the row vector w_j must not be transposed.

@xuweichn
Copy link

there was a comment: where w_j is the j-th row of W reshaped as a column, means that w_j is a column vector which is reshaped from the j-th row of W.

I was confused by this point by weeks. Intuitively, I think the w_j is the j-th row of W. I still think the way of dealing with the vector is meaningless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants