Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort singular values #46

Merged
merged 13 commits into from Sep 24, 2016
Merged

Conversation

Andlon
Copy link
Collaborator

@Andlon Andlon commented Sep 18, 2016

Note: This is not a complete implementation! Intended to eventually solve #17.

My idea was to first extend the existing SVD tests to verify that the singular values were sorted. Once these were failing, I'd go ahead and implement the sort, which should restore them back to a passing state. As it happened, for the existing test data, the SVD algorithm in use already returned sorted singular values, so I constructed some new matrices for which I'd expect the algorithm not to return sorted singular values. However, it seems that for these test matrices, the SVD algorithm does not complete (i.e. it is stuck in an endless loop).

Hence, as long as the SVD algorithm does not finish, I'm also unable to complete the feature in question. I'm not at all familiar with the algorithm in use, so I can't really debug it without spending a lot of time dividing into the algorithm and how it works. Unfortunately, at the moment, that is time I don't have, so I wanted to put the PR up here, in case anybody has any idea how to fix the SVD.

See also commit messages.

Interestingly, the singular values just happen to be
ordered already for the tall and short and matrices,
while they are not ordered correctly
for the square example. Before we implement the
sorting of the matrices, we at least want these tests
to fail, so we must construct new matrix examples
for which the singular values hopefully come out
unsorted.
Before implementing the sort functionality,
it is useful to have some tests that fail because
the singular values are not sorted. However,
it appears that the computation of the SVD
for these examples does not even (currently) finish!
@AtheMathmo
Copy link
Owner

Thanks for this! I was worried this might be the case somewhere... The algorithm should theoretically terminate but I'll have to dig around to figure out why it isn't - fortunately you've given me examples to test!

I suspect it is either numerical inaccuracy causing the break condition to not be met or a minor bug somewhere. I'll try to figure out which and get a fix in - it might be as simple as a safety check to stop us looping infinitely and return an Error instead.

@AtheMathmo
Copy link
Owner

AtheMathmo commented Sep 18, 2016

Hey @Andlon.

Good news! I'm pretty confident I found the cause of the infinite looping and it is an easy bug to fix. I'm happy for you to roll it into this PR. The offending line is number 350:

// This is incorrect
diag_abs_sum = T::min_positive_value() *
                                   (b_ii.abs() + *b.get_unchecked([i + 1, i + 1]));

// It should be:
diag_abs_sum = T::min_positive_value() *
                                   (b_ii.abs() + b.get_unchecked([i + 1, i + 1]).abs());

This is detailed in algorithm 1b in the source for this implementation.

Let me know how you want to handle this.

@AtheMathmo
Copy link
Owner

As a side note - we can also speed up convergence if we use a more sensible value for epsilon. Right now I'm using T::min_positive_value(). I put a PR in for the num crate which would give us a nicer alternative.

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 19, 2016

That was quick. Great job! Thanks for looking into it.

I guess the easiest thing to do is for me to make the change and just accredit you in the commit message? Or if you want this fix in as soon as possible, you can include it separately and I can rebase. I might have some time tonight to continue work on this. If not, I should have some time during the week.

Also, your PR to num sounds useful indeed. Let's hope they're able to get it in within reasonable time (I see that they're discussing the fact that it's a breaking change).

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 19, 2016

Also, a bit tangential, but I didn't want to open a new issue just yet. Currently, when writing tests, it's rather painful to compare approximate equality for matrices and vectors. I used the same style as I've seen in the existing code, which is to use iterators to check that the absolute difference for each element is less than some fairly arbitrary threshold.

I think we should have a standardized way of doing this, probably based on ULPS such that we can do something to the effect of

assert_approx_eq!(expected_matrix, actual_matrix);

The approx crate seems to be a good candidate. We should also be able to expose this behavior to the users of rulinalg, so that they can also easily use this functionality in their own tests. If we choose to use such a crate, we should probably re-export it within rulinalg, such that it can more easily be replaced if necessary.

What's your thoughts on this? In any case, I don't need it to complete the current issue, but it would certainly make testing more straightforward.

@AtheMathmo
Copy link
Owner

I'm happy to leave the bug fix to you for now - if it turns out that you don't have as much time as you thought I'll get it in separately.

I agree that approximate equality would be a useful thing! I'll try to remember to write up an issue for that tomorrow.

Thanks to @AtheMathmo for figuring out the issue.
See AtheMathmo#46 for
a discussion of the problem.
@Andlon
Copy link
Collaborator Author

Andlon commented Sep 19, 2016

So I had a tiny bit of time to look more into it. For the short and tall tests, we expect singular values
[8.0, 6.0, 4.0, 2.0]. However, it turns out that the algorithm spits out [8.0, 6.0, 2.0, -4.0] (note the negative sign). However, it passes the validate_svd assertions. I think this indicates that the sign of one of the corresponding singular vectors (i.e. left or right singular vector) is, or should be, flipped somehow?

You can see this for yourself if you incorporate the one-line fix you sent me and disable the tests against expected_values (i.e. comment out the three lines after validate_svd). The tests will pass, but if you inspect the value of b, you can see the negative sign. (validate_svd does not check for non-negativity)

@AtheMathmo
Copy link
Owner

Ah - interesting...

I'll have to dig into this a little. It looks like it may be worth adding a new issue to track this bug if it's non-trivial.

@AtheMathmo
Copy link
Owner

So after a little more reading it seems that the non-negative singular values is a conventional requirement. The validate svd passes because this is a sort-of valid SVD. I'm not sure if the sign flip is a bug in my implementation or whether we should simply enforce this at the same time as ordering the values. i.e. have a new function to enforce convention, process_svd or something? This would order the singular values (and rows/columns) and would also make sure all singular values are positive and multiply one of the orthogonal rows/columns by -1 too.

I'll try to figure out if the algorithm should capture this - but if not this doesn't add too hard a performance hit and produces a conventionally valid svd.

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 20, 2016

I skimmed the article you had posted earlier, and to me it wasn't entirely clear. However, they defined the SVD using non-negative diagonal entries, so I would expect them to explicitly notify the reader that the singular values produced by the algorithm could be negative, if this were the case...

I agree that we can fix this by post-processing, however if the algorithm is actually intended to produce non-negative singular values, the behavior could also hide other, more subtle bugs.

Is it possible to break the algorithm up more into individually testable pieces? I suspect this may be difficult. I could however write some tests for the Golub-Kahan step, and verify the properties of the output as described in the aforementioned paper.

@AtheMathmo
Copy link
Owner

I skimmed the article you had posted earlier, and to me it wasn't entirely clear. However, they defined the SVD using non-negative diagonal entries

It does seem surprising to me - but they also define the singular values being ordered and I'm fairly confident this isn't enforced by the algorithm. Nonetheless I think it warrants a little extra reading.

Is it possible to break the algorithm up more into individually testable pieces?

Yes we can break it up a little more. Many of the algorithms (including the one I'm using) first reduce the algorithm to a bidiagonal decomposition (which is already separate). Then we block the matrix to get the part we are still working on. We zero off the off-diagonal of zero-valued singular values. And finally apply some inner step (like Golub-Kahan) to reduce the rest of the matrix.

I think that we can break off each of these to test them individually. But to keep code tidy this might be better done in a new module (at least until we make a decision about custom decomposition types).

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 20, 2016

Ah, you make a good point. In that case - at least until we determine otherwise - I think we should assume that the sign from Golub-Reinsch is not correct, and I'll proceed with an implementation that post-processes the result of the Golub-Reinsch algorithm. I think I'll structure it like the following:

pub fn svd(...) {
    // Call self.svn_unordered() and sort the results
}

// Whether public or not depends on result of benchmark, i.e. if it's entirely practically negligible or not
pub fn svd_unordered() {
    // Call self.svd_golub_reinsch() and post-process the results to ensure non-negativity
}

fn svd_golub_reinsch() {
   // Equivalent to what today is .svd()
}

In the long run, we need much more extensive tests. I would be interested to look into using some form of property-based testing, like QuickCheck.

Thoughts?

@AtheMathmo
Copy link
Owner

AtheMathmo commented Sep 20, 2016

So I have managed to find one more bug! One of the steps I described:

We zero off the off-diagonal of zero-valued singular values.

Is currently very wrong! Instead of applying a transformation to do this I am essentially setting the value to zero. I would be surprised if this makes a significant enough difference to cause a sign change - but nonetheless I'll try to sort it. This probably motivates a new issue for the two SVD bugs?

Feel free to continue as you are planning to for now. I (optimistically) suspect that the new bug is only affecting convergence rates and accuracy.

Edit: I couldn't crack the bug on my own (yet). So I posted a stack exchange question - feel free to check it out.

Edit2: Sorry I missed this part. I like the idea of using QuickCheck too - I think especially to help us cover error handling. For things like SVD and other decompositions it seems incredibly valuable.

@AtheMathmo AtheMathmo mentioned this pull request Sep 20, 2016
6 tasks
@Andlon
Copy link
Collaborator Author

Andlon commented Sep 20, 2016

Sounds good. I'm not well-versed enough in the details of these algorithms to be able to offer much assistance at the moment, and in many ways diving into them is too similar to my thesis work, in the sense that I'd like the things I do for rulinalg to be mostly recreational. Hence I'd rather like to refrain from that for the time being. That said, I'll otherwise offer any assistance that I can, and I'm generally happy to work on less technical things!

@AtheMathmo
Copy link
Owner

That's totally fair! I don't like leaving contributors to clean up messes I made in my code so I'll do my best to get it fixed up :).

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 20, 2016

I think leaving bugs in the implementation of a very technical and difficult algorithm does not count as "making a mess", but maybe that's just me :P

Anyway, I'll see if maybe I have a little time tonight, although I was thinking I'd finish the macro PR first, since it's so simple.

@AtheMathmo
Copy link
Owner

I think leaving bugs in the implementation of a very technical and difficult algorithm does not count as "making a mess", but maybe that's just me :P

I think you're probably right! Fortunately I figured out the answer to the stackexchange question - I should be able to test out the fix later on today.

Anyway, I'll see if maybe I have a little time tonight, although I was thinking I'd finish the macro PR first, since it's so simple.

Great! The macro will be really good to have in there finally.

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 20, 2016

All right, I think I'm back on the track with this issue now. I've implemented the signs correction, so now I can turn my mind back to the sorting of the singular values again.

I just wanted to make a mental note that we really need to document the dimensions of the returned SVD! I was very surprised when I discovered that the U and V matrices were not both square. Of course, there are many equally adequate ways of representing the SVDs - we just need to document it!

@AtheMathmo
Copy link
Owner

I just wanted to make a mental note that we really need to document the dimensions of the returned SVD!

Right you are! I had a bit of a headache figuring this out myself so it's certainly bad that I didn't document it well at the time...

@AtheMathmo
Copy link
Owner

I managed to do a little digging into the negative values and auto-sorting. I don't have any super compelling evidence - but you can read about what I found in #48 .

@AtheMathmo
Copy link
Owner

I need to look a little more but I think that the code looks solid to me.

I think at this point it is worth us splitting out the svd code into a new module as well. This means modifying the decompositions module to have a folder structure too. What do you think of this?

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 22, 2016

I still need to set up the benchmarks and fix up the documentation a little. Hoping to have some time to work on that today.

Yeah, definitely. The decompositions module is getting a bit unwieldy. Are you thinking something like this?

decompositions/svd.rs
              /bidiagonal.rs
              /qr.rs
              /lup.rs
              /and_so_on.rs

I think, however, that we should complete this PR and treat the reorganization as a separate issue. This seems like a prerequisite to further work on #40, perhaps...?

@AtheMathmo
Copy link
Owner

All sounds good to me. Take your time with the docs etc.

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 22, 2016

All right, so here's an update. I quickly hacked together a few benchmarks, and at the moment the SVD implementation converges so slowly that it doesn't make sense to talk about sorting vs not sorting. For example, within 5-10 minutes it hadn't finished the SVD of a 100x100 matrix, which took Numpy about 0.05 seconds.

I think I'll leave out the benchmarks for now, let sorted SVDs be the default, fix up the docs, then let you review for a merge. Then we'll leave the slow convergence to #48 instead. I think after we've got a reasonably fast SVD, we can revisit the topic of whether or not the ordering should be opt-out. What do you think?

@AtheMathmo
Copy link
Owner

Ha! 5-10 minutes is pretty terrible, looks like we do indeed need some more work.

I'm hoping this is a result of our incredibly harsh stopping criterion (< T::min_positive_value() ...). You could try modifying these to be something like (< T::from(1e-8) ...). If that makes no difference then we're back to the drawing board...

Your approach sounds good though. Let's get this in even if the performance remains terrible and we'll address other issues in #48.

Thanks

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 22, 2016

Ah, of course. I forgot about that! I made a quick hack with your suggested changes to change the stopping criterion. Preliminary bench results look like this:

test linalg::svd::svd_100_100                      ... bench:  39,352,118 ns/iter (+/- 4,253,824)
test linalg::svd::svd_10_10                        ... bench:      80,026 ns/iter (+/- 1,361)
test linalg::svd::svd_golub_reinsch_100_100        ... bench:  38,152,409 ns/iter (+/- 1,494,442)
test linalg::svd::svd_golub_reinsch_10_10          ... bench:      79,257 ns/iter (+/- 2,199)
test linalg::svd::svd_unordered_100_100            ... bench:  37,881,354 ns/iter (+/- 103,384)
test linalg::svd::svd_unordered_10_10              ... bench:      79,246 ns/iter (+/- 1,337)

I'll work on it a bit more and see what happens. Not sure what to make of it just yet.

Edit: svd_golub_reinsch_* is without the overhead of flipping the signs. svd_unordered is flipping signs, but not sorting the singular values. svd also sorts the singular values.

@AtheMathmo
Copy link
Owner

Well those numbers look a lot more reasonable. In fact - they seem an order of magnitude faster than numpy? I'm pretty sure numpy only has svd support via lapack so that would be surprising (unless python adds that much overhead?).

Either way this is good news (providing the accuracy isn't wrecked by this change).

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 22, 2016

I didn't use the same matrix for numpy and the rulinalg benches, just the same size to get a ballpark figure, which would explain a lot. Other than that, I only ran the numpy test once and manually timing it with time.clock() in Python. Will report back once I've had time to go more in depth, but I agree it looks a lot more promising indeed :)

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 22, 2016

I hadn't even thought thoroughly about it, but of course T::min_positive_value is something like e-308 for f64, which of course doesn't work. T::from(3.0 * std::f64::EPSILON).unwrap() seems to work well, but that obviously only works for f64. Without the Float.epsilon() feature you submitted to num, what do you suggest I add here as a stopgap solution?

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 22, 2016

I suppose I could create a temporary trait that exposes epsilon() for f32 and f64 and document it as such until epsilon lands in num?

@AtheMathmo
Copy link
Owner

I'm not sure how best to handle it for different floats. I guess a simple approach is to use f32::EPSILON and try to pick a multiple that works well for both f64 and f32? We can track in a separate issue to get the num change later. It seems that change is desired but will break the API and so must either be modified or delayed.

@AtheMathmo
Copy link
Owner

We managed to get the epsilon PR into num - hopefully that means we'll be able to utilize it soon!

The idea is to remove this once epsilon() lands
in the next version of the num crate.
Since the benchmarks test module-internal methods,
the benchmarks can not reside in the benches/ subdirectory.
However, since the benchmarks must be run on nightly, it
involves some complication. Hence, in order to run these
benchmarks, you must add the 'nightly' feature to the compilation,
like so:

`cargo bench --features nightly`
@Andlon
Copy link
Collaborator Author

Andlon commented Sep 22, 2016

I'm currently using 3 times machine epsilon, and I got these numbers:

test matrix::decomposition::tests::benches::svd_100_100           ... bench:  40,232,457 ns/iter (+/- 133,909)
test matrix::decomposition::tests::benches::svd_10_10             ... bench:      92,791 ns/iter (+/- 521)
test matrix::decomposition::tests::benches::svd_unordered_100_100 ... bench:  40,058,111 ns/iter (+/- 129,457)
test matrix::decomposition::tests::benches::svd_unordered_10_10   ... bench:      92,757 ns/iter (+/- 2,657)

It seems to me that the ordering of the singular values has little impact on performance, although to be fair this is only for a single matrix. In my opinion, we should hide svd_unordered() from the public API (which I've done so far in the implementation), at least for now. It may be worthwhile to revisit some time in the future, but for now I think we don't have sufficient tests that verify the accuracy of the computations to warrant spending a lot of time on performance optimizations.

In order to incorporate these benches, I needed access to non-public methods of the API, so I had to embed the benchmarks in the decomposition.rs file. However, as you know, the benchmarks need to be run on nightly, and so I had to add some conditional compilation steps in order to get around this. As a result, in order to run the SVD benches, you need to append --features nightly to the cargo bench command. If you have some suggestions for better ways to handle this, I'd be very happy to improve. However, after spending quite some time researching it seems you'll have to use feature flags of some kind, at the very least, so I'm not sure how much better it can get.

I consider the feature complete now, pending your review.

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 22, 2016

Oh, and, yeah, it's great that the num PR was merged :)

Copy link
Owner

@AtheMathmo AtheMathmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the implementation looks good - and the new tests are a lot more comprehensive, thank you!

I would like to move these benches out (and remove the ones on private functions). Adding them like this will require CI changes and is one more (albeit minor) thing to be aware of when developing. If you disagree I am happy to hear you out!

let ref mut shortest_matrix = if u.rows() <= v.rows() { &mut u }
else { &mut v };
let column_length = shortest_matrix.rows();
let num_singular_values = cmp::min(b.rows(), b.cols());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe b should be square by this point. But we can keep this in for future-proofing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I wrote this function when I thought b was rectangular. However, the implementation as is doesn't make any assumptions on the form of the SVD decomposition, and I think (although this isn't tested) it would work also for other forms of the SVD, so we might as well keep it as is? (i.e. if we swap the Golub-Reinsch algorithm with one that outputs a different kind of SVD)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it makes sense to keep it in whether we need it or not.


// When correcting the signs of the singular vectors, we can choose
// to correct EITHER u or v. We make the choice depending on which matrix has the
// least number of rows. Later we will need to multiply all elements in columns by
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm just tired - but don't we want to multiply the rows of v by -1 instead of the columns? b * v is basically scaling the rows of v by the respective singular values?

If this is true then it becomes a little trickier to determine the faster approach. It will probably almost always be faster to flip the sign on the row due to cache access and vectorization.


Actually I just realised that we transpose v at the end so this is correct. I'm leaving this comment to highlight the fact that we may want to modify the algorithm so that we are operating on rows (not in this PR of course).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible to collect the indices of all columns that need to be flipped, and then iterate over each row, flipping the sign of each element as we go. It's also possible to do the same for the sorting, and would likely be faster. However, the implementation would be slightly more complex, and since the overhead seems to be more or less negligible at this point, I opted for the simpler approach of just dealing with columns directly.

As you say, we may want to revisit this in the future!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I just realized that a 100 x 100 matrix (which I used in my benchmarks) would still fit in L2 cache. Guess we should do some tests for matrices so big that they also don't fit in L3? Perhaps the results would be significantly different then...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try that too. In particular the case nxm where n >> m, E.g. 10000x10. This a bit more representative of cases we may see in rusty-machine.

Copy link
Collaborator Author

@Andlon Andlon Sep 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that next chance I get. Thinking a little about it though, I suspect that we have the following performance characteristics. First, consider that if m >= n, SVD is usually something like O(m n^2) flops or so, and I'm not sure about memory accesses of Golub-Reinsch. The ordering of the singular vectors is roughly O(n^2) memory accesses in the worst case. Based on this and our current benchmarks, I would expect the following:

  • For small matrices, the matrix fits in cache and the column swaps are therefore still fast enough so that the cost is negligible.
  • For larger matrices for which the matrix does not fit in cache, the column swaps are presumably more expensive (though I expect due to pre-fetching it's perhaps not too bad, since the stride is uniform?), but the cost of the ordering is completely dwarfed by the complexity of the SVD computation.

Looking forward to try the benchmarks out!

By the way, you may find the following interesting. I spent Fall 2014 and Spring 2015 at UC Berkeley. Among other things, I took some courses under Prof. James Demmel. Prof. Demmel is a leading figure in the research of what they call "communication avoiding algorithms". In essence, the goals of these algorithms is to minimize not just the number of flops required for an algorithm, but it also uses a simple mathematical model for cache behavior, so that you can develop algorithms that provably minimize (again in a big O sense) the amount of communication, i.e. the number of expensive memory accesses. I mention this, because I remember that at some point prof. Demmel referred me to some certain papers on these algorithms. Their publications can be found here. I think in particular this survey paper may be of interest. It's fairly up to date (published in 2014) and discusses the communication optimality of various algorithms in linear algebra.

However, I want again to stress that we should ensure correctness of the algorithms in rulinalg (by extensive testing against expected error bounds etc.) before this should become a topic. Still, thought I'd mention it, because I think there's a lot of interesting stuff in there :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have missed this comment before. Thank you for pointing out those papers! They look like really interesting reads - though at a first glance pretty challenging. The survey paper looks like a really good start, it looks like I could learn a lot from it.

// Sorting a vector of indices simultaneously with the singular values
// gives us a mapping between old and new (final) column indices.
indexed_sorted_values.sort_by(|&(_, ref x), &(_, ref y)|
x.partial_cmp(y).expect("All singular values should be finite, and thus sortable.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this panic be user error or an algorithmic failure? (Or both I guess...)

I'm wondering whether it is better to propagate this by returning a Result. It feels like a "something has gone terribly wrong" though so the panic is probably appropriate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning for a panic was this:

We are using the successful result of svd_unordered (which in turns uses the result of svd_golub_reinsch), which means that if the singular values are not all finite, it must be a bug in one of these functions, in which case I think a panic is appropriate. Does this make sense to you?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that does make sense, just wanted to check!

@@ -1297,4 +1485,55 @@ mod tests {

let _ = a.lup_decomp();
}

#[cfg(all(test, feature = "nightly"))]
mod benches {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to move these benches into the main benches folder and just avoid benching the svd_unordered. If we ever want to add this function to the public api we can add the benches then.

Of course it is nice to have it benchmarked but in this case I'd prefer to leave it out due to other feature changes. These mean I have to modify the CI for example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it isn't worth the added complexity. I added it partly because I thought it would be useful to have a bench harness also for private functions. But I suppose this use case is far and between at the moment. In any case, we should probably be more concerned with correctness than maximum performance going forward anyway. Perhaps by the time we need more fine-granied benchmarks again, there will be a benchmarking harness in stable.

I'll remove all the module internal benchmarks as you say!

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 23, 2016

OK, so I tried running 10000x10, and it wouldn't finish in reasonable time. I figured maybe the benchmarking harness was running it a lot of times, so I made an example application that runs SVD a single time and measures the time. I'm seeing some interesting things. I have the following examples (this is just one run without any averaging, so they're not very accurate):

$ cargo run --release <rows> <cols>
...

$ cargo run --release 100 100
Computed ordered SVD in:   0.08540 seconds.
Computed unordered SVD in: 0.06431 seconds.

$ cargo run --release 1000 10
Computed ordered SVD in:   1.76212 seconds.
Computed unordered SVD in: 1.67813 seconds.

$ cargo run --release 10 1000
Computed ordered SVD in:   1.75135 seconds.
Computed unordered SVD in: 1.67570 seconds.

$ cargo run --release 1000 100
Computed ordered SVD in:   16.38357 seconds.
Computed unordered SVD in: 16.50339 seconds.

$ cargo run --release 100 1000
Computed ordered SVD in:   16.70569 seconds.
Computed unordered SVD in: 16.55411 seconds.

cargo run --release 10 100
Computed ordered SVD in:   0.00540 seconds.
Computed unordered SVD in: 0.00548 seconds.

$ cargo run --release 1 1000
Computed ordered SVD in:   0.24941 seconds.
Computed unordered SVD in: 0.17086 seconds.

$ cargo run --release 1 2000
Computed ordered SVD in:   1.36840 seconds.
Computed unordered SVD in: 1.31257 seconds.

$ cargo run --release 1 4000
Computed ordered SVD in:   10.30131 seconds.
Computed unordered SVD in: 10.25298 seconds.

$ cargo run --release 1 8000
Computed ordered SVD in:   81.76079 seconds.
Computed unordered SVD in: 81.26149 seconds.

Seems to be the performance scaling is rather less than optimal. I guess this might be due to the incorrect handling of the Givens rotations you were talking about?

Instead just bench the public .svd() method along
with the other benchmarks.
@Andlon
Copy link
Collaborator Author

Andlon commented Sep 23, 2016

@AtheMathmo: OK, I removed and moved out the benches as you requested. Think that should settle it, unless you see anything else that needs to be done!

@AtheMathmo
Copy link
Owner

Thanks for getting those timings. I agree that the scaling is a little worrying - but not totally unexpected. Did you try running 10000x10 out of the benching harness?

I'll look this over again quickly later today and get it merged. Thank you for all your hard work with it - I'm sorry also that the issue turned into much more than it began as!

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 24, 2016

Thanks for getting those timings. I agree that the scaling is a little worrying - but not totally unexpected. Did you try running 10000x10 out of the benching harness?

Yeah, but it didn't finish in reasonable time (I didn't have the patience to let it run longer than 10 mins or so, sorry!).

I'll look this over again quickly later today and get it merged. Thank you for all your hard work with it - I'm sorry also that the issue turned into much more than it began as!

No need to be sorry. That is the nature of development :)

@Andlon
Copy link
Collaborator Author

Andlon commented Sep 24, 2016

By the way, note that for the tests where the dimensions are 1xN, the time scales almost perfectly with N3. At most we would expect N2 here.

let num_singular_values = cmp::min(b.rows(), b.cols());

for i in 0 .. num_singular_values {
if b[[i, i]] < T::zero() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very minor but we can remove some bound checks here by using the get_unchecked functions. This will have a very negligible effect though so I'm happy to merge without.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but I came to the conclusion that the effect here would presumably be completely negligible, in which case I think going unsafe is not worth it. I think it makes more sense to only use unsafe if you know that you have a potential bottleneck, preferably after profiling. Otherwise this would just fall into the category of premature optimization...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds sensible - let's keep it as the safe bound checked version.

/// non-increasing order. U and V have orthonormal columns, and each column represents the
/// left and right singular vectors for the corresponding singular value in Σ, respectively.
///
/// Denoting the dimensions of self as M x N (rows x cols), the dimensions of the returned matrices
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there is a minor mistake here in the dimensions listed below. I think that Σ should be square with rows equal to the minimum of M and N. If the minimum is M then I think that U is M x M and V is N x M.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Yeah, you're completely right. We transpose the matrix before applying Golub-Reinsch. I only read the description in the paper you mentioned, and there it is assumed that m >= n, which I had forgotten. Will fix immediately.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

You may want to verify that I haven't got the dimensions I gave wrong. I only skimmed the source without checking the output (I'll have some time to check them properly later today if you can't).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I checked it by hand, as well as in the code. I think you are correct (easy to confuse). In the end I chose the verbose way of documenting it, as I found that to be clearer. Let me know if you want it differently phrased or so.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the verbose way is much better - it is pretty complex so taking more space to explain is sensible.

All looks good now so I'll merge. Thanks again for your work on this!

@AtheMathmo
Copy link
Owner

I just have two minor comments - only one of which needs a response (the one about the doc dimensions).

If I'm mistaken on that comment I'm happy to merge now.

@AtheMathmo AtheMathmo merged commit e902890 into AtheMathmo:master Sep 24, 2016
theotherphil pushed a commit to theotherphil/rulinalg that referenced this pull request Dec 4, 2016
* correct eigenmethods for 2-by-2 (and 1-by-1) matrices

In issue AtheMathmo#46, it was observed that the `eigenvalue` and `eigendecomp`
methods could give incorrect results. Rusty-machine creator and BDFL
James Lucas noted that the problem was specific to 2-by-2 matrices and
suggested special-casing them.

* private methods for general and small matrix eigencomputations, test

Conflicts:
	rusty-machine/src/linalg/matrix/decomposition.rs
	rusty-machine/src/linalg/matrix/mod.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants