Sort singular values #46

Andlon · 2016-09-18T17:54:51Z

Note: This is not a complete implementation! Intended to eventually solve #17.

My idea was to first extend the existing SVD tests to verify that the singular values were sorted. Once these were failing, I'd go ahead and implement the sort, which should restore them back to a passing state. As it happened, for the existing test data, the SVD algorithm in use already returned sorted singular values, so I constructed some new matrices for which I'd expect the algorithm not to return sorted singular values. However, it seems that for these test matrices, the SVD algorithm does not complete (i.e. it is stuck in an endless loop).

Hence, as long as the SVD algorithm does not finish, I'm also unable to complete the feature in question. I'm not at all familiar with the algorithm in use, so I can't really debug it without spending a lot of time dividing into the algorithm and how it works. Unfortunately, at the moment, that is time I don't have, so I wanted to put the PR up here, in case anybody has any idea how to fix the SVD.

See also commit messages.

Interestingly, the singular values just happen to be ordered already for the tall and short and matrices, while they are not ordered correctly for the square example. Before we implement the sorting of the matrices, we at least want these tests to fail, so we must construct new matrix examples for which the singular values hopefully come out unsorted.

Before implementing the sort functionality, it is useful to have some tests that fail because the singular values are not sorted. However, it appears that the computation of the SVD for these examples does not even (currently) finish!

AtheMathmo · 2016-09-18T18:03:24Z

Thanks for this! I was worried this might be the case somewhere... The algorithm should theoretically terminate but I'll have to dig around to figure out why it isn't - fortunately you've given me examples to test!

I suspect it is either numerical inaccuracy causing the break condition to not be met or a minor bug somewhere. I'll try to figure out which and get a fix in - it might be as simple as a safety check to stop us looping infinitely and return an Error instead.

AtheMathmo · 2016-09-18T22:40:42Z

Hey @Andlon.

Good news! I'm pretty confident I found the cause of the infinite looping and it is an easy bug to fix. I'm happy for you to roll it into this PR. The offending line is number 350:

// This is incorrect
diag_abs_sum = T::min_positive_value() *
                                   (b_ii.abs() + *b.get_unchecked([i + 1, i + 1]));

// It should be:
diag_abs_sum = T::min_positive_value() *
                                   (b_ii.abs() + b.get_unchecked([i + 1, i + 1]).abs());

This is detailed in algorithm 1b in the source for this implementation.

Let me know how you want to handle this.

AtheMathmo · 2016-09-18T22:42:10Z

As a side note - we can also speed up convergence if we use a more sensible value for epsilon. Right now I'm using T::min_positive_value(). I put a PR in for the num crate which would give us a nicer alternative.

Andlon · 2016-09-19T06:25:20Z

That was quick. Great job! Thanks for looking into it.

I guess the easiest thing to do is for me to make the change and just accredit you in the commit message? Or if you want this fix in as soon as possible, you can include it separately and I can rebase. I might have some time tonight to continue work on this. If not, I should have some time during the week.

Also, your PR to num sounds useful indeed. Let's hope they're able to get it in within reasonable time (I see that they're discussing the fact that it's a breaking change).

Andlon · 2016-09-19T06:44:55Z

Also, a bit tangential, but I didn't want to open a new issue just yet. Currently, when writing tests, it's rather painful to compare approximate equality for matrices and vectors. I used the same style as I've seen in the existing code, which is to use iterators to check that the absolute difference for each element is less than some fairly arbitrary threshold.

I think we should have a standardized way of doing this, probably based on ULPS such that we can do something to the effect of

assert_approx_eq!(expected_matrix, actual_matrix);

The approx crate seems to be a good candidate. We should also be able to expose this behavior to the users of rulinalg, so that they can also easily use this functionality in their own tests. If we choose to use such a crate, we should probably re-export it within rulinalg, such that it can more easily be replaced if necessary.

What's your thoughts on this? In any case, I don't need it to complete the current issue, but it would certainly make testing more straightforward.

AtheMathmo · 2016-09-19T07:05:22Z

I'm happy to leave the bug fix to you for now - if it turns out that you don't have as much time as you thought I'll get it in separately.

I agree that approximate equality would be a useful thing! I'll try to remember to write up an issue for that tomorrow.

@AtheMathmo

Thanks to @AtheMathmo for figuring out the issue. See AtheMathmo#46 for a discussion of the problem.

Andlon · 2016-09-19T21:05:27Z

So I had a tiny bit of time to look more into it. For the short and tall tests, we expect singular values
[8.0, 6.0, 4.0, 2.0]. However, it turns out that the algorithm spits out [8.0, 6.0, 2.0, -4.0] (note the negative sign). However, it passes the validate_svd assertions. I think this indicates that the sign of one of the corresponding singular vectors (i.e. left or right singular vector) is, or should be, flipped somehow?

You can see this for yourself if you incorporate the one-line fix you sent me and disable the tests against expected_values (i.e. comment out the three lines after validate_svd). The tests will pass, but if you inspect the value of b, you can see the negative sign. (validate_svd does not check for non-negativity)

AtheMathmo · 2016-09-19T21:38:22Z

Ah - interesting...

I'll have to dig into this a little. It looks like it may be worth adding a new issue to track this bug if it's non-trivial.

AtheMathmo · 2016-09-19T22:40:56Z

So after a little more reading it seems that the non-negative singular values is a conventional requirement. The validate svd passes because this is a sort-of valid SVD. I'm not sure if the sign flip is a bug in my implementation or whether we should simply enforce this at the same time as ordering the values. i.e. have a new function to enforce convention, process_svd or something? This would order the singular values (and rows/columns) and would also make sure all singular values are positive and multiply one of the orthogonal rows/columns by -1 too.

I'll try to figure out if the algorithm should capture this - but if not this doesn't add too hard a performance hit and produces a conventionally valid svd.

Andlon · 2016-09-20T07:07:08Z

I skimmed the article you had posted earlier, and to me it wasn't entirely clear. However, they defined the SVD using non-negative diagonal entries, so I would expect them to explicitly notify the reader that the singular values produced by the algorithm could be negative, if this were the case...

I agree that we can fix this by post-processing, however if the algorithm is actually intended to produce non-negative singular values, the behavior could also hide other, more subtle bugs.

Is it possible to break the algorithm up more into individually testable pieces? I suspect this may be difficult. I could however write some tests for the Golub-Kahan step, and verify the properties of the output as described in the aforementioned paper.

AtheMathmo · 2016-09-20T13:39:43Z

I skimmed the article you had posted earlier, and to me it wasn't entirely clear. However, they defined the SVD using non-negative diagonal entries

It does seem surprising to me - but they also define the singular values being ordered and I'm fairly confident this isn't enforced by the algorithm. Nonetheless I think it warrants a little extra reading.

Is it possible to break the algorithm up more into individually testable pieces?

Yes we can break it up a little more. Many of the algorithms (including the one I'm using) first reduce the algorithm to a bidiagonal decomposition (which is already separate). Then we block the matrix to get the part we are still working on. We zero off the off-diagonal of zero-valued singular values. And finally apply some inner step (like Golub-Kahan) to reduce the rest of the matrix.

I think that we can break off each of these to test them individually. But to keep code tidy this might be better done in a new module (at least until we make a decision about custom decomposition types).

Andlon · 2016-09-20T15:01:10Z

Ah, you make a good point. In that case - at least until we determine otherwise - I think we should assume that the sign from Golub-Reinsch is not correct, and I'll proceed with an implementation that post-processes the result of the Golub-Reinsch algorithm. I think I'll structure it like the following:

pub fn svd(...) {
    // Call self.svn_unordered() and sort the results
}

// Whether public or not depends on result of benchmark, i.e. if it's entirely practically negligible or not
pub fn svd_unordered() {
    // Call self.svd_golub_reinsch() and post-process the results to ensure non-negativity
}

fn svd_golub_reinsch() {
   // Equivalent to what today is .svd()
}

In the long run, we need much more extensive tests. I would be interested to look into using some form of property-based testing, like QuickCheck.

Thoughts?

AtheMathmo · 2016-09-20T15:31:20Z

So I have managed to find one more bug! One of the steps I described:

We zero off the off-diagonal of zero-valued singular values.

Is currently very wrong! Instead of applying a transformation to do this I am essentially setting the value to zero. I would be surprised if this makes a significant enough difference to cause a sign change - but nonetheless I'll try to sort it. This probably motivates a new issue for the two SVD bugs?

Feel free to continue as you are planning to for now. I (optimistically) suspect that the new bug is only affecting convergence rates and accuracy.

Edit: I couldn't crack the bug on my own (yet). So I posted a stack exchange question - feel free to check it out.

Edit2: Sorry I missed this part. I like the idea of using QuickCheck too - I think especially to help us cover error handling. For things like SVD and other decompositions it seems incredibly valuable.

Andlon · 2016-09-20T16:16:15Z

Sounds good. I'm not well-versed enough in the details of these algorithms to be able to offer much assistance at the moment, and in many ways diving into them is too similar to my thesis work, in the sense that I'd like the things I do for rulinalg to be mostly recreational. Hence I'd rather like to refrain from that for the time being. That said, I'll otherwise offer any assistance that I can, and I'm generally happy to work on less technical things!

AtheMathmo · 2016-09-20T16:26:20Z

That's totally fair! I don't like leaving contributors to clean up messes I made in my code so I'll do my best to get it fixed up :).

Andlon · 2016-09-20T16:29:59Z

I think leaving bugs in the implementation of a very technical and difficult algorithm does not count as "making a mess", but maybe that's just me :P

Anyway, I'll see if maybe I have a little time tonight, although I was thinking I'd finish the macro PR first, since it's so simple.

AtheMathmo · 2016-09-20T17:02:24Z

I think leaving bugs in the implementation of a very technical and difficult algorithm does not count as "making a mess", but maybe that's just me :P

I think you're probably right! Fortunately I figured out the answer to the stackexchange question - I should be able to test out the fix later on today.

Anyway, I'll see if maybe I have a little time tonight, although I was thinking I'd finish the macro PR first, since it's so simple.

Great! The macro will be really good to have in there finally.

Andlon · 2016-09-20T22:09:00Z

All right, I think I'm back on the track with this issue now. I've implemented the signs correction, so now I can turn my mind back to the sorting of the singular values again.

I just wanted to make a mental note that we really need to document the dimensions of the returned SVD! I was very surprised when I discovered that the U and V matrices were not both square. Of course, there are many equally adequate ways of representing the SVDs - we just need to document it!

AtheMathmo · 2016-09-21T00:28:40Z

I just wanted to make a mental note that we really need to document the dimensions of the returned SVD!

Right you are! I had a bit of a headache figuring this out myself so it's certainly bad that I didn't document it well at the time...

AtheMathmo · 2016-09-21T05:56:26Z

I managed to do a little digging into the negative values and auto-sorting. I don't have any super compelling evidence - but you can read about what I found in #48 .

AtheMathmo · 2016-09-22T13:53:42Z

I need to look a little more but I think that the code looks solid to me.

I think at this point it is worth us splitting out the svd code into a new module as well. This means modifying the decompositions module to have a folder structure too. What do you think of this?

Andlon · 2016-09-22T14:07:56Z

I still need to set up the benchmarks and fix up the documentation a little. Hoping to have some time to work on that today.

Yeah, definitely. The decompositions module is getting a bit unwieldy. Are you thinking something like this?

decompositions/svd.rs
              /bidiagonal.rs
              /qr.rs
              /lup.rs
              /and_so_on.rs

I think, however, that we should complete this PR and treat the reorganization as a separate issue. This seems like a prerequisite to further work on #40, perhaps...?

AtheMathmo · 2016-09-22T14:16:38Z

All sounds good to me. Take your time with the docs etc.

Andlon · 2016-09-22T15:34:39Z

All right, so here's an update. I quickly hacked together a few benchmarks, and at the moment the SVD implementation converges so slowly that it doesn't make sense to talk about sorting vs not sorting. For example, within 5-10 minutes it hadn't finished the SVD of a 100x100 matrix, which took Numpy about 0.05 seconds.

I think I'll leave out the benchmarks for now, let sorted SVDs be the default, fix up the docs, then let you review for a merge. Then we'll leave the slow convergence to #48 instead. I think after we've got a reasonably fast SVD, we can revisit the topic of whether or not the ordering should be opt-out. What do you think?

AtheMathmo · 2016-09-22T15:46:55Z

Ha! 5-10 minutes is pretty terrible, looks like we do indeed need some more work.

I'm hoping this is a result of our incredibly harsh stopping criterion (< T::min_positive_value() ...). You could try modifying these to be something like (< T::from(1e-8) ...). If that makes no difference then we're back to the drawing board...

Your approach sounds good though. Let's get this in even if the performance remains terrible and we'll address other issues in #48.

Thanks

Andlon · 2016-09-22T16:06:38Z

Ah, of course. I forgot about that! I made a quick hack with your suggested changes to change the stopping criterion. Preliminary bench results look like this:

test linalg::svd::svd_100_100                      ... bench:  39,352,118 ns/iter (+/- 4,253,824)
test linalg::svd::svd_10_10                        ... bench:      80,026 ns/iter (+/- 1,361)
test linalg::svd::svd_golub_reinsch_100_100        ... bench:  38,152,409 ns/iter (+/- 1,494,442)
test linalg::svd::svd_golub_reinsch_10_10          ... bench:      79,257 ns/iter (+/- 2,199)
test linalg::svd::svd_unordered_100_100            ... bench:  37,881,354 ns/iter (+/- 103,384)
test linalg::svd::svd_unordered_10_10              ... bench:      79,246 ns/iter (+/- 1,337)

I'll work on it a bit more and see what happens. Not sure what to make of it just yet.

Edit: svd_golub_reinsch_* is without the overhead of flipping the signs. svd_unordered is flipping signs, but not sorting the singular values. svd also sorts the singular values.

AtheMathmo · 2016-09-22T16:23:32Z

Well those numbers look a lot more reasonable. In fact - they seem an order of magnitude faster than numpy? I'm pretty sure numpy only has svd support via lapack so that would be surprising (unless python adds that much overhead?).

Either way this is good news (providing the accuracy isn't wrecked by this change).

Andlon · 2016-09-22T16:26:07Z

I didn't use the same matrix for numpy and the rulinalg benches, just the same size to get a ballpark figure, which would explain a lot. Other than that, I only ran the numpy test once and manually timing it with time.clock() in Python. Will report back once I've had time to go more in depth, but I agree it looks a lot more promising indeed :)

Andlon · 2016-09-22T18:15:27Z

I hadn't even thought thoroughly about it, but of course T::min_positive_value is something like e-308 for f64, which of course doesn't work. T::from(3.0 * std::f64::EPSILON).unwrap() seems to work well, but that obviously only works for f64. Without the Float.epsilon() feature you submitted to num, what do you suggest I add here as a stopgap solution?

Andlon · 2016-09-22T18:24:22Z

I suppose I could create a temporary trait that exposes epsilon() for f32 and f64 and document it as such until epsilon lands in num?

AtheMathmo · 2016-09-22T18:31:46Z

I'm not sure how best to handle it for different floats. I guess a simple approach is to use f32::EPSILON and try to pick a multiple that works well for both f64 and f32? We can track in a separate issue to get the num change later. It seems that change is desired but will break the API and so must either be modified or delayed.

AtheMathmo · 2016-09-22T19:47:33Z

We managed to get the epsilon PR into num - hopefully that means we'll be able to utilize it soon!

The idea is to remove this once epsilon() lands in the next version of the num crate.

Since the benchmarks test module-internal methods, the benchmarks can not reside in the benches/ subdirectory. However, since the benchmarks must be run on nightly, it involves some complication. Hence, in order to run these benchmarks, you must add the 'nightly' feature to the compilation, like so: `cargo bench --features nightly`

Andlon · 2016-09-22T20:26:29Z

I'm currently using 3 times machine epsilon, and I got these numbers:

test matrix::decomposition::tests::benches::svd_100_100           ... bench:  40,232,457 ns/iter (+/- 133,909)
test matrix::decomposition::tests::benches::svd_10_10             ... bench:      92,791 ns/iter (+/- 521)
test matrix::decomposition::tests::benches::svd_unordered_100_100 ... bench:  40,058,111 ns/iter (+/- 129,457)
test matrix::decomposition::tests::benches::svd_unordered_10_10   ... bench:      92,757 ns/iter (+/- 2,657)

It seems to me that the ordering of the singular values has little impact on performance, although to be fair this is only for a single matrix. In my opinion, we should hide svd_unordered() from the public API (which I've done so far in the implementation), at least for now. It may be worthwhile to revisit some time in the future, but for now I think we don't have sufficient tests that verify the accuracy of the computations to warrant spending a lot of time on performance optimizations.

In order to incorporate these benches, I needed access to non-public methods of the API, so I had to embed the benchmarks in the decomposition.rs file. However, as you know, the benchmarks need to be run on nightly, and so I had to add some conditional compilation steps in order to get around this. As a result, in order to run the SVD benches, you need to append --features nightly to the cargo bench command. If you have some suggestions for better ways to handle this, I'd be very happy to improve. However, after spending quite some time researching it seems you'll have to use feature flags of some kind, at the very least, so I'm not sure how much better it can get.

I consider the feature complete now, pending your review.

Andlon · 2016-09-22T21:12:24Z

Oh, and, yeah, it's great that the num PR was merged :)

AtheMathmo

I think that the implementation looks good - and the new tests are a lot more comprehensive, thank you!

I would like to move these benches out (and remove the ones on private functions). Adding them like this will require CI changes and is one more (albeit minor) thing to be aware of when developing. If you disagree I am happy to hear you out!

AtheMathmo · 2016-09-23T03:35:26Z

src/matrix/decomposition.rs

+        let ref mut shortest_matrix = if u.rows() <= v.rows() { &mut u }
+                                      else { &mut v };
+        let column_length = shortest_matrix.rows();
+        let num_singular_values = cmp::min(b.rows(), b.cols());


I believe b should be square by this point. But we can keep this in for future-proofing.

You are right. I wrote this function when I thought b was rectangular. However, the implementation as is doesn't make any assumptions on the form of the SVD decomposition, and I think (although this isn't tested) it would work also for other forms of the SVD, so we might as well keep it as is? (i.e. if we swap the Golub-Reinsch algorithm with one that outputs a different kind of SVD)

I agree, it makes sense to keep it in whether we need it or not.

AtheMathmo · 2016-09-23T03:44:46Z

src/matrix/decomposition.rs

+
+    // When correcting the signs of the singular vectors, we can choose
+    // to correct EITHER u or v. We make the choice depending on which matrix has the
+    // least number of rows. Later we will need to multiply all elements in columns by


Maybe I'm just tired - but don't we want to multiply the rows of v by -1 instead of the columns? b * v is basically scaling the rows of v by the respective singular values?

If this is true then it becomes a little trickier to determine the faster approach. It will probably almost always be faster to flip the sign on the row due to cache access and vectorization.

Actually I just realised that we transpose v at the end so this is correct. I'm leaving this comment to highlight the fact that we may want to modify the algorithm so that we are operating on rows (not in this PR of course).

It's also possible to collect the indices of all columns that need to be flipped, and then iterate over each row, flipping the sign of each element as we go. It's also possible to do the same for the sorting, and would likely be faster. However, the implementation would be slightly more complex, and since the overhead seems to be more or less negligible at this point, I opted for the simpler approach of just dealing with columns directly.

As you say, we may want to revisit this in the future!

Sounds good to me!

Although I just realized that a 100 x 100 matrix (which I used in my benchmarks) would still fit in L2 cache. Guess we should do some tests for matrices so big that they also don't fit in L3? Perhaps the results would be significantly different then...

Let's try that too. In particular the case nxm where n >> m, E.g. 10000x10. This a bit more representative of cases we may see in rusty-machine.

Will do that next chance I get. Thinking a little about it though, I suspect that we have the following performance characteristics. First, consider that if m >= n, SVD is usually something like O(m n^2) flops or so, and I'm not sure about memory accesses of Golub-Reinsch. The ordering of the singular vectors is roughly O(n^2) memory accesses in the worst case. Based on this and our current benchmarks, I would expect the following:

For small matrices, the matrix fits in cache and the column swaps are therefore still fast enough so that the cost is negligible.

For larger matrices for which the matrix does not fit in cache, the column swaps are presumably more expensive (though I expect due to pre-fetching it's perhaps not too bad, since the stride is uniform?), but the cost of the ordering is completely dwarfed by the complexity of the SVD computation.

Looking forward to try the benchmarks out!

By the way, you may find the following interesting. I spent Fall 2014 and Spring 2015 at UC Berkeley. Among other things, I took some courses under Prof. James Demmel. Prof. Demmel is a leading figure in the research of what they call "communication avoiding algorithms". In essence, the goals of these algorithms is to minimize not just the number of flops required for an algorithm, but it also uses a simple mathematical model for cache behavior, so that you can develop algorithms that provably minimize (again in a big O sense) the amount of communication, i.e. the number of expensive memory accesses. I mention this, because I remember that at some point prof. Demmel referred me to some certain papers on these algorithms. Their publications can be found here. I think in particular this survey paper may be of interest. It's fairly up to date (published in 2014) and discusses the communication optimality of various algorithms in linear algebra.

However, I want again to stress that we should ensure correctness of the algorithms in rulinalg (by extensive testing against expected error bounds etc.) before this should become a topic. Still, thought I'd mention it, because I think there's a lot of interesting stuff in there :)

I must have missed this comment before. Thank you for pointing out those papers! They look like really interesting reads - though at a first glance pretty challenging. The survey paper looks like a really good start, it looks like I could learn a lot from it.

AtheMathmo · 2016-09-23T03:47:01Z

src/matrix/decomposition.rs

+    // Sorting a vector of indices simultaneously with the singular values
+    // gives us a mapping between old and new (final) column indices.
+    indexed_sorted_values.sort_by(|&(_, ref x), &(_, ref y)|
+        x.partial_cmp(y).expect("All singular values should be finite, and thus sortable.")


Would this panic be user error or an algorithmic failure? (Or both I guess...)

I'm wondering whether it is better to propagate this by returning a Result. It feels like a "something has gone terribly wrong" though so the panic is probably appropriate.

My reasoning for a panic was this:

We are using the successful result of svd_unordered (which in turns uses the result of svd_golub_reinsch), which means that if the singular values are not all finite, it must be a bug in one of these functions, in which case I think a panic is appropriate. Does this make sense to you?

Yes that does make sense, just wanted to check!

AtheMathmo · 2016-09-23T03:57:07Z

src/matrix/decomposition.rs

@@ -1297,4 +1485,55 @@ mod tests {

        let _ = a.lup_decomp();
    }
+
+    #[cfg(all(test, feature = "nightly"))]
+    mod benches {


I would prefer to move these benches into the main benches folder and just avoid benching the svd_unordered. If we ever want to add this function to the public api we can add the benches then.

Of course it is nice to have it benchmarked but in this case I'd prefer to leave it out due to other feature changes. These mean I have to modify the CI for example.

I agree, it isn't worth the added complexity. I added it partly because I thought it would be useful to have a bench harness also for private functions. But I suppose this use case is far and between at the moment. In any case, we should probably be more concerned with correctness than maximum performance going forward anyway. Perhaps by the time we need more fine-granied benchmarks again, there will be a benchmarking harness in stable.

I'll remove all the module internal benchmarks as you say!

Andlon · 2016-09-23T19:24:16Z

OK, so I tried running 10000x10, and it wouldn't finish in reasonable time. I figured maybe the benchmarking harness was running it a lot of times, so I made an example application that runs SVD a single time and measures the time. I'm seeing some interesting things. I have the following examples (this is just one run without any averaging, so they're not very accurate):

$ cargo run --release <rows> <cols>
...

$ cargo run --release 100 100
Computed ordered SVD in:   0.08540 seconds.
Computed unordered SVD in: 0.06431 seconds.

$ cargo run --release 1000 10
Computed ordered SVD in:   1.76212 seconds.
Computed unordered SVD in: 1.67813 seconds.

$ cargo run --release 10 1000
Computed ordered SVD in:   1.75135 seconds.
Computed unordered SVD in: 1.67570 seconds.

$ cargo run --release 1000 100
Computed ordered SVD in:   16.38357 seconds.
Computed unordered SVD in: 16.50339 seconds.

$ cargo run --release 100 1000
Computed ordered SVD in:   16.70569 seconds.
Computed unordered SVD in: 16.55411 seconds.

cargo run --release 10 100
Computed ordered SVD in:   0.00540 seconds.
Computed unordered SVD in: 0.00548 seconds.

$ cargo run --release 1 1000
Computed ordered SVD in:   0.24941 seconds.
Computed unordered SVD in: 0.17086 seconds.

$ cargo run --release 1 2000
Computed ordered SVD in:   1.36840 seconds.
Computed unordered SVD in: 1.31257 seconds.

$ cargo run --release 1 4000
Computed ordered SVD in:   10.30131 seconds.
Computed unordered SVD in: 10.25298 seconds.

$ cargo run --release 1 8000
Computed ordered SVD in:   81.76079 seconds.
Computed unordered SVD in: 81.26149 seconds.

Seems to be the performance scaling is rather less than optimal. I guess this might be due to the incorrect handling of the Givens rotations you were talking about?

Instead just bench the public .svd() method along with the other benchmarks.

Andlon · 2016-09-23T19:38:06Z

@AtheMathmo: OK, I removed and moved out the benches as you requested. Think that should settle it, unless you see anything else that needs to be done!

AtheMathmo · 2016-09-24T08:23:29Z

Thanks for getting those timings. I agree that the scaling is a little worrying - but not totally unexpected. Did you try running 10000x10 out of the benching harness?

I'll look this over again quickly later today and get it merged. Thank you for all your hard work with it - I'm sorry also that the issue turned into much more than it began as!

Andlon · 2016-09-24T08:30:46Z

Thanks for getting those timings. I agree that the scaling is a little worrying - but not totally unexpected. Did you try running 10000x10 out of the benching harness?

Yeah, but it didn't finish in reasonable time (I didn't have the patience to let it run longer than 10 mins or so, sorry!).

I'll look this over again quickly later today and get it merged. Thank you for all your hard work with it - I'm sorry also that the issue turned into much more than it began as!

No need to be sorry. That is the nature of development :)

Andlon · 2016-09-24T08:33:45Z

By the way, note that for the tests where the dimensions are 1xN, the time scales almost perfectly with N³. At most we would expect N² here.

AtheMathmo · 2016-09-24T10:26:06Z

src/matrix/decomposition.rs

+        let num_singular_values = cmp::min(b.rows(), b.cols());
+
+        for i in 0 .. num_singular_values {
+            if b[[i, i]] < T::zero() {


It's very minor but we can remove some bound checks here by using the get_unchecked functions. This will have a very negligible effect though so I'm happy to merge without.

I considered this, but I came to the conclusion that the effect here would presumably be completely negligible, in which case I think going unsafe is not worth it. I think it makes more sense to only use unsafe if you know that you have a potential bottleneck, preferably after profiling. Otherwise this would just fall into the category of premature optimization...

That sounds sensible - let's keep it as the safe bound checked version.

AtheMathmo · 2016-09-24T10:32:27Z

src/matrix/decomposition.rs

+    /// non-increasing order. U and V have orthonormal columns, and each column represents the
+    /// left and right singular vectors for the corresponding singular value in Σ, respectively.
+    ///
+    /// Denoting the dimensions of self as M x N (rows x cols), the dimensions of the returned matrices


I think that there is a minor mistake here in the dimensions listed below. I think that Σ should be square with rows equal to the minimum of M and N. If the minimum is M then I think that U is M x M and V is N x M.

Oh! Yeah, you're completely right. We transpose the matrix before applying Golub-Reinsch. I only read the description in the paper you mentioned, and there it is assumed that m >= n, which I had forgotten. Will fix immediately.

Great, thanks!

You may want to verify that I haven't got the dimensions I gave wrong. I only skimmed the source without checking the output (I'll have some time to check them properly later today if you can't).

Yeah, I checked it by hand, as well as in the code. I think you are correct (easy to confuse). In the end I chose the verbose way of documenting it, as I found that to be clearer. Let me know if you want it differently phrased or so.

I think the verbose way is much better - it is pretty complex so taking more space to explain is sensible.

All looks good now so I'll merge. Thanks again for your work on this!

AtheMathmo · 2016-09-24T10:36:03Z

I just have two minor comments - only one of which needs a response (the one about the doc dimensions).

If I'm mistaken on that comment I'm happy to merge now.

* correct eigenmethods for 2-by-2 (and 1-by-1) matrices In issue AtheMathmo#46, it was observed that the `eigenvalue` and `eigendecomp` methods could give incorrect results. Rusty-machine creator and BDFL James Lucas noted that the problem was specific to 2-by-2 matrices and suggested special-casing them. * private methods for general and small matrix eigencomputations, test Conflicts: rusty-machine/src/linalg/matrix/decomposition.rs rusty-machine/src/linalg/matrix/mod.rs

Andlon added 4 commits September 18, 2016 18:14

Use SVD definition to verify value/vector triplets

61b05ee

Clean up SVD tests

3692ad2

Add tests for sorted singular values

54a7744

Before implementing the sort functionality, it is useful to have some tests that fail because the singular values are not sorted. However, it appears that the computation of the SVD for these examples does not even (currently) finish!

Fix SVD non-termination issue

b94a609

Thanks to @AtheMathmo for figuring out the issue. See AtheMathmo#46 for a discussion of the problem.

AtheMathmo mentioned this pull request Sep 20, 2016

SVD bugs #48

Open

6 tasks

Correct signs of singular values in SVD

38a7997

Andlon added 2 commits September 21, 2016 17:17

Rewrite correct_svd_signs to idiomatic style

dd44d14

Enforce ordering of singular values in SVD

a86d113

Clarify SVD documentation

01013a6

Andlon added 2 commits September 22, 2016 22:11

Implement temporary Epsilon trait

677930e

The idea is to remove this once epsilon() lands in the next version of the num crate.

AtheMathmo requested changes Sep 23, 2016

View reviewed changes

Remove 'nightly' benchmarking of internal modules

1e0f7a7

Instead just bench the public .svd() method along with the other benchmarks.

AtheMathmo reviewed Sep 24, 2016

View reviewed changes

AtheMathmo approved these changes Sep 24, 2016

View reviewed changes

Correct SVD dimensions in documentation

6a49d87

AtheMathmo merged commit e902890 into AtheMathmo:master Sep 24, 2016

Sort singular values #46

Sort singular values #46

Conversation

Andlon commented Sep 18, 2016 • edited

AtheMathmo commented Sep 18, 2016

AtheMathmo commented Sep 18, 2016 • edited

AtheMathmo commented Sep 18, 2016

Andlon commented Sep 19, 2016

Andlon commented Sep 19, 2016

AtheMathmo commented Sep 19, 2016

Andlon commented Sep 19, 2016 • edited

AtheMathmo commented Sep 19, 2016

AtheMathmo commented Sep 19, 2016

Andlon commented Sep 20, 2016

AtheMathmo commented Sep 20, 2016

Andlon commented Sep 20, 2016 • edited

AtheMathmo commented Sep 20, 2016 • edited

Andlon commented Sep 20, 2016 • edited

AtheMathmo commented Sep 20, 2016

Andlon commented Sep 20, 2016

AtheMathmo commented Sep 20, 2016

Andlon commented Sep 20, 2016

AtheMathmo commented Sep 21, 2016

AtheMathmo commented Sep 21, 2016

AtheMathmo commented Sep 22, 2016

Andlon commented Sep 22, 2016

AtheMathmo commented Sep 22, 2016

Andlon commented Sep 22, 2016

AtheMathmo commented Sep 22, 2016

Andlon commented Sep 22, 2016 • edited

AtheMathmo commented Sep 22, 2016

Andlon commented Sep 22, 2016

Andlon commented Sep 22, 2016

Andlon commented Sep 22, 2016

AtheMathmo commented Sep 22, 2016

AtheMathmo commented Sep 22, 2016

Andlon commented Sep 22, 2016

Andlon commented Sep 22, 2016

AtheMathmo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AtheMathmo Sep 23, 2016 • edited

Choose a reason for hiding this comment

AtheMathmo Sep 23, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Andlon Sep 23, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AtheMathmo Sep 23, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Andlon commented Sep 23, 2016

Andlon commented Sep 23, 2016

AtheMathmo commented Sep 24, 2016

Andlon commented Sep 24, 2016

Andlon commented Sep 24, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AtheMathmo commented Sep 24, 2016

Andlon commented Sep 18, 2016 •

edited

AtheMathmo commented Sep 18, 2016 •

edited

Andlon commented Sep 19, 2016 •

edited

Andlon commented Sep 20, 2016 •

edited

AtheMathmo commented Sep 20, 2016 •

edited

Andlon commented Sep 20, 2016 •

edited

Andlon commented Sep 22, 2016 •

edited

AtheMathmo Sep 23, 2016 •

edited

AtheMathmo Sep 23, 2016 •

edited

Andlon Sep 23, 2016 •

edited

AtheMathmo Sep 23, 2016 •

edited

Andlon commented Sep 24, 2016 •

edited