Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise iteration in PreconditionChebyshev #7703

Merged
merged 5 commits into from Feb 18, 2019

Conversation

kronbichler
Copy link
Member

The old implementation of PreconditionChebyshev used an iteration that was more complicated than necessary due to a temporary vector representing x^{n-1}-x^{n}. I have rewritten the algorithm directly in terms of the underlying three-term recurrence x^{n+1} = x^{n} + f1 (x^{n}-x^{n-1}) + f2 P^{-1} (b-A x^{n}) where f1 and f2 are some factors. This has the advantage of being slightly faster because we do not need to update the temporary vector and only load from x^{n-1} instead, i.e., we save one vector write, reducing the access in the vector updates from five reads and two writes to five reads (x^n, x^{n-1}, P^{-1}, t=Ax^n, b) and one write to x. Overall, it depends on the cost of the matrix-vector product how much this reduction by around 15% helps in final cost. I measured up to 5% of the solution with a multigrid algorithm, which is not bad.

Furthermore, the new implementation should be more comprehensible to someone familiar with Chebyshev polynomials. I adapted the variable names for the temporary vectors to make things clearer and have also stated the recurrence relation in the class description.

Since the algorithm should be mathematically equivalent to the one we had before, we should not need any test changes (checked on my machine).

@kronbichler
Copy link
Member Author

/rebuild

*/
void
do_transpose_chebyshev_loop(VectorType &dst, const VectorType &src) const;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For someone reviewing this part: We had previously put the code for vmult() and step() as well as Tvmult() and Tstep() into these common functions. Given some changes, I would now also have to pass the iteration index. Given that each function only calls two other functions and does an update to a scalar, I have put the code inline because it does not save us any lines in the end.

@jodlbauer
Copy link
Contributor

Probably somehow relevant: in a parallel setting, the initial vector is set to something like 1/norm with 0-th entry explicitly 0.
This destroys the parallel consistency, since entry 0 may not represent the same dof for different number of cores.

@tjhei
Copy link
Member

tjhei commented Feb 7, 2019

@tcclevenger FYI.

@kronbichler
Copy link
Member Author

in a parallel setting, the initial vector is set to something like 1/norm with 0-th entry explicitly 0. This destroys the parallel consistency, since entry 0 may not represent the same dof for different number of cores.

This is something that has bothered me for a while indeed. For deal.II's own vectors, we set something like (-5.5, -4.5, -3.5, .... 3.5, 4.5, 5.5), a vector with zero mean and some offset, that is consistent as long as the numbering of the unknowns is the same. For other vectors we cannot access all entries as easily, which is why we had the norm-type of approach. If the numbering does change different, there is probably little one can do at this abstract level because we have no notion of the numbering or associated DoFHandler.

My advise would be to to pull out the eigenvalue computation to user code. To use this, you should set AdditionalData::eig_cg_n_iterations = 0, AdditionalData::max_eigenvalue to your desired value, and AdditionalData::smoothing_range = max_eigenvalue / min_eigenvalue. I guess we should update the documentation to better describe this procedure, the current version is not really clear.

@kronbichler
Copy link
Member Author

One thing we tried to do in the past is to use the right hand side that is given the first time vmult() is called. However, those vectors are often not of sufficiently high frequency, resulting in rather bad eigenvalue estimates. Other libraries use random vectors, but again those are not repeatable at all (in particular not across processor boundaries), so we did not try them.

@tamiko
Copy link
Member

tamiko commented Feb 8, 2019

@kronbichler Please rebase and resolve the merge conflict :-)

@kronbichler
Copy link
Member Author

I rebased and also adjusted the Chebyshev degree in a new test that came in in #7696 that was not adjusted in #7708.

@tjhei
Copy link
Member

tjhei commented Feb 9, 2019

@kronbichler , what is this diagonal_matrix.h.2 file thing about?

@kronbichler
Copy link
Member Author

what is this diagonal_matrix.h.2 file thing about?

Sorry, left over from some experiments. I removed it.

solution.swap(temp_vector1);
solution_old.swap(temp_vector1);
}
else if (iteration_index > 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iteration_index is always positive, which makes this logic somewhat weird (as this check is unnecessary). Is there a reason you don't start with iteration=0? (here and above of course)

Copy link
Member

@tjhei tjhei Feb 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait: that is not true, you also call it with 0. Does it make sense to make an empty

if (iteration_index==0)
{
// nothing to do here, because ....
}

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would make sense to add this part to make it similar to the other if statements cases.

@@ -342,7 +342,7 @@ do_test(const DoFHandler<dim> &dof)
++level)
{
smoother_data[level].smoothing_range = 15.;
smoother_data[level].degree = 5;
smoother_data[level].degree = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test change does not change the output?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary after #7708 with the change in what the Chebyshev degree means. That PR was open while the test, added in #7696, was also open. So this simply fixes an only remotely related issue, see also here:
https://cdash.kyomu.43-1.org/viewTest.php?onlydelta&buildid=7762

@kronbichler kronbichler force-pushed the chebyshev_cleanup branch 2 times, most recently from 90901ac to 8bb4172 Compare February 9, 2019 18:17
@kronbichler
Copy link
Member Author

/rebuild

@kronbichler
Copy link
Member Author

I rebased to re-start the testers that were failing spuriously on a base/timer and base/thread_validity test. Should be ready now.

@@ -976,7 +1018,7 @@ class PreconditionChebyshev : public Subscriptor
/**
* Constructor.
*/
AdditionalData(const unsigned int degree = 0,
AdditionalData(const unsigned int degree = 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a follow-up change to #7708, isn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had forgotten that change there and one of the exceptions I introduced here revealed this issue in a test.

@masterleinad masterleinad merged commit 53adee8 into dealii:master Feb 18, 2019
@kronbichler kronbichler deleted the chebyshev_cleanup branch June 3, 2019 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants