New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage in CG #130

Merged
merged 3 commits into from Jun 26, 2017

Conversation

Projects
None yet
7 participants
@haampie
Collaborator

haampie commented Jun 15, 2017

  • Pre-allocates the vectors, improvement shown below.
  • Removes the dependency of CG on KrylovSubspace (makes little sense for a 3-term recurrence method IMHO). Also: the KrylovSubspace methods are the main reason for inefficient memory usage. For the moment it was easier to fix just CG.
  • Fixes a bug where the CG method starts while the initial guess is the actual solution (I still have to add a test for this)
  • Adds a simple benchmark for the CG method
  • Relies always on A_mul_B! for matrix-vector products. Since we're switching to LinearMaps this shouldn't be a problem.
  • Also: removes the check whether all elements of b are zero. I would say CG should not be responsible for this.

Benchmark results:

LinearSystemsBench.cg()

This PR

BenchmarkTools.Trial:
  memory estimate:  45.80 MiB
  allocs estimate:  437
  --------------
  minimum time:     2.561 s (0.16% GC)
  median time:      2.580 s (0.27% GC)
  mean time:        2.580 s (0.27% GC)
  maximum time:     2.598 s (0.38% GC)
  --------------
  samples:          2
  evals/sample:     1

Current implementation

  BenchmarkTools.Trial:
  memory estimate:  1.53 GiB
  allocs estimate:  1041
  --------------
  minimum time:     3.173 s (9.28% GC)
  median time:      3.246 s (11.98% GC)
  mean time:        3.246 s (11.98% GC)
  maximum time:     3.320 s (14.57% GC)
  --------------
  samples:          2
  evals/sample:     1
@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Jun 15, 2017

Codecov Report

Merging #130 into master will decrease coverage by 0.24%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #130      +/-   ##
==========================================
- Coverage   86.54%   86.29%   -0.25%     
==========================================
  Files          18       18              
  Lines        1390     1394       +4     
==========================================
  Hits         1203     1203              
- Misses        187      191       +4
Impacted Files Coverage Δ
src/cg.jl 91.3% <100%> (-6.32%) ⬇️
src/common.jl 78.94% <0%> (-5.27%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac7bbee...ca67d46. Read the comment docs.

codecov-io commented Jun 15, 2017

Codecov Report

Merging #130 into master will decrease coverage by 0.24%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #130      +/-   ##
==========================================
- Coverage   86.54%   86.29%   -0.25%     
==========================================
  Files          18       18              
  Lines        1390     1394       +4     
==========================================
  Hits         1203     1203              
- Misses        187      191       +4
Impacted Files Coverage Δ
src/cg.jl 91.3% <100%> (-6.32%) ⬇️
src/common.jl 78.94% <0%> (-5.27%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac7bbee...ca67d46. Read the comment docs.

@haampie

This comment has been minimized.

Show comment
Hide comment
@haampie

haampie Jun 15, 2017

Collaborator

Hmm, I missed the follow up of @shivin9 in #128, but still I think it is better to not use the KrylovSubspace functions and shoot just for A_mul_B!. CG does Krylov subspace things implicitly, and it feels hacky to initialize a KrylovSubspace of order 1.

I'll do the same bench against #128 soon.

Collaborator

haampie commented Jun 15, 2017

Hmm, I missed the follow up of @shivin9 in #128, but still I think it is better to not use the KrylovSubspace functions and shoot just for A_mul_B!. CG does Krylov subspace things implicitly, and it feels hacky to initialize a KrylovSubspace of order 1.

I'll do the same bench against #128 soon.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Jun 15, 2017

Contributor

Also: removes the check whether all elements of b are zero. I would say CG should not be responsible for this.

Since there's a test for this, it's fine to change as long as it passes the test. Without that check it used to return all-NaN, but presumably your fix of the starting-point-at-solution bug fixed this.

Contributor

timholy commented Jun 15, 2017

Also: removes the check whether all elements of b are zero. I would say CG should not be responsible for this.

Since there's a test for this, it's fine to change as long as it passes the test. Without that check it used to return all-NaN, but presumably your fix of the starting-point-at-solution bug fixed this.

@haampie

This comment has been minimized.

Show comment
Hide comment
@haampie

haampie Jun 15, 2017

Collaborator

I think the NaN result is a bug in the current implementation because the initial residual is not checked. With b = zeros(n) and x0 = zeros(n) you start out with a solution, so the iterations should not even start. The current implementation only inspects the residual after the first iteration; this PR does it before the first iteration.

If b = zeros(n) while x0 = rand(n), then CG works just fine.

Collaborator

haampie commented Jun 15, 2017

I think the NaN result is a bug in the current implementation because the initial residual is not checked. With b = zeros(n) and x0 = zeros(n) you start out with a solution, so the iterations should not even start. The current implementation only inspects the residual after the first iteration; this PR does it before the first iteration.

If b = zeros(n) while x0 = rand(n), then CG works just fine.

@lopezm94

This comment has been minimized.

Show comment
Hide comment
@lopezm94

lopezm94 Jun 15, 2017

Member

Removing KrylovSubspace should probably be in a new issue and left intact in this PR. Some solvers in this package already accept KrylovSubspace as input, the intention with making the cg implementation dependant was to enable it as input in the future. So if we are going to remove KrylovSubspace, it should be generalized in the whole package.

If I remember correctly there was an idea to remove KrylovSubspace from the package, right @jiahao? Something along the lines of using it only as a store of vectors and providing normalization functions. I could open a new issue about this. This could also improve readability for people who are more accustomed to see the matrix operations inside the solvers.

Member

lopezm94 commented Jun 15, 2017

Removing KrylovSubspace should probably be in a new issue and left intact in this PR. Some solvers in this package already accept KrylovSubspace as input, the intention with making the cg implementation dependant was to enable it as input in the future. So if we are going to remove KrylovSubspace, it should be generalized in the whole package.

If I remember correctly there was an idea to remove KrylovSubspace from the package, right @jiahao? Something along the lines of using it only as a store of vectors and providing normalization functions. I could open a new issue about this. This could also improve readability for people who are more accustomed to see the matrix operations inside the solvers.

@lopezm94

This comment has been minimized.

Show comment
Hide comment
@lopezm94

lopezm94 Jun 23, 2017

Member

@haampie can you leave KrylovSubspace dependency as it was and rebase? To not delay this PR's merge unnecessarily and then we can discuss KrylovSubspace in general for the package.

Member

lopezm94 commented Jun 23, 2017

@haampie can you leave KrylovSubspace dependency as it was and rebase? To not delay this PR's merge unnecessarily and then we can discuss KrylovSubspace in general for the package.

@haampie

This comment has been minimized.

Show comment
Hide comment
@haampie

haampie Jun 23, 2017

Collaborator

This PR doesn't remove the KrylovSubspace type, it removes its usage in cg.jl, so this PR can be merged I guess.

Collaborator

haampie commented Jun 23, 2017

This PR doesn't remove the KrylovSubspace type, it removes its usage in cg.jl, so this PR can be merged I guess.

@lopezm94

This comment has been minimized.

Show comment
Hide comment
@lopezm94

lopezm94 Jun 23, 2017

Member

Yeah the issue is about removing it's usage in cg, not about removing the type from the package. As I explained before:

Some solvers in this package already accept KrylovSubspace as input, the intention with making the cg implementation dependant was to enable it as input in the future. So if we are going to remove KrylovSubspace, it should be generalized in the whole package.

So the intention was to make cg accept krylovsubspace as input in the future. If we remove it then it can't be done and it should be in a separate issue.

Member

lopezm94 commented Jun 23, 2017

Yeah the issue is about removing it's usage in cg, not about removing the type from the package. As I explained before:

Some solvers in this package already accept KrylovSubspace as input, the intention with making the cg implementation dependant was to enable it as input in the future. So if we are going to remove KrylovSubspace, it should be generalized in the whole package.

So the intention was to make cg accept krylovsubspace as input in the future. If we remove it then it can't be done and it should be in a separate issue.

@haampie

This comment has been minimized.

Show comment
Hide comment
@haampie

haampie Jun 24, 2017

Collaborator

Alright, fixed that

Collaborator

haampie commented Jun 24, 2017

Alright, fixed that

@barche

This comment has been minimized.

Show comment
Hide comment
@barche

barche Jun 25, 2017

Just for reference, I did a comparison with Trilinos.jl, resulting in the following for CG:

  • Your code: 2.7 s
  • Trilinos: 1.633 s

I attempted to reproduce your benchmark as accurately as possible here:
https://github.com/barche/Trilinos.jl/blob/master/benchmarks/belos.jl

barche commented Jun 25, 2017

Just for reference, I did a comparison with Trilinos.jl, resulting in the following for CG:

  • Your code: 2.7 s
  • Trilinos: 1.633 s

I attempted to reproduce your benchmark as accurately as possible here:
https://github.com/barche/Trilinos.jl/blob/master/benchmarks/belos.jl

@ChrisRackauckas

This comment has been minimized.

Show comment
Hide comment
@ChrisRackauckas

ChrisRackauckas Jun 25, 2017

Member

Thanks @barche for taking the time to do that. It'll serve as a good performance goal to keep in mind! That shows we're already pretty close to Trillinos which is awesome!

Member

ChrisRackauckas commented Jun 25, 2017

Thanks @barche for taking the time to do that. It'll serve as a good performance goal to keep in mind! That shows we're already pretty close to Trillinos which is awesome!

@KristofferC

This comment has been minimized.

Show comment
Hide comment
@KristofferC

KristofferC Jun 25, 2017

Is Trilinos multithreading the sparse matrix vector multiplication? If it is, one could try using something like https://github.com/JuliaSparse/MKLSparse.jl or https://github.com/dcjones/RecursiveSparseBlocks.jl to see if it is the matvec that makes the difference.

KristofferC commented Jun 25, 2017

Is Trilinos multithreading the sparse matrix vector multiplication? If it is, one could try using something like https://github.com/JuliaSparse/MKLSparse.jl or https://github.com/dcjones/RecursiveSparseBlocks.jl to see if it is the matvec that makes the difference.

@haampie

This comment has been minimized.

Show comment
Hide comment
@haampie

haampie Jun 25, 2017

Collaborator

@barche did you run Julia with multithreading enabled? In my experience the Julia version runs somewhat faster if you do BLAS.set_num_threads(1) because of all the BLAS level 1 operations. On the other hand @KristofferC might have a good point as well.

Collaborator

haampie commented Jun 25, 2017

@barche did you run Julia with multithreading enabled? In my experience the Julia version runs somewhat faster if you do BLAS.set_num_threads(1) because of all the BLAS level 1 operations. On the other hand @KristofferC might have a good point as well.

@barche

This comment has been minimized.

Show comment
Hide comment
@barche

barche Jun 26, 2017

@KristofferC No, Trilinos was not multithreading. I'll do some more tests using that when I get back to one of my desktop machines.

@haampie I kept the Julia settings at their default, but switching off threading made no noticeable difference on this 2012 Macbook Air.

barche commented Jun 26, 2017

@KristofferC No, Trilinos was not multithreading. I'll do some more tests using that when I get back to one of my desktop machines.

@haampie I kept the Julia settings at their default, but switching off threading made no noticeable difference on this 2012 Macbook Air.

@lopezm94 lopezm94 merged commit eb031c0 into JuliaMath:master Jun 26, 2017

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
coverage/coveralls Coverage increased (+1.04%) to 87.582%
Details

haampie added a commit to haampie/IterativeSolvers.jl that referenced this pull request Jun 28, 2017

Reduce memory usage in CG (#130)
* Reduce memory usage in CG

* Bring back Krylov subspace again

* Fix deprecation warning

haampie added a commit to haampie/IterativeSolvers.jl that referenced this pull request Jul 10, 2017

Reduce memory usage in CG (#130)
* Reduce memory usage in CG

* Bring back Krylov subspace again

* Fix deprecation warning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment