Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to use MKL Pardiso, reduce number of tests #225

Merged
merged 30 commits into from
May 21, 2021
Merged

Conversation

ranjanan
Copy link
Member

No description provided.

@coveralls
Copy link

coveralls commented Feb 27, 2020

Pull Request Test Coverage Report for Build 609

  • 88 of 136 (64.71%) changed or added relevant lines in 9 files are covered.
  • 43 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-4.09%) to 81.323%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/run.jl 1 2 50.0%
src/raster/advanced.jl 22 30 73.33%
src/core.jl 46 61 75.41%
src/utils.jl 4 28 14.29%
Files with Coverage Reduction New Missed Lines %
src/config.jl 2 97.01%
src/core.jl 7 92.12%
src/run.jl 15 61.7%
src/utils.jl 19 60.16%
Totals Coverage Status
Change from base Build 582: -4.09%
Covered Lines: 1537
Relevant Lines: 1890

💛 - Coveralls

src/core.jl Outdated Show resolved Hide resolved
@ViralBShah
Copy link
Member

Do we always check norm(Ax-b) after solving a system to make sure we got a reasonable answer, for all the solver types?

@ranjanan ranjanan changed the title Add option to use MKL Pardiso Add option to use MKL Pardiso, reduce number of tests Mar 15, 2020
@ranjanan
Copy link
Member Author

Fixes #188

src/core.jl Outdated Show resolved Hide resolved
@ranjanan
Copy link
Member Author

No idea why Windows is failing when I'm running completely serial code. Looking at the log, it seems like there's some reordering and race conditions even though I'm running with only one process.

@ranjanan
Copy link
Member Author

Ok, the purpose of doing these experiments was the establish the following. If I did this:

  1. First I run a problem in parallel and get the answer
  2. I remove the processes using rmprocs
  3. I immediately run another problem after with 1 process.

Then the pmap seems to be causing some kind of "race condition" (even though it is not running in parallel) and affecting one of the variables, thereby changing the final answer and causing my test to fail.

@ViralBShah
Copy link
Member

ViralBShah commented Mar 18, 2020

Should we have a separate PR to fix the broken tests first and clean up the tests business, and then merge these the pardiso and gdal PRs?

@ViralBShah
Copy link
Member

@ranjanan If you can further describe the pmap related issue or point to it in the code, perhaps @tanmaykm may be able to help. Maybe worthwhile to do a call with him and show him what's happening?

I feel like it is worth stabilizing this codebase fully before introducing threading. Thoughts?

@ranjanan
Copy link
Member Author

I spoke to @shashi and we concluded that there's a race condition.

Currently, everytime we run Circuitscape, we run addprocs at the beginning depending on the value of max_parallel in the config file. We also create am Array of SharedArray's whose length is nprocs() so each process can safely accumulate into their own matrix. We then do the computation and then run rmprocs(1:workers()) at the end to get rid of the workers.

Now, the race condition happens because rmprocs from the previous test hasn't actually deleted workers but nprocs() returns 1. In the next text, when we create a single SharedArray (because nprocs() returns 1), we have multiple processes writing into a single SharedArray.

Perhaps this would just be fixed if I did wait(rmprocs(...)) instead so I know for sure the worker processes are removed.

@ViralBShah
Copy link
Member

So what do we need to do to get this in?

@vlandau
Copy link
Member

vlandau commented May 5, 2020

Probably doesn't make sense....but is there any sort of gc to handle actually deleting remote workers? I had a weird issue where I needed to do GC.gc() in runtests.jl for Omniscape to remove a lock in order to be able to remove a directory. Long shot, and not a well-informed idea by any means, but thought I'd mention it just in case.

@ViralBShah
Copy link
Member

Can we rebase this one on master?

@ViralBShah
Copy link
Member

@vlandau This is the pardiso branch - but needs to be rebased to master. Perhaps @ranjanan can rebase it and you can try it out and see the benefit?

@ViralBShah
Copy link
Member

Nice - tests are passing!

@ranjanan ranjanan merged commit 87c2fcc into master May 21, 2021
@ranjanan ranjanan deleted the pardiso branch May 21, 2021 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants