Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized ?trtrs #2252

Merged
merged 15 commits into from
Sep 12, 2019
Merged

Optimized ?trtrs #2252

merged 15 commits into from
Sep 12, 2019

Conversation

thrasibule
Copy link
Contributor

closes #2251
I've written it in the same model as the optimized ?getrs implementation. The idea is to just use ?trsv instead of ?trsm when the right hand side only has 1 column. I've checked that this give the expected speedup.
The only sure I'm not quite sure about, and someone more familiar with the codebase could probably help me with are these lines:
https://github.com/thrasibule/OpenBLAS/blob/trtrs/interface/lapack/trtrs.c#L142-L143.
I used the same parameters as in ?getrs
and here:
https://github.com/thrasibule/OpenBLAS/blob/trtrs/interface/lapack/trtrs.c#L165
Again I've used the same parameters as ?getrs. This looks like operation counts, so it probably should be half of that, but I'm not sure what these macros do.

This is a work in progress, I can reorganize the commits and maybe add some tests.

@brada4
Copy link
Contributor

brada4 commented Sep 9, 2019

quick exit on NOOP is missing from code -

      IF( N.EQ.0 )
     $   RETURN

@thrasibule
Copy link
Contributor Author

@martin-frbg
Copy link
Collaborator

The appveyor failure is unrelated (and my fault), but I do not yet understand what happened to the travis OSX builds.

@thrasibule
Copy link
Contributor Author

This should fix it... I'm surprised it even got picked up, I thought extended precision was disabled.

@brada4
Copy link
Contributor

brada4 commented Sep 9, 2019

@thrasibule extended precision is alive on *BSD , just that OSX is only of a kind in CI
The other NOOP shortcut - ir could be few lines earlier - and - does it qualify for upstream?

@martin-frbg martin-frbg added this to the 0.3.8 milestone Sep 10, 2019
@martin-frbg
Copy link
Collaborator

Unfortunately I am seeing a considerable number of test failures in the LAPACK EIG and LIN tests (available through make lapack-test) with this PR.

@thrasibule
Copy link
Contributor Author

I think I've fixed it all. I'm still seeing some error in lapack-test, but not more than before this PR.

@martin-frbg
Copy link
Collaborator

Thank you. The remaining error in STFSM/CTFSM in indeed unrelated to the PR (probably caused by some inaccuracy in the corresponding trsm, and definitely not a recent regression).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow ?trtrs implementation
3 participants