Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest update broke OpenBLAS on 10.6 (x86_64 and ppc): Makefile.conf:8: *** missing separator. Stop. #3989

Closed
barracuda156 opened this issue Apr 2, 2023 · 30 comments · Fixed by #3996

Comments

@barracuda156
Copy link
Contributor

See: https://trac.macports.org/ticket/66907
Log from 10.6.8 x86_64 with clang: https://build.macports.org/builders/ports-10.6_x86_64-builder/builds/149261/steps/install-port/logs/stdio

@martin-frbg
Copy link
Collaborator

martin-frbg commented Apr 2, 2023

You'll want to skip 0.3.22 for unrelated reasons, and I am not immediately aware of any relevant recent changes in the tools that generate Makefile.conf. From the log you posted, there appears to be a permission(?) problem with actions performed by c_check in relation to the identification of the C compiler - probably this is where the spurious dash comes from. the mktemp command on your system appears to require an additional argument, but I think this is not the root of the problem. The spurious dash seems to be spillover from the preceding CEXTRALIB line, but I do not see how this could happen unless the shell printf has an implicit line break at that line length. Does it work when you add USE_PERL=1 to employ the original perl-based tool ?

@barracuda156
Copy link
Contributor Author

You'll want to skip 0.3.22 for unrelated reasons

@martin-frbg Could you refer to those please? Perhaps Macports maintainer @cjones051073 would want to be aware of that.

I am not immediately aware of any relevant changes in the tools that generate Makefile.conf

It was failing for some time with builds from master branch (see Trac ticket). However 0.3.21 built fine.

@martin-frbg
Copy link
Collaborator

Mhh, I also notice that you are applying a number of patches, including to c_check ?

@barracuda156
Copy link
Contributor Author

Mhh, I also notice that you are applying a number of patches, including to c_check ?

Thank you for pointing that out. Those aren’t mine, I guess (I am not a maintainer of OpenBLAS), but I will look what they do.

Does it work when you add USE_PERL=1 to employ the original perl-based tool?

I will check that too.

@martin-frbg
Copy link
Collaborator

#3976 - a seemingly harmless change in GETRF/GETF2 that had unexpected consequences. 0.3.23 was released yesterday.

@martin-frbg
Copy link
Collaborator

Hm, could it be that your (cross)-compiler path length used to be shorter in the recent past ? That dash on line 8 definitely belongs on the end of line 7, there is no other output field between CROSS_SUFFIX and CROSS

@barracuda156
Copy link
Contributor Author

barracuda156 commented Apr 2, 2023

Hm, could it be that your (cross)-compiler path length used to be shorter in the recent past ? That dash on line 8 definitely belongs on the end of line 7, there is no other output field between CROSS_SUFFIX and CROSS

This is very much unlikely (same prefix, same port name, why would it be different). In fact, as I recall, compiler is literally the same, there were no recent updates. Also, that does not explain why builds are fine on other systems.

@martin-frbg
Copy link
Collaborator

I have no idea what your other systems do, I do notice that the whole _opt_var_macports path is unusually long and there is no reason (in the script) why it would be split across lines 7&8 to create a spurious dash on line 8.

@barracuda156
Copy link
Contributor Author

barracuda156 commented Apr 2, 2023

I have no idea what your other systems do, I do notice that the whole _opt_var_macports path is unusually long and there is no reason (in the script) why it would be split across lines 7&8 to create a spurious dash on line 8.

Well, in any case I cannot do anything about Macports paths – we need to fix OpenBLAS build for everyone, so local trickery with paths won’t solve the problem. But I am pretty sure nothing changed in this regard as compared to OpenBLAS 0.3.21.
I can actually revert locally to 0.3.21 and run configure to confirm it works with everything else identical. Gimme some time, machine is busy atm.

UPD. This commit broken it: aaaecdb
Reverting it fixes the build.

@martin-frbg
Copy link
Collaborator

Ah, thanks for the bisect - that is unfortunate, and I reckon it was at least the second attempt to support spaces here portably. Guess that line would have to be conditional on not being on macos, if nothing else

@martin-frbg
Copy link
Collaborator

So far I do not see why this would break, and I cannot reproduce the problem on either OSX-M1 or Linux-x86_64 even when faking your long compiler paths (unfortunately no easy osx-x86 access until next weekend, but the c_check is supposed to be universally applicable). I also do not get why a simple mktemp -d would fail with a usage message (either with or without replacing the original with gmktemp as one of your patches does - the d option and the behaviour in the absence of a template should be the same in either implementations)

(As an aside, your i386-Apple.diff seems to suggest that cpuid is available and reliable on 32bit ? Is that a specialty of the environment macports is expected to be used in, or could this patch be upstreamed here ?)

@kencu
Copy link
Contributor

kencu commented Apr 4, 2023

macports restricts the compiler selection for building openblas to newer clang and gcc versions.

It appears both of those compiler families support cpuid for i386.

So it’s upstreamable only if the appropriate compilers are used…

@martin-frbg
Copy link
Collaborator

@kencu thank you. Adding an ifdef based on compiler versions should not be a problem I think. But I still fail to reproduce the original issue here (now tested on OSX-x86_64 in Azure CI) - allowing space(s) within the CC string does not appear to be the actual cause in itself. Is there anything in your usual build environment that would make the OpenBLAS c_check script "think" you were actually trying to cross-compile ? (My understanding is/was that it is just a regular build on and for x86_64 ?)

@barracuda156
Copy link
Contributor Author

@kencu thank you. Adding an ifdef based on compiler versions should not be a problem I think. But I still fail to reproduce the original issue here (now tested on OSX-x86_64 in Azure CI) - allowing space(s) within the CC string does not appear to be the actual cause in itself. Is there anything in your usual build environment that would make the OpenBLAS c_check script "think" you were actually trying to cross-compile ? (My understanding is/was that it is just a regular build on and for x86_64 ?)

The error happens (for us in MacPorts) only on 10.6, for two archs (x86_64 and ppc). So it may not be easy to reproduce elsewhere.

@martin-frbg
Copy link
Collaborator

Oh, ok, this strongly suggests some bug or limitation in a system component, like maximum line length in the shelI I guess. Azure has versions 11 and 12 only - isn't 10.6 more than ten years old anyway ?

@barracuda156
Copy link
Contributor Author

Oh, ok, this strongly suggests some bug or limitation in a system component, like maximum line length in the shelI I guess. Azure has versions 11 and 12 only - isn't 10.6 more than ten years old anyway ?

We support everything from 10.4.x onwards :)

This error is, however, very untypical, AFAICT. I cannot recall anything of this kind, and we test thousands of ports. Given that a specific commit broken the build, likely it was a suboptimal solution anyway.

@martin-frbg
Copy link
Collaborator

I suspect is has to be a combination of the change from that commit and the rather long installation path name in your builds - if it only affects macports for version 10.6 (and possibly earlier), I wonder if it would be easier for you to carry a patch to revert the PR on the affected OS versions, or for me to add a MacOS version check and ifdef the changes inside c_check ?
(And as I mentioned above, building with make USE_PERL=1 should also be an acceptable workaround for MacOS <= 10.6, provided the old systems have perl installed - which they should have, as it used to be mandatory for building before the original c_check was replaced by a POSIX shell script)

@barracuda156
Copy link
Contributor Author

@martin-frbg It is very much likely that the same error gonna occur on <10.6 too, I just cannot verify it right now (and no build bots).

We have already reverted the breaking commit in a patch (it does no good to any macOS, but breaks some), but it is always preferable to have a fix in upstream – and that may also benefit users who do not install via Macports.

@martin-frbg
Copy link
Collaborator

I have not had a complaint from Homebrew yet , but maybe nobody there builds for Snow Leopard anymore. My only (inherited) Mac is 10.14.6 and none of the usual CI providers appears to offer anything that old, so I cannot debug this myself to identify the actual source of the problem. Also no idea if or why a Mac user would end up with a path containing spaces, but as far as I know it is not prohibited by the OS (?)

@barracuda156
Copy link
Contributor Author

I have not had a complaint from Homebrew yet , but maybe nobody there builds for Snow Leopard anymore. My only (inherited) Mac is 10.14.6 and none of the usual CI providers appears to offer anything that old, so I cannot debug this myself to identify the actual source of the problem. Also no idea if or why a Mac user would end up with a path containing spaces, but as far as I know it is not prohibited by the OS (?)

It is certainly possible to have file names with spaces and therefore paths, but that should not be the case for compilers, I believe, or any meaningful software installations.
I have no opinion re relative desirability: if someone believes paths with spaces are something relevant, then perhaps it can be made conditional for MacOS > 10.6? Users of <= 10.6 would certainly prefer a working build to a broken one with support for spaces :)

P. S. MacOS does have perl, but system one is pretty old in those OSs. 10.6.8 has 5.10.x, 10.6 PPC has 5.8.8, 10.5.x have 5.8.8 or earlier.

@saudet
Copy link

saudet commented Apr 7, 2023

FWIW, this happens on macOS 11 as well, when cross compiling for arm64:
https://github.com/bytedeco/javacpp-presets/actions/runs/4595357140/jobs/8115532225

@martin-frbg
Copy link
Collaborator

@saudet thanks, that is very useful information. I wonder why I did not see this in my cross-compile tests on Azure CI - are you using gcc and/or something from an Apple SDK for the build there, or clang -arch ? (Having some trouble interpreting your workflow files, sorry)

@martin-frbg
Copy link
Collaborator

nvm, found the cppbuild.sh - still wondering why my test did not trigger this though... but I guess I never tried an actual cross-compile with clang there.

@martin-frbg
Copy link
Collaborator

Hopefully fixed now by #3996 - can you please confirm ?

@martin-frbg martin-frbg reopened this Apr 7, 2023
@barracuda156
Copy link
Contributor Author

Hopefully fixed now by #3996 - can you please confirm ?

I will test tomorrow and let you know. Thank you.

@martin-frbg
Copy link
Collaborator

unfortunately #3996 only "worked" by way of a syntax error in a test statement effectively disabling the CROSS_SUFFIX line.

@barracuda156
Copy link
Contributor Author

unfortunately #3996 only "worked" by way of a syntax error in a test statement effectively disabling the CROSS_SUFFIX line.

Sorry, been too busy with other stuff. Do I get it correctly that it is unneeded to test #3996 now, as it has been confirmed not to work correctly?

@martin-frbg
Copy link
Collaborator

yes, currently testing #4010

@martin-frbg
Copy link
Collaborator

Assuming fixed by an embarassingly long series of PRs

@barracuda156
Copy link
Contributor Author

@martin-frbg We hope it is! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants