Skip to content

Conversation

@boberfly
Copy link
Contributor

@boberfly boberfly commented Sep 7, 2021

Build : Add more SCons 'Depends' onto the python modules, they will attempt to link before IECorePython can have enough time to generate a lib to link to on multiple job threads.

The woes of running on something that can spin 128 threads...

Related Issues

  • N/A

Dependencies

  • N/A

Breaking Changes

  • N/A

Checklist

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable.
  • I have tested my change(s) in the test suite, and added new test cases where necessary.
  • My code follows the Cortex project's prevailing coding style and conventions.
  • If my code made breaking changes, I applied the pr-majorVersion label to this PR.

@boberfly
Copy link
Contributor Author

boberfly commented Sep 7, 2021

Hmm standby, this seems to not solve the issue yet & needs a bit more investigating...

@boberfly boberfly force-pushed the corePythonModule_depends_fix branch from 78588f2 to cc158e3 Compare September 7, 2021 21:58
@boberfly
Copy link
Contributor Author

boberfly commented Sep 7, 2021

Alright I had to make it depend on the actual install rather than the module itself, seems to work now here.

Copy link
Member

@andrewkaufman andrewkaufman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took a while to get to this @boberfly. I've suggested a change that I think should be applied to all the other modules as well.

Happy to take that on if you don't have time.

corePythonModule = corePythonModuleEnv.SharedLibrary( "python/IECore/_IECore", corePythonModuleSources )
corePythonModuleEnv.Depends( corePythonModule, coreLibrary )
corePythonModuleEnv.Depends( corePythonModule, corePythonLibrary )
corePythonModuleEnv.Depends( corePythonModule, corePythonLibraryInstall )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about this change. You should be able to run scons without anything actually installing, it all just compiles in-place, but this change would cause libIECorePython.so to actually install.

I think this line was correct before, and what you're needing is for corePythonModuleInstall to depend on corePythonLibraryInstall.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index ab0fa4b66..c482335fe 100644
--- a/SConstruct
+++ b/SConstruct
@@ -1763,10 +1763,11 @@ corePythonModuleEnv.Append( LIBS = os.path.basename( coreEnv.subst( "$INSTALL_LI
 corePythonModuleEnv.Append( LIBS = os.path.basename( corePythonEnv.subst( "$INSTALL_PYTHONLIB_NAME" ) ) )
 corePythonModule = corePythonModuleEnv.SharedLibrary( "python/IECore/_IECore", corePythonModuleSources )
 corePythonModuleEnv.Depends( corePythonModule, coreLibrary )
-corePythonModuleEnv.Depends( corePythonModule, corePythonLibraryInstall )
+corePythonModuleEnv.Depends( corePythonModule, corePythonLibrary )
 
 corePythonModuleInstall = corePythonModuleEnv.Install( "$INSTALL_PYTHON_DIR/IECore", corePythonScripts + corePythonModule )
 corePythonModuleEnv.AddPostAction( "$INSTALL_PYTHON_DIR/IECore", lambda target, source, env : makeSymLinks( corePythonEnv, corePythonEnv["INSTALL_PYTHON_DIR"] ) )
+corePythonModuleEnv.Depends( corePythonModuleInstall, corePythonLibraryInstall )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @andrewkaufman

I ran into this one again and the exact issue is: /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /dependencies-5e01d53f74c525306a27c497de35e1a0510d6546-source/gafferDependenciesBuild/lib/libIECorePython.so: file not recognized: file truncated

My memory might be wrong, but your suggestion I think didn't work for me until I used the corePythonLibraryInstall as the dependency, perhaps the dependent modules try to link to the installed one? Not sure...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying it now with a rebase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an update, the requested change from @andrewkaufman still caused race conditions in the build for me, so I've kept the depends to corePythonLibraryInstall so this is just a rebase.

@boberfly boberfly force-pushed the corePythonModule_depends_fix branch 2 times, most recently from 20bd443 to 6b04bc5 Compare May 4, 2022 20:22
…ttempt to link before IECorePython can have enough time to generate a lib to link to on multiple job threads.
@boberfly boberfly force-pushed the corePythonModule_depends_fix branch from 6b04bc5 to b983f3e Compare May 10, 2022 00:16
@ivanimanishi ivanimanishi self-requested a review May 10, 2022 00:34
Copy link
Member

@ivanimanishi ivanimanishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @boberfly offline, my best guess is that in all such cases we technically need the libraries not only built, but also "installed" in order to be usable by dependent build operations.

It's therefore necessary to set all similar dependency relationships to the "Install" node instead of the "build" node for a library.

At a minimum, it does guarantee that the build operation is fully complete before starting the next build.

@ivanimanishi ivanimanishi merged commit f3f793a into ImageEngine:main May 10, 2022
@johnhaddon
Copy link
Member

Hang on, isn't there a reason that install is a separate target? So you can build and test and be satisfied that everything is working before committing anything to the filesystem?

@ivanimanishi
Copy link
Member

Right... I see your point. But then I don't really understand how @boberfly is getting the errors.
Do you know where do we look for the coreLibrary or corePythonLibrary when building the corePythonModule? I was assuming you'd only get the error because we were looking for it in the install location.

But if you are correct, then it would need to be looking for it in the build location, in which case I don't understand how the library wouldn't be available with the old code.

Andrew's last suggestion about "install depends on install" didn't work, and doesn't seem like it would anyway.

@johnhaddon
Copy link
Member

Do you know where do we look for the coreLibrary or corePythonLibrary when building the corePythonModule?

I think we should be getting them from where they're built, in ./lib, which is why we add that directory to the LIB build variable.

But then I don't really understand how @boberfly is getting the errors.

I don't either. I do -j 64 builds all the time, and don't seem to be hitting any issues.

@ivanimanishi
Copy link
Member

ivanimanishi commented May 10, 2022

@johnhaddon,
I'm thinking that the issue is that @boberfly is running this using the gafferDependencies command line, which specificies the LIBPATH to be the install location.

In that scenario, it may be that creating the dependency to install makes sense.

@ivanimanishi
Copy link
Member

No. The SConstruct still should add ./lib to the LIBPATH...

@ivanimanishi
Copy link
Member

I'm resetting the main branch to the previous state (before merging this PR), until we can sort this out.

@boberfly
Copy link
Contributor Author

@johnhaddon hmm I get this like 90% of the time, but my system is probably -j128, would it be better to just do a max limiter for cortex so it goes -j64 instead?

A quick search for bash shows that " -j $(({jobs}>64 ? 64 : ${jobs}))" could work as a clamp...

@ivanimanishi
Copy link
Member

Reverted here: 4cb69f0

@johnhaddon
Copy link
Member

would it be better to just do a max limiter for cortex so it goes -j64 instead?

We do want to get this sorted out properly in the SConstruct, but I don't think we're (or at least I'm) at a point where we truly understand the problem. Rather than tell SCons to wait until the installs are done, I think we want to make sure that SCons is picking up libIECorePython.so from ./lib instead of the install location. Does that make sense?

@johnhaddon
Copy link
Member

/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /dependencies-5e01d53f74c525306a27c497de35e1a0510d6546-source/gafferDependenciesBuild/lib/libIECorePython.so: file not recognized: file truncated

@boberfly, do you have the command line that triggered this error? I'm wondering if the linker is searching <installDir>/lib before ./lib, and so is seeing a lib that is part-way through being installed when we want it to pick the already-fully-formed one in ./lib instead. If that's the case, the solution might be to sort out the library search order, rather than make anything depend on the install target.

@johnhaddon
Copy link
Member

Aha! Today I fell victim to this same error too! And I can confirm that ./lib is coming after <installDir>/lib on the command line, which I think is the cause of the problem :

/opt/rh/devtoolset-9/root/usr/bin/g++ -o contrib/IECoreUSD/python/IECoreUSD/_IECoreUSD.so -Wl,--no-as-needed -shared -Wl,-fatal_warnings -L/dependencies-5.0.0a4-source/gafferDependenciesBuild/lib -lpython3.7m -lcrypt -lpthread -ldl -lutil -lm contrib/IECoreUSD/src/IECoreUSD/bindings/IEUSDModule.os -Llib -L/dependencies-5.0.0a4-source/gafferDependenciesBuild/lib -lpthread -lboost_filesystem -lboost_regex -lboost_iostreams -lboost_date_time -lboost_thread -lboost_timer -lboost_chrono -lboost_system -ltbb -lblosc -lIex -lImath -lIlmImf -lIlmThread -lHalf -lz -lboost_python37 -lusd_usd -lusd_usdGeom -lusd_usdLux -lusd_usdSkel -lusd_usdShade -lusd_sdf -lusd_tf -lusd_pcp -lusd_arch -lusd_gf -lusd_js -lusd_vt -lusd_ar -lusd_plug -lusd_trace -lusd_kind -lusd_work -lIECore -lIECorePython -lIECoreUSD

So I think we just need to get those -L arguments reordered somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants