Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Oracle OCCI APIs which work with std::string in C++11 ABI mode #17147

Merged
merged 1 commit into from Jan 23, 2017

Conversation

davidlt
Copy link
Contributor

@davidlt davidlt commented Jan 11, 2017

We have enabled C++11 ABI in GCC 6.3.0 architecture. This means that
things like std::string are not binary compatible between old "GCC 4"
ABI and new one (C++11, default since GCC 5). The changes to GNU C++
Standard Library were needed to meet C++ Standard Library requirements
for C++11. Note that GNU libstdc++ is dual-ABI.

Oracle Instant Client is not compiled with latest GCC and C++11 ABI. One
could avoid issues by using methods which do not use, e.g., std::string,
or use C-level API.

This lacks proper testings. All the tests within CMSSW are disabled for
unknown reason. Also these are probably not used within IB RelVals.

This does compile with GCC 6.3.0 + C++11 ABI.

Signed-off-by: David Abdurachmanov David.Abdurachmanov@cern.ch

We have enabled C++11 ABI in GCC 6.3.0 architecture. This means that
things like std::string are not binary compatible between old "GCC 4"
ABI and new one (C++11, default since GCC 5). The changes to GNU C++
Standard Library were needed to meet C++ Standard Library requirements
for C++11. Note that GNU libstdc++ is dual-ABI.

Oracle Instant Client is not compiled with latest GCC and C++11 ABI. One
could avoid issues by using methods which do not use, e.g., std::string,
or use C-level API.

This lacks proper testings. All the tests within CMSSW are disabled for
unknown reason. Also these are probably not used within IB RelVals.

This does compile with GCC 6.3.0 + C++11 ABI.

Signed-off-by: David Abdurachmanov <David.Abdurachmanov@cern.ch>
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @davidlt for CMSSW_9_0_X.

It involves the following packages:

OnlineDB/CSCCondDB
OnlineDB/EcalCondDB

@ggovi, @cmsbuild, @davidlange6 can you please review it and eventually sign? Thanks.
@argiro this is something you requested to watch as well.
@davidlange6, @smuzaffar you are the release manager for this.

cms-bot commands are listed here #13028

@davidlange6
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 13, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/17264/console Started: 2017/01/13 09:09

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

@davidlt
Copy link
Contributor Author

davidlt commented Jan 19, 2017

ping^1

@ggovi
Copy link
Contributor

ggovi commented Jan 23, 2017

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_9_0_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @smuzaffar

@davidlange6
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 5979d3b into cms-sw:CMSSW_9_0_X Jan 23, 2017
@davidlt
Copy link
Contributor Author

davidlt commented Oct 27, 2017

This is a question to @davidlange6 . Also, could you provide full details on what kind of problem are you having? It's hard to understand what is happening from your original message. CC7 should not be a trigger as the same applies for SLC6. Thus there might be an extra element. Again, more details are needed to understand.

@ggovi
Copy link
Contributor

ggovi commented Oct 27, 2017

here follows the error message we get:
while validating the O2O code for the platform CC7 (slc7_amd64_gcc630) I got the following failure:

----- Begin Fatal Exception 04-Sep-2017 11:24:38 CEST-----------------------
An exception of category 'StdException' occurred while
[0] Calling endJob for module ExTestEcalDAQAnalyzer/'conf_o2o'
Exception Message:
A std::exception was thrown.
ERROR: Connection Failed: ORA-24960: the attribute OCI_ATTR_USERNAME is greater than the maximum allowable length of 255

The exception is thrown by the OCCI Environment::createConnection( … ) function.
This error has been obtained using the release CMSSW_9_2_10 with slc7/gcc630

The trigger of the error is not the slc7, but the gcc630 compiler.

@davidlt
Copy link
Contributor Author

davidlt commented Oct 27, 2017

Could you debug and confirm that ABI is causing this issue?

@ggovi
Copy link
Contributor

ggovi commented Oct 27, 2017

Well with gcc530 it works... Thanks probably to
https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_9_2_X/gcc530/gcc.spec#L271

@ggovi
Copy link
Contributor

ggovi commented Oct 30, 2017

any feedback on this? Can we restore the compilation option?

@davidlt
Copy link
Contributor Author

davidlt commented Oct 30, 2017

Okay. So I had to investigate myself. I took random login credentials publicly floating on the internet (not gonna share here). Using CMSSW_9_2_0 + slc6_amd64_gcc630 + C++17 it failed. Using -D_GLIBCXX_USE_CXX11_ABI=0 to revert compilation unit to use older ABI (GCC4-compatible) solved the issue with my test case. This is ABI issue and I somewhat see where it happens, but cannot debug anything as that proprietary Oracle software stack.

As mentioned in original comment I would highly suggest to use C-level API as that is safe. You can also use -D_GLIBCXX_USE_CXX11_ABI=0 as long as you provide good encapsulation. Hopefully Oracle release a update Instant Client OCCI.

@ggovi
Copy link
Contributor

ggovi commented Oct 30, 2017

The OCCI choice is non-standard in cms, the recommended interface is coral, which is based on OCI. So, for whoever is currently using OCCI in CMSSW, the possible fixes are obviously the two you are mentioning:

  1. migrate to coral or OCI
  2. switch the ABI compatibility on

Option 2. looks cheaper, I would be in favour of it. If there are reasons not to go for it, please let us know.
As soon as 630 without -D_GLIBCXX_USE_CXX11_ABI=0 will be official architecture for HLT, we will be forced to leave the O2O release behind.

@ggovi
Copy link
Contributor

ggovi commented Oct 30, 2017

I guess one relevant point could be the medium term availability ( for Run2 ) of this option in the compilers we use. Is it available in gcc700? Can you please check if it works?

@davidlt
Copy link
Contributor Author

davidlt commented Oct 30, 2017

You mean -D_GLIBCXX_USE_CXX11_ABI=0? That's part of GNU Standard C++ Library implementation and not the compiler. It does exist in GCC 7.2.1 and GCC 8 and will for more years.

Note that this is not magic bullet and you need to use it accordingly, refactoring of some code might be needed. This flag control ABI within GNU Standard C++ Library and is used mainly to build libraries with dual-ABI.

@ggovi
Copy link
Contributor

ggovi commented Oct 30, 2017

Let me make a step back on this. We have some CMSSW code that does not work since we moved to gcc630. We know the reason, we have an optimal solution ( get rid of OCCI ), but it requires some work. Is the workaround used in gcc530 still a solution until the end of run2? If for any reason related to the planning (in terms of compilers ) this is not the case, we need to tell to the owners of the affected packages that they have to migrate to OCI or coral.

@davidlange6
Copy link
Contributor

davidlange6 commented Oct 30, 2017 via email

@ggovi
Copy link
Contributor

ggovi commented Nov 2, 2017

I've asked the people to provide an estimation...

@ggovi
Copy link
Contributor

ggovi commented Nov 8, 2017

Hi,
the Ecal team already started to work on the migration. The SiStrip team looks in a more complex situation:
OCCI is used in 45 classes and ~30k lines of code are involved in a package external to CMSSW.
Should we invite them to some discussion ( ORP or a dedicate meeting )?
Please let me know...

@davidlange6
Copy link
Contributor

davidlange6 commented Nov 8, 2017 via email

@hqucms
Copy link
Contributor

hqucms commented Nov 8, 2017

Hi @davidlange6

The external package for SiStrip that uses OCCI is tkonline. It is part of the trackerDAQ software and is compiled as a standalone package online (without depending on CMSSW). Inside CMSSW, it is mainly used to access the online SiStrip configuration database for e.g., the Strips O2O, etc.

@davidlange6
Copy link
Contributor

davidlange6 commented Nov 8, 2017 via email

@davidlt
Copy link
Contributor Author

davidlt commented Nov 8, 2017

None of RelVals use that functionality thus not so visible. In general it's only use in ONLINE. E.g, AArch64 is using a fake tkonline (https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_10_0_X/gcc630/tkonlinesw-fake.spec) which just provides the interfaces needed by CMSSW to compile. If by accident is something called you would received exception saying that implementation is missing.

There seems to be a number of functions (not sure if public) which use thing like oracle::occi::Clob, std::string, etc.

@davidlange6
Copy link
Contributor

davidlange6 commented Nov 8, 2017 via email

@hqucms
Copy link
Contributor

hqucms commented Nov 8, 2017

Hi @davidlt

A naive question: I was wondering if it would be possible to patch tkonline somehow so that it does not communicate with the outside using std::string, and then compile tkonline with -D_GLIBCXX_USE_CXX11_ABI=0. Will that solve the problem?

@davidlt
Copy link
Contributor Author

davidlt commented Nov 8, 2017

It's enough to use C primitives instead of C++ standard classes to avoid C++ standard library ABI changes. Thus std::string has a different structure in different ABI configurations, but char * is all safe.

E.g., I am not sure if occi::Clob is affected. You can print object or class sizes under different ABI.

@hqucms
Copy link
Contributor

hqucms commented Nov 8, 2017

Thank you @davidlt ! I will take a look!

@hqucms
Copy link
Contributor

hqucms commented Nov 14, 2017

The tracker online team found a way to work around the failure with the following piece of code:

extern "C" {
#include <string.h>
int _ZNKSs6lengthEv (char **a)
{
  char *s = *a;
  return strlen (s);
}
}

One just needs to compile it into a shared library and preload it with LD_PRELOAD when running cmsRun for the O2Os.

Basically it is replacing the code for s.length() with strlen(s.c_str()) so occi still sees the correct length when getting the new cxx11 strings. This seems enough to fix the problem with occi, except when occi is throwing an exception: in that case the code catching it will crash badly while handling some string. I tested this today and the Strips O2O indeed ran correctly with it in CMSSW_9_4_0 with gcc630.

Any comments on this, @davidlt @davidlange6 ?

@davidlt
Copy link
Contributor Author

davidlt commented Nov 15, 2017

I see how this could work, but I am not (yet) fully sure if it's safe in all cases. I would probably prefer not to have it within CMSSW repository.

@hqucms
Copy link
Contributor

hqucms commented Nov 15, 2017

Hi @davidlt

I would probably prefer not to have it within CMSSW repository.

I think that's fine for us. We can compile the piece of code into a standalone library and preload it in the O2O scripts. What do you think, @ggovi ?

@ggovi
Copy link
Contributor

ggovi commented Nov 15, 2017

@hqucms
The standalone library needs to be in some repository - currently we run the O2O entirely from the CMSSW release... So the library needs to be built within CMSSW, otherwise we can't use it I'm afraid...

@hqucms
Copy link
Contributor

hqucms commented Nov 15, 2017

@ggovi
Can we treat it the same way as what we do with the steering scripts outside CMSSW?
Or can we add an "isolated" package (e.g., OnlineDB/OracleStringFix) in CMSSW with just this piece of code, @davidlt ?

@ggovi
Copy link
Contributor

ggovi commented Nov 16, 2017

@davidlt @davidlange6 @hqucms
Maybe better to move the entire discussion into a meeting? Can we propose next Monday? @hqucms can you check with the online team?

@hqucms
Copy link
Contributor

hqucms commented Nov 16, 2017

@ggovi Sure, I will ask them now.

@hqucms
Copy link
Contributor

hqucms commented Dec 6, 2017

@davidlt @davidlange6 @ggovi @boudoul

I just had a quick look at the new version of OCCI 12.2.0.1.0. Indeed the symbol _ZNKSs6lengthEv is gone. But then the hack no longer works anymore... Can we stay with 12.1 at least for 2018?

------ More details below ------

A simple example for reproducing it:

#include <iostream>
#include <string>
#include <sstream>

#include "occi.h"

using namespace oracle::occi;

extern "C" {
#include <string.h>
int _ZNKSs6lengthEv (char **a)
{
  char *s = *a;
  return strlen (s);
}
}

int main(){

    std::string user = "a";
    std::string pass = "b";
    std::string sid = "c";

    auto env = Environment::createEnvironment(Environment::OBJECT);
    auto conn = env->createConnection(user, pass, sid);
    auto stmt = conn->createStatement();

    return 0;
}

Compiling e.g.,

g++ occi.cc -I/cvmfs/cms.cern.ch/slc6_amd64_gcc630/external/oracle/12.1.0.2.0/include -L/cvmfs/cms.cern.ch/slc6_amd64_gcc630/external/oracle/12.1.0.2.0/lib -locci -lclntsh

With OCCI 12.1.0.2.0, it gives

terminate called after throwing an instance of 'oracle::occi::SQLException'
  what():  ORA-12154: TNS:could not resolve the connect identifier specified

while in the new OCCI 12.2.0.1.0, we are back to the original error related to the string length:

terminate called after throwing an instance of 'oracle::occi::SQLException'
  what():  ORA-24960: the attribute  OCI_ATTR_USERNAME is greater than the maximum allowable length of 255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants