Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL queries return unicode strings while CMSSW only accepts byte strings #4497

Closed
vytjan opened this issue May 15, 2019 · 7 comments
Closed

Comments

@vytjan
Copy link
Contributor

vytjan commented May 15, 2019

While debugging job failures in the new T0 release replay, I found that T0 SQL queries return unicode strings in the new release instead of byte strings (in the old 2.1.4 release).
For instance, this one:
https://github.com/dmwm/T0/blob/master/src/python/T0/WMBS/Oracle/RunConfig/GetStreamDatasetTriggers.py

Since SetupCMSSWPset.py was throwing an exception, after some debugging I ended up at CMSSW cares about the string type and only accepts a byte string:
https://github.com/cms-sw/cmssw/blob/02d4198c0b6615287fd88e9a8ff650aea994412e/Configuration/DataProcessing/python/Repack.py#L57

I think this is caused by the recent sqlalchemy package update:
cms-sw/cmsdist@2187529#diff-c359e72e06d238c2f7e059ec6f8c3c3cL1
as starting with sqlalchemy 1.3.3 release the convert_unicode flag was deprecated
https://docs.sqlalchemy.org/en/13/core/type_basics.html#sqlalchemy.types.String.params.convert_unicode

Even though WMCore always had it set to use unicode, I am not sure if this was also a case for T0:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Database/DBFactory.py#L13
as I don't see the DBFactory used by T0.

For now I simply added a quick fix to the GetStreamDatasetTriggers.py:
vytjan@684299a

However, I am not sure if that's a right place for these unicode string checks (as there may be some other cases for failures in the code). I am currently running a replay on vocms047 to check if that's the only place where unicode strings don't work properly.

@hufnagel
Copy link
Member

hufnagel commented May 15, 2019

T0 uses the same database access code as WMCore. With convert_unicode deprecated, how is this done now? Are we silently using unicode for any string return from a database call now and we only notice in the Tier0 since it passed some of these to CMSSW?

@vytjan
Copy link
Contributor Author

vytjan commented May 16, 2019

As I understand, the convert_unicode was deprecated in the latest SQLAlchemy upgrade in April
cms-sw/cmsdist@2187529#diff-c359e72e06d238c2f7e059ec6f8c3c3cL1

Therefore, all old T0 releases used byte strings returned from database call and everything worked well (since the old SQLAlchemy release was used).
Only starting from the new Tier0 release (the one currently being tested) we are using unicode strings from any database call.
And yes, we only noticed that in a release test replay as some of these unicode strings got passed to CMSSW and caused all the jobs to fail.

Overall, I am wondering whether we should review these database calls more carefully or should we just go with the fix I am testing now, if it works without errors?

@hufnagel
Copy link
Member

I think converting every single DAO where we pull strings out of a database individually is the wrong approach. Maybe some central place would be better. I would coordinate this with @amaltaro , since this really is a more general WMCore problem.

@amaltaro
Copy link
Contributor

I think the generic fix is what DBFormatter format functions provide. Since you have to format every single DAO output, you just call one of those methods.
We might still have to improve something there - like casting the key as well, if needed - but I'd rather have this conversion only done where we need; I fear having that conversion for every single DAO in the fetch* method, for instance, might degrade the agent performance by a non-negligible factor.

@vytjan
Copy link
Contributor Author

vytjan commented May 24, 2019

I agree with Alan that the DBFormatter is a more central/generic place for these checks.
I adjusted my changes accordingly, to use the formatDict method to convert results:
master...vytjan:deployment-updates
It seems the GetStreamDatasetTriggers DAO is the only one where we hit this unicode string incompatibility.
Maybe it could be enough to apply the patch I created and leave all the other DAOs as they are? @amaltaro Alan, what do you think?
In such case we wouldn't need to convert every single DAO result of unicode->byte string as Python handles them fine and there seems to be no other issues with T0 WMAgent of this kind.

@amaltaro
Copy link
Contributor

Your change looks fine to me, Vytas.

@vytjan
Copy link
Contributor Author

vytjan commented Jun 12, 2019

Thanks Alan. Tested and merged #4503

@vytjan vytjan closed this as completed Jun 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants