Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when approving two FNAL requests (possible duplicate subscriptions) #954

Open
DAMason opened this issue Dec 16, 2013 · 7 comments
Open

Comments

@DAMason
Copy link

DAMason commented Dec 16, 2013

Greetings,

When I try to approve transfer requests 407431 and 407424 I get an error like:

"""
Apologies, looks like we have an internal server error, details of which below. If the problem persists, please
submit a bug report.

Error time=2013-12-14 17:26:36 UTC id=306eb01962ae825b712c8ab74db0a4fe

"""

Other requests that have come before and after these were fine. These seem to have been manually created by
Julian -- in the comments I see:

This subscription need to be manually created due to failures in WMAgent. They belong to the following workflows
pdmvserv_EXO-Fall13-00106_00026_v0__131206_200618_2283 pdmvserv_EXO-Fall13-00120_00026_v0__131206_200622_2530
pdmvserv_EXO-Fall13-00130_00026_v0__131206_200822_6140

Julian later reported that after he made these requests the agent recovered and made the subscriptions itself. These then became duplicates.

Thanks,

--Dave

@ghost ghost assigned TonyWildish Dec 16, 2013
@TonyWildish
Copy link
Contributor

Hi Dave,

I'll take a look.

Cheers,
Tony.

@DAMason
Copy link
Author

DAMason commented Dec 16, 2013

Thanks -- will leave the requests alone for now -- though would be nice to clean them up at some point when you no longer need them for debugging...

@DAMason
Copy link
Author

DAMason commented Mar 25, 2014

FWIW we have another request like this:

Request #410835

The error I got this last time trying to approve:

Apologies, looks like we have an internal server error, details of which below. If the problem persists, please submit a bug report.

Error time=2014-03-25 03:57:07 UTC id=ed1af18271b345447c087fc949602b6b

This and the other two referenced here are kinda left hanging -- what should be done with them?

Thanks!

--Dave

@TonyWildish
Copy link
Contributor

Hi Dave,

sorry for the delay on this, I've had no time at all to look into it. I hope to get to it by the end of this week.

Cheers,
Tony.

On 03/25/2014 05:00 AM, DAMason wrote:

FWIW we have another request like this:

Request #410835

The error I got this last time trying to approve:

Apologies, looks like we have an internal server error, details of which below. If the problem persists, please submit a bug report.

Error time=2014-03-25 03:57:07 UTC id=ed1af18271b345447c087fc949602b6b

This and the other two referenced here are kinda left hanging -- what should be done with them?

Thanks!

--Dave


Reply to this email directly or view it on GitHub #954 (comment).

@DAMason
Copy link
Author

DAMason commented Apr 12, 2014

OK -- seems we have another one -- in fact now about 4 of these guys stacked up at FNAL, the latest I just tried to approve again to give you a recent timestamp:

"""
Apologies, looks like we have an internal server error, details of which below. If the problem persists, please submit a bug report.

Error time=2014-04-12 14:41:07 UTC id=ed1af18271b345447c087fc949602b6b

This is from request 412473
"""

Apparently whats going on is ops are seeing that the agent doesn't have a record of a subscription being made for some datasets, so then manually go make the custodial subscription themselves. Currently the (FNAL) subscription requests I have in this state are the following:

407424
407431
410835
412473

Would be nice to at least know what can be done with them -- easiest is to just disapprove, but am leaving them around so that you might know what's going wonky here :)

Thanks!

@TonyWildish
Copy link
Contributor

Hi Dave,

so, these are all indeed duplicate requests:
#407424
cannot request replica transfer:
/MuMinus_Pt-1to150_PositiveEndcap-gun/Fall13-POSTLS162_V1-v4/GEN-SIM
already subscribed to T1_US_FNAL_MSS as move

#407431
cannot request replica transfer:
/WprimeToENu_M_3800_Tune4C_13TeV_pythia8/Fall13-POSTLS162_V1-v1/GEN-SIM
already subscribed to T1_US_FNAL_MSS as move

#410835
cannot request replica transfer:
/QCD_Pt-120to170_MuEnrichedPt5_Tune4C_13TeV_pythia8/Fall13dr-tsg_PU20bx25_POSTLS162_V2-v1/AODSIM
already subscribed to T1_US_FNAL_MSS as move

#412473
/TZJetsTo3LNuB_FCNC_zeta_zut_8TeV_madgraph/Summer12_DR53X-PU_S10_START53_V19-v1/AODSIM
already subscribed to T1_US_FNAL_MSS with different custodiality

you should go ahead and disapprove them.

From my side, I need to examine the UpdateRequests API which is giving this error message. The API traps all errors and reports this generic error instead of the details, because it doesn't fully trust that the errors won't leak sensitive information. I can filter the useful error messages and just pass them on to the user.

So I've updated the title of this issue and will leave it open until it's fixed, hopefully in the first release after Easter.

Cheers,
Tony.

@DAMason
Copy link
Author

DAMason commented Apr 14, 2014

Hi Tony,

Thanks — yes passing a more instructive error message to the requestor would be the best thing here.

Thanks!

—Dave

On Apr 14, 2014, at 6:09 AM, Tony Wildish <notifications@github.commailto:notifications@github.com> wrote:

Hi Dave,

so, these are all indeed duplicate requests:
#407424
cannot request replica transfer:
/MuMinus_Pt-1to150_PositiveEndcap-gun/Fall13-POSTLS162_V1-v4/GEN-SIM
already subscribed to T1_US_FNAL_MSS as move

#407431
cannot request replica transfer:
/WprimeToENu_M_3800_Tune4C_13TeV_pythia8/Fall13-POSTLS162_V1-v1/GEN-SIM
already subscribed to T1_US_FNAL_MSS as move

#410835
cannot request replica transfer:
/QCD_Pt-120to170_MuEnrichedPt5_Tune4C_13TeV_pythia8/Fall13dr-tsg_PU20bx25_POSTLS162_V2-v1/AODSIM
already subscribed to T1_US_FNAL_MSS as move

#412473
/TZJetsTo3LNuB_FCNC_zeta_zut_8TeV_madgraph/Summer12_DR53X-PU_S10_START53_V19-v1/AODSIM
already subscribed to T1_US_FNAL_MSS with different custodiality

you should go ahead and disapprove them.

From my side, I need to examine the UpdateRequests API which is giving this error message. The API traps all errors and reports this generic error instead of the details, because it doesn't fully trust that the errors won't leak sensitive information. I can filter the useful error messages and just pass them on to the user.

So I've updated the title of this issue and will leave it open until it's fixed, hopefully in the first release after Easter.

Cheers,
Tony.


Reply to this email directly or view it on GitHubhttps://github.com//issues/954#issuecomment-40354786.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants