Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dasgoclient version #6602

Closed
wants to merge 2 commits into from
Closed

New dasgoclient version #6602

wants to merge 2 commits into from

Conversation

vkuznet
Copy link
Contributor

@vkuznet vkuznet commented Jan 28, 2021

Fix for file block=/a/b/c#123 run=123 site=XXX query.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @vkuznet (Valentin Kuznetsov) for branch IB/CMSSW_11_3_X/master.

@cmsbuild, @smuzaffar, @mrodozov can you please review it and eventually sign? Thanks.
cms-bot commands are listed here

@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8151b2/12588/summary.html
COMMIT: cc6a034
CMSSW: CMSSW_11_3_X_2021-01-28-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/6602/12588/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2716596
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2716573
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 36 files compared)
  • Checked 156 log files, 37 edm output root files, 37 DQM output files

@smuzaffar
Copy link
Contributor

smuzaffar commented Jan 28, 2021

@vkuznet , do you understand why following queries generate different results with old and new dasgoclient?

For following , old client returns empty list while new client return some LNFs (there are total of 177 such queries)

  • file dataset=/RelValWpToENu_M-2000_13/CMSSW_8_1_0_pre9_Geant4102-81X_mcRun2_asymptotic_v2-v1/GEN-SIM site=T2_CH_CERN
  • file dataset=/RelValSingleElectronPt35_UP15/CMSSW_8_1_0_pre9_Geant4102-81X_mcRun2_asymptotic_v2-v1/GEN-SIM site=T2_CH_CERN
  • file dataset=/RelValNuGun_UP15/CMSSW_8_1_0_pre9_Geant4102-81X_mcRun2_asymptotic_v2-v1/GEN-SIM site=T2_CH_CERN
  • file dataset=/RelValTTbar_13/CMSSW_7_6_0_pre7-76X_mcRun2_asymptotic_v9_realBS-v1/GEN-SIM site=T2_CH_CERN
  • file dataset=/RelValPhotonJets_Pt_10_13/CMSSW_8_1_0_pre9_Geant4102-81X_mcRun2_asymptotic_v2-v1/GEN-SIM site=T2_CH_CERN

For following 3 queries, old das returns some LNF but new client returns emtry list

  • file dataset=/HIMinimumBias0/Tier0_REPLAY_vocms015-v214/RAW run=325174 site=T2_CH_CERN
  • file dataset=/SingleElectron/Run2012D-v1/RAW run=208307 site=T2_CH_CERN
  • file dataset=/HIMinimumBiasReducedFormat0/Tier0_REPLAY_vocms015-v214/RAW run=325174 site=T2_CH_CERN

@vkuznet
Copy link
Contributor Author

vkuznet commented Jan 28, 2021

I can do full analysis only next week as I travel tomorrow, but any site related queries now reflects difference between Phedex (used in old dasgoclient) and Rucio (new dasgoclient).

@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

Pull request #6602 was updated.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8151b2/12596/summary.html
COMMIT: e7a3bb5
CMSSW: CMSSW_11_3_X_2021-01-28-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/6602/12596/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2716596
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2716573
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 36 files compared)
  • Checked 156 log files, 37 edm output root files, 37 DQM output files

@smuzaffar
Copy link
Contributor

hold
we need to understand the issue #6602 (comment)

@cmsbuild
Copy link
Contributor

Pull request has been put on hold by @smuzaffar
They need to issue an unhold command to remove the hold state or L1 can unhold it for all

@cmsbuild cmsbuild added the hold label Jan 29, 2021
@vkuznet
Copy link
Contributor Author

vkuznet commented Jan 29, 2021

@smuzaffar , the queries you referred

file dataset=/HIMinimumBias0/Tier0_REPLAY_vocms015-v214/RAW run=325174 site=T2_CH_CERN
file dataset=/SingleElectron/Run2012D-v1/RAW run=208307 site=T2_CH_CERN
file dataset=/HIMinimumBiasReducedFormat0/Tier0_REPLAY_vocms015-v214/RAW run=325174 site=T2_CH_CERN

does not resolve anything in Rucio. I suggest that we refer this question to Rucio team.

@ericvaandering could you please check why Rucio does not return any information for above datasets. I query Rucio directly with the following curl call

######### HERE IS MY rucio_curl script
#!/bin/bash
opt="-s -L -k --key $HOME/.globus/userkey.pem --cert $HOME/.globus/usercert.pem"
token=`curl $opt -v https://cms-rucio-auth.cern.ch/auth/x509 2>&1 | grep "X-Rucio-Auth-Token:" | sed -e "s,< X-Rucio-Auth-Token: ,,g"`
echo "$token"
curl $opt -H "X-Rucio-Auth-Token: $token" $@
###########

# and here is my query
rucio_curl -v -H "content-type: application/json" -d '{"dids":[{"name":"/SingleElectron/Run2012D-v1/RAW","scope":"cms"}],"domain":"all","rse_expression":"T2_CH_CERN"}' http://cms-rucio.cern.ch/replicas/list
valya-/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov-unknown-df0589e72e4f4e9aaddfc67c9c15d231
* Could not resolve host: application; Unknown error
* Closing connection 0
* About to connect() to cms-rucio.cern.ch port 80 (#1)
*   Trying 188.185.89.122...
* Connected to cms-rucio.cern.ch (188.185.89.122) port 80 (#1)
> POST /replicas/list HTTP/1.1
> User-Agent: curl/7.29.0
> Host: cms-rucio.cern.ch
> Accept: */*
> X-Rucio-Auth-Token: valya-/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov-unknown-df0589e72e4f4e9aaddfc67c9c15d231
> Content-Length: 112
>
* upload completely sent off: 112 out of 112 bytes
< HTTP/1.1 200 OK
< Server: nginx/1.17.10
< Date: Fri, 29 Jan 2021 13:29:06 GMT
< Content-Type: application/x-json-stream
< Transfer-Encoding: chunked
< Connection: keep-alive
< Access-Control-Allow-Origin: None
< Access-Control-Allow-Headers: None
< Access-Control-Allow-Methods: *
< Access-Control-Allow-Credentials: true
< X-Rucio-Host: cms-rucio.cern.ch
<
* Connection #1 to host cms-rucio.cern.ch left intact

@ericvaandering
Copy link

The problem is the domain of all. I traced it through the code and while I can't be sure, I think it's requiring the file be accessible by both a wan protocol and a lan protocol. Instead, it's only available by WAN.

rucio list-file-replicas cms:/SingleElectron/Run2012D-v1/RAW --domain all --rse rse=T2_CH_CERN provides no results while rucio list-file-replicas cms:/SingleElectron/Run2012D-v1/RAW --domain wan --rse rse=T2_CH_CERN or rucio list-file-replicas cms:/SingleElectron/Run2012D-v1/RAW --rse rse=T2_CH_CERN provide the list of replicas.

My suggestion is for Valentin to just define {'domain': None} in the JSON he passes and take the default.

@vkuznet
Copy link
Contributor Author

vkuznet commented Jan 30, 2021

@ericvaandering , two things:

  • please check that Rucio works with plain curl and provide what should be supplied with curl. I can't make it work if I do:
rucio_curl -v -H "content-type: application/json" -d '{"dids":[{"name":"/SingleElectron/Run2012D-v1/RAW","scope":"cms"}],"domain":"wan","rse_expression":"T2_CH_CERN"}' http://cms-rucio.cern.ch/replicas/list

or

rucio_curl -v -H "content-type: application/json" -d '{"dids":[{"name":"/SingleElectron/Run2012D-v1/RAW","scope":"cms"}],"domain":null,"rse_expression":"T2_CH_CERN"}' http://cms-rucio.cern.ch/replicas/list

In both case I get zero results.

  • Second, please properly define which data-type should be supplied to domain attribute. The "all", "wan" are string data type. The None is not string data type. I can't use None and string in language which has strict data-type, it is only Python feature to assign arbitrary data-types to the same attribute. In all other languages a strict data type means that I can't change it in different calls.

I provided you a shell script which obtains token and pass it to second curl call with necessary parameters. So far I don't get any results back. Please check and provide me working example as I can't move forward as I don't use Python and Rucio CLI tools.

@ericvaandering
Copy link

You've got some problem in your curl since it's interpreting application/json as a host name. I spent 30 minutes trying to debug the curl before realizing that.

Here is what the json.dumps in the command I sent you is sending:

{"all_states": false, "domain": null, "rse_expression": "T2_CH_CERN", "resolve_archives": false, "resolve_parents": false, "dids": [{"scope": "cms", "name": "/SingleElectron/Run2012D-v1/RAW"}]} │·

I suspect you don't need the false and null values, but I'm not 100% sure.

@vkuznet
Copy link
Contributor Author

vkuznet commented Jan 30, 2021

Eric, please be more concrete. I can't guess what I should or should not supply to Rucio, I have zero knowledge about Rucio internals. What I need is a working curl example to translate it to pure HTTP callback. If I use your json I still got nothing from the Rucio. Here is full script with explicit POST HTTP request. Please adjust it here such that it will return a concrete results and post it to this ticket that I can implement it in GoLang.

#!/bin/bash

#url=$1
#echo "$url"
opt="-v -s -L -k --key $HOME/.globus/userkey.pem --cert $HOME/.globus/usercert.pem"
token=`curl $opt -v https://cms-rucio-auth.cern.ch/auth/x509 2>&1 | grep "X-Rucio-Auth-Token:" | sed -e "s,< X-Rucio-Auth-Token: ,,g"`
echo "$token"
#curl $opt -H "X-Rucio-Auth-Token: $token" "$url"
curl $opt -H "X-Rucio-Auth-Token: $token" \
    -H "content-type: application/json" \
    -d '{"all_states": false, "domain": null, "rse_expression": "T2_CH_CERN", "resolve_archives": false, "resolve_parents": false, "dids": [{"scope": "cms", "name": "/SingleElectron/Run2012D-v1/RAW"}]}' \
    http://cms-rucio.cern.ch/replicas/list

valya-/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov-unknown-e70cbb60999f49458d78243e57404090
*   Trying 188.185.89.122:80...
* Connected to cms-rucio.cern.ch (188.185.89.122) port 80 (#0)
> POST /replicas/list HTTP/1.1
> Host: cms-rucio.cern.ch
> User-Agent: curl/7.74.0
> Accept: */*
> X-Rucio-Auth-Token: valya-/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov-unknown-e70cbb60999f49458d78243e57404090
> content-type: application/json
> Content-Length: 193
>
* upload completely sent off: 193 out of 193 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.17.10
< Date: Sat, 30 Jan 2021 19:15:40 GMT
< Content-Type: application/x-json-stream
< Transfer-Encoding: chunked
< Connection: keep-alive
< Access-Control-Allow-Origin: None
< Access-Control-Allow-Headers: None
< Access-Control-Allow-Methods: *
< Access-Control-Allow-Credentials: true
< X-Rucio-Host: cms-rucio.cern.ch
<
* Connection #0 to host cms-rucio.cern.ch left intact

@vkuznet
Copy link
Contributor Author

vkuznet commented Jan 31, 2021

Eric,
I tried the following on lxplus:

time rucio list-file-replicas cms:/SingleElectron/Run2012D-v1/RAW --rse T2_CH_CERN --domain wan
/cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/py2-pyOpenSSL/18.0.0/lib/python2.7/site-packages/OpenSSL/crypto.py:12: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.
  from cryptography import x509
+---------+--------+------------+-----------+----------------+
| SCOPE   | NAME   | FILESIZE   | ADLER32   | RSE: REPLICA   |
|---------+--------+------------+-----------+----------------|
+---------+--------+------------+-----------+----------------+

real    1m48.818s
user    0m2.653s
sys     0m1.015s

time rucio list-file-replicas cms:/HIMinimumBias0/Tier0_REPLAY_vocms015-v214/RAW --rse T2_CH_CERN --domain wan
/cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/py2-pyOpenSSL/18.0.0/lib/python2.7/site-packages/OpenSSL/crypto.py:12: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.
  from cryptography import x509
2021-01-31 16:23:16,886 ERROR   Data identifier not found.
Details: Data identifier 'cms:/HIMinimumBias0/Tier0_REPLAY_vocms015-v214/RAW' not found

real    0m2.019s
user    0m0.321s
sys     0m0.484s

time rucio list-file-replicas cms:/HIMinimumBiasReducedFormat0/Tier0_REPLAY_vocms015-v214/RAW  --rse T2_CH_CERN --domain wan
/cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/py2-pyOpenSSL/18.0.0/lib/python2.7/site-packages/OpenSSL/crypto.py:12: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.
  from cryptography import x509
2021-01-31 16:23:46,436 ERROR   Data identifier not found.
Details: Data identifier 'cms:/HIMinimumBiasReducedFormat0/Tier0_REPLAY_vocms015-v214/RAW' not found

real    0m0.848s
user    0m0.295s
sys     0m0.259s

I also tried to skip --domain wan option and results are the same.

Bottom line, I need clear instructions for the following:

  • setup working environment with Rucio client (I used /cvmfs/cms.cern.ch/rucio/current/bin/rucio)
  • example of working rucio query (discussed here) via rucio client
  • example of working query via curl

Once I'll know details of what Rucio does with HTTP requests I can implement this in DAS, otherwise I'm stuck.

And, it would be extremely useful if Rucio team can implement verbose mode to rucio client which can dump underlying HTTP requests (including HTTP headers, body, etc), this will solve all of these problems as it would be clear how to place proper HTTP request to Rucio.

@ericvaandering
Copy link

Your latest script works great, the "problem" is that that dataset is no longer present at T2_CH_CERN. If you replace that with T0_CH_CERN_Tape, you'll see that. I don't know if @smuzaffar needs that data at CERN for some reason (I assume not) or if he was just using a test that worked before.

Your setup for Rucio is correct and what it was giving on the CLI was correct too, of course.

I will look into adding a debug option for dumping the actual request parameters. That would be very helpful.

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 1, 2021

Eric, thanks for finding site for /SingleElectron/Run2012D-v1/RAW dataset, but this issue contains two other datasets for which rucio returns Data identified error:

Data identifier 'cms:/HIMinimumBias0/Tier0_REPLAY_vocms015-v214/RAW' not found
and
Data identifier 'cms:/HIMinimumBiasReducedFormat0/Tier0_REPLAY_vocms015-v214/RAW' not found

What about those?

I think @smuzaffar refers to some set of queries they setup for jenkins tests. It can be that in Rucio land those datasets are not present. What I think he needs is set of reference datasets which can be constantly used in jenkins tests to verify changes we commit in different clients. Can Rucio permanently keep some dataset which you may agree at certain sites that those can be used in jenkins tests?

@ericvaandering , bottom line, I'm changing DAS client to use instead of

{"dids":[{"name":"dataset-name","scope":"cms"}],"domain":"all","rse_expression":"site-name"}

to

{"dids":[{"name":"dataset-name","scope":"cms"}],"domain":null,"all_states":false,"resolve_archive":false,"resolve_parents":false,"rse_expression":"site-name"}

My question is will it work for queries with all different RSEs and all datasets? Please note that I need to construct this POST request in DAS code and will only pass dataset/site- names to this JSON. Therefore I need to know a structure of the JSON.

@ericvaandering
Copy link

Ah. Those two datasets (containers) don't exist in the production Rucio, but in the testbed (presumably Tier0 was running against that for some tests).

I think if you just want a functional test, your best bet is a RAW dataset at a tape site as those are never going to move unless the Tier1 withdraws from CMS. Next best would be for cmsbot itself to make some rules keeping some small datasets at a site (CERN?) but unless I'm missing something, that shouldn't be necessary.

I'm not quite sure what you mean about "all different RSEs and all datasets" except, "yes, of course". For any RSE/container (or block) combination that should give you what you are looking for. No output means it's not there either because the container or block doesn't exist or it's not at that RSE.

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 1, 2021

Eric, what I meant that originally you told me to use domain all value , and turns out does not work for some datasets. Now I put domain null value as well as I added other parameters which I posted . Therefore, I need to know if the json structure will work for all rse and all datasets, or it will require further changes.

@smuzaffar
Copy link
Contributor

smuzaffar commented Feb 1, 2021

@vkuznet and @ericvaandering thanks for looking in to it. As mentioned #6602 (comment) , I would like to understand the differences between the old and new das clients results. Out of total 3490 queries, which we run for CMSSW IB/PR tests, there are 180 for which old and new client give different results.

  1. For 177 queries old dasclient does not show any files at T2_CH_CERN while new das client returns some files. I think these will not break IBs and might allow us to start running tests for which das was missing due to empty results from old das client
  2. For 3 queries old das client does match some data at T2_CH_CERN while new das client does not return any thing. I am afraid this might break the workflows where we are using/accessing this data.

I just want to understand why there are these inconsistencies? This might be indicating that there might be some wrong entries which needs a cleanup either in rucio or Phedex

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 1, 2021

@smuzaffar , it is up to Rucio team to tell you what to do. DAS here is like a middle man which does not have any judging call. My only concern is how to properly put JSON in DAS logic, either

{"dids":[{"name":"dataset-name","scope":"cms"}],"domain":"all","rse_expression":"site-name"}

or use this one:

{"dids":[{"name":"dataset-name","scope":"cms"}],"domain":null,"all_states":false,
 "resolve_archive":false,"resolve_parents":false,"rse_expression":"site-name"}

So far, I used the former, but it seems I should use the latter (in that case I'll need to prepare new PR). Eric should clearly answer both questions.

@ericvaandering
Copy link

ericvaandering commented Feb 1, 2021

@vkuznet and @ericvaandering thanks for looking in to it. As mentioned #6602 (comment) , I would like to understand the differences between the old and new das clients results. Out of total 3490 queries, which we run for CMSSW IB/PR tests, there are 180 for which old and new client give different results.

  1. For 177 queries old dasclient does not show any files at T2_CH_CERN while new das client returns some files. I think these will not break IBs and might allow us to start running tests for which das was missing due to empty results from old das client
  2. For 3 queries old das client does match some data at T2_CH_CERN while new das client does not return any thing. I am afraid this might break the workflows where we are using/accessing this data.

I just want to understand why there are these inconsistencies? This might be indicating that there might be some wrong entries which needs a cleanup either in rucio or Phedex

PhEDEx information, while still there, has not been updating since November when we turned off everything but the data service. Comparisons now between PhEDEx and Rucio are not useful. That DAS was still contacting PhEDEx was an oversight and was providing incorrect results.

@ericvaandering
Copy link

@smuzaffar , it is up to Rucio team to tell you what to do. DAS here is like a middle man which does not have any judging call. My only concern is how to properly put JSON in DAS logic, either

{"dids":[{"name":"dataset-name","scope":"cms"}],"domain":"all","rse_expression":"site-name"}

or use this one:

{"dids":[{"name":"dataset-name","scope":"cms"}],"domain":null,"all_states":false,
 "resolve_archive":false,"resolve_parents":false,"rse_expression":"site-name"}

So far, I used the former, but it seems I should use the latter (in that case I'll need to prepare new PR). Eric should clearly answer both questions.

Please just use the latter. If you want to test it, I suspect you'll get the same with just the DIDS and RSE_EXP keys. If not, stick with what works.

@vkuznet vkuznet mentioned this pull request Feb 1, 2021
@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 1, 2021

@smuzaffar , based on Eric's input I created new PR #6608 to reflect changes in JSON which I pass to Rucio. At this point I declare that dasgoclient reflects results provided by Rucio. How will you treat your queries it is up to you (and Eric) to decide.

@smuzaffar
Copy link
Contributor

please test

@smuzaffar
Copy link
Contributor

closing in favor of #6608

@smuzaffar smuzaffar closed this Feb 1, 2021
@smuzaffar
Copy link
Contributor

abort

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants