Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLT-Validation tests failing in IBs of CMSSW <= 12_1_X #40013

Closed
missirol opened this issue Nov 8, 2022 · 36 comments · Fixed by #40365
Closed

HLT-Validation tests failing in IBs of CMSSW <= 12_1_X #40013

missirol opened this issue Nov 8, 2022 · 36 comments · Fixed by #40365

Comments

@missirol
Copy link
Contributor

missirol commented Nov 8, 2022

The HLT-Validation tests are failing in IBs of CMSSW_12_1_X and lower release cycles, e.g.
https://cmssdt.cern.ch/SDT/html/cmssdt-ib/#/ib/CMSSW_12_1_X

The main [*] reason of these failures is access to EDM files used by these tests (these files were removed from the CERN Tier-2, some days or weeks ago) [**].

TSG has copies of these files in its EOS area, so the fix is simple: we can replace the path to these files.

Questions to @cms-sw/orp-l2.

  • Do we want to fix this? Are PRs still accepted in release cycles as low as 5_3_X?

  • If yes, should we also make PRs to release cycles that do not appear on the IB dashboard (e.g. 12_2_X)?

[*] In 10_6_X (and likely some other cycle), these tests will continue to fail (even after fixing the file-access issue) due to other issues.

[**] It might be useful to know from experts if there are ways to cache these files similarly to what is done for the EDM files used in RelVal wfs and addOnTests.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 8, 2022

A new Issue was created by @missirol Marino Missiroli.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@missirol
Copy link
Contributor Author

missirol commented Nov 8, 2022

assign hlt

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 8, 2022

New categories assigned: hlt

@missirol,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

@missirol
Copy link
Contributor Author

missirol commented Nov 8, 2022

@cms-sw/orp-l2

HLT plans to go ahead with the PRs to fix these tests. If you have comments, please let us know.

@missirol
Copy link
Contributor Author

missirol commented Nov 9, 2022

Corresponding PRs are listed below.

Script used to make these changes is copied below.

#!/bin/bash

replFile(){
  sed -i "s|$1|root://eoscms.cern.ch//eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/STORM/$2|g" \
    "${CMSSW_BASE}"/src/HLTrigger/Configuration/test/cmsDriver.csh \
    "${CMSSW_BASE}"/src/Configuration/HLT/python/addOnTestsHLT.py \
    "${CMSSW_BASE}"/src/Utilities/ReleaseScripts/scripts/addOnTests.py
}

sed -i "s|root://eoscms.cern.ch//eos/cms/store/data|/store/data|g" \
  "${CMSSW_BASE}"/src/HLTrigger/Configuration/test/cmsDriver.csh \
  "${CMSSW_BASE}"/src/Configuration/HLT/python/addOnTestsHLT.py \
  "${CMSSW_BASE}"/src/Utilities/ReleaseScripts/scripts/addOnTests.py

sed -i "s|root://eoscms.cern.ch//eos/cms/store/hidata|/store/hidata|g" \
  "${CMSSW_BASE}"/src/HLTrigger/Configuration/test/cmsDriver.csh \
  "${CMSSW_BASE}"/src/Configuration/HLT/python/addOnTestsHLT.py \
  "${CMSSW_BASE}"/src/Utilities/ReleaseScripts/scripts/addOnTests.py

replFile \
/store/data/Run2011B/MinimumBias/RAW/v1/000/178/479/3E364D71-F4F5-E011-ABD2-001D09F29146.root \
RAW/Run2011B_MinimumBias_run177719/065F5CDD-E6EC-E011-ACBF-001D09F26509.root

replFile \
/store/hidata/HIRun2011/HIHighPt/RAW/v1/000/182/838/F20AAF66-F71C-E111-9704-BCAEC532971D.root \
RAW/HIRun2011_HIHighPt_run182838/F20AAF66-F71C-E111-9704-BCAEC532971D.root

replFile \
/store/data/Run2012A/MuEG/RAW/v1/000/191/718/14932935-E289-E111-830C-5404A6388697.root \
RAW/Run2012A_MuEG_run191718/14932935-E289-E111-830C-5404A6388697.root

replFile \
/store/data/Run2015D/MuonEG/RAW/v1/000/256/677/00000/80950A90-745D-E511-92FD-02163E011C5D.root \
RAW/Run2015D_MuonEG_run256677/80950A90-745D-E511-92FD-02163E011C5D.root

replFile \
/store/hidata/HIRun2015/HIHardProbes/RAW-RECO/HighPtJet-PromptReco-v1/000/263/689/00000/1802CD9A-DDB8-E511-9CF9-02163E0138CA.root \
RAW/HIRun2015_HIHardProbes_run263718/08057733-02A5-E511-9C7D-02163E014606.root

replFile \
/store/data/Run2016B/JetHT/RAW/v1/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root \
RAW/Run2016B_JetHT_run272762/C666CDE2-E013-E611-B15A-02163E011DBE.root

replFile \
/store/data/Run2017A/HLTPhysics4/RAW/v1/000/295/606/00000/36DE5E0A-3645-E711-8FA1-02163E01A43B.root \
RAW/Run2017A_HLTPhysics4_run295606/36DE5E0A-3645-E711-8FA1-02163E01A43B.root

replFile \
/store/data/Run2018D/EphemeralHLTPhysics1/RAW/v1/000/323/775/00000/2E066536-5CF2-B340-A73B-209640F29FF6.root \
RAW/Run2018D_EphemeralHLTPhysics1_run323775/2E066536-5CF2-B340-A73B-209640F29FF6.root

replFile \
/store/data/Run2018D/HIMinimumBias0/RAW/v1/000/325/112/00000/660F62BB-9932-D645-A4A4-0BBBDA3963E8.root \
RAW/Run2018D_HIMinimumBias0_run325112/660F62BB-9932-D645-A4A4-0BBBDA3963E8.root

replFile \
/store/hidata/HIRun2018A/HIHardProbes/RAW/v1/000/326/479/00000/853DBE29-53BA-9A44-9FDD-58E4E9064EB1.root \
RAW/HIRun2018A_HIHardProbes_run326479/0E2CC5D5-9D87-7348-9219-B00CD718C847.root

@perrotta
Copy link
Contributor

@missirol if the intent is just to fix the IB tests, and there will be no need to transform them into new releases, I think the new PRs can be all merged, taking also into account that they were also all tested succesfully.
@missirol @cms-sw/core-l2 please sign them if you also agree with it

@missirol
Copy link
Contributor Author

@perrotta

if the intent is just to fix the IB tests, and there will be no need to transform them into new releases

Yes, this is the case.

I think those PRs do the job, but I would like to implement whatever is the better solution according to Core-Sw. Let's see if we can clarify that (details below).

  1. This question hasn't really been answered yet. I can only think that one downside of the "TSG EOS area"-approach is that, when/if one day somebody will naively move around the files in that area, this will break the tests in the IBs. (as unlikely as it may be, it is a weakness of that approach)

  2. The update by Shahzad to redirect the HLT-Val tests to the cmsbot cache already improved the situation (example: the HLT-Val tests of the latest 5_3_X IB passed), but this does not do the trick for all release cycles (example: the same tests still failed for the latest 9_4_X IB), because HLT's cmsDriver.csh is different across releases.

  3. In the PRs, I touched the addOnTests, but this was not really necessary; this change should likely be reverted regardless (this will remove the need for Core-Sw to sign), but it depends on the answer to 1).

  4. If the cmsbot cache is the preferred approach, I can update those PRs to do that, but in that case I think I need help from Core-Sw on this point.

@cms-sw/core-l2, please share your preference(s).

@Martin-Grunewald
Copy link
Contributor

Martin-Grunewald commented Nov 15, 2022

Well, recall that this problem appeared because the statement 'RAW data is never deleted', holding true for many years, was changed/invalidated. Thus, I fear a repeat as alluded to in point 1) above will appear again, so I rather prefer the files are all in an TSG eos are (they are for the MC tests already anyway). Overall these are a few files, so will not create a disk space problem, and no complications in making sure to be able to access the bot cache (from outside and/or running tests locally).
Also, I prefer the input files used for the addOnTests to be the same as used for the TSG/IB tests, and are taken from the same location! This is the case with the current PRs so I would suggest to integrate them without further change.

@missirol
Copy link
Contributor Author

The IB tests keep failing, so it would be good to converge.

As I wrote in #40020 (comment), I think the better solution is to use the bot cache, like the vast majority of CMSSW wfs do (that mechanism is meant precisely to prevent issues like this one).

@Martin-Grunewald , if you are convinced the current approach is better, please sign the PRs (and ask Core-Sw to do the same). Otherwise, I will update them to use the bot cache (this 2nd option will not require signatures by Core-Sw).

PS.

(they are for the MC tests already anyway)

These MC files are also in the bot cache now, after #40020 (comment).

@Martin-Grunewald
Copy link
Contributor

So in that case of bot usage, what is the procedure if we change any of the input files? We first need to copy them to the bot cache? Who has permissions? (Then we could copy them to TSG eos directly).
Sorry, but I feel like this becomes a complicated system to use a cache not under our control. Maybe I miss the point but what is the advantage?

@missirol
Copy link
Contributor Author

So in that case of bot usage, what is the procedure if we change any of the input files?

I think it would have to be a file that is accessible via xrootd at that moment (e.g. it is at a T2); then once the PR gets merged, it runs in an IB, and the bot caches the file. It's also true that #40020 (comment) says the bot still needs to be instructed to look into the HLT-Val tests, so we should clarify that with Core-Sw if we decide to use the cache.

Sorry, but I feel like this becomes a complicated system to use a cache not under our control. Maybe I miss the point but what is the advantage?

There is no denying that we introduce a dependence on the cache, and the latter is not under our control (on the other hand, if the cache fails, most CMSSW tests will). What is not-so-nice about the current approach is that if someone moves the files in the TSG-EOS area (e.g. renaming folders), this will break the IB tests (which is far from obvious.. and it is likely to happen one day, because nothing prevents it other than some of us knowing). As long as the cache works, the advantage is that we just write the LFN of the files, and there is no other maintenance (or need to guard the EOS-TSG area).

But to be clear, I don't have a strong opinion on these two solutions; I was waiting for a guideline from Core-Sw (#40020 (comment)), but maybe they don't have a strong opinion either. Tagging @smuzaffar , in case I'm missing something.

@smuzaffar
Copy link
Contributor

So in that case of bot usage, what is the procedure if we change any of the input files? We first need to copy them to the bot cache? Who has permissions? (Then we could copy them to TSG eos directly).

First I would suggest to start using LFN ( i.e. /store/something ) instead of PFN ( e.g. root://eoscms.cern.ch//store/something). New file(s) should be accessible via xrootd. Any file which is not in bot cache will be first read via global xrootd redirector and then bot will automatically cache it and use it.

Sorry, but I feel like this becomes a complicated system to use a cache not under our control. Maybe I miss the point but what is the advantage?

yes looks complicated but works fine for many years for relvals/addon and unit tests. Unused files are automatically deleted after 6 months. I also have no strong opinion , if you wantto make use of bot cache then I can help but if you want to have full control over the files then feel free to maintain your own eos area.

@Martin-Grunewald
Copy link
Contributor

OK, so it looks like this works fine.

(The only caveat left is this 6-mo deletion once unused: this would bite us in case a discontinued CMSSW release series gets resurrected after say 8 months or so because some wants to add a PR, but some of the corresponding
bot files have disappeared in the meantime and some TSG/IB tests will thus fail unless the files are found in their original location...)

@missirol
Copy link
Contributor Author

Okay, I will double-check things and update the PRs to use the cache in the next couple of days (if there are no issues).

@smuzaffar , regarding #40020 (comment), could you please teach the bot to parse the HLT-Validation logs to look for files to cache ?

@missirol
Copy link
Contributor Author

missirol commented Nov 23, 2022

Thanks for the update, @traylenator . For reference, Steve also explained that a user can disable hepix as follows:

mkdir -p ~/.hepix
touch ~/.hepix/off

no idea why /etc/hepix/sh/GROUP/zh/group_rc.sh is not overriding it for bash

and that "csh source is loaded with every shell vs only on login for bash".

@missirol
Copy link
Contributor Author

missirol commented Nov 23, 2022

I updated the PRs so that the HLT-Val tests can access EDM files from the cms-bot cache.

There is a separate issue (not discussed so far) for the CMSSW releases where the HLT-Val tests run in SLC6. There, the tests will continue to fail (what continues to fail is the "hlt-integration-tests" part of these tests). I reproduced the problem locally, and it looks like an incompatibility with the external files (e.g. JAVA .jar files) used to download the HLT configurations from the database. The stack trace I see in a local test is in [1].

Fwiw, #40004 has removed queries to ConfDB in IB tests, so this type of problem should never happen moving forward in recent releases.

[1]

*** glibc detected *** java: free(): invalid next size (fast): 0x00007f5aec328e40 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x75e5e)[0x7f5af697ee5e]
/lib64/libc.so.6(+0x78cf0)[0x7f5af6981cf0]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.265.b01-0.el6_10.x86_64/jre/lib/amd64/libjava.so(+0x140dc)[0x7f5af2f9d0dc]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.265.b01-0.el6_10.x86_64/jre/lib/amd64/libjava.so(+0x13ebe)[0x7f5af2f9cebe]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.265.b01-0.el6_10.x86_64/jre/lib/amd64/libjava.so(+0x140aa)[0x7f5af2f9d0aa]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.265.b01-0.el6_10.x86_64/jre/lib/amd64/libjava.so(+0x142cd)[0x7f5af2f9d2cd]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.265.b01-0.el6_10.x86_64/jre/lib/amd64/libjava.so(+0x143ba)[0x7f5af2f9d3ba]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.265.b01-0.el6_10.x86_64/jre/lib/amd64/libjava.so(Java_java_util_TimeZone_getSystemTimeZoneID+0x5a)[0x7f5af2f9cbda]
[0x7f5add018427]
======= Memory map: ========
6e0000000-6e4aa0000 rw-p 00000000 00:00 0 
6e4aa0000-72aaa0000 ---p 00000000 00:00 0 
72aaa0000-734000000 rw-p 00000000 00:00 0 
734000000-7c0000000 ---p 00000000 00:00 0 
7c0000000-7c00e0000 rw-p 00000000 00:00 0 
7c00e0000-800000000 ---p 00000000 00:00 0 
5556ce2c9000-5556ce2ca000 r-xp 00000000 00:46 2265605                    /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.265.b01-0.el6_10.x86_64/jre/bin/java
5556ce4c9000-5556ce4ca000 r--p 00000000 00:46 2265605                    /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.265.b01-0.el6_10.x86_64/jre/bin/java
5556ce4ca000-5556ce4cb000 rw-p 00001000 00:46 2265605                    /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.265.b01-0.el6_10.x86_64/jre/bin/java

missirol added a commit to cms-tsg-storm/cmssw that referenced this issue Nov 24, 2022
HLT-integration tests cannot run with SLC6 architectures,
due to an incompatibility with the latest .jar files of ConfDB-v2.
For further details, see cms-sw#40013 (comment)
missirol added a commit to cms-tsg-storm/cmssw that referenced this issue Nov 24, 2022
HLT-integration tests cannot run with SLC6 architectures,
due to an incompatibility with the latest .jar files of ConfDB-v2.
For further details, see cms-sw#40013 (comment)
missirol added a commit to cms-tsg-storm/cmssw that referenced this issue Nov 24, 2022
HLT-integration tests cannot run with SLC6 architectures,
due to an incompatibility with the latest .jar files of ConfDB-v2.
For further details, see cms-sw#40013 (comment)
missirol added a commit to cms-tsg-storm/cmssw that referenced this issue Nov 24, 2022
HLT-integration tests cannot run with SLC6 architectures,
due to an incompatibility with the latest .jar files of ConfDB-v2.
For further details, see cms-sw#40013 (comment)
@missirol
Copy link
Contributor Author

We agreed to switch off the HLT-integration tests for IBs using SLC6, to avoid false positives due to the issue described in #40013 (comment). The PRs to 8_0_X, 9_4_X, 10_2_X and 10_3_X have been updated accordingly (the update of the 5_3_X PR is not necessary, as those tests do not run in that earlier cycle).

I think all the PRs are now ready to go.

@missirol
Copy link
Contributor Author

+hlt

With the expected exception of 10_6_X (#40026 (comment)), after the PRs above, the HLT-Val tests passed in the latest IBs of all active release cycles. The same tests were also updated in recent cycles (see #40147 and its backports), in order to use LFNs and the ibeos cache.

Thanks @smuzaffar for helping us make use of the ibeos cache.

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

@missirol
Copy link
Contributor Author

please close

@Martin-Grunewald
Copy link
Contributor

Hello @smuzaffar @missirol

Need to reopen this

I find in several tests that the jobs are failing to finding the input file, while other jobs succeed (130, 126, 125 were tested), running the tests in my own developer areas:

Succeeding tests use:
04-Dec-2022 19:46:00 CET Initiating request to open file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/data/Run2016B/JetHT/RAW/v/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root
04-Dec-2022 19:46:04 CET Successfully opened file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/data/Run2016B/JetHT/RAW/v1/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root

while failing tests use:
04-Dec-2022 19:46:02 CET Initiating request to open file root://eoscms.cern.ch//eos/cms/store/data/Run2016B/JetHT/RAW/v1/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root
%MSG-w XrdAdaptorInternal: file_open 04-Dec-2022 19:46:04 CET pre-events
Failed to open file at URL root://eoscms.cern.ch:1094//eos/cms/store/data/Run2016B/JetHT/RAW/v1/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root?xrdcl.requuid=98f45b40-5b83-46d5-b709-2f05f86579c8.

IOW, the path prefix part is different, the failing tests does not seem to use the bot cache.
7:42 AM

(/store/user/cmsbuild part in the path)

Where is the /store/user/cmsbuild part in the filepath inserted?

1 similar comment
@Martin-Grunewald
Copy link
Contributor

Hello @smuzaffar @missirol

Need to reopen this

I find in several tests that the jobs are failing to finding the input file, while other jobs succeed (130, 126, 125 were tested), running the tests in my own developer areas:

Succeeding tests use:
04-Dec-2022 19:46:00 CET Initiating request to open file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/data/Run2016B/JetHT/RAW/v/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root
04-Dec-2022 19:46:04 CET Successfully opened file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/data/Run2016B/JetHT/RAW/v1/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root

while failing tests use:
04-Dec-2022 19:46:02 CET Initiating request to open file root://eoscms.cern.ch//eos/cms/store/data/Run2016B/JetHT/RAW/v1/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root
%MSG-w XrdAdaptorInternal: file_open 04-Dec-2022 19:46:04 CET pre-events
Failed to open file at URL root://eoscms.cern.ch:1094//eos/cms/store/data/Run2016B/JetHT/RAW/v1/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root?xrdcl.requuid=98f45b40-5b83-46d5-b709-2f05f86579c8.

IOW, the path prefix part is different, the failing tests does not seem to use the bot cache.
7:42 AM

(/store/user/cmsbuild part in the path)

Where is the /store/user/cmsbuild part in the filepath inserted?

@Martin-Grunewald
Copy link
Contributor

Need to check the tcsh problem mentioned above...

@smuzaffar
Copy link
Contributor

The actual file in IB EOS cache is /eos/cms/store/user/cmsbuild/store/data/Run2016B/JetHT/RAW/v2/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root ( note the v1 instead of v in path). Could there be a typo in the file name?

[a]

[lxplus713] ls -l /eos/cms/store/user/cmsbuild/store/data/Run2016B/JetHT/RAW/v1/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root
-rw-r--r--. 1 cmsbuild zh 7200145933 Dec  4 23:19 /eos/cms/store/user/cmsbuild/store/data/Run2016B/JetHT/RAW/v1/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root

@Martin-Grunewald
Copy link
Contributor

Ah, the typo was a cut and paste error.
So turns out that the tcsh fix still needs to be deployed, and until then one has to set the PATHs by hand (which got lost in my tests).

@missirol
Copy link
Contributor Author

missirol commented Dec 5, 2022

Re #40013 (comment), I'm trying to figure out how we can protect the HLT tests from the '6-months removal' rule of the cache.

  1. I thought about adding a unit test in master that accesses the files used by the HLT-Val tests in all (active) release cycles; with such unit test, the cms-bot would continue to keep the files in the cache. This approach has a few shortcomings: (1) one more unit test to maintain, (2) it feels very much like a hack, (3) if we also consider closed releases, some of the files are not available anymore (would need to manually copy into the cache, or similar). The test would read the list of files from a text file that would be in CMSSW (tracked by git), and would have to be updated when needed, which is the maintenance part of this (I wouldn't do the git grep of other branches inside the unit test, as that seems even more of a hack).

  2. I realised that, after the bot was updated to keep track of the HLT-Val tests, the cache ended up having two versions of the same files (at least for now), example:

/eos/cms/store/user/cmsbuild//store/data/Run2016B/JetHT/RAW/v1/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root 
/eos/cms/store/user/cmsbuild//store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/STORM/RAW/Run2016B_JetHT_run272762/C666CDE2-E013-E611-B15A-02163E011DBE.root 

I'm not convinced by the 1st option. The 2nd point might suggest a solution to the 6-months problem.

  • If the HLT tests used inputFile=/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/STORM/[..] (note the absence of root://eoscms[..] at the beginning), this should work without grid certificate both (a) locally on machines in the CERN network, and (b) centrally in IBs:

    • if SITECONFIG_PATH=/cvmfs/cms.cern.ch/SITECONF/local, it would first look into the EOS TSG area at CERN T2;
    • if SITECONFIG_PATH=/cvmfs/cms-ib.cern.ch/SITECONF/local, it should first look into the cms-bot cache and then fall back to the TSG EOS area if needed.
  • If a cycle is re-opened again after more than 6 months, the HLT tests would continue to work if the files are still available in the TSG area (as that would be the fall back solution in IBs and PR tests, and they will then be re-cached);

  • If a file used in active releases is removed from the TSG area, the IB tests would not break, because they would first look into the bot cache, and find the file at /eos/cms/store/user/cmsbuild//store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/STORM/[..]

The disadvantage of this option is that the DAS name of the files is not obvious (but it's still possible to figure it out), there is still some weak dependence on the TSG area (although that's intended, as backup option), and there may be some wasted space in the cache for a while because of a few identical files with different names.

Thoughts?

@missirol
Copy link
Contributor Author

@smuzaffar

Martin noticed that the following file became "unused" in the cache:
/eos/cms/store/user/cmsbuild//store/relval/CMSSW_8_0_11/RelValProdTTbar/GEN-SIM/80X_mcRun1_realistic_v4-v1/10000/06A6C86B-C634-E611-93A5-0CC47A74525A.root.unused
It's true that it wasn't being used in any release cycle, but I thought it was copied there only last month, so I was expecting it to stick around for a while (given the 6-months rule).

Based on #40013 (comment) , we will try to add a simple unit test to ensure that the few files we need are kept in the cache.

In the meantime, could you please make this file available again in the cache (removing "unused") ?

@aandvalenzuela
Copy link
Contributor

Hello @missirol,

I just did it. Could you please confirm you can access it now?
Many thanks,
Andrea.

@missirol
Copy link
Contributor Author

Thanks, Andrea. I see it.

It's true that it wasn't being used in any release cycle, but I thought it was copied there only last month, so I was expecting it to stick around for a while (given the 6-months rule).

Is it possible to figure out why it became 'unused' today? (I'm trying to understand better how the caching works, to avoid issues like this in the future.)

@missirol
Copy link
Contributor Author

(Continuing to write here for completeness, even though the issue is closed.)

#40365 adds a unit test to keep the files cached, along the lines of what was described in item-1 of #40013 (comment).

I'm not convinced this is the better long-term solution, but it should at least avoid issues like #40013 (comment).

If others have any suggestions, I'd be happy to hear them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants