Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit to produce a list of (framework) files to be deleted from storage #71

Merged
merged 3 commits into from
Mar 4, 2016

Conversation

OlivierBondu
Copy link
Member

Addresses #25
Script that:

  • parse SAMADhi and look for samples containing the username either in author or in path
  • parse the directories in /storage/data/cms/store/user/USERNAME
    • if there is a path not listed in the DB
    • AND it is older than 1 month old (then the crab task is deleted from the server anyway)
    • then add it to the list of things that can be removed

Note that it currently miss the check :

is some 'cp3_llbb' job (do not touch your favorite btag or jetmet or tracker crab3 jobs)

listed in #25 : I am not sure of a clean way to check this...

More eyes on the script are very welcome

the output is something like the following (yes, I know, I am deleting these)

##### Get the list of potential DB samples of interest

##### Get the list of user paths in /storage/data/cms/store/user/obondu
# Tasks older than 6 months
# timestamp=  150820122206
# totalSize=  1.5TB
# size= 20.0kB
rm -r /storage/data/cms/store/user/obondu/GluGluToRadionToHHTo2B2VTo2L2Nu_M-350_narrow_13TeV-madgraph/GluGluToRadionToHHTo2B2VTo2L2Nu_M-350_narrow_Asympt25ns/150727_164929/0000
# size= 15.0kB
rm -r /storage/data/cms/store/user/obondu/GluGluToRadionToHHTo2B2VTo2L2Nu_M-550_narrow_13TeV-madgraph/GluGluToRadionToHHTo2B2VTo2L2Nu_M-550_narrow_Asympt25ns/150727_165456/0000
(...)
# size= 23.0kB
rm -r /storage/data/cms/store/user/obondu/GluGluToRadionToHHTo2B2VTo2L2Nu_M-500_narrow_13TeV-madgraph/GluGluToRadionToHHTo2B2VTo2L2Nu_M-500_narrow_Asympt25ns/150727_165420/0000

# Tasks between 3 and 6 months old
# timestamp=  151121122206
# totalSize=  989.7GB
# size= 3.3GB
rm -r /storage/data/cms/store/user/obondu/GluGluToRadionToHHTo2B2VTo2L2Nu_M-300_narrow_13TeV-madgraph/GluGluToRadionToHHTo2B2VTo2L2Nu_M-300_narrow_Asympt25ns/150831_172610/0000
# size= 9.6MB
rm -r /storage/data/cms/store/user/obondu/SingleElectron/SingleElectron_Run2015C-PromptReco-v1_2015-09-04/150923_123205/0000
(...)
# size= 3.7GB
rm -r /storage/data/cms/store/user/obondu/GluGluToRadionToHHTo2B2VTo2L2Nu_M-900_narrow_13TeV-madgraph/GluGluToRadionToHHTo2B2VTo2L2Nu_M-900_narrow_Asympt25ns/150901_165030/0000

# Tasks between 1 and 3 months old
# timestamp=  160122122206
# totalSize=  27.9GB
# size= 9.1GB
rm -r /storage/data/cms/store/user/obondu/QCD_Pt-300toInf_EMEnriched_TuneCUETP8M1_13TeV_pythia8/QCD_Pt-300toInf_EMEnriched_TuneCUETP8M1_13TeV_pythia8_25ns/160111_152423/0000
# size= 18.8GB
rm -r /storage/data/cms/store/user/obondu/VVTo2L2Nu_13TeV_amcatnloFXFX_madspin_pythia8/VVTo2L2Nu_13TeV_amcatnloFXFX_madspin_pythia8_MiniAODv2/160112_151930/0000

@blinkseb
Copy link
Member

Thanks for that!

parse the directories in /storage/data/cms/store/user/USERNAME

I'm afraid USERNAME is not always the same between ingrid and crab / storage, so it's needed to be configurable.

is some 'cp3_llbb' job (do not touch your favorite btag or jetmet or tracker crab3 jobs)

This is the hardest part since there's basically no way to tell if the task comes from the framework or not. We may think of adding in the output path a tag (like LLBBFramework or whatever) to be able to filter between tasks.

@blinkseb
Copy link
Member

Actually, we could think of opening one ROOT file, and see if there's a tree named t in the output (Framework) or Events (then CMSSW). Would slow everything down, but at least it's a way to discriminate the tasks.

@OlivierBondu
Copy link
Member Author

or also, thinking about this: shouldn't there be a mention of the local submission path in some crab xml output ?

we could then spot cp3_llbb/SOMEAnalysis ?

@blinkseb
Copy link
Member

I checked into an XML file, and there's no mention of the submission path, since it's the job report (see /storage/data/cms/store/user/sbrochet/TTTo2L2Nu_13TeV-powheg/TTTo2L2Nu_13TeV-powheg_MiniAODv2/160210_181039/0000/log/cmsRun_1.log.tar.gz for example). However, one solution could be to look into the output log, but it'll work only if the user chose to store the log file to the storage (which is the case when running GridIn)

@OlivierBondu
Copy link
Member Author

PR updated:

  • now the usernames can be defined in command line
  • the script tries to open logfiles located in the storage (if any) and try to see some keywords like cp3_llbb/Framework or HHAnalysis or hh_analyzer
  • in case the crab location cannot be tagged as 'FW', some information is given but nothing else

In my case the additional checks seem to succeed:

# The following tasks could not be asserted to be cp3_llbb framework tasks or not... deal with them as you see fit:
# totalSize=  3.5GB
# size= 21.0kB  path= /storage/data/cms/store/user/obondu/GluGluToBulkGravitonToHHTo2B2VTo2L2Nu_M-270_narrow_13TeV-madgraph/GluGluToBulkGravitonToHHTo2B2VTo2L2Nu_M-270_narrow_Asympt25ns/150709_132032/0000
# size= 29.0kB  path= /storage/data/cms/store/user/obondu/GluGluToBulkGravitonToHHTo2B2VTo2L2Nu_M-650_narrow_13TeV-madgraph/GluGluToBulkGravitonToHHTo2B2VTo2L2Nu_M-650_narrow_Asympt25ns/150709_132031/0000
# size= 17.0kB  path= /storage/data/cms/store/user/obondu/GluGluToRadionToHHTo2B2VTo2L2Nu_M-600_narrow_13TeV-madgraph/GluGluToRadionToHHTo2B2VTo2L2Nu_M-600_narrow_Asympt25ns/150709_131458/0000
# size= 13.0kB  path= /storage/data/cms/store/user/obondu/GluGluToRadionToHHTo2B2VTo2L2Nu_M-400_narrow_13TeV-madgraph/GluGluToRadionToHHTo2B2VTo2L2Nu_M-400_narrow_Asympt25ns/150709_131737/0000
# size= 3.5GB   path= /storage/data/cms/store/user/obondu/GluGluToRadionToHHTo2B2VTo2L2Nu_M-900_narrow_13TeV-madgraph/GluGluToRadionToHHTo2B2VTo2L2Nu_M-900_narrow_Asympt25ns/150709_162459/0000
# size= 25.0kB  path= /storage/data/cms/store/user/obondu/GluGluToBulkGravitonToHHTo2B2VTo2L2Nu_M-800_narrow_13TeV-madgraph/GluGluToBulkGravitonToHHTo2B2VTo2L2Nu_M-800_narrow_Asympt25ns/150709_132009/0000
# size= 25.0kB  path= /storage/data/cms/store/user/obondu/GluGluToRadionToHHTo2B2VTo2L2Nu_M-270_narrow_13TeV-madgraph/GluGluToRadionToHHTo2B2VTo2L2Nu_M-270_narrow_Asympt25ns/150709_131753/0000
# size= 12.0kB  path= /storage/data/cms/store/user/obondu/WZZ_TuneCUETP8M1_13TeV-amcatnlo-pythia8/WZZ_TuneCUETP8M1_13TeV-amcatnlo-pythia8_MiniAODv2/160112_151930/0000
# size= 17.0kB  path= /storage/data/cms/store/user/obondu/GluGluToBulkGravitonToHHTo2B2VTo2L2Nu_M-600_narrow_13TeV-madgraph/GluGluToBulkGravitonToHHTo2B2VTo2L2Nu_M-600_narrow_Asympt25ns/150709_132029/0000

These are all old framework tasks that crashed violently a while ago before any framework sequence could be run... So in principle I don't know what these tasks are, at all (though in practice I was running only FW stuff at that time)

blinkseb added a commit that referenced this pull request Mar 4, 2016
Initial commit to produce a list of (framework) files to be deleted from storage
@blinkseb blinkseb merged commit 9b79a67 into cp3-llbb:master Mar 4, 2016
@OlivierBondu OlivierBondu deleted the delete_crab_tasks branch March 4, 2016 14:13
@OlivierBondu OlivierBondu mentioned this pull request Mar 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants