Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using time stamp with RunPostCrab? #61

Closed
sdevissc opened this issue Dec 11, 2015 · 6 comments
Closed

Using time stamp with RunPostCrab? #61

sdevissc opened this issue Dec 11, 2015 · 6 comments

Comments

@sdevissc
Copy link

Dear all,

Shall we consider the possibility to add a time stamp (in addition to the tag related to the code version) when running runPostCrab to distinguish two prods made with the same code version? It seems that at the moment it is not possible (is it?).

EDIT:

Actually I think there is a bug. If you run twice with the same version of the code, then the db indicates something strange. Example with a first run on Dec 8th, then a second today (11). As you can see, the first line of the entry is well updated, but the repository is not. It still points to the one created on Dec 8th.

Sample #791 (created on 2015-12-11 21:58:43 by simon):
name: DoubleMuon_Run2015D-PromptReco-v4_2015-10-20_v1.1.0+7415-11-g330707b_ZAAnalysis_1694c06
path: /storage/data/cms/store/user/sdevissc/DoubleMuon/DoubleMuon_Run2015D-PromptReco-v4_2015-10-20/151208_213421/0000
@blinkseb
Copy link
Member

I'm not sure to understand the issue here. What's the point to redo a production with the same code version? You'll just duplicate what you already done, no?

Currently, if you run the post-crab script for a sample which already is in the database (this mean that the sample name is already in the database), the sample is updated, otherwise a new one is added. By updated, I mean only the variables here [1] are updated, and the path is not one of them, since it was designed to handle update of an existing task, and not a completely new task with the same name. Moreover, the path is deprecated in favor for the list of files, which is correctly updated.

I'm also a bit surprised to see that the creation timestamp is updated, only the modified timestamp should change. I'll look at it when I have time. NVM it's normal

To summarize, if you relaunch a production with the exact same code, you really should delete the previous one from the database to avoid any issues. But indeed, we'll run into issues when launching the same code but with different python configuration (like systematics for example), so we'll need to make the name unique in this case. Maybe it's your case? Adding an hash of the python configuration to the sample name may be enough to solve the issue?

Also, just for information, it's not really an issue for the database to have samples with the same name, as samples are uniquely identified by their ids, and not by their name. It's however an issue for the post-crab script, since you only have access to the sample name. The timestamp is also already included in the database in the created and modified fields, so adding it into the name is not really needed.

[1] https://github.com/cp3-llbb/GridIn/blob/master/scripts/runPostCrab.py#L145-L167

@sdevissc
Copy link
Author

Well in my case, this is something stupid: I did the changes to use the full lumi, and for some reason these changes were not ported to the crab config files (which I don't get, but perhaps a mistake from my side). Anyway, I wanted to run a second time with these changes taken into account, then I discovered the "issue" I mentioned.

But ok I understand that I didn't manage my entries in the best way, I'll try to simply delete the wrong entries and recreate them with the correct prod.

Thanks

S.

@sdevissc
Copy link
Author

Btw, is there some indication how to delete entries in the db?

@blinkseb
Copy link
Member

Ok I see.

Thinking a bit about this, I don't even think we have an easy way to remove sample for the database except doing raw SQL queries...

@sdevissc
Copy link
Author

hmm, ok. Then perhaps for the time being I'll just hack the samples.json to indicate the correct addresses of the new prods, that's dirty but for now that is ok.

@blinkseb
Copy link
Member

We solved the issue "offline" with Simon, but we should really provide a script to delete samples from the database. Closing this issue and opening a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants