Meeting August 28th #41

suenjedt · 2014-08-28T08:25:57Z

metadata ingest will happen manually for the first 14 high level datasets, for the mid-term future we will enable automated ingestions from a controlled list of sources
CMS is preparing for a "guided tour"/how-to document which will accompany every dataset and analysis. This document will be the same for all primary data sets (may change later). But it will be different for derived data sets (e.g. the instructions connect to derived "pattuples" from ana will point to code and the instructions of how to run). However, the structure will be the same:
selection
validation
how to reuse
limitations

These texts are being prepared by CMS, with the support of Patricia. They should be linked (initially) on the right hand side of the individual records with a dedicated box. Patricia will investigate if parts of this information can be referenced in the metadata to enable the tailoured dataset specific display. This additional documentation will sit, however, on an additional page and should be exportable as a PDF. It should be a record by itself, get a DOI, incl. citation recommendation (Action on Patricia to prepare that).

all of the datasets get a disclaimer that Kati will provide, i.e. concerning quality assurance. Location on the record page to be decided, possibly at the bottom of the page
There will be a set of restricted files, not visible to the external users with trigger/selection details (Kati please correct the details here!)
there must be an export functionality for the 14 highlevel file names enabling an easy integration into the config files - this needs to include the root file name
A virtual image will be stored on the plattform: will become a standalone record and DOI

Ana's analysis

is derived from two high-level (primary) datasets [the same is the case for Tom's examples]
is available on github:
a) exercise itself https://github.com/ayrodrig/OutreachExercise2010
b) the pattuples production https://github.com/ayrodrig/pattuples2010
Ana's code should become a record by itself, too - also with a DOI [following Zenodo's Github integration]
also these records will have their own "how to" in the box on the right [see 1-4 above]
there should be enough metadata to create such a record: authors lists the same

Overall tasks and next steps

set up Laura's design
set up of html-editing pages for additional info on github
prepare for additional information in separate menu so that we can prepare some nice additional documentation there
prepare the additional boxes on the right of a detailed record page
check export functionlities (see comment on titles above)
meeting beginning of next week for documentation sprint (with Achintya and Patricia)
meeting beginning of next week with Pamfilos for design sprint

UX/UI testing tasks

navigation on the portal
navigation from primary and reduced data
one task: can you reproduce the analysis? [is the user able to find all the related information, data, code, "how-to" for the particular analysis?]

Metadata related tasks

compile metadata for software
compile metdata for virtual image
populate the records for the 14 primary datsets
integrate Ana's analysis

katilp · 2014-08-28T08:41:26Z

Just a clarification for the second point: for the guided tour, to start with, this document will be the same for all primary data sets (may change later).
But it will be different for derived data sets (e.g. the instructions connect to derived "pattuples" from ana will point to code and the instructions of how to run)

suenjedt · 2014-08-28T09:03:34Z

Thanks Kati, changed that :)

RaoOfPhysics · 2014-08-28T09:36:26Z

@pherterich, @suenjedt, @katilp: Please confirm a suitable time for the documentation sprint next week. Options for me: http://doodle.com/e5sgctgu5mxunuki#calendar

katilp · 2014-08-28T09:38:31Z

Further elaboration of the four information areas which should accompany each element on the portal:

where did this come
how was it validated
how to use it
limitations

For primary data set these would contain

trigger selections
general statement on the data validation (eventually, in the future, validation plots which will be needed if the the data or the software need to be migrated)
guided tour doc with explanations on the data content and on how to do an analysis)
whatever needed....

For derived data sets

code that was used to produced them starting from the primary data sets
eventually an expected result to which to compare after step 3)
pointer to application (event display, histogramming, analysis example code) and the instructions
whatever needed (i.e. physcis object selections may not be the offcial recommendations of CMS etc...)

For the VM image

some explanation of how the image was built (i.e. link to CernVM...)
Anssi's report
prerequisite text from https://twiki.cern.ch/twiki/bin/view/CMS/DPOAVMUserInstructions#Prerequisites
unsolved problems found by Anssi if any

For the CMSSW code example (i.e. those to produce the event display files, Ana's two levels)

statement that this code runs in CMSSW version N
eventually, a reference plot or a result or expected output from step 3)
instructions on how to run
whatever needed...

For the applications (i.e. histogramming, event display, else)

what are the underlying packages, tools
a reference plot/figure of after running step 3)
pointer to a source code and instructions on how to run (needed for "external" developers
whatever needed

RaoOfPhysics · 2014-08-28T10:22:28Z

@suenjedt, @katilp, @pherterich: We meet

on 2 September (Tuesday) at 15:00 in R1

(Unless you prefer a proper meeting room?)

TimSmithCH · 2014-08-28T11:52:26Z

In addition to the disclaimer, all data records should have clearly marked the copyright statement and licence for reuse

suenjedt · 2014-08-28T11:57:22Z

Indeed, the official label for CCZero, which is the one being used here (so far) is available here http://creativecommons.org/about/downloads

katilp · 2014-08-29T09:50:38Z

Do we already have an area for editing the Additional information text in github?

suenjedt · 2014-08-29T15:57:53Z

@tiborsimko : you mentioned this easy editing functionality for html stuff here we could use for the information texts. Could you point me/us to it so we can get started? Thanks!

tiborsimko · 2014-09-01T13:46:56Z

@suenjedt @katilp Thanks for the meeting write-up and further elaboration. It would be useful to turn these notes into a series of independent issues/tasks, so that:

we could assign issues to different persons depending on who will deal with the issue at hand;
we could plan issues to different milestones to monitor time-based progress.

Do you think you could split these into independent issues according to the topic?

As an example, I started independent tasks for VM images, see #47 and #48.

tiborsimko · 2014-09-01T14:00:44Z

you mentioned this easy editing functionality for html stuff here we could use for the information texts. Could you point me/us to it so we can get started? Thanks!

Here are quick instructions:

Say you'd like to edit "Visualise Events" page that is here:

http://open-data-demo.cern.ch/visualise/events

You'd localise this page in the source code under base/templates directory, either by direct browsing of that place, or by searching for strings that occur on the web page, which will bring you here:

https://github.com/tiborsimko/open-data.cern.ch/blob/pu/invenio_opendata/base/templates/visualise_events.html

Now you click on Edit icon on the rhs which will open a basic file editor on GitHub. The editor will permit you to edit the page source (in HTML) say to copy/paste HTML text into the template.

Note that the GitHub editor will help you to edit the HTML, e.g. opening/closure of elements like <ul>...</ul>, but the "preview" button will not show you the page in action in any good format; for this one has to preview the page via the Invenio application. (*)

You save your edits and issue a pull request that we'd check, review, and deploy. (Note that issuing a pull request assumes that you first forked this repository in your own space; just use "Fork" button in the top right.)

See also various GitHub guides like:

(*) Otherwise it may be easier to edit page body in some easy-to-use markup format, such as reStructuredText, which would contain a simple preview. However for this we'd have to change the layout of the templates in the repository. Perhaps you can give current HTML-only version a try and see if it is OK with you?

katilp · 2014-09-01T14:09:40Z

This requires that the page exits: we would need the following areas then, for the sake of clarity
I will make a separate issue with the list of pages that we think we need urgently

tiborsimko · 2014-09-01T14:16:26Z

This requires that the page exits: we would need the following areas then, for the sake of clarity
I will make a separate issue with the list of pages that we think we need urgently

Yes, thanks. In order for them to appear on the site, we'd need to create corresponding templates and add some "glue" to the system. Basically the pages will all appear flattened here:

https://github.com/tiborsimko/open-data.cern.ch/tree/pu/invenio_opendata/base/templates

suenjedt · 2014-09-01T15:25:07Z

OK - Will do tomorrow hopefully. Sorry for our absence today - Proposal submission tomorrow.

From: Tibor Simko [notifications@github.com]
Sent: 01 September 2014 15:46
To: tiborsimko/open-data.cern.ch
Cc: Sunje Dallmeier-Tiessen
Subject: Re: [open-data.cern.ch] Meeting August 28th (#41)

@suenjedthttps://github.com/suenjedt @katilphttps://github.com/katilp Thanks for the meeting write-up and further elaboration. It would be useful to turn these notes into a series of independent issues/tasks, so that:

we could assign issues to different persons depending on who will deal with the issue at hand;
we could plan issues to different milestoneshttps://github.com/tiborsimko/open-data.cern.ch/milestones to monitor time-based progress.

Do you think you could split these into independent issues according to the topic?

As an example, I started independent tasks for VM images, see #47 #47 and #48 #48.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/41#issuecomment-54060576.

tiborsimko · 2014-09-12T11:42:45Z

Closing this "meta-topical issue" that had been further individualised into separate topical issues (which were either done or for which we are tracking progress elsewhere).

This was referenced Sep 1, 2014

Area for information material with easy editing #49

Closed

CMS: customise "CMS VM Images" collection page and record page #48

Closed

katilp mentioned this issue Sep 1, 2014

List of pages needed for additional material (continued from #41) #50

Closed

tiborsimko closed this as completed Sep 12, 2014

tiborsimko mentioned this issue Sep 15, 2014

Typos in the data selection fields #195

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meeting August 28th #41

Meeting August 28th #41

suenjedt commented Aug 28, 2014

katilp commented Aug 28, 2014

suenjedt commented Aug 28, 2014

RaoOfPhysics commented Aug 28, 2014

katilp commented Aug 28, 2014

RaoOfPhysics commented Aug 28, 2014

TimSmithCH commented Aug 28, 2014

suenjedt commented Aug 28, 2014

katilp commented Aug 29, 2014

suenjedt commented Aug 29, 2014

tiborsimko commented Sep 1, 2014

tiborsimko commented Sep 1, 2014

katilp commented Sep 1, 2014

tiborsimko commented Sep 1, 2014

suenjedt commented Sep 1, 2014

tiborsimko commented Sep 12, 2014

Meeting August 28th #41

Meeting August 28th #41

Comments

suenjedt commented Aug 28, 2014

katilp commented Aug 28, 2014

suenjedt commented Aug 28, 2014

RaoOfPhysics commented Aug 28, 2014

katilp commented Aug 28, 2014

RaoOfPhysics commented Aug 28, 2014

TimSmithCH commented Aug 28, 2014

suenjedt commented Aug 28, 2014

katilp commented Aug 29, 2014

suenjedt commented Aug 29, 2014

tiborsimko commented Sep 1, 2014

tiborsimko commented Sep 1, 2014

katilp commented Sep 1, 2014

tiborsimko commented Sep 1, 2014

suenjedt commented Sep 1, 2014

tiborsimko commented Sep 12, 2014