Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meeting August 28th #41

Closed
suenjedt opened this issue Aug 28, 2014 · 15 comments
Closed

Meeting August 28th #41

suenjedt opened this issue Aug 28, 2014 · 15 comments

Comments

@suenjedt
Copy link
Member

  • metadata ingest will happen manually for the first 14 high level datasets, for the mid-term future we will enable automated ingestions from a controlled list of sources
  • CMS is preparing for a "guided tour"/how-to document which will accompany every dataset and analysis. This document will be the same for all primary data sets (may change later). But it will be different for derived data sets (e.g. the instructions connect to derived "pattuples" from ana will point to code and the instructions of how to run). However, the structure will be the same:
  • selection
  • validation
  • how to reuse
  • limitations

These texts are being prepared by CMS, with the support of Patricia. They should be linked (initially) on the right hand side of the individual records with a dedicated box. Patricia will investigate if parts of this information can be referenced in the metadata to enable the tailoured dataset specific display. This additional documentation will sit, however, on an additional page and should be exportable as a PDF. It should be a record by itself, get a DOI, incl. citation recommendation (Action on Patricia to prepare that).

  • all of the datasets get a disclaimer that Kati will provide, i.e. concerning quality assurance. Location on the record page to be decided, possibly at the bottom of the page
  • There will be a set of restricted files, not visible to the external users with trigger/selection details (Kati please correct the details here!)
  • there must be an export functionality for the 14 highlevel file names enabling an easy integration into the config files - this needs to include the root file name
  • A virtual image will be stored on the plattform: will become a standalone record and DOI

Ana's analysis

  • is derived from two high-level (primary) datasets [the same is the case for Tom's examples]
  • is available on github:
    a) exercise itself https://github.com/ayrodrig/OutreachExercise2010
    b) the pattuples production https://github.com/ayrodrig/pattuples2010
  • Ana's code should become a record by itself, too - also with a DOI [following Zenodo's Github integration]
  • also these records will have their own "how to" in the box on the right [see 1-4 above]
  • there should be enough metadata to create such a record: authors lists the same

Overall tasks and next steps

  • set up Laura's design
  • set up of html-editing pages for additional info on github
  • prepare for additional information in separate menu so that we can prepare some nice additional documentation there
  • prepare the additional boxes on the right of a detailed record page
  • check export functionlities (see comment on titles above)
  • meeting beginning of next week for documentation sprint (with Achintya and Patricia)
  • meeting beginning of next week with Pamfilos for design sprint

UX/UI testing tasks

  • navigation on the portal
  • navigation from primary and reduced data
  • one task: can you reproduce the analysis? [is the user able to find all the related information, data, code, "how-to" for the particular analysis?]

Metadata related tasks

  • compile metadata for software
  • compile metdata for virtual image
  • populate the records for the 14 primary datsets
  • integrate Ana's analysis
@katilp
Copy link
Member

katilp commented Aug 28, 2014

Just a clarification for the second point: for the guided tour, to start with, this document will be the same for all primary data sets (may change later).
But it will be different for derived data sets (e.g. the instructions connect to derived "pattuples" from ana will point to code and the instructions of how to run)

@suenjedt
Copy link
Member Author

Thanks Kati, changed that :)

@RaoOfPhysics
Copy link
Member

@pherterich, @suenjedt, @katilp: Please confirm a suitable time for the documentation sprint next week. Options for me: http://doodle.com/e5sgctgu5mxunuki#calendar

@katilp
Copy link
Member

katilp commented Aug 28, 2014

Further elaboration of the four information areas which should accompany each element on the portal:

  1. where did this come
  2. how was it validated
  3. how to use it
  4. limitations

For primary data set these would contain

  1. trigger selections
  2. general statement on the data validation (eventually, in the future, validation plots which will be needed if the the data or the software need to be migrated)
  3. guided tour doc with explanations on the data content and on how to do an analysis)
  4. whatever needed....

For derived data sets

  1. code that was used to produced them starting from the primary data sets
  2. eventually an expected result to which to compare after step 3)
  3. pointer to application (event display, histogramming, analysis example code) and the instructions
  4. whatever needed (i.e. physcis object selections may not be the offcial recommendations of CMS etc...)

For the VM image

  1. some explanation of how the image was built (i.e. link to CernVM...)
  2. Anssi's report
  3. prerequisite text from https://twiki.cern.ch/twiki/bin/view/CMS/DPOAVMUserInstructions#Prerequisites
  4. unsolved problems found by Anssi if any

For the CMSSW code example (i.e. those to produce the event display files, Ana's two levels)

  1. statement that this code runs in CMSSW version N
  2. eventually, a reference plot or a result or expected output from step 3)
  3. instructions on how to run
  4. whatever needed...

For the applications (i.e. histogramming, event display, else)

  1. what are the underlying packages, tools
  2. a reference plot/figure of after running step 3)
  3. pointer to a source code and instructions on how to run (needed for "external" developers
  4. whatever needed

@RaoOfPhysics
Copy link
Member

@suenjedt, @katilp, @pherterich: We meet

on 2 September (Tuesday) at 15:00 in R1

(Unless you prefer a proper meeting room?)

@TimSmithCH
Copy link

In addition to the disclaimer, all data records should have clearly marked the copyright statement and licence for reuse

@suenjedt
Copy link
Member Author

Indeed, the official label for CCZero, which is the one being used here (so far) is available here http://creativecommons.org/about/downloads

@katilp
Copy link
Member

katilp commented Aug 29, 2014

Do we already have an area for editing the Additional information text in github?

@suenjedt
Copy link
Member Author

@tiborsimko : you mentioned this easy editing functionality for html stuff here we could use for the information texts. Could you point me/us to it so we can get started? Thanks!

@tiborsimko
Copy link
Member

@suenjedt @katilp Thanks for the meeting write-up and further elaboration. It would be useful to turn these notes into a series of independent issues/tasks, so that:

  • we could assign issues to different persons depending on who will deal with the issue at hand;
  • we could plan issues to different milestones to monitor time-based progress.

Do you think you could split these into independent issues according to the topic?

As an example, I started independent tasks for VM images, see #47 and #48.

@tiborsimko
Copy link
Member

you mentioned this easy editing functionality for html stuff here we could use for the information texts. Could you point me/us to it so we can get started? Thanks!

Here are quick instructions:

Say you'd like to edit "Visualise Events" page that is here:

You'd localise this page in the source code under base/templates directory, either by direct browsing of that place, or by searching for strings that occur on the web page, which will bring you here:

Now you click on Edit icon on the rhs which will open a basic file editor on GitHub. The editor will permit you to edit the page source (in HTML) say to copy/paste HTML text into the template.

Note that the GitHub editor will help you to edit the HTML, e.g. opening/closure of elements like <ul>...</ul>, but the "preview" button will not show you the page in action in any good format; for this one has to preview the page via the Invenio application. (*)

You save your edits and issue a pull request that we'd check, review, and deploy. (Note that issuing a pull request assumes that you first forked this repository in your own space; just use "Fork" button in the top right.)

See also various GitHub guides like:

(*) Otherwise it may be easier to edit page body in some easy-to-use markup format, such as reStructuredText, which would contain a simple preview. However for this we'd have to change the layout of the templates in the repository. Perhaps you can give current HTML-only version a try and see if it is OK with you?

@katilp
Copy link
Member

katilp commented Sep 1, 2014

This requires that the page exits: we would need the following areas then, for the sake of clarity
I will make a separate issue with the list of pages that we think we need urgently

@tiborsimko
Copy link
Member

This requires that the page exits: we would need the following areas then, for the sake of clarity
I will make a separate issue with the list of pages that we think we need urgently

Yes, thanks. In order for them to appear on the site, we'd need to create corresponding templates and add some "glue" to the system. Basically the pages will all appear flattened here:

@suenjedt
Copy link
Member Author

suenjedt commented Sep 1, 2014

OK - Will do tomorrow hopefully. Sorry for our absence today - Proposal submission tomorrow.


From: Tibor Simko [notifications@github.com]
Sent: 01 September 2014 15:46
To: tiborsimko/open-data.cern.ch
Cc: Sunje Dallmeier-Tiessen
Subject: Re: [open-data.cern.ch] Meeting August 28th (#41)

@suenjedthttps://github.com/suenjedt @katilphttps://github.com/katilp Thanks for the meeting write-up and further elaboration. It would be useful to turn these notes into a series of independent issues/tasks, so that:

  • we could assign issues to different persons depending on who will deal with the issue at hand;
  • we could plan issues to different milestoneshttps://github.com/tiborsimko/open-data.cern.ch/milestones to monitor time-based progress.

Do you think you could split these into independent issues according to the topic?

As an example, I started independent tasks for VM images, see #47#47 and #48#48.


Reply to this email directly or view it on GitHubhttps://github.com//issues/41#issuecomment-54060576.

@tiborsimko
Copy link
Member

Closing this "meta-topical issue" that had been further individualised into separate topical issues (which were either done or for which we are tracking progress elsewhere).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants