Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc issue 78 #94

Merged
merged 14 commits into from
Aug 1, 2019
Merged

Doc issue 78 #94

merged 14 commits into from
Aug 1, 2019

Conversation

jklynch
Copy link
Contributor

@jklynch jklynch commented Jun 29, 2019

Move event-model documentation from bluesky to event-model.

Description

Merge https://blueskyproject.io/bluesky/documents.html into https://blueskyproject.io/event-model/data-model.html.

Move https://nsls-ii.github.io/bluesky/event_descriptors.html into the event-model project.

Motivation and Context

This PR addresses issue #78.

How Has This Been Tested?

@jklynch jklynch self-assigned this Jun 29, 2019
@jklynch
Copy link
Contributor Author

jklynch commented Jul 1, 2019

I left some text between "begin remove" ... "end remove" markers.

Data Model
**********
********************************
Data Model: Documents and Events
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of Events as one of the kinds of Documents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have had some confusion with the event/Event terminology. In my mind there are event-based execution model "events" and then there are "Event Documents". I try not to think of the documents as "events" but as "Events". I think you are right, this heading should go back to just "Data Model".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, let's be clearer. Instead of starting with the next paragraph, start with something more direct.

In data acquisition, an *event* is a set of readings coordinated at a specific time.  A primary design goal of bluesky is to enable better research by recording rich metadata alongside the event data for use in later analysis.  Bluesky encapsulates all information from each event in a *Document*.  Other types of Document describe the events associated with an experimental run.


A *document* is our term for a Python dictionary with a schema --- that is,
organized in a
`formally specified <https://github.com/NSLS-II/event-model>`_ way --- created
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link is a cross-reference from bluesky to event-model. Now that we are in event-model, I think we can just remove this link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AND, leave the title of this section as "Data Model: Documents and Events" since it creates the necessary association with the data acquisition term event and the bluesky document type event.

Minimal nontrivial valid example:

.. code-block:: python

# 'stop' document
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should add a comment like this to all the examples? Most of them do not have one. I would rather apply it to none or all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added them to all the examples.

Data Model: Documents and Events
********************************

*begin remove*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like this original first paragraph, doesn't seem exactly redundant to what follows. Maybe combine it with the next paragraph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time trying to combine the original first paragraph and the text from documents.rst. My problem is that the original is talking about little-e "events" and it seems confusing to talk about big-e "Events" at the same time.

I will try again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a new proposal: I added the original paragraph to the second new paragraph. I still think using the term "event-based" is confusing so I propose using "document-based" instead. I felt like the original text fits here since the following paragraph talks about subscribing to a stream of documents.

A primary design goal of bluesky is to enable better research by recording
rich metadata alongside measured data for use in later analysis. Documents are
how we do this. 

A *document* is our term for a Python dictionary with a schema created
by the RunEngine during plan execution.  All of the metadata and data generated
by executing the plan is organized into documents. Bluesky's document-based
data model supports complex, asynchronous data collection and enables
sophisticated live, prompt, streaming, and *post-facto* data analysis.

A :doc:"later section <callbacks>" describes how outside functions can
"subscribe" to a stream of these documents, visualizing, processing, or saving
them. This section provides an outline of documents themselves, aiming to give
a sense of the structure and familiarity with useful components.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on the third paragraph below....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I agree this is tricky. This line once made sense

created by the RunEngine during plan execution

but is now too limiting: we can "ingest" data files from non-bluesky beamlines or simulations to create documents now. I wouldn't want to give the wrong impression that the RunEngine is the only way to mint documents.

by the RunEngine during plan execution. All of the metadata and data generated
by executing the plan is organized into documents.

A :doc:"later section <callbacks>" describes how outside functions can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the bluesky internal references of the kind I think we should just remove. Maybe this whole paragraph can go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the mention of subscribing to a document stream. It convinces me that this is very useful information. I propose to link to the Bluesky documentation:

The `bluesky documentation <https://blueskyproject.io/bluesky/>`_ describes
how outside functions can "subscribe" to a stream of these documents,
visualizing, processing, or saving them. This section provides an outline of
documents themselves, aiming to give a sense of the structure and familiarity
with useful components.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a lot of sense.

Event Descriptors
=================

In the section on :doc:`documents`, we gave an overview of the four kinds of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the section on :doc:`documents`, we gave an overview of the four kinds of
In the section on :doc:`data-model`, we gave an overview of the types of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check

=================

In the section on :doc:`documents`, we gave an overview of the four kinds of
document. We presented an example Run Start, Event, and Run Stop, but we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second sentence is no longer accurate. We do present an Event Descriptor example. We just down break it down and discuss it in detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check


.. code-block:: python

# 'event' document (same as above, shown again for reference)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"above" is out of date --- this is own its own page

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check

Overview
========

The data model is composed of six types of Documents, which in Python are
represented as dictionaries but could be represented as nested mappings (e.g.
JSON) in any language. Each document class has a defined, but flexible, schema.

*begin remove*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised you marked this for removal. I figure it's good to have a very short summary of the meaning of each document somewhere. As someone coming to this fresh, do you think this is so succinct that it doesn't add much value, that we might as well go straight to the more detailed description?

I would be happy with any of the following:

  • Drop bluesky's version of this overview and keep event-model's.
  • Merge some of event-model's verbiage (e.g. "the who / why / what") into bluesky's.

It depends on how much detail we want to present this early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the similar summary of each document from documents.rst because there was a lot of detail, but I see the value of a shorter summary for each document. I will drop the bluesky version.


generates a typical 'run start' document like this:

.. code-block:: python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is older and less accurate that the version in event-model. We should keep the version in event-model. We could however reincorporate the example bluesky code if that seems helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check

@jklynch
Copy link
Contributor Author

jklynch commented Jul 2, 2019

In the "Overview" section at the top of data-model.rst there is no summary for resource or datum documents. @danielballan if you would like to write those I will put them in.

@danielballan
Copy link
Member

They are there; they are just below the diagram since they aren't necessary for understanding the diagram.

image

Would it better to move them up?

@jklynch
Copy link
Contributor Author

jklynch commented Jul 9, 2019

The first sentence in the Overview section is "The data model is composed of six types of Documents...." so I feel let down because the list has only four. I would like to move the Resource and Datum documents up.

@danielballan
Copy link
Member

Sounds good.

@jklynch jklynch marked this pull request as ready for review July 23, 2019 20:44
@tacaswell
Copy link
Contributor

"Reference" is not much better than "Pointer" in terms of clarity. It's not really a uri because it is structured rather than a string.

It is "A bundle of data you can trade with another system to get the data you really want" (which is how I understand more-than-one-reference-deep pointers in c++ 😜 ).

@danielballan
Copy link
Member

I'm with @prjemian that "pointer" has the potential to confuse sophisticated programmers who, in first encountering our work, provisionally map "pointer" to "memory reference" and need more context to be sure that's not what we mean. I have some empirical evidence for this: I avoided using that term at first, then tried using it for awhile, and found that it confused people. Maybe "record that points to" in place of "pointer to"?

@jklynch
Copy link
Contributor Author

jklynch commented Jul 29, 2019

I suggest opening a new issue on the topic of "pointer" vs "reference" vs something else since this PR is focused on moving some specific documentation from the bluesky repo to the event-model repo.

Copy link
Member

@danielballan danielballan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this is getting close. Just a couple picky comments to make sure all our statements are up to date and not misleading in any way.

docs/source/data-model.rst Outdated Show resolved Hide resolved
docs/source/data-model.rst Outdated Show resolved Hide resolved
rich metadata alongside measured data for use in later analysis. Documents are
how we do this.

A *document* is our term for a Python dictionary with a schema created
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This language should be updated since the RunEngine is now not the only way to create documents. Something like:

A document is our term for a Python dictionary with a schema. The bluesky RunEngine emits documents during plan execution.

We might want to elaborate:

There are tools for composing documents directly, which can be useful for ingesting data that was not acquired with bluesky into this data model.

But that might be too much detail for this section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to leave out the elaboration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that is too much for this context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I lean that way as well, though I included it for consideration.

Copy link
Contributor

@prjemian prjemian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the chance to review. Need another view before I can approve.

docs/source/data-model.rst Outdated Show resolved Hide resolved
docs/source/data-model.rst Outdated Show resolved Hide resolved
docs/source/data-model.rst Show resolved Hide resolved
docs/source/data-model.rst Show resolved Hide resolved
docs/source/data-model.rst Show resolved Hide resolved
docs/source/data-model.rst Show resolved Hide resolved
event_model/schemas/event_descriptor.json Outdated Show resolved Hide resolved
Copy link
Contributor

@prjemian prjemian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the one additional change (Globally unique ID of this run's 'start' document.), then I'm good.

rich metadata alongside measured data for use in later analysis. Documents are
how we do this.

A *document* is our term for a Python dictionary with a schema created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that is too much for this context.

docs/source/data-model.rst Outdated Show resolved Hide resolved
docs/source/data-model.rst Show resolved Hide resolved
event_model/schemas/event_descriptor.json Outdated Show resolved Hide resolved
@jklynch jklynch requested review from prjemian and removed request for prjemian August 1, 2019 12:50
Copy link
Contributor

@prjemian prjemian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's much better!

@jklynch
Copy link
Contributor Author

jklynch commented Aug 1, 2019

Sorry @prjemian I don't know why that change wasn't showing up.

Copy link
Member

@danielballan danielballan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for driving this, @jklynch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants