Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data session and group fields in start doc #196

Merged
merged 5 commits into from
Dec 15, 2020

Conversation

dylanmcreynolds
Copy link
Contributor

Adds auth_session field to start document schema

Description

The optional auth_session field is intended to be a place to store filtering information that can be used by downstream systems when making decisions about authorization / privileges regarding the run.

Motivation and Context

While this could be done by individual beamlines, it is desirable to make this part of the standard event model. The next step after this will be to go to the suitcase repos and index this field so that downstream processes can query it efficiently.

I made this a list of strings because the same run might have multiple known contexts. I had in mind "beamline A" and "proposal B" as distinct auth filtering contexts that could benefit.

How Has This Been Tested?

New module test_auth.py includes tests for valid and invalid auth_sessions.

@stuartcampbell
Copy link
Member

@dylanmcreynolds I thought we had decided to call this "Data Session"

@dylanmcreynolds
Copy link
Contributor Author

@stuartcampbell hmmm, I didn't take notes and it came up as auth_session in subsequent conversations. I'm game to make it whatever, though this is the right time, since I have it in four repos right now (two in splash, this one and suitcase-mongo.) You want I should change it?

@danielballan
Copy link
Member

For what it’s worth “data session” is what I remember us converging on too. There was something about maybe wanting “auth” for other uses, and this being particularly about who is authorized to see this data? But I have no strong views and will happily click the green button once a choice is made.

@dylanmcreynolds
Copy link
Contributor Author

OK, I'll make this change tomorrow: data_session

@tacaswell
Copy link
Contributor

Is the order in the list meant to be meaningful? It may be better to spell this as a dictionary that is {'naming_authority': "token"} so we do not have to search through a list going "does this look like something I would have handed out to identify a data session?"

@dylanmcreynolds
Copy link
Contributor Author

No, I was not envisioning order being significant. Yes, if that's important, we'd have to go to a dictionary.

I was going for simplicity and easy indexing in Mongo. I know that there are other serializations possible, but the use case I am concerned about is something along the line of "search for runs that a currently logged in user would be able to view." And example might look like:
data_session: ["als", "bl832", "proposal1234"]

A query based on user access would return if the user had access to any of those data sessions. Someone enjoying privileges to see all data at ALS, someone enjoiying privileges to see all from that beamline, etc.

If we go with a dictionary, would that look like?

data_session: {
   "als": "foo",
   "bl832: "bar",
   "proposal": "foo2"
}

Is it possible to index Mongo on keys is a dict?

@stuartcampbell
Copy link
Member

The trouble is then, what is the key in the dictionary for the data session itself?

If you want to specify a 'group' of people that are not associated with a facility, beamline, proposal or (e)SAF what would go in this list or dictionary?

@dylanmcreynolds
Copy link
Contributor Author

@stuartcampbell In this PR, I am not proposing enforcing any governance on what can go in to the list or dict. If beamline wanted to give access to their cats, it would be on them to decide that "catz" is a valid group to add. Or am I seriously misunderstanding your question?

@danielballan
Copy link
Member

I posted a summary of a conversation with @stuartcampbell and @cryos but before I learned that, after that conversation, @dylanmcreynolds and @stuartcampbell came to an understanding. I have deleted my comment to avoid muddying the waters. I don't have strongly-held views or deep experience in this area. I'm just aiming to un-block progress. 👍

@stuartcampbell
Copy link
Member

So, following up, @dylanmcreynolds and I had a chat this afternoon and cleared up some confusion that I had with thinking there was an implied metadata storage in the items in the list, but now I know its just a list of 'groups' that are allowed access to this data then I am happy. So, I have no strong preference for a string or a list - and hence if @dylanmcreynolds has use cases for a list, then I am happy with that.

@dylanmcreynolds
Copy link
Contributor Author

I committed the change to "data_session" from "auth_session". As far as I know, this PR is good to go.

@danielballan
Copy link
Member

danielballan commented Dec 14, 2020

Forgive me for one last round of questions here. Changes to this schema are hard to back out of and I want to make sure we get this right on the first try.

When we chose the name "data session" I think we had in mind a unique ID (something like a "visit" identifier, not a globally unique ID) as the value. The unique ID would have meaning to some external system that would map it to proposals / access groups / users.

This proposal writes the groups directly into the document, effectively removing a layer. Both approaches seem valid to me, and it may make sense to choose one or even both depending on the use case. Stroring a list of groups is more direct and simpler in the case where the documents remain under the management of a system that understands those groups. Storing a unique ID better enables the use case where documents may be accessed or moved between uncoordinated systems. For example, proposal12345 may mean something different at NSLS-II than ALS.

I propose that we support both in the document model: an optional unique ID for the session and an optional hard-coded list of groups that can access it. We could worry about the unique ID field in a future PR, but I bring it up now because I think the name "data session" fits better for that---for our original concept of data_session: unique ID. This PR has evolved into something that I think we might better call "data groups" or "access groups".


Edited: We had in mind a unique ID, not a UUID.

@dylanmcreynolds
Copy link
Contributor Author

As discussed with @danielballan, I separated data_session (a string) and data_groups (list of strings).

@danielballan
Copy link
Member

For the record: this is exactly in line with the discussion that took place during the DAMA group meeting on Monday involving @stuartcampbell, @cryos, @tacaswell, and others present, so I will take the liberty of merging it. Thanks for your patient attention to detail and group consensus here, @dylanmcreynolds.

@danielballan danielballan merged commit f233f68 into bluesky:master Dec 15, 2020
@danielballan danielballan changed the title Auth session field in start doc Data session and group fields in start doc Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants