Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PARO-720] ENH Model sharing helpers #315

Merged
merged 10 commits into from
Aug 26, 2019

Conversation

stephen-hoover
Copy link

A CivisML model has File and Project run outputs, each of which need to be shared to effectively share the "model". This helper will transparently handle sharing the run outputs. Include JSONValue in case a future version of CivisML uses that. The functions are patterned after the autogenerated API sharing endpoints.

A CivisML model has File and Project run outputs, each of which need to be shared to effectively share the "model". This helper will transparently handle sharing the run outputs. Include JSONValue in case a future version of CivisML uses that. The functions are patterned after the autogenerated API sharing endpoints.
@stephen-hoover stephen-hoover changed the title ENH Model sharing helpers [PARO-720] ENH Model sharing helpers Aug 9, 2019
@stephen-hoover
Copy link
Author

@elsander , see what you think of the naming and other design choices here. It's not obvious what the right interface would be. I considered adding a list_models_shares function as well, but doing that right would be really complex, and I wasn't sure it was worth it.



def test_share_model_debug_log():
# Debug logs need "write" permission to be shared
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessarily the behavior I'd expect. Why do we only share debug logs to people with write permissions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning was that the debug logs are more internal workings of the model. If you just want someone to be able to see your model, with the lowest levels of permissions, you might not want to expose all of the logging. I'm certainly open to discussion on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right in most cases, a person who only needs read permissions won't need access to the debug logs. But I don't think it would ever be a problem to expose the debug logs, and there are some cases where you might expect read permissions to provide log access. Probably the most common is CS sharing one of us on a client job so that we can examine the logs and debug, but I could also imagine clients sharing jobs to debug internally, before they escalate to CS. My preference is to share all run outputs for consistency, but I'm open to discussion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this makes the code simpler, so I'm happy to make the change. I'm not thinking of anything sensitive in the logs which wouldn't also be in the other model artifacts which would be shared at read level.

elif _output['object_type'] == 'Project':
_func = getattr(client.projects, "put_shares_" + entity_type)
if permission_level == 'read':
# Users must be able to add to projects to use the model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own understanding, why is this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CivisML training job keeps a "scoring jobs" project as a run output. CivisML scoring jobs add themselves to this project so that users (and the Civis Platform UI) can link from training jobs to all dependent scoring jobs. Users need "write" permission to add things to projects.

I reasoned that users would expect that "read" permission on a model would give them the ability to make their own scoring jobs based on it. Does that make sense? The project is the only thing which would prevent you from scoring with only read permissions, and the error you get is a bit cryptic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I didn't think about whether a "read" user should be able to score based off the original model rather than a clone. That reasoning makes sense to me, thanks.

Follow-up question: what permissions does the original user have on a scoring job created by the new user in this way?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original user wouldn't have any special permissions on the scoring job by default. If you don't have "read" permission on the scoring job, it wouldn't be visible to you in the project. I think this is expected behavior -- I could imagine the "scoring" project being filled with scoring jobs, and no one user being able to see all of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I agree that this is expected behavior.

"put_shares_" + entity_type)
obj_permission = permission_level
else:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to issue a message to the debug log if you hit this condition?

Also, are there any outputs where we'd expect to hit this condition? If you write OOS scores to a table, does that show up as a run output?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a debug message definitely makes sense. I'll add that.

I don't expect to ever hit this condition. I believe that Tables are the only possible run output not covered in this loop, and you can't grant people permissions on tables through the API endpoints. CivisML doesn't add any tables as run outputs, AFAIK.

_func = getattr(client.projects, endpoint_name)
elif _output['object_type'] == 'JSONValue':
_func = getattr(client.json_values, endpoint_name)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note here, would it be useful to log skipped outputs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do.


Returns
-------
readers : dict::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the type annotation here. It looks like "users" and "groups" are lists of dicts, right? Should this say list[dict] to specify that explicitly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copy-pasted this directly from the corresponding doc strings for other sharing endpoints. I'll take a look.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readers is a single dictionary. It contains keys users and groups, each of which is a list of dictionaries. In Python notation, I think you're right that the types of users and groups would be list[dict], but I think that what's show here is also correct, and consistent with the type notation used in the other API endpoints.

@elsander elsander assigned stephen-hoover and unassigned elsander Aug 20, 2019
@elsander
Copy link
Contributor

I think the function names make a lot of sense. I could see the list function being useful, but if it's complex to implement it's probably not worth doing right now. If we hear requests for it we could reevaluate.

@stephen-hoover
Copy link
Author

@elsander , I addressed your comments.

Copy link
Contributor

@elsander elsander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@elsander elsander assigned stephen-hoover and unassigned elsander Aug 26, 2019
@stephen-hoover stephen-hoover merged commit eded881 into civisanalytics:master Aug 26, 2019
@stephen-hoover stephen-hoover deleted the share-models branch August 26, 2019 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants