Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AiLab] prevent save of trained models if they don't pass share filtering #40915

Merged
merged 3 commits into from Jun 7, 2021

Conversation

Erin007
Copy link
Contributor

@Erin007 Erin007 commented Jun 2, 2021

We don't want students to be able to save trained machine learning models if they contain profanity or personally identifying information (pii). Saved models can be imported into App Lab apps, which can be published or remixed and we don't want indecent or unsafe information shared. Prior to saving a model, we now run its data through the share filter, which will check for profanity, emails, phone numbers or street addresses and prevent the model from saving if any are found.

We'll show an alternate fail message if profanity or pii is found:
Screen Shot 2021-06-07 at 12 17 20 PM

code-dot-org/ml-playground#218

@Erin007 Erin007 requested review from breville, made-line and a team June 2, 2021 20:45
return head :bad_request if model_data.nil? || model_data == ""
profanity_or_pii = ShareFiltering.find_failure(model_data.to_s, request.locale)
if profanity_or_pii
render json: {id: model_id, status: "failure", details: profanity_or_pii.type}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what are the possible values of profanity_or_pii.true? We want to make sure all possible values are unique to profanity filtering failures, so that the client code doesn't have to make too many assumptions about what type of error this is. Otherwise, we could return a more explicit error indicating that it relates to profanity filtering...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The possible values for profanity_or_pii.type are email, phone, address and profanity found here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what you mean, I updated details to be more specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided that it wasn't important to specify the type of error so I modified to render a new status: piiProfanity that the code in AI Lab now handles to display the correct, generic message.

@Erin007 Erin007 merged commit c269f4f into staging Jun 7, 2021
@Erin007 Erin007 deleted the profanity-check-ml-models branch June 7, 2021 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants