Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AI Tutor] CT-432: Get profanity and PII filters working as expected #58083

Merged
merged 21 commits into from
May 2, 2024

Conversation

ebeastlake
Copy link
Contributor

@ebeastlake ebeastlake commented Apr 17, 2024

The following PR:

  • Gets the client registering PII and profanity violations from the server
  • Modifies the UserMessage component to correctly "redact" any flagged messages
  • Logs violations, if needed, so we can use the console data to troubleshoot per @dancodedotorg
  • Removes an unneeded constant from the shared constants file

Links

Jira ticket 1: https://codedotorg.atlassian.net/browse/CT-432
Jira ticket 2: https://codedotorg.atlassian.net/browse/CT-442

Testing story

Tested manually.

4_29_demo.mov

Here's what the log will look like.

Screenshot 2024-04-29 at 10 04 47 PM Screenshot 2024-04-29 at 10 04 57 PM

Deployment strategy

Follow-up work

This style of "redacting" messages is unique to AI Tutor. Gen AI is rendering dismissable "alerts." This is fine for now according to Mark, but I will reach out in the #ai-for-student-learning to continue the conversation.

PR Checklist:

  • Tests provide adequate coverage
  • Privacy and Security impacts have been assessed
  • Code is well-commented
  • New features are translatable or updates will not break translations
  • Relevant documentation has been added or updated
  • User impact is well-understood and desirable
  • Pull Request is labeled appropriately
  • Follow-up work items (including potential tech debt) are tracked and linked

@ebeastlake ebeastlake changed the base branch from staging to emily/ct-427/students-provide-feedback April 17, 2024 22:08
@ebeastlake ebeastlake requested review from a team as code owners April 17, 2024 22:08
@ebeastlake ebeastlake changed the base branch from emily/ct-427/students-provide-feedback to staging April 17, 2024 22:09
@ebeastlake ebeastlake removed request for a team April 25, 2024 23:09
};
if (!response) return {status: Status.ERROR};

switch (response.status) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic maps the response.status from the existing ShareFilter API (which is the type of violation) to one of two non-generic error statuses profanity_violation or pii_violation we'll store in the db.

The previous checks would never have passed

response?.status === Status.PROFANITY_VIOLATION

and I switched to a switch statement because of the many-1 mapping of ShareFilter statuses to AITutorInteractionStatus.

@@ -105,7 +104,7 @@ export const askAITutor = createAsyncThunk(
if (chatApiResponse.assistantResponse) {
const assistantChatMessage: ChatCompletionMessage = {
role: Role.ASSISTANT,
status: Status.OK,
status: chatApiResponse.status,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing the chatApiResponse status from the server lets us style the messages appropriately later.

'This chat has been hidden because it is inappropriate.';
const PII_VIOLATION_USER_MESSAGE =
'This chat has been hidden because it contains personal information.';
const ERROR_USER_MESSAGE =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I standardized the variable names in this file so they all refer to "PII violation" and "profanity violation" instead of the more generic "inappropriate"/"personal."

return null;
};

const UserMessage: React.FC<UserMessageProps> = ({message}) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking this up into a functional component called displayUserMessage felt redundant, so I condensed it.

AI_TUTOR_TYPES = {
COMPILATION: 'compilation',
VALIDATION: 'validation',
GENERAL_CHAT: 'general_chat',
}.freeze

PII_TYPES = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use these anywhere on the backend, so I removed them. cc @bencodeorg

Copy link
Contributor

@molly-moen molly-moen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question on logging, otherwise looks good!

// Analogous to https://github.com/code-dot-org/ml-playground/pull/299
// We want to expose enough information to help troubleshoot false positives
const logViolationDetails = (response: OpenaiChatCompletionMessage) => {
console.info('Violation detected in chat completion response', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log this to cloudwatch instead? Or in addition to this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. It'd be nice to not have to wait for users to report these to see how aggressive it's being. Thanks, Molly!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable to log a generic event for now!

@ebeastlake
Copy link
Contributor Author

@molly-moen I added a generic metric so we can view frequency, but I'd want to check with Travis before I start logging student PII/profanity to Cloudwatch. Does that seem reasonable to you?

I also removed the use of the Lab2MetricsReporter per conversation with @sanchitmalhotra126.

Diff: efe22e3

@ebeastlake ebeastlake merged commit 827b55e into staging May 2, 2024
2 checks passed
@ebeastlake ebeastlake deleted the emily/ct-432/flag-pii branch May 2, 2024 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants