[AI Tutor] CT-432: Get profanity and PII filters working as expected #58083

ebeastlake · 2024-04-17T22:08:26Z

The following PR:

Gets the client registering PII and profanity violations from the server
Modifies the UserMessage component to correctly "redact" any flagged messages
Logs violations, if needed, so we can use the console data to troubleshoot per @dancodedotorg
Removes an unneeded constant from the shared constants file

Links

Jira ticket 1: https://codedotorg.atlassian.net/browse/CT-432
Jira ticket 2: https://codedotorg.atlassian.net/browse/CT-442

Testing story

Tested manually.

4_29_demo.mov

Here's what the log will look like.

Deployment strategy

Follow-up work

This style of "redacting" messages is unique to AI Tutor. Gen AI is rendering dismissable "alerts." This is fine for now according to Mark, but I will reach out in the #ai-for-student-learning to continue the conversation.

PR Checklist:

Tests provide adequate coverage
Privacy and Security impacts have been assessed
Code is well-commented
New features are translatable or updates will not break translations
Relevant documentation has been added or updated
User impact is well-understood and desirable
Pull Request is labeled appropriately
Follow-up work items (including potential tech debt) are tracked and linked

…mily/ct-427/students-provide-feedback

…mily/ct-432/flag-pii

ebeastlake · 2024-04-30T04:42:53Z

apps/src/aiTutor/chatApi.ts

-    };
+  if (!response) return {status: Status.ERROR};
+
+  switch (response.status) {


This logic maps the response.status from the existing ShareFilter API (which is the type of violation) to one of two non-generic error statuses profanity_violation or pii_violation we'll store in the db.

The previous checks would never have passed

response?.status === Status.PROFANITY_VIOLATION

and I switched to a switch statement because of the many-1 mapping of ShareFilter statuses to AITutorInteractionStatus.

ebeastlake · 2024-04-30T04:44:10Z

apps/src/aiTutor/redux/aiTutorRedux.ts

@@ -105,7 +104,7 @@ export const askAITutor = createAsyncThunk(
    if (chatApiResponse.assistantResponse) {
      const assistantChatMessage: ChatCompletionMessage = {
        role: Role.ASSISTANT,
-        status: Status.OK,
+        status: chatApiResponse.status,


Storing the chatApiResponse status from the server lets us style the messages appropriately later.

ebeastlake · 2024-04-30T04:47:30Z

apps/src/aiTutor/views/UserMessage.tsx

+  'This chat has been hidden because it is inappropriate.';
+const PII_VIOLATION_USER_MESSAGE =
+  'This chat has been hidden because it contains personal information.';
+const ERROR_USER_MESSAGE =


I standardized the variable names in this file so they all refer to "PII violation" and "profanity violation" instead of the more generic "inappropriate"/"personal."

ebeastlake · 2024-04-30T05:09:37Z

apps/src/aiTutor/views/UserMessage.tsx

-  return null;
-};
-
-const UserMessage: React.FC<UserMessageProps> = ({message}) => {


Breaking this up into a functional component called displayUserMessage felt redundant, so I condensed it.

ebeastlake · 2024-04-30T05:10:23Z

lib/cdo/shared_constants.rb

  AI_TUTOR_TYPES = {
    COMPILATION: 'compilation',
    VALIDATION: 'validation',
    GENERAL_CHAT: 'general_chat',
  }.freeze

-  PII_TYPES = {


We don't use these anywhere on the backend, so I removed them. cc @bencodeorg

molly-moen

One question on logging, otherwise looks good!

molly-moen · 2024-04-30T17:57:29Z

apps/src/aiTutor/chatApi.ts

+// Analogous to https://github.com/code-dot-org/ml-playground/pull/299
+// We want to expose enough information to help troubleshoot false positives
+const logViolationDetails = (response: OpenaiChatCompletionMessage) => {
+  console.info('Violation detected in chat completion response', {


should we log this to cloudwatch instead? Or in addition to this?

Great idea. It'd be nice to not have to wait for users to report these to see how aggressive it's being. Thanks, Molly!

seems reasonable to log a generic event for now!

ebeastlake · 2024-05-02T01:15:13Z

@molly-moen I added a generic metric so we can view frequency, but I'd want to check with Travis before I start logging student PII/profanity to Cloudwatch. Does that seem reasonable to you?

I also removed the use of the Lab2MetricsReporter per conversation with @sanchitmalhotra126.

Diff: efe22e3

ebeastlake added 12 commits April 11, 2024 23:09

front end of feedback buttons, missing pop-up menu

a5d1ee3

start wiring up feedbacks backend

d037d21

tutor feedback successfully saving

3c9205f

work on feedbacks bug

b311ee8

resolve merge conflicts

b132298

remove message ids as keys

5f84e31

Merge branch 'staging' of github.com:code-dot-org/code-dot-org into e…

593b2c1

…mily/ct-427/students-provide-feedback

update handling of null

c148de4

self PR review

c3e4e62

add annotations for models/ai_tutor_interaction_feedback.rb

e532c0b

Merge branch 'staging' of github.com:code-dot-org/code-dot-org into e…

8915be9

…mily/ct-432/flag-pii

get PII and profanity UI working

576bb57

ebeastlake changed the base branch from staging to emily/ct-427/students-provide-feedback April 17, 2024 22:08

ebeastlake requested review from a team as code owners April 17, 2024 22:08

ebeastlake changed the base branch from emily/ct-427/students-provide-feedback to staging April 17, 2024 22:09

ebeastlake added 4 commits April 24, 2024 18:09

Merge branch 'staging' into emily/ct-432/flag-pii

9093453

clean up from staging merge

afa30d6

missed import when merging staging

d39c1a3

missing shared constants

05e20cc

ebeastlake removed request for a team April 25, 2024 23:09

ebeastlake added 2 commits April 25, 2024 21:12

remove manually added shared constants

f272ede

correct path for shared constants

d5bacdc

ebeastlake commented Apr 30, 2024

View reviewed changes

expose violation details

88e99e4

ebeastlake commented Apr 30, 2024

View reviewed changes

ebeastlake requested review from a team, bencodeorg and sanchitmalhotra126 April 30, 2024 05:18

molly-moen approved these changes Apr 30, 2024

View reviewed changes

ebeastlake added 2 commits May 1, 2024 18:09

log metric for violations

efe22e3

Merge branch 'staging' into emily/ct-432/flag-pii

bde2e81

ebeastlake merged commit 827b55e into staging May 2, 2024
2 checks passed

ebeastlake deleted the emily/ct-432/flag-pii branch May 2, 2024 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AI Tutor] CT-432: Get profanity and PII filters working as expected #58083

[AI Tutor] CT-432: Get profanity and PII filters working as expected #58083

ebeastlake commented Apr 17, 2024 •

edited

ebeastlake Apr 30, 2024 •

edited

ebeastlake Apr 30, 2024

ebeastlake Apr 30, 2024

ebeastlake Apr 30, 2024 •

edited

ebeastlake Apr 30, 2024

molly-moen left a comment

molly-moen Apr 30, 2024

ebeastlake May 1, 2024

molly-moen May 2, 2024

ebeastlake commented May 2, 2024

[AI Tutor] CT-432: Get profanity and PII filters working as expected #58083

[AI Tutor] CT-432: Get profanity and PII filters working as expected #58083

Conversation

ebeastlake commented Apr 17, 2024 • edited

Links

Testing story

Deployment strategy

Follow-up work

PR Checklist:

ebeastlake Apr 30, 2024 • edited

Choose a reason for hiding this comment

ebeastlake Apr 30, 2024

Choose a reason for hiding this comment

ebeastlake Apr 30, 2024

Choose a reason for hiding this comment

ebeastlake Apr 30, 2024 • edited

Choose a reason for hiding this comment

ebeastlake Apr 30, 2024

Choose a reason for hiding this comment

molly-moen left a comment

Choose a reason for hiding this comment

molly-moen Apr 30, 2024

Choose a reason for hiding this comment

ebeastlake May 1, 2024

Choose a reason for hiding this comment

molly-moen May 2, 2024

Choose a reason for hiding this comment

ebeastlake commented May 2, 2024

ebeastlake commented Apr 17, 2024 •

edited

ebeastlake Apr 30, 2024 •

edited

ebeastlake Apr 30, 2024 •

edited