Skip to content

Commit

Permalink
Create MarkusHupfauer_Unwanted-AI-Actions (#315)
Browse files Browse the repository at this point in the history
  • Loading branch information
mhupfauer committed May 21, 2024
1 parent e2b9936 commit 762e564
Showing 1 changed file with 34 additions and 0 deletions.
34 changes: 34 additions & 0 deletions 2_0_candidates/MarkusHupfauer_Unwanted-AI-Actions
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## Unwanted AI Actions by General Purpose LLMs

**Author(s):**

[Markus Hupfauer](mailto:markus@hupfauer.one)

### Description

Unwanted AI Actions occur when AI systems employing general-purpose large language models (LLMs) are not sufficiently tailored to their specific application contexts.
Inadequately configured system prompts or insufficient fine-tuning can result in AI models performing illegal or undesirable actions within their operational environments, leading to potential legal liabilities and reputational damage.

### Common Examples of Risk

1. A health insurance AI unlawfully provides patient healthcare advice, overstepping legal restrictions.
2. A customer service AI - potentially unrelated to financial services - inappropriately offers financial advice if asked, breaching legal requirements for such counsel.
3. AI systems in service sectors requiring neutrality might endorse specific products or firms, breaching required neutrality and potentially resulting in legal or reputational repercussions.

### Prevention and Mitigation Strategies

1. Implement robust validation layers to ensure AI system outputs conform to legal and organizational guidelines.
2. Develop and enforce strict governance protocols for prompt setup and engage in thorough system prompt tuning to better align the model with its intended function.
3. Regularly test AI models before deployment, upon changes to the underlying LLM, and continuously thereafter to prevent undesirable outputs. Automate these tests to support ongoing validation efforts.
4. Utilize a secondary AI system to validate model responses, ensuring they match the intended use case, discarding any that are potentially out of scope.
5. Incorporate explicit guardrail instructions in all user-generated prompts to safeguard against unintended model behavior.

### Example Attack Scenarios

- Scenario #1: A financial AI system, queried about health insurance, inappropriately offers insurance advice due to an inadequately configured prompt, attracting regulatory scrutiny and potential fines.

- Scenario #2: A general-purpose LLM deployed in customer support inadvertently recommends competitors or alternative solutions due to insufficient guardrails, undermining business objectives and customer loyalty.

### Reference Links

1. [AI Regulation Has Its Own Alignment Problem](https://dho.stanford.edu/wp-content/uploads/AI_Regulation.pdf): **Guha, Neel; Lawrence, Christie M. et al. Stanford University**

0 comments on commit 762e564

Please sign in to comment.