<a href="https://colab.research.google.com/github/anshupandey/MA_AI900/blob/main/Lab4_Prompting_Examples_Tax_and_Audit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prompt Engineering for Audit and Compliance


In [None]:
!pip install openai --quiet

In [1]:
import os
os.environ['AZURE_OPENAI_ENDPOINT'] = "https://xxxxxx.openai.azure.com/"
os.environ['AZURE_OPENAI_API_KEY'] = ""
os.environ['OPENAI_API_VERSION'] = "2025-03-01-preview"

In [2]:
from openai import AzureOpenAI
client = AzureOpenAI()


In [3]:

# creating a function to get outcome
def generateResponse(prompt,model='gpt-4.1-mini',temperature=0):
  messages = [{"role":"user","content":prompt}]
  response = client.responses.create(
      model = model,
      input = messages,
      temperature=temperature,
      #top_p=0.99
  )
  return response.output_text



## 2) Prompt Engineering Fundamentals (Prompts-Only)

- **Be explicit & scoped**
  - State role/tone (e.g., “Act as a neutral auditor-style writer”).
  - State the **task** (summarize, extract, classify, rewrite).
  - State the **format** (bullets, JSON fields, table markdown).
  - State **constraints** (word/line count, vocabulary to use/avoid).

- **Use structure**
  - Headings, numbered steps, and bullets help the model organize output.
  - For machine-readability, ask for `JSON` with a fixed schema.

- **Guard for missing info**
  - Tell the model what to do if the info is insufficient (e.g., “respond with ‘Insufficient information’ and list what’s missing”).

- **Iterate**
  - Capture the output and **refine** via another prompt (“tighten tone”, “shorten to 5 bullets”, “add assumptions/open items”).

> All examples below call **`generateResponse(prompt)`** directly with a single string.


## 3) Domain Sample Text (you can replace with your own)
We'll use a small synthetic excerpt that resembles audit/policy language.


In [6]:

sample_text = '''
Client: ACME Corp (FY2024). Revenue is recognized when control transfers upon delivery (FOB Destination).
Variable consideration is estimated and constrained. Discounts must be documented on sales orders and invoices.
Prior year finding: late shipment confirmations caused timing errors in revenue recognition.
Control (Order Entry): Objective—ensure shipped orders are authorized and recorded accurately; Frequency—Daily; Owner—Order Management; System—ERP-X.
'''
print(sample_text)



Client: ACME Corp (FY2024). Revenue is recognized when control transfers upon delivery (FOB Destination).
Variable consideration is estimated and constrained. Discounts must be documented on sales orders and invoices.
Prior year finding: late shipment confirmations caused timing errors in revenue recognition.
Control (Order Entry): Objective—ensure shipped orders are authorized and recorded accurately; Frequency—Daily; Owner—Order Management; System—ERP-X.



## 4) Pattern A — Role + Tone + Audience
**When to use:** You want consistent voice and professional style.

**Prompt template (bulleted):**
- Role: *senior associate supporting audit team*
- Goal: *summarize the text for a non-auditor IT audience*
- Tone: *neutral, concise, professional*
- Output: *5 bullet points, ≤15 words each*


In [8]:

prompt = f'''
Act as a senior associate supporting an audit team.
Audience: IT professionals who support auditors; minimal jargon.
Task: Summarize the following content into EXACTLY 5 bullet points (<=15 words each).
Tone: neutral, concise, professional. Do not add facts.
Text:
{sample_text}
'''
print(generateResponse(prompt))


- ACME Corp recognizes revenue at delivery under FOB Destination terms.  
- Variable consideration is estimated and constrained in revenue calculations.  
- Discounts require documentation on sales orders and invoices.  
- Prior year issue: late shipment confirmations caused revenue timing errors.  
- Order Entry control ensures authorized, accurate order recording daily via ERP-X.


## 5) Pattern B — Structured JSON (Schema-Constrained Output)
**When to use:** You need machine-readable output to feed another tool.

**Prompt template (bulleted):**
- Task: *extract a control description*
- Output: **strict JSON** with fields:
  - `objective` (string)
  - `frequency` (one of: Daily/Weekly/Monthly/Quarterly/Annual/Ad-Hoc)
  - `owner` (string)
  - `system` (string)
- Rule: *if a field is missing, set it to `null` and list it under `missing_fields`*


In [10]:

prompt = f'''
Extract a SOX-style control description from the text below.

Output STRICT JSON with fields:
- objective: string
- frequency: one of ["Daily","Weekly","Monthly","Quarterly","Annual","Ad-Hoc"]
- owner: string
- system: string
- missing_fields: array of field names that were not found

If information is insufficient, set missing fields to null and include them in missing_fields.
Do not include any extra fields. Do not invent facts.

Text:
{sample_text}
'''
print(generateResponse(prompt))


```json
{
  "objective": "ensure shipped orders are authorized and recorded accurately",
  "frequency": "Daily",
  "owner": "Order Management",
  "system": "ERP-X",
  "missing_fields": []
}
```


## 6) Pattern C — Classification with Controlled Vocabulary
**When to use:** You want consistent, low-variance labels.

**Prompt template (bulleted):**
- Task: *classify the excerpt into categories*
- Allowed labels: `["Revenue Recognition","Controls","Findings","Policy"]`
- Output: JSON with fields `labels[]` and `rationale` (one short sentence)
- Rule: If unsure, return `labels: []` and `rationale: "Insufficient information."`


In [11]:

prompt = f'''
Classify the following text using ONLY these labels:
["Revenue Recognition","Controls","Findings","Policy"]

Output JSON with:
- labels: array of chosen labels
- rationale: one short sentence

If unsure, return labels: [] and rationale: "Insufficient information."

Text:
{sample_text}
'''
print(generateResponse(prompt))


```json
{
  "labels": ["Revenue Recognition", "Controls", "Findings", "Policy"],
  "rationale": "The text discusses revenue recognition criteria, control objectives, a prior year finding, and documentation policy."
}
```


## 7) Pattern D — Rewrite with Formatting & Lexical Constraints
**When to use:** You must standardize wording across many drafts.

**Prompt template (bulleted):**
- Task: rewrite into **auditor-style neutral prose**
- Format: **two paragraphs**, each ≤3 sentences
- Lexical constraints: avoid words *“leverage, cutting-edge, synergy”*
- Keep all facts; don’t add any


In [12]:

prompt = f'''
Rewrite the text into auditor-style neutral prose.

Constraints:
- Output exactly two paragraphs; each paragraph has <= 3 sentences.
- Avoid these words: leverage, cutting-edge, synergy.
- Keep all facts; add nothing new.

Text:
{sample_text}
'''
print(generateResponse(prompt))


ACME Corp recognizes revenue in FY2024 when control transfers upon delivery, following FOB Destination terms. Variable consideration is estimated and constrained, with discounts requiring documentation on sales orders and invoices. A prior year finding identified that late shipment confirmations led to timing errors in revenue recognition.

The control over order entry aims to ensure that shipped orders are authorized and recorded accurately. This control is performed daily by the Order Management team using the ERP-X system. The objective is to maintain accurate and timely recording of revenue transactions.


## 8) Pattern E — Checklist Generation (No Chain-of-Thought)
**When to use:** You need a concise, actionable checklist.

**Prompt template (bulleted):**
- Task: produce **a numbered checklist** of procedures to validate a policy/control
- Count: exactly **5** items
- Style: each item starts with a **verb** and ≤12 words
- Rule: if an item can’t be supported by the text, write “Review documentation for missing details.”


In [13]:

prompt = f'''
Create a numbered checklist of EXACTLY 5 procedures to validate the described control/policy.
Each item:
- starts with a verb
- has <= 12 words
Only use information consistent with the text.
If an item can't be supported, write: "Review documentation for missing details."

Text:
{sample_text}
'''
print(generateResponse(prompt))


1. Verify sales orders for documented discounts and authorization daily.  
2. Confirm shipment dates match FOB Destination delivery terms.  
3. Review revenue recognition timing against shipment confirmations.  
4. Check variable consideration estimates for proper constraint application.  
5. Review documentation for missing details.


## 9) Pattern F — Insufficient Information Handling
**When to use:** You want predictable behavior when data is missing.

**Prompt template (bulleted):**
- Task: extract `delivery_term` and `payment_term`
- Output: strict JSON with both fields and `open_items[]`
- Rule: if a field is unknown, set to `null` and add a question to `open_items`


In [14]:

prompt = f'''
Extract terms into STRICT JSON:
- delivery_term: string or null
- payment_term: string or null
- open_items: array of questions for the client

Rules:
- If a value is unknown, set it to null and add a clarifying question to open_items.
- Do not fabricate facts.

Text:
{sample_text}
'''
print(generateResponse(prompt))


```json
{
  "delivery_term": "FOB Destination",
  "payment_term": null,
  "open_items": [
    "What are the payment terms for ACME Corp?"
  ]
}
```


## 10) Pattern G — Iterative Refinement
**When to use:** You want to improve style/format without changing facts.

**Step 1:** Generate a draft.  
**Step 2:** Feed it back with controlled edits (shorten, change audience, add headings).


In [16]:

draft = generateResponse(f'''
Act as a senior associate. Write a short summary (<=120 words) for IT readers.
Neutral tone. Keep facts; do not add new facts.

Text:
{sample_text}
''')
print("DRAFT:\n", draft)

refined = generateResponse(f'''
Tighten the following summary for executive readers (<=80 words).
Add a bold heading 'Summary' at the top. Keep facts unchanged.

Content:
{draft}
''')
print("\nREFINED:\n", refined)


DRAFT:
 ACME Corp recognizes revenue upon delivery under FOB Destination terms for FY2024. Variable consideration is estimated and constrained, with discounts requiring documentation on sales orders and invoices. A prior year issue involved late shipment confirmations, leading to revenue timing errors. The Order Entry control aims to ensure shipped orders are authorized and accurately recorded. This control operates daily, is managed by Order Management, and utilizes the ERP-X system.

REFINED:
 **Summary**  
ACME Corp recognizes revenue upon delivery under FOB Destination for FY2024, estimating and constraining variable consideration. Discounts require documented sales orders and invoices. A prior year issue with late shipment confirmations caused revenue timing errors. The daily Order Entry control, managed by Order Management via ERP-X, ensures shipped orders are authorized and accurately recorded.


## 11) Domain Prompt Menu (Copy/Paste)

**Assurance — Planning Memo Skeleton**
- *Task:* Draft a planning memo section in bullet points
- *Constraints:* ≤7 bullets; each bullet ≤18 words; neutral, evidence-aware wording; include an **Open Items** subsection with 2 questions.
- *Prompt:*  
```
Act as a senior associate supporting audit planning.
Write a concise planning memo section in bullets based on the text.
Include sections: Background, Risks, Planned Procedures, Open Items.
Max 7 bullets overall; each bullet <=18 words. Neutral tone. No new facts.

Text:
'''{your_text_here}'''
```

**Tax — Policy/Eligibility Screen (Text-Only)**
- *Task:* Ask 5 yes/no questions to gauge credit eligibility; list missing documents.
- *Prompt:*  
```
Ask 5 yes/no questions to screen eligibility. Then list missing documents (3 items).
Keep domain-neutral language. No conclusions.

Text:
'''{your_text_here}'''
```

**Risk & IT Compliance — Control Normalization**
- *Task:* Normalize a free-text control into fields (Objective, Frequency, Owner, System, Evidence).
- *Prompt:*  
```
Normalize the control into JSON fields: objective, frequency, owner, system, evidence, missing_fields[].
If unknown, set null and add to missing_fields. No extra fields.

Text:
'''{your_text_here}'''
```


## 12) Tips & Pitfalls (Prompts-Only)

- **Specify outputs**: count, length, section names, JSON schema.
- **Disallow invention**: “Do not add facts”; “If unknown, set null and list open items.”
- **Keep neutrality**: especially in assurance/risk narratives.
- **Use audience labels**: IT, executive, practitioner—this changes tone and detail level.
- **Iterate intentionally**: refine for length, tone, or format without changing content.
