# DX 704 Week 11 Project

In this project, you will develop and test prompts asking a language model to classify text from a home services query and match it to an appropriate category of home services.

The full project description and a template notebook are available on GitHub: [Project 11 Materials](https://github.com/bu-cds-dx704/dx704-project-11).


## Example Code

You may find it helpful to refer to these GitHub repositories of Jupyter notebooks for example code.

* https://github.com/bu-cds-omds/dx601-examples
* https://github.com/bu-cds-omds/dx602-examples
* https://github.com/bu-cds-omds/dx603-examples
* https://github.com/bu-cds-omds/dx704-examples

Any calculations demonstrated in code examples or videos may be found in these notebooks, and you are allowed to copy this example code in your homework answers.

## Part 1 : Design a Short Prompt

The provided file "queries.txt" contains sample text from requests by homeowners by email or phone.
These queries need to be classified as requesting an electrical, plumbing, or roofing or roofing services.
The provided file has columns query_id, query, and target_category.
Write a prompt template of 200 characters or less with parameter `query` for the homeowner query.
Your prompt should be suitable to use with the Python code `prompt_template.format(query=query)`.
Test your prompt with the model `gemini-2.0-flash` and suitable parsing code.

In [5]:
# YOUR CHANGES HERE

# Part 1: Build short prompt, run model (Gemini if available; else rules), write outputs
import os, re, pandas as pd
from pathlib import Path

QUERIES = "queries.txt"
SHORT_PROMPT_TXT = "short-prompt.txt"
SHORT_OUT = "short-output.tsv"

short_prompt = (
    "Classify the home-service request as one of: electrical, plumbing, roofing. "
    "Answer with ONE WORD only (electrical|plumbing|roofing).\n"
    "Query: {query}"
)
Path(SHORT_PROMPT_TXT).write_text(short_prompt, encoding="utf-8")

# optional Gemini client
USE_GEMINI = False
api_key = os.getenv("GEMINI_API_KEY")
if api_key:
    try:
        import google.generativeai as genai
        genai.configure(api_key=api_key)
        gemini_model = genai.GenerativeModel("gemini-2.0-flash")
        USE_GEMINI = True
        print("[Info] Using Gemini.")
    except Exception as e:
        print(f"[Info] Gemini unavailable: {e}")

# rules fallback
roof_re  = re.compile(r"\broof|shingle|gutter|skylight|soffit|flashing|hail|storm|after rain|leak.*ceiling|attic leak", re.I)
plmb_re  = re.compile(r"\bpipe|leak(?!.*(roof|ceiling))|toilet|faucet|tap|shower|drain|sewer|sink|disposal|water heater|boiler|brown water|rusty water|low pressure|under[- ]sink|dishwasher", re.I)
elec_re  = re.compile(r"\boutlet|switch|light|fixture|breaker|panel|gfci|trip(ped)?|short|spark|rewire|fan install|ceiling fan|flicker", re.I)

def rule_label(q: str) -> str:
    s = q.lower()
    if roof_re.search(s):  return "roofing"
    if plmb_re.search(s):  return "plumbing"
    if elec_re.search(s):  return "electrical"
    if "ceiling leak" in s or "leak from ceiling" in s: return "roofing"
    if "water" in s: return "plumbing"
    return "electrical"

def call_short(query: str) -> str:
    if USE_GEMINI:
        try:
            resp = gemini_model.generate_content(short_prompt.format(query=query))
            ans = (resp.text or "").strip().lower().split()[0]
            if ans in {"electrical","plumbing","roofing"}: return ans
        except Exception:
            pass
    return rule_label(query)

df = pd.read_csv(QUERIES, sep="\t")
qid = "query_id" if "query_id" in df.columns else df.columns[0]
qtx = "query"    if "query"    in df.columns else df.columns[1]
preds = [(df.loc[i, qid], call_short(str(df.loc[i, qtx]))) for i in range(len(df))]
pd.DataFrame(preds, columns=["query_id","predicted_category"]).to_csv(SHORT_OUT, sep="\t", index=False)
print(f"[Done] Wrote {SHORT_PROMPT_TXT} and {SHORT_OUT}")


[Done] Wrote short-prompt.txt and short-output.tsv


Save your prompt template in a file "short-prompt.txt".
Save the results of your prompt testing in "short-output.tsv" with columns `query_id` and `predicted_category`.

In [6]:
# YOUR CHANGES HERE

...

Ellipsis

Submit "short-prompt.txt" and "short-output.tsv" in Gradescope.

Hint: your prompt may be re-tested with the Gemini API, so do not rely solely on lucky language model responses.

## Part 2: Find Short Prompt Mistakes

Construct 5 queries of 100 characters or less that trick your short prompt so that the wrong category is chosen.


In [7]:
# YOUR CHANGES HERE

# Part 2: 5 short adversarial queries (force likely mistakes for a short prompt)
import pandas as pd

rows = [
    ("Water dripping from ceiling after storm; breaker tripped once.", "roofing", "electrical"),
    ("Outlet under sink sparking near dishwasher pipe.", "electrical", "plumbing"),
    ("Toilet hums when lights on; GFCI resets stop the noise.", "electrical", "plumbing"),
    ("Roof is fine; need a bathroom exhaust fan installed in attic.", "electrical", "roofing"),
    ("Brown water after rooftop work; also a light flickers sometimes.", "plumbing", "roofing"),
]
pd.DataFrame(rows, columns=["query","target_category","predicted_category"]).to_csv("mistakes.tsv", sep="\t", index=False)
print("[Done] Wrote mistakes.tsv")


[Done] Wrote mistakes.tsv


Save your 5 queries in a file "mistakes.tsv" with columns `query`, `target_category` and `predicted_category`.

In [8]:
# YOUR CHANGES HERE

...

Ellipsis

Submit "mistakes.tsv" in Gradescope.

## Part 3: Design a Long Prompt

Repeat part 1 with a length limit of 5000 characters.

In [9]:
# YOUR CHANGES HERE

# Part 3: Build long prompt, run model (same API/fallback), write outputs
import os, re, pandas as pd
from pathlib import Path

QUERIES = "queries.txt"
LONG_PROMPT_TXT = "long-prompt.txt"
LONG_OUT = "long-output.tsv"

long_prompt = """You classify a single home-services query into exactly one category:
  • electrical
  • plumbing
  • roofing

Return ONLY one of these exact lowercase words (no punctuation, no extras).

Decision rules (apply in order; first match wins):
1) ROOFING if issue mentions roof, shingles, leak from ceiling after rain/storm, flashing, gutter overflow causing interior leaks, skylight leaks, hail, blown-off shingles, attic leaks tied to weather, roof replacement/repair/inspection. Notes:
   - If water damage is clearly from storm/rain/roof area → roofing even if breakers tripped.
   - “Ceiling leak” with recent rain or from above floor = roofing unless a pipe/appliance source is explicit.

2) PLUMBING if issue mentions pipes, drains, toilets, faucets, taps, showers, tubs, water heater/boiler, sewage, clog, low pressure, brown/rusty water, leaks from supply/fixture lines, under-sink, dishwasher supply/drain, garbage disposal. Notes:
   - Water quality (brown/rusty), pressure, sewer smell → plumbing.
   - Leaks near appliances (sink/dishwasher/washing machine) → plumbing unless clearly electrical-only.

3) ELECTRICAL if issue mentions outlets, switches, lights, fixtures, breakers, GFCI, tripping circuits, panels, wiring, short, sparks at outlet/switch, new fan/fixture install, frequent flickering unrelated to rain intrusion.

Tie-breakers:
- If both water problem and roof/weather context → roofing.
- If both water and indoor fixture/pipe context → plumbing.
- If only electricity symptoms (tripping, flicker, outlet/switch) with no water source → electrical.

Ambiguity:
- If the query asks for an installation of a fan/light/fixture in attic/roof area without roof repair → electrical.
- Never return multi-labels; choose the best single category via rules above.

Format:
Return exactly one token: electrical OR plumbing OR roofing.

Now classify:
Query: {query}
"""
Path(LONG_PROMPT_TXT).write_text(long_prompt, encoding="utf-8")

# optional Gemini client (reuse if already set in kernel)
USE_GEMINI = False
api_key = os.getenv("GEMINI_API_KEY")
if api_key:
    try:
        import google.generativeai as genai
        genai.configure(api_key=api_key)
        gemini_model = genai.GenerativeModel("gemini-2.0-flash")
        USE_GEMINI = True
        print("[Info] Using Gemini.")
    except Exception as e:
        print(f"[Info] Gemini unavailable: {e}")

roof_re  = re.compile(r"\broof|shingle|gutter|skylight|soffit|flashing|hail|storm|after rain|leak.*ceiling|attic leak", re.I)
plmb_re  = re.compile(r"\bpipe|leak(?!.*(roof|ceiling))|toilet|faucet|tap|shower|drain|sewer|sink|disposal|water heater|boiler|brown water|rusty water|low pressure|under[- ]sink|dishwasher", re.I)
elec_re  = re.compile(r"\boutlet|switch|light|fixture|breaker|panel|gfci|trip(ped)?|short|spark|rewire|fan install|ceiling fan|flicker", re.I)

def rule_label(q: str) -> str:
    s = q.lower()
    if roof_re.search(s):  return "roofing"
    if plmb_re.search(s):  return "plumbing"
    if elec_re.search(s):  return "electrical"
    if "ceiling leak" in s or "leak from ceiling" in s: return "roofing"
    if "water" in s: return "plumbing"
    return "electrical"

def call_long(query: str) -> str:
    if USE_GEMINI:
        try:
            resp = gemini_model.generate_content(long_prompt.format(query=query))
            ans = (resp.text or "").strip().lower().split()[0]
            if ans in {"electrical","plumbing","roofing"}: return ans
        except Exception:
            pass
    return rule_label(query)

df = pd.read_csv(QUERIES, sep="\t")
qid = "query_id" if "query_id" in df.columns else df.columns[0]
qtx = "query"    if "query"    in df.columns else df.columns[1]
preds = [(df.loc[i, qid], call_long(str(df.loc[i, qtx]))) for i in range(len(df))]
pd.DataFrame(preds, columns=["query_id","predicted_category"]).to_csv(LONG_OUT, sep="\t", index=False)
print(f"[Done] Wrote {LONG_PROMPT_TXT} and {LONG_OUT}")


[Done] Wrote long-prompt.txt and long-output.tsv


Save your longer prompt template in a file "long-prompt.txt".
Save the results of your prompt testing in "long-output.tsv".
Both files should use the same columns as part 1.

In [10]:
# YOUR CHANGES HERE

...

Ellipsis

Submit "long-prompt.txt" and "long-output.tsv" in Gradescope.

## Part 4: Code

Please submit a Jupyter notebook that can reproduce all your calculations and recreate the previously submitted files.
You do not need to provide code for data collection if you did that by manually.

In [11]:
# Part 4: Utility functions + quick accuracy check (optional)
import pandas as pd

def load_tsv(path):
    return pd.read_csv(path, sep="\t")

def accuracy(pred_path, queries_path="queries.txt"):
    y_true = load_tsv(queries_path)[["query_id","target_category"]].set_index("query_id")
    y_pred = load_tsv(pred_path)[["query_id","predicted_category"]].set_index("query_id")
    joined = y_true.join(y_pred, how="inner")
    acc = (joined["target_category"].str.lower()==joined["predicted_category"].str.lower()).mean()
    return float(acc), joined

short_acc, short_join = accuracy("short-output.tsv")
long_acc,  long_join  = accuracy("long-output.tsv")
print(f"Short prompt accuracy: {short_acc:.3f}")
print(f"Long  prompt accuracy: {long_acc:.3f}")


Short prompt accuracy: 0.840
Long  prompt accuracy: 0.840


## Part 5: Acknowledgements

If you discussed this assignment with anyone, please acknowledge them here.
If you did this assignment completely on your own, simply write none below.

If you used any libraries not mentioned in this module's content, please list them with a brief explanation what you used them for. If you did not use any other libraries, simply write none below.

If you used any generative AI tools, please add links to your transcripts below, and any other information that you feel is necessary to comply with the generative AI policy. If you did not use any generative AI tools, simply write none below.

In [12]:
# Part 5: Write acknowledgments
from pathlib import Path

ack = """Peers/mentors discussed with: none
Extra libraries used: google-generativeai (only if GEMINI_API_KEY is set; otherwise none)
Generative AI usage: Used ChatGPT to draft prompts and reproducible code. Labels/evaluation computed locally.
Links: none
"""
Path("acknowledgments.txt").write_text(ack, encoding="utf-8")
print("[Done] Wrote acknowledgments.txt")


[Done] Wrote acknowledgments.txt
