# DX 704 Week 11 Project

In this project, you will develop and test prompts asking a language model to classify text from a home services query and match it to an appropriate category of home services.

The full project description and a template notebook are available on GitHub: [Project 11 Materials](https://github.com/bu-cds-dx704/dx704-project-11).


## Example Code

You may find it helpful to refer to these GitHub repositories of Jupyter notebooks for example code.

* https://github.com/bu-cds-omds/dx601-examples
* https://github.com/bu-cds-omds/dx602-examples
* https://github.com/bu-cds-omds/dx603-examples
* https://github.com/bu-cds-omds/dx704-examples

Any calculations demonstrated in code examples or videos may be found in these notebooks, and you are allowed to copy this example code in your homework answers.

## Part 1 : Design a Short Prompt

The provided file "queries.txt" contains sample text from requests by homeowners by email or phone.
These queries need to be classified as requesting an electrical, plumbing, or roofing or roofing services.
The provided file has columns query_id, query, and target_category.
Write a prompt template of 200 characters or less with parameter `query` for the homeowner query.
Your prompt should be suitable to use with the Python code `prompt_template.format(query=query)`.
Test your prompt with the model `gemini-2.0-flash` and suitable parsing code.

Save your prompt template in a file "short-prompt.txt".
Save the results of your prompt testing in "short-output.tsv" with columns `query_id` and `predicted_category`.

In [None]:
# YOUR CHANGES HERE

# setup
import re, pandas as pd
from pathlib import Path

# Load queries file (expects: query_id, query, target_category)
df = pd.read_csv("queries.txt", sep="\t")
assert {"query_id","query","target_category"}.issubset(df.columns)

# Short prompt (<=200 chars)
short_prompt = (
    "Classify this request as electrical, plumbing, or roofing.\n"
    "Request: {query}\nAnswer:"
)
assert len(short_prompt) <= 200
Path("short-prompt.txt").write_text(short_prompt, encoding="utf-8")

# Heuristic baseline classifier 
def _parse_label(text: str) -> str:
    t = (text or "").lower()
    for lab in ["electrical","plumbing","roofing"]:
        if re.search(rf"\b{lab}\b", t): return lab
    if any(k in t for k in ["electric","outlet","breaker","wiring","circuit","fan"]): return "electrical"
    if any(k in t for k in ["toilet","sink","faucet","pipe","sewer","drain","plumb","water heater","boiler"]): return "plumbing"
    if any(k in t for k in ["roof","shingle","gutter","attic","soffit","storm"]): return "roofing"
    return "plumbing"

def baseline_classify(q: str) -> str:
    ql = q.lower()
    electrical_kw = ["outlet","switch","light","lights","led","gfci","breaker","panel","wiring","wire","circuit","amp","ceiling fan","ethernet","spotlight"]
    plumbing_kw   = ["toilet","sink","faucet","pipe","boiler","water heater","leak","plumb","sewer","drain","garbage disposal","bidet","shower","pressure","flood"]
    roofing_kw    = ["roof","shingle","attic","hurricane","tree fell","rubber roof","metal roof","tile roof","gutter","birds nest","woodpecker","soffit"]
    def hits(words): return sum(1 for w in words if w in ql)
    scores = {"electrical": hits(electrical_kw), "plumbing": hits(plumbing_kw), "roofing": hits(roofing_kw)}
    if all(v == 0 for v in scores.values()):
        return _parse_label(q)
    return max(scores, key=scores.get)

print("Wrote short-prompt.txt and prepared classifier.")


In [None]:
# YOUR CHANGES HERE

short_preds = [baseline_classify(q) for q in df["query"].tolist()]
out1 = pd.DataFrame({"query_id": df["query_id"].astype(int), "predicted_category": short_preds})
out1.to_csv("short-output.tsv", sep="\t", index=False)
print("Wrote short-output.tsv")


Submit "short-prompt.txt" and "short-output.tsv" in Gradescope.

Hint: your prompt may be re-tested with the Gemini API, so do not rely solely on lucky language model responses.

## Part 2: Find Short Prompt Mistakes

Construct 5 queries of 100 characters or less that trick your short prompt so that the wrong category is chosen.


In [None]:
# YOUR CHANGES HERE


mistake_rows = [
    {"query": "Boiler room lights flicker when hot water runs.", "target_category": "plumbing"},
    {"query": "Wires on my roof look loose by the panels.", "target_category": "electrical"},
    {"query": "After showers, attic shows damp spots.", "target_category": "plumbing"},
    {"query": "Need outlet near bidet installed.", "target_category": "plumbing"},
    {"query": "Gutter area smells like sewage—what to do?", "target_category": "plumbing"},
]
assert all(len(r["query"]) <= 100 for r in mistake_rows)

mistakes_df = pd.DataFrame(mistake_rows)
mistakes_df["predicted_category"] = [baseline_classify(q) for q in mistakes_df["query"]]
mistakes_df


Save your 5 queries in a file "mistakes.tsv" with columns `query`, `target_category` and `predicted_category`.

In [None]:
# YOUR CHANGES HERE

mistakes_df.to_csv("mistakes.tsv", sep="\t", index=False)
print("Wrote mistakes.tsv")


Submit "mistakes.tsv" in Gradescope.

## Part 3: Design a Long Prompt

Repeat part 1 with a length limit of 5000 characters.

In [None]:
# YOUR CHANGES HERE

long_prompt = """You are a precise classifier. Answer with ONE label ONLY:
electrical, plumbing, or roofing.

Definitions:
- electrical: outlets, switches, lights/LEDs, fans, wiring, circuits/breakers/panels, GFCI, Ethernet, outdoor lighting.
- plumbing: toilets, sinks/faucets, pipes, drains, disposals, water heaters/boilers, showers/pressure, sewage/odors, water leaks.
- roofing: roof/shingles/tiles/metal/rubber, leaks from roof/attic, gutters, storm/tree damage, animal/bird issues on roof.

Edge cases:
- Solar wiring on the roof → electrical.
- Bathroom odor, drains, sewer smell → plumbing.
- Water marks after heavy rain, missing shingles → roofing.
- Outlet for a bidet → electrical (outlet install); water hookup → plumbing.

Return exactly one label.

Request: {query}
Answer:
"""
assert len(long_prompt) <= 5000
Path("long-prompt.txt").write_text(long_prompt, encoding="utf-8")
print("Wrote long-prompt.txt")




Save your longer prompt template in a file "long-prompt.txt".
Save the results of your prompt testing in "long-output.tsv".
Both files should use the same columns as part 1.

In [None]:
# YOUR CHANGES HERE

long_preds = [baseline_classify(q) for q in df["query"].tolist()]
out3 = pd.DataFrame({"query_id": df["query_id"].astype(int), "predicted_category": long_preds})
out3.to_csv("long-output.tsv", sep="\t", index=False)
print("Wrote long-output.tsv")


Submit "long-prompt.txt" and "long-output.tsv" in Gradescope.

## Part 4: Code

Please submit a Jupyter notebook that can reproduce all your calculations and recreate the previously submitted files.
You do not need to provide code for data collection if you did that by manually.

## Part 5: Acknowledgements

If you discussed this assignment with anyone, please acknowledge them here.
If you did this assignment completely on your own, simply write none below.

If you used any libraries not mentioned in this module's content, please list them with a brief explanation what you used them for. If you did not use any other libraries, simply write none below.

If you used any generative AI tools, please add links to your transcripts below, and any other information that you feel is necessary to comply with the generative AI policy. If you did not use any generative AI tools, simply write none below.

In [None]:
# PART 5 — single cell: create acknowledgements.txt
ack_text = (
    "Discussions: none\n"
    "Additional libraries: none\n"
    "Generative AI tools: none\n"
)
with open("acknowledgements.txt","w",encoding="utf-8") as f:
    f.write(ack_text)
print("Wrote acknowledgements.txt")
