# CAH 30503 — Week 6: Real Users and Domain Fit

**Theme**: From "it's deployed" to "I know what real users think and where accountability lives."

---

This is the most examination-heavy week of the course. No new code, no new deployment. Instead, three lenses:

1. **User testing** — What do real users actually experience when they use your app?
2. **Accountability** — When your app gives wrong output, who is responsible?
3. **Domain fit** — Does this tool fit into a real workflow? What are the stakes when it's wrong?

By the end of today, you'll have a prioritized fix list that drives next week's hardening.

---

## Your Deployed App

Before we begin, confirm your app is live:

- **My app name**: 

- **My public URL**: 

- **Is it loading right now?** *(Visit the URL — free-tier Spaces sleep after inactivity. If it shows "Sleeping," wait 30 seconds.)*

If your Space isn't working, tell your instructor now. You need a live URL for today's activities.

---

## Activity 1: Design Your User Test Protocol

A user test is not "hey, try my app." It's a **structured observation**. Fill in the protocol below BEFORE testing.

### About My App

- **Name**: 

- **URL**: 

- **What it does** (one sentence, for someone who has never seen it): 

- **Who it's for** (imagine one specific person): 

### Test Tasks

Give each tester these tasks. Tasks are **goals**, not instructions.

Don't say "click the Analyze button" — say "find out what the main topic of this article is."

1. **Easy task** *(the app should handle this perfectly)*:


2. **Realistic task** *(a genuine use case — what the app is actually for)*:


3. **Edge case task** *(something that might break or confuse — how does the app handle it?)*:



### What I'm Watching For

- Can they figure out what the app does without being told?
- Do they use the examples or try their own input?
- Where do they pause or look confused?
- Do they understand the output?
- Do they encounter any errors?
- *(add your own observation criteria)*: 

### My Testers

- **Tester 1**: *(who? how close are they to the target audience?)*

- **Tester 2** (if available): *(who?)*


### Protocol Review

Swap your protocol with a partner. Check each other's:

- [ ] Tasks are goals, not instructions ("find out X" not "click Y")
- [ ] There are at least 3 different test scenarios
- [ ] The observation criteria are specific
- [ ] The testers represent the target audience (or close to it)

**Feedback from my partner**: 


**Changes I'll make**: 



---

## Activity 2: Conduct User Tests

### Rules

Read these before you start testing:

1. **Give the tester the URL and the first task.** Nothing else.
2. **Do not explain the app.** If they ask how to use it: "Just try what makes sense to you."
3. **Do not help.** If they get stuck, write it down. Don't fix it for them.
4. **Write down everything.** Actions, quotes, pauses, confusion, success.
5. **After all tasks**: Ask "What did you think this app does?" and "What was confusing?"

### Tester 1 Observations

**Who**: 

**Task 1 — Easy**:
- What they did: 
- Completed? 
- Notes: 

**Task 2 — Realistic**:
- What they did: 
- Completed? 
- Notes: 

**Task 3 — Edge case**:
- What they did: 
- Completed? 
- Notes: 

**Post-test questions**:
- "What does this app do?" (their words): 

- "What was confusing?" (their words): 


### Tester 2 Observations (if available)

**Who**: 

**Task 1 — Easy**:
- What they did: 
- Completed? 
- Notes: 

**Task 2 — Realistic**:
- What they did: 
- Completed? 
- Notes: 

**Task 3 — Edge case**:
- What they did: 
- Completed? 
- Notes: 

**Post-test questions**:
- "What does this app do?": 

- "What was confusing?": 


---

## Activity 3: Categorize Feedback

Go through all your observations and sort every finding into one of four categories.

| Category | Definition |
|----------|------------|
| **Must fix** | The app is broken or unusable without this |
| **Should fix** | Significantly improves the experience |
| **Nice to have** | Would improve but isn't essential |
| **Out of scope** | Good idea, but not for this version |

**"Out of scope" doesn't mean "bad idea."** It means "not this version." Record it — it might be the first thing you improve next week.

### Must Fix
*(Things that make the app broken or unusable)*

1. 
2. 

### Should Fix
*(Things that would significantly improve the experience)*

1. 
2. 

### Nice to Have
*(Would be better, but the app works without it)*

1. 
2. 

### Out of Scope
*(Good ideas for a future version, not this one)*

1. 
2. 

### My Priority for Week 7

**The #1 thing to fix**: 

**Why it's #1**: 


---

## Activity 4: Accountability Analysis

Your app produces output. Someone uses that output to make a decision. The output is wrong.

**Who is responsible?**

### Accountability Is Structural

Accountability isn't about finding one person to blame. It's about understanding where responsibility lives in the system:

| Node | Role |
|------|------|
| **The builder** (you) | Built the app, chose the model, designed the interface |
| **The model creator** | Trained the model, made decisions about training data |
| **The user** | Chose to use the app, decided to trust the output |
| **The affected person** | Impacted by decisions made based on the app's output |
| **The platform** (HF) | Hosts the app, provides the infrastructure |

### Three Accountability Questions

**1. When my app gives wrong output, who is most responsible?**

*(Name a role, not "everyone" or "nobody." Why that role?)*



**2. What evidence should a user see before trusting my app?**

Think about where your app sits on the **evidence ladder**:

| Level | Name | What It Means |
|-------|------|---------------|
| 1 | Demonstration | Someone showed it works once |
| 2 | Piloted | Tested with a few real people |
| 3 | Operationally Tested | Used by real users in real contexts |
| 4 | Institutionally Trusted | Accepted as standard practice |

My app is at level: 

Because: 


**3. What does "good enough" mean for my app?**

*(What error rate or behavior is acceptable? What would be unacceptable? A 95% accuracy rate means very different things depending on what the output is used for.)*



### Your Accountability Statement

Use this template:

```
When my [app name] produces [type of error] in [context],
[who] is most responsible because [reason connected to their position in the system].
```

**My accountability statement**:

> When my **___** produces **___** in **___**,
> 
> **___** is most responsible because **___**.

### Accountability Check

- [ ] I named a specific role, not "the organization" or "the AI"
- [ ] I named a specific error type, not "something goes wrong"
- [ ] The reason connects to their position in the system
- [ ] If I say the user should check the output, does my interface tell them to check?

---

## Activity 5: Domain Fit and Stakes

### Workflow Integration

Where does your app sit in a real process? Map it:

```
What happens BEFORE    →    What does my    →    What does the user    →    Who CHECKS
someone uses my app?        APP do?              do with the output?        the output?
```

**BEFORE**: 

**APP**: 

**AFTER**: 

**CHECK**: 

### Domain Stakes Assessment

| Question | Your Answer |
|----------|-------------|
| **What's the most likely error my app could make?** | |
| **How likely is that error?** (Low / Medium / High) | |
| **How bad is it if that error happens?** (Low = minor inconvenience / Medium = wasted time, wrong decision / High = real harm) | |
| **What supervision does that imply?** | |

**Supervision levels:**
- Low stakes → Output-only check (user glances at result)
- Medium stakes → Human reviews before acting on it
- High stakes → Human verifies every output

**My app's appropriate supervision level**: 

**Why**: 

---

## 6-Question Examination Protocol

Apply the protocol to this week's *testing and analysis experience* — the whole Week 6 process.

### 1. What did I ask it to do?
*(What was the purpose of user testing? What did you hope to learn?)*


### 2. What did it actually do?
*(What actually happened during the tests? What did you learn?)*


### 3. Where did it succeed?
*(What did testers find easy or useful?)*


### 4. Where did it fail or struggle?
*(What surprised you? What issues did you not anticipate?)*


### 5. Why might it have failed?
*(Model limitation? Interface confusing? Missing information?)*


### 6. What would I do differently next time?
*(About the testing process itself, not just the app)*



---

## DCS Question: Where Does Accountability Live When This System Is Wrong?

Trace the chain for your specific app:

- The **model creator** trained the model — they're accountable for...
- **I (the builder)** designed the app — I'm accountable for...
- The **user** chose to use the output — they're accountable for...
- The **affected person** experiences the consequences — they...

**Where is the gap?** *(Where is accountability unclear? Where is no one positioned to catch an error?)*


**Connect to your user test**: What did the tester assume about the output? Would they have acted on wrong output without checking?



---

## Record: CLAUDE.md Week 6 Entry

Add this to your CLAUDE.md file:

```
## Week 6: Real Users and Domain Fit

### User Testing Findings
- Tested with: [who — relationship to target audience]
- Key finding: [the most important thing testers revealed]
- Surprise: [something I didn't expect]
- Tester's description of my app (their words): [quote]

### Prioritized Fix List
- Must fix: [items, ranked]
- Should fix: [items]
- Nice to have: [items]
- Out of scope: [items]
- Top priority for Week 7: [#1 fix and why]

### Accountability Analysis
- Accountability statement: When my [app] produces [error],
  [who] is responsible because [reason].
- Evidence level: [Demonstration / Piloted / Operationally Tested]
- "Good enough" means: [threshold]
- Workflow: [BEFORE] → [APP] → [AFTER] → [CHECK]
- Stakes: Probability [level] × Severity [level] = [supervision level]

### DCS: Where Does Accountability Live?
[Trace the chain: model creator → builder → user → affected person.
 Where is the gap? What did my tester assume?]
```

---

## What's Next

You know what users think and where accountability lives. You have a prioritized fix list.

**Next week**: What could go wrong — systematically. You'll catalog failure modes, ask whether a simpler alternative would work, implement safeguards, write a capability statement (what your app does well / what it gets wrong / what it should NOT be used for), and deploy v2.