# Qualitative Coding with LLMs

This notebook demonstrates how to use Large Language Models for qualitative data analysis tasks like thematic coding, code condensation, and reflexive analysis.

## Learning Objectives

- Perform **inductive thematic analysis** with LLMs on open-ended text
- Use **structured outputs** (JSON mode) for consistent coding results
- Apply **code condensation** strategies to reduce redundant themes
- Implement **reflexive coding** by having the LLM challenge its own interpretations
- Handle **long documents** (interview transcripts) through text chunking
- Compare thematic coding results from different LLM approaches

## Setup

### Running in Google Colab
1. Upload this notebook to Google Colab
2. Run the installation cell below
3. You'll be prompted to enter your OpenAI API key

### Running Locally
1. Install requirements: `pip install openai scikit-learn pandas numpy`
2. Set environment variable: `export OPENAI_API_KEY="your-key-here"`
3. Run notebook with Jupyter: `jupyter notebook week5_qualitative_coding_colab.ipynb`

In [None]:
# Install required packages (uncomment if needed)
# !pip install -q "openai>=1.40.0" scikit-learn pandas numpy

In [None]:
import os
import json
import re
import pandas as pd
import numpy as np
from openai import OpenAI
from sklearn.cluster import KMeans
import getpass

# Set your OpenAI API key
# For Colab: you'll be prompted to enter it
# For local: set OPENAI_API_KEY environment variable
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API key: ")

client = OpenAI()  # reads OPENAI_API_KEY from environment

print("✓ Setup complete!")

**What this code does:**

Sets up the environment for qualitative coding with LLMs:

**Key libraries:**
- **`openai`**: Official client for OpenAI API (GPT-4, GPT-3.5)
- **`sklearn`**: Machine learning library (we'll use clustering for theme aggregation)
- **`pandas`**: Data manipulation for tabular results
- **`getpass`**: Secure API key input (won't display in output)

**API Key setup:**
- In Colab: You'll be prompted to paste your key
- Locally: Set `export OPENAI_API_KEY="sk-..."` before running
- Security: Never hardcode keys or commit them to git

**Why OpenAI vs other models:**
- GPT-4o-mini: Best balance of cost and quality for coding (~$0.15 per 1M tokens)
- Can switch to Anthropic Claude or open-source models with minor code changes

**Expected output:** "✓ Setup complete!" means you're ready to make API calls.

---

## Part 1: Inductive Thematic Analysis

In qualitative research, **inductive thematic analysis** means deriving themes directly from the data rather than applying pre-defined categories. We'll ask an LLM to:

1. Read a collection of open-ended responses
2. Identify 3-6 recurring themes
3. Assign each text to one or more themes
4. Provide justifications using quotes from the data

We'll use **JSON mode** to get structured, parseable output.

In [None]:
# Sample dataset: short survey responses about community concerns
texts = [
    "I worry about rising rent and housing costs in my city.",
    "Public transit has improved, but buses are still unreliable.",
    "I love my neighborhood's community garden and local markets.",
    "Healthcare appointments take months to schedule.",
    "Street lighting got better and I feel safer walking at night.",
    "Childcare is unaffordable; I had to reduce my hours at work.",
    "The new bike lanes are great, but drivers don't respect them.",
    "Utility bills have gone up a lot this year.",
    "Local library events helped me meet new neighbors.",
    "The wait times at the public clinic are frustrating."
]

print(f"Dataset: {len(texts)} responses about community life\n")
for i, text in enumerate(texts[:3], 1):
    print(f"{i}. {text}")
print("...")

**What this code does:**

This is the core of LLM-assisted inductive thematic analysis:

**Prompt structure:**
1. **System message:** Defines the LLM's role and output format
   - "Return valid JSON only" ensures parseable output
   - Specifies what keys to include (themes, assignments)
2. **User message:** Contains the actual task and data
   - Wrapped in JSON for clarity
   - Specifies desired number of themes (3-6)

**Key parameter: `response_format={"type": "json_object"}`**
- Forces valid JSON output (won't return prose)
- Still need to request JSON in the prompt
- Available in GPT-4 and GPT-3.5-turbo models

**Temperature choice: 0.3**
- Lower than creative tasks (0.7-1.0)
- Ensures more consistent theme identification
- For maximum reproducibility, use 0.0 (but may be too rigid)

**How to adapt this:**
- Change `n_themes_range` to guide theme granularity
- Add domain-specific instructions (e.g., "focus on economic themes")
- Include codebook examples for few-shot learning

**Expected output:** A JSON object with structured themes and text-to-theme assignments.

In [None]:
# System instructions for the LLM
system_instructions = """You are a careful qualitative analyst.
Return valid JSON only. Create 3–6 concise themes with clear names and
1–2 sentence definitions. Then assign each input text an array of theme names
(best matches), and include a short justification using quotes when possible.
If uncertain, allow an empty array. Keys: themes, assignments."""

# User prompt with the task and data
user_prompt = {
    "task": "Inductive theming of short survey responses",
    "n_themes_range": "3-6",
    "texts": texts
}

# Make API call with JSON mode
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_instructions},
        {"role": "user", "content": json.dumps(user_prompt)}
    ],
    response_format={"type": "json_object"},  # Force JSON output
    temperature=0.3  # Low temperature for more consistent coding
)

result = json.loads(response.choices[0].message.content)
print("✓ Thematic analysis complete")
print(f"Keys in response: {list(result.keys())}")

**What this code does:**

Parses and displays the LLM's theme assignments - with robust error handling:

**Why we need `get_text_index` function:**
- LLMs may return results in different formats
- Sometimes returns `"idx": 0`, sometimes `"text": "full text here"`
- This function handles multiple possible response structures

**Parsing strategies:**
1. **Check for explicit index keys:** `idx`, `index`, `i`, `text_id`
2. **Match by text content:** If no index, try to find the text in original list
3. **Substring matching:** Handle partial quotes or truncated text

**Note on variable key names:**
- LLMs may return different key names (e.g., "themes" vs "labels" vs "categories")
- The code handles this by checking multiple possible keys: `assignment.get("themes", assignment.get("labels", assignment.get("categories", [])))`
- This flexibility makes the code robust to LLM output variations

**Why this matters:**
- Makes code robust to LLM output variations
- Different models or temperature settings may format differently
- Always validate LLM outputs before trusting them

**How to use:**
- Run as-is for most cases
- If your LLM uses different keys, add them to the function
- For production, add logging to track which matching strategy succeeded

**Expected output:** Human-readable list showing which texts got which theme labels, with justifications.

In [None]:
# Display the themes identified by the LLM
print("\n=== IDENTIFIED THEMES ===\n")

themes = result.get("themes", [])
for i, theme in enumerate(themes, 1):
    name = theme.get("name", "Unnamed theme")
    definition = theme.get("definition", "No definition provided")
    print(f"{i}. {name}")
    print(f"   {definition}\n")

In [None]:
# Display assignments of texts to themes
def get_text_index(assignment, texts):
    """Extract text index from assignment, handling various response formats"""
    # Try explicit index keys
    for key in ("idx", "index", "i", "text_id"):
        if key in assignment:
            return int(assignment[key])
    
    # Try to match by text content
    if "text" in assignment:
        text_snippet = assignment["text"]
        # Exact match
        if text_snippet in texts:
            return texts.index(text_snippet)
        # Substring match
        for j, t in enumerate(texts):
            if text_snippet[:30] in t:
                return j
    return None

print("\n=== THEME ASSIGNMENTS ===\n")

assignments = result.get("assignments", [])
for assignment in assignments:
    idx = get_text_index(assignment, texts)
    
    # Get theme names (handle different possible keys)
    theme_names = assignment.get("themes", 
                   assignment.get("labels", 
                   assignment.get("categories", [])))
    
    justification = assignment.get("justification", "")
    
    if idx is not None:
        print(f"Text {idx}: {texts[idx][:60]}...")
        print(f"  → Themes: {', '.join(theme_names)}")
        if justification:
            print(f"  → Why: {justification}")
        print()

**What this code does:**

Demonstrates **code condensation** - a key qualitative analysis technique:

**What is code condensation:**
- Start with many detailed codes (13 in this example)
- Group them into fewer, broader themes (3-5)
- Creates a hierarchical codebook structure

**Why this matters:**
- Mirrors iterative qualitative analysis workflow
- Helps manage complexity in large datasets
- Makes patterns more visible

**How the prompt works:**
1. Provides all initial codes as context
2. Asks for 3-5 broader categories
3. Requests mapping of initial → condensed codes
4. Wants definitions for each condensed theme

**Temperature: 0.2 (very low)**
- Code condensation should be consistent
- We want logical groupings, not creative ones
- Higher temperature might produce inconsistent hierarchies

**How to use this:**
- Start by coding 20-50 texts with detailed codes
- Feed those codes to this condensation step
- Iterate if the condensed themes don't feel right
- Use human judgment to validate the groupings

**Expected output:** 3-5 broader themes, each encompassing multiple initial codes, with clear definitions.

---

## Part 2: Code Condensation

In iterative qualitative coding, you often start with many codes and then **condense** them into higher-level themes. Let's simulate this by:

1. Creating an initial set of detailed codes
2. Asking the LLM to group them into broader categories
3. Producing a condensed codebook

In [None]:
# Simulate initial detailed codes from a first-pass analysis
initial_codes = [
    "Rising rent concerns",
    "Housing affordability crisis",
    "Childcare costs",
    "Utility bill increases",
    "Bus unreliability",
    "Bike lane infrastructure",
    "Healthcare access delays",
    "Public clinic wait times",
    "Street lighting improvements",
    "Neighborhood safety",
    "Community garden engagement",
    "Library social events",
    "Local market connections"
]

print(f"Initial codes: {len(initial_codes)} detailed codes\n")
for code in initial_codes:
    print(f"  • {code}")

In [None]:
# Ask LLM to condense codes into higher-level themes
condensation_prompt = f"""You are a qualitative researcher performing code condensation.

Given these {len(initial_codes)} initial codes from interview analysis:
{json.dumps(initial_codes, indent=2)}

Group them into 3-5 broader themes. For each theme:
- Provide a clear theme name
- List which initial codes it encompasses
- Write a brief definition

Return JSON with key 'condensed_themes' containing an array of theme objects."""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a qualitative researcher. Return valid JSON only."},
        {"role": "user", "content": condensation_prompt}
    ],
    response_format={"type": "json_object"},
    temperature=0.2
)

condensed = json.loads(response.choices[0].message.content)
print("✓ Code condensation complete\n")

In [None]:
# Display condensed themes
print("=== CONDENSED THEMES ===\n")

condensed_themes = condensed.get("condensed_themes", [])
for i, theme in enumerate(condensed_themes, 1):
    name = theme.get("name", "Unnamed")
    definition = theme.get("definition", "")
    included_codes = theme.get("codes", theme.get("initial_codes", []))
    
    print(f"{i}. {name}")
    print(f"   Definition: {definition}")
    print(f"   Encompasses: {', '.join(included_codes)}")
    print()

**What this code does:**

Implements **reflexive coding** - having the LLM critique its own analysis:

**What is reflexivity in qualitative research:**
- Questioning your own interpretations
- Acknowledging researcher positionality and bias
- Considering alternative readings of data
- Strengthening validity through self-critique

**The two-step process:**
1. **Initial coding:** LLM identifies themes (temperature 0.4)
2. **Self-challenge:** LLM questions its own themes (temperature 0.5)

**Why slightly higher temperature in step 2:**
- Need creativity to generate alternative interpretations
- Want to escape from initial framing
- Still structured enough for analysis

**What the challenge prompt asks:**
- What biases might the LLM have made?
- What alternative interpretations exist?
- What evidence contradicts the themes?
- What was overlooked?

**Sociological value:**
- Surfaces ambiguities in the data
- Reveals multiple possible readings
- Makes analysis more transparent and rigorous

**How to use:**
- Run on texts where interpretation is ambiguous
- Use the challenges to refine your own thinking
- Don't treat LLM challenges as "truth" - they're provocations
- Combine with human reflexive memos

**Expected output:** Critical reflections on initial coding, alternative themes, and a revised interpretation.

---

## Part 3: Reflexive Coding (Self-Challenge)

Good qualitative research involves **reflexivity**: questioning your own interpretations and biases. We can prompt the LLM to:

1. Analyze text and produce initial codes
2. Challenge its own coding decisions
3. Consider alternative interpretations

This helps surface ambiguities and encourages deeper analysis.

**What this code does:**

Implements **text chunking** for handling long documents (like interview transcripts):

**Why chunking is necessary:**
- LLMs have context windows (e.g., 128k tokens for GPT-4)
- Long interviews may exceed this limit
- Smaller chunks = more focused coding
- Easier to manage API costs

**The `chunk_text` function:**
1. **Normalize input:** Handles both strings and lists
2. **Clean whitespace:** Collapses multiple spaces to one
3. **Split by words:** Chunks of ~200 words (adjustable)
4. **Filter short chunks:** Minimum 40 characters to avoid tiny fragments

**Parameters to adjust:**
- **`max_words=200`:** Smaller = more granular coding, but more API calls
  - Use 200-300 for detailed coding
  - Use 500-1000 for broader themes
- **`min_chars=40`:** Prevents tiny trailing chunks

**Alternative chunking strategies:**
- **Paragraph-based:** Split on `\n\n` (respects document structure)
- **Sentence-based:** Use spaCy or NLTK sentence tokenizer
- **Semantic:** Use embeddings to find natural break points
- **Speaker turns:** For multi-party interviews

**How to use:**
- For 5,000-word interview: ~25 chunks at 200 words each
- Each chunk gets coded separately (next cell)
- Then aggregate themes across chunks (Part 5)

**Expected output:** List of text segments, each small enough for focused LLM coding.

**What this code does:**

Applies thematic coding to each chunk separately, then aggregates results:

**The `code_chunks` function workflow:**
1. Loop through each text chunk
2. Send to LLM for theme identification (max 3 themes per chunk)
3. Collect all themes in a flat list
4. Print themes found in each chunk

**Key parameter: `max_themes_per_chunk=3`**
- Prevents theme explosion (too many themes)
- Forces LLM to prioritize most salient themes
- Adjust based on chunk size and content density

**Why this approach:**
- Each chunk gets focused attention
- Avoids overwhelming the LLM with too much text
- Natural for long interviews with multiple topics

**Potential issues:**
- **Duplicate themes across chunks:** "affordability" appears in chunks 1, 3, 5
- **Inconsistent naming:** Chunk 1 says "cost concerns", chunk 3 says "financial burden"
- Solution: Deduplication step (next cells)

**Cost considerations:**
- 25 chunks × $0.0002 per call = ~$0.005 (very cheap)
- Most cost is in prompt tokens (the chunk text)
- For 100 interviews: budget ~$5-10

**How to improve:**
- Add few-shot examples in the prompt for consistency
- Use lower temperature (0.1) for more uniform theme names
- Include a preliminary codebook to guide naming

**Expected output:** List of all themes across all chunks, with some duplication that needs resolution.

In [None]:
# Sample interview excerpt for reflexive analysis
interview_excerpt = """I've lived in this neighborhood for fifteen years. 
When I first moved here, everyone knew each other. We'd have block parties, 
kids played outside together. Now? People keep to themselves. Everyone's 
always rushing somewhere. The new apartment buildings brought in a lot of 
young professionals who don't seem interested in community. But maybe that's 
just my generation talking. I don't know. Things change."""

print("Interview excerpt:\n")
print(interview_excerpt)

In [None]:
# Step 1: Initial coding
initial_analysis_prompt = f"""Analyze this interview excerpt and identify 2-3 themes.

Excerpt:
{interview_excerpt}

Return JSON with:
- themes: array of theme names
- interpretations: brief explanation of each theme
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a qualitative researcher. Return valid JSON only."},
        {"role": "user", "content": initial_analysis_prompt}
    ],
    response_format={"type": "json_object"},
    temperature=0.4
)

initial_analysis = json.loads(response.choices[0].message.content)

print("=== INITIAL ANALYSIS ===\n")
for theme in initial_analysis.get("themes", []):
    print(f"  • {theme}")
print()
for interpretation in initial_analysis.get("interpretations", []):
    print(f"  - {interpretation}")

**What this code does:**

Uses the LLM to **deduplicate and consolidate** similar themes:

**Why LLMs are good at this:**
- Can recognize semantic similarity ("cost concerns" ≈ "financial burden")
- Understand broader categories that encompass multiple themes
- Generate clear definitions for consolidated themes

**The deduplication prompt:**
1. Shows all unique themes from chunk coding
2. Asks LLM to merge similar/overlapping ones
3. Requests 3-5 final distinct themes
4. Wants mapping: which original themes → which final theme

**Temperature: 0.2 (low)**
- Consolidation should be logical and consistent
- Too high = creative but inconsistent groupings

**How this differs from code condensation:**
- **Code condensation:** Hierarchical grouping (initial → broader)
- **Deduplication:** Horizontal merging (similar themes → single theme)

**Alternative approaches:**
- **Embedding similarity:** Use cosine similarity on theme embeddings to auto-cluster
- **Manual review:** Export theme list and manually mark duplicates
- **Hybrid:** LLM suggests merges, human approves

**When to use:**
- After coding 20+ chunks (enough variation to cause duplicates)
- When you see obvious semantic overlap in theme names
- Before final codebook creation

**Expected output:** Clean, non-redundant final codebook with 3-5 themes, each with clear definition and list of merged original themes.

In [None]:
# Step 2: Challenge the initial coding
challenge_prompt = f"""You previously identified these themes in an interview excerpt:
{json.dumps(initial_analysis, indent=2)}

Original excerpt:
{interview_excerpt}

Now, critically examine your own interpretation:
1. What biases or assumptions might you have made?
2. What alternative interpretations are possible?
3. What evidence contradicts your themes?
4. What did you overlook?

Return JSON with:
- challenges: array of critical reflections
- alternative_themes: array of alternative theme names
- revised_interpretation: your reconsidered analysis
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a reflexive qualitative researcher. Return valid JSON only."},
        {"role": "user", "content": challenge_prompt}
    ],
    response_format={"type": "json_object"},
    temperature=0.5  # Slightly higher for creative challenges
)

reflexive_analysis = json.loads(response.choices[0].message.content)

print("\n=== REFLEXIVE ANALYSIS ===\n")

print("Critical reflections:")
for challenge in reflexive_analysis.get("challenges", []):
    print(f"  • {challenge}")

print("\nAlternative themes:")
for alt_theme in reflexive_analysis.get("alternative_themes", []):
    print(f"  • {alt_theme}")

print("\nRevised interpretation:")
print(f"  {reflexive_analysis.get('revised_interpretation', '')}")

**What this code does:**

Creates an **interactive dialogue** where the LLM asks YOU questions about the analysis:

**What is member checking/collaborative coding:**
- Researcher presents analysis to participants
- Participants validate or challenge interpretations
- Iterative dialogue refines understanding

**How this simulates that:**
- LLM presents its themes
- LLM asks open-ended questions about:
  - Ambiguities in the text
  - Missing context
  - Alternative interpretations
- Researcher answers (in practice, you'd feed answers back to LLM)

**Temperature: 0.6 (moderate-high)**
- Need creativity to formulate good questions
- Want genuine probing, not just confirmation
- Not coding task, so higher temp is fine

**Practical workflow:**
1. Run this cell to get LLM's questions
2. Answer them based on your domain knowledge
3. Feed answers back in a new prompt: "Given these clarifications: [answers], revise your coding"
4. LLM produces refined themes

**Why this matters:**
- Makes LLM analysis more interactive
- Surfaces what the LLM is uncertain about
- Encourages researcher reflexivity
- Combines LLM capabilities with human expertise

**How to extend:**
- Create a loop: Ask questions → Get answers → Revise → Ask follow-ups
- Log the dialogue for transparency in publications
- Use for training human coders (shows what questions to ask)

**Expected output:** 2-3 thoughtful questions that probe interpretation, not just factual gaps.

---

## Part 4: Handling Long Documents (Chunking)

Interview transcripts can be very long. To handle them effectively:

1. **Chunk** the text into manageable segments
2. Code each segment separately
3. Aggregate themes across all segments

Here's a simple chunking strategy based on word count.

In [None]:
def chunk_text(text, max_words=200, min_chars=40):
    """
    Split text into chunks of roughly max_words each.
    
    Args:
        text: Input text (string or list of strings)
        max_words: Maximum words per chunk
        min_chars: Minimum characters to keep a chunk
    
    Returns:
        List of text chunks
    """
    # Normalize to string
    if isinstance(text, list):
        text = " ".join(str(t) for t in text)
    else:
        text = str(text)
    
    # Collapse whitespace
    text = re.sub(r"\s+", " ", text).strip()
    if not text:
        return []
    
    # Split into words and chunk
    words = text.split()
    chunks = []
    
    for i in range(0, len(words), max_words):
        chunk = " ".join(words[i:i+max_words]).strip()
        if len(chunk) >= min_chars:
            chunks.append(chunk)
    
    return chunks

# Test with a sample long text
sample_long_text = """Video games are a diverse medium. The audience who plays 
video games is similarly diverse. If you're trying to break into a specialist 
space, such as interactive fiction, there are very set definitions of what 
interactive fiction can be or can't be. Some people in the community believe 
interactive fiction is typing into a parser and nothing else. But there are 
people working to change that perception. Similar movements happen in other 
genres like roguelikes. Gaming is a big tent and we're seeing those voices 
being given more press as a testament to the maturity of our medium."""

chunks = chunk_text(sample_long_text, max_words=50)
print(f"Split text into {len(chunks)} chunks\n")
for i, chunk in enumerate(chunks, 1):
    print(f"Chunk {i} ({len(chunk.split())} words):")
    print(f"  {chunk[:100]}...\n")

In [None]:
# Now let's code each chunk separately
def code_chunks(chunks, model="gpt-4o-mini", max_themes_per_chunk=3):
    """
    Apply thematic coding to each chunk and aggregate results.
    """
    all_themes = []
    
    for i, chunk in enumerate(chunks):
        prompt = f"""Identify up to {max_themes_per_chunk} themes in this text segment.
        
Segment:
{chunk}

Return JSON with:
- themes: array of theme names
- supporting_quotes: array of relevant quotes
"""
        
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": "You are a qualitative researcher. Return valid JSON only."},
                {"role": "user", "content": prompt}
            ],
            response_format={"type": "json_object"},
            temperature=0.3
        )
        
        result = json.loads(response.choices[0].message.content)
        chunk_themes = result.get("themes", [])
        all_themes.extend(chunk_themes)
        
        print(f"Chunk {i+1}: {chunk_themes}")
    
    return all_themes

print("=== CODING CHUNKS ===\n")
all_themes = code_chunks(chunks)

print(f"\n✓ Total themes identified across chunks: {len(all_themes)}")
print(f"Unique themes: {len(set(all_themes))}")

---

## Part 5: Theme Aggregation and Deduplication

When coding in chunks, you often get **duplicate or overlapping themes**. Let's aggregate and deduplicate them.

In [None]:
# Count theme frequency
from collections import Counter

theme_counts = Counter(all_themes)

print("=== THEME FREQUENCY ===\n")
for theme, count in theme_counts.most_common():
    print(f"  {theme}: {count}x")

In [None]:
# Use LLM to deduplicate and merge similar themes
dedup_prompt = f"""You identified these themes across multiple text segments:
{json.dumps(list(theme_counts.keys()), indent=2)}

Some themes may be duplicates or highly similar. Consolidate them into a 
final list of 3-5 distinct themes. For each:
- Provide a clear theme name
- List which original themes it merges
- Give a 1-2 sentence definition

Return JSON with key 'final_themes'.
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a qualitative researcher. Return valid JSON only."},
        {"role": "user", "content": dedup_prompt}
    ],
    response_format={"type": "json_object"},
    temperature=0.2
)

final_codebook = json.loads(response.choices[0].message.content)

print("\n=== FINAL CODEBOOK ===\n")
for i, theme in enumerate(final_codebook.get("final_themes", []), 1):
    name = theme.get("name", "Unnamed")
    definition = theme.get("definition", "")
    merged = theme.get("merged_themes", theme.get("merges", []))
    
    print(f"{i}. {name}")
    print(f"   {definition}")
    if merged:
        print(f"   Merged from: {', '.join(merged)}")
    print()

---

## Part 6: Interactive Reflexive Dialogue

One powerful feature of LLM-assisted coding is the ability to have a **dialogue** about the analysis. Let's create a function that allows the LLM to:

1. Present its coding
2. Ask for your opinion
3. Incorporate your feedback

This simulates collaborative coding or member checking.

In [None]:
# Create a dialogue where LLM asks for researcher input
dialogue_prompt = f"""You've coded this interview excerpt:

{interview_excerpt}

Your themes: {initial_analysis.get('themes', [])}

Now, formulate 2-3 questions to ask the researcher to validate or challenge 
your interpretation. These should be open-ended questions that:
- Probe ambiguities in the text
- Ask about context you might be missing
- Invite alternative interpretations

Return JSON with key 'questions'.
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a reflexive qualitative researcher. Return valid JSON only."},
        {"role": "user", "content": dialogue_prompt}
    ],
    response_format={"type": "json_object"},
    temperature=0.6
)

dialogue = json.loads(response.choices[0].message.content)

print("=== LLM QUESTIONS FOR RESEARCHER ===\n")
for i, question in enumerate(dialogue.get("questions", []), 1):
    print(f"{i}. {question}\n")

print("\n💡 In practice, you would answer these questions and feed the responses")
print("   back to the LLM to refine the coding iteratively.")

**What this code does:**

Implements **robust JSON extraction with retry logic** for when parsing fails:

**The three-phase approach:**

**Phase 1: Initial request**
- Clear instructions: "Return only a JSON object"
- Low temperature (0.1) for consistency
- Specify exact schema in prompt

**Phase 2: Parse attempt**
- Try `json.loads()` on response
- If successful → return result
- If fails → proceed to Phase 3

**Phase 3: Retry with correction**
- Send original prompt + failed response + correction instruction
- Use temperature 0.0 (most deterministic)
- Try parsing again
- If still fails → raise exception for manual review

**Key features:**
- **`max_retries` parameter:** Control how many attempts (1 is usually enough)
- **Error logging:** Print failed output for debugging
- **Gradual temperature reduction:** 0.1 → 0.0 increases determinism

**Note about `get_labels_robust`:**
- This function is defined here for demonstration purposes to show robust JSON extraction with retry logic
- However, the actual batch annotation code in this notebook uses a simpler approach with JSON mode API
- JSON mode API (shown earlier) is more reliable and doesn't require this retry logic
- This function is useful when working with models that don't support JSON mode or for understanding error handling

**When to use retry logic:**
- Using Approach 1 or 2 (not JSON mode)
- Critical annotations (can't skip failures)
- Debugging schema issues

**When NOT needed:**
- Using JSON mode or function calling (already reliable)
- Batch processing (skip failures, review later)

**Success rates:**
- Without retry: ~85% (prompt-only) to ~95% (few-shot)
- With retry: ~98%
- Remaining 2%: Usually schema issues or model limitations

**Best practice:** Use JSON mode (Approach 3) to avoid needing this complexity.