# Part 2: Basic LLM Chat Tool

## Introduction

In this part, you'll create a simple command-line chat tool that interacts with a Large Language Model (LLM) through the Hugging Face API. This tool will allow you to have conversations with an LLM about healthcare topics.

## Learning Objectives

- Connect to the Hugging Face API
- Create a basic interactive chat loop
- Handle simple error cases
- Test with healthcare questions

## Setup and Installation

In [12]:
# Install required packages
%pip install -r requirements.txt

# Additional packages for LLM API interaction
%pip install requests

# Import necessary libraries
import os
import sys
import time
import logging
import argparse
from typing import Optional

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Create directories
os.makedirs('utils', exist_ok=True)
os.makedirs('results/part_2', exist_ok=True)

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## 1. Connecting to the Hugging Face API

The Hugging Face Inference API provides access to many language models. We'll use models that are available on the free tier.

In [41]:
import requests
import os

# Read the .env file and set variables manually
with open('.env', 'r') as f:
    for line in f:
        line = line.strip()
        if line and not line.startswith('#'):
            if '=' in line:
                key, value = line.split('=', 1)
                key = key.strip()
                value = value.strip().strip('"').strip("'")
                os.environ[key] = value

# at least it works
API_URL = os.getenv("API_URL")

API_KEY = os.getenv("HUGGINGFACE_API_KEY")

headers = {"Authorization": f"Bearer {API_KEY}"}  # Optional for some models

In [42]:
def query(payload):
    """
    Send a query to the Hugging Face API
    
    Args:SS
        payload: Dictionary containing the query parameters
        
    Returns:
        The API response
    """
    # TODO: Implement the API request
    # Use requests.post to send the query to the API_URL
    response = requests.post(API_URL, headers=headers, json=payload)
    result = response.json()
    # Return the response
    return result

# Test the query function
test_payload = {"inputs": "What are the symptoms of diabetes?"}
response = query(test_payload)
print(response)
# generated_text = response[0]["generated_text"]
# print(generated_text)

{'error': 'You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly included credits.'}


## 2. Creating Simple Chat Scripts

Your task is to create two simple scripts that interact with the Hugging Face API:

1. A basic one-off chat script (`utils/one_off_chat.py`)
2. A contextual conversation script (`utils/conversation.py`)

### One-Off Chat Script

Create a script that handles independent interactions (each prompt/response is separate):

In [43]:
# utils/one_off_chat.py

def get_response(prompt, model_name = "HuggingFaceH4/zephyr-7b-beta", api_key = API_KEY) :
    """
    Send a prompt to the Hugging Face API and return the generated response.

    Args:
        prompt (str): The text prompt to send to the LLM.
        model_name (str): The name of the Hugging Face model to use.
        api_key (str): Your Hugging Face API key.

    Returns:
        str: The generated text from the LLM, or an error message if the request fails.
    """
    # Construct the API URL based on the model_name
    api_url = API_URL
    headers = {"Authorization": f"Bearer {api_key}"}
    print(api_url)

    payload = {"inputs": prompt}
    if not API_KEY:
        print("Error: HUGGINGFACE_API_KEY environment variable is not set.")
        exit()


    try:
        response = requests.post(api_url, headers=headers, json=payload)
        response.raise_for_status()
        result = response.json()

        # Your working `query` function directly returned `result`.
        # Here, we keep the specific extraction for "generated_text"
        # as it's common for text generation APIs.
        if isinstance(result, list) and "generated_text" in result[0]:
            return result[0]["generated_text"]
        elif isinstance(result, dict) and "error" in result:
            return f"API Error: {result['error']}"
        else:
            # Fallback for unexpected API response formats
            return str(result)
    except requests.exceptions.RequestException as e:
        return f"Request failed: {e}"
    except Exception as e:
        return f"An unexpected error occurred: {e}"


def run_chat():
    """Runs an interactive chat session with the LLM."""
    print("--- Interactive chat mode. Type 'exit' to quit. ---")
    import sys  # Ensure sys is available for flush
    while True:
        user_input = input("\nYou: ")
        if user_input.lower() == "exit":
            print("Goodbye!")
            break
        print(user_input)
        print("LLM: Thinking...", flush=True)
        response = get_response(user_input)

        if "402 Client Error: Payment Required" in response:
            response = "\nIf you can't give me money, I don't wanna waste time on you."
            print(response)
            break
        
        print("LLM:", response)


def main():
    """Main function to parse arguments and run the chat or a single prompt."""
    parser = argparse.ArgumentParser(description="Chat with an LLM via Hugging Face Inference API.")
    parser.add_argument(
        "--prompt",
        type=str,
        help="Prompt to send to the LLM (if not provided, runs interactive chat)",
        default=None
    )
    parser.add_argument(
        "--model_name",
        type=str,
        help="Model name to use (default: HuggingFaceH4/zephyr-7b-beta)",
        default="HuggingFaceH4/zephyr-7b-beta"
    )
    parser.add_argument(
        "--api_key",
        type=str,
        help="Hugging Face API key (default: uses HUGGINGFACE_API_KEY environment variable)",
        default=API_KEY # Use the globally defined API_KEY as default
    )
    
    # parse_known_args is useful if there might be other unrecognized arguments,
    # but for a simple script, parse_args() is usually sufficient.
    args = parser.parse_known_args()[0]
    
    if args.prompt:
        # Run with a single prompt
        response = get_response(args.prompt, model_name=args.model_name, api_key=args.api_key)
        print(response)
    else:
        # Run interactive chat
        run_chat()


if __name__ == "__main__":
    main()

--- Interactive chat mode. Type 'exit' to quit. ---
Hi, you look so pretty
LLM: Thinking...
https://router.huggingface.co/hf-inference/models/openai/whisper-large-v3

If you can't give me money, I don't wanna waste time on you.


At least it works well before free trial used up I promise.

### Contextual Conversation Script

Create a script that maintains conversation history:

In [32]:
# utils/conversation.py

import requests
import argparse
import os
from collections import deque

DEFAULT_API_TIMEOUT = 60

def get_response(prompt: str, history: deque = None, model_name: str = "HuggingFaceH4/zephyr-7b-beta", api_key: str = API_KEY, history_length: int = 3) -> str:
    """
    Get a response from the model using conversation history.

    Args:
        prompt (str): The current user prompt.
        history (deque): A deque (double-ended queue) of previous (user_prompt, llm_response) tuples.
                         Used for maintaining a limited conversation history.
        model_name (str): Name of the model to use on Hugging Face (e.g., "google/flan-t5-base").
        api_key (str): API key for authentication with Hugging Face.
        history_length (int): Number of previous exchanges (user_prompt, llm_response)
                              to include as context in the current prompt.

    Returns:
        str: The model's generated response, or an informative error message if the request fails.
    """
    # Initialize history as a deque if it's None or not already a deque.
    # The maxlen ensures the history stays within the specified length.
    if history is None or not isinstance(history, deque):
        history = deque(maxlen=history_length)

    # Construct the full API URL for the specified model.
    api_url = API_URL
    headers = {"Authorization": f"Bearer {api_key}"}

    context = ""
    for prev_user_prompt, prev_llm_response in history:
        context += f"User: {prev_user_prompt}\nAI: {prev_llm_response}\n"

    # Combine the historical context with the current user prompt.
    # The "AI:" at the end hints the model to complete the AI's turn.
    full_prompt = f"{context}User: {prompt}\nAI:"

    # Define the payload for the API request.
    # Parameters like max_new_tokens and temperature help control the LLM's output.
    payload = {
        "inputs": full_prompt,
        "parameters": {
            "max_new_tokens": 150, # Limit the length of the generated response
            "temperature": 0.7,    # Controls creativity (higher = more creative)
            "do_sample": True,     # Enables sampling from the model's output distribution
            "return_full_text": False # Instructs the API to only return the generated part
        }
    }

    try:
        # Send the POST request to the Hugging Face Inference API.
        # The timeout prevents the request from hanging indefinitely.
        response = requests.post(api_url, headers=headers, json=payload, timeout=DEFAULT_API_TIMEOUT)
        response.raise_for_status()  # Raise an HTTPError for bad responses (4xx or 5xx status codes)

        result = response.json()

        # Extract the generated text from the API response.
        if isinstance(result, list) and "generated_text" in result[0]:
            generated_text = result[0]["generated_text"].strip()
            # Clean up potential leading/trailing parts the model might generate
            if generated_text.startswith("User:"):
                # If the model hallucinates user turns, remove them
                generated_text = generated_text.split("AI:", 1)[-1].strip()
            elif generated_text.startswith("AI:"):
                # If the model explicitly starts with "AI:", remove it
                generated_text = generated_text.replace("AI:", "").strip()
            
            return generated_text
        elif isinstance(result, dict) and "error" in result:
            # Handle API-specific errors returned in the response body
            return f"API Error: {result.get('error', 'Unknown API Error')}"
        else:
            # Handle unexpected response formats from the API
            return f"Unexpected API response format: {str(result)}"
    except requests.exceptions.Timeout:
        # Catch specific timeout errors
        return f"Request failed: Read timed out after {DEFAULT_API_TIMEOUT} seconds. The model might be busy or your internet is slow."
    except requests.exceptions.HTTPError as http_err:
        # Catch HTTP errors (e.g., 401 Unauthorized, 404 Not Found, 500 Internal Server Error)
        return f"HTTP error occurred: {http_err} - Response: {http_err.response.text}"
    except requests.exceptions.RequestException as req_err:
        # Catch other requests-related exceptions (e.g., connection errors)
        return f"Request failed: {req_err}"
    except Exception as e:
        # Catch any other unexpected exceptions
        return f"An unexpected error occurred: {e}"

def run_chat():
    """Runs an interactive chat session with conversation history."""
    print("Welcome to the Contextual LLM Chat! Type 'exit' to quit.")
    print("-------------------------------------------------------")

    # Initialize conversation history using a deque to manage its length.
    # The maxlen is determined by the history_length argument passed to main.
    history = deque(maxlen=args.history_length) # Access args from the outer scope's main function

    while True:
        user_input = input("\nYou: ")
        if user_input.lower() == 'exit':
            print("Goodbye!")
            break


        print("LLM: Thinking...", flush=True) # Use flush=True to ensure "Thinking..." appears immediately

        # Get response using the current user input and the conversation history.
        llm_response = get_response(
            prompt=user_input,
            history=history,
            model_name=args.model_name, # Use model_name from parsed arguments
            api_key=args.api_key        # Use api_key from parsed arguments
        )

        # Kindly reminder to add money
        if "402 Client Error: Payment Required" in llm_response:
            history.append((user_input, llm_response))
            llm_response = "\nIf you can't give me money, I don't wanna waste time on you."
            print(llm_response)
            break

        # Print the LLM's response.
        print("LLM:", llm_response)

def main():
    """Main function to parse command-line arguments and run the chat session."""
    parser = argparse.ArgumentParser(description="Chat with an LLM using conversation history.")
    parser.add_argument(
        "--prompt",
        type=str,
        help="A single prompt to send to the LLM (if not provided, runs interactive chat mode).",
        default=None
    )
    parser.add_argument(
        "--model_name",
        type=str,
        help="Name of the Hugging Face model to use (default: HuggingFaceH4/zephyr-7b-beta).",
        default="HuggingFaceH4/zephyr-7b-beta" # Updated default model name
    )
    parser.add_argument(
        "--api_key",
        type=str,
        help="Your Hugging Face API key (defaults to HUGGINGFACE_API_KEY environment variable).",
        default=API_KEY # Uses the globally loaded API_KEY as default
    )
    parser.add_argument(
        "--history_length",
        type=int,
        help="Number of previous conversation exchanges to include as context (default: 3).",
        default=3
    )

    # Change to parse_known_args() to ignore unknown arguments (like Jupyter's --f argument)
    global args 
    args, unknown = parser.parse_known_args() # Modified line

    # Determine whether to run interactive chat or a single prompt execution.
    if args.prompt:
        # If a single prompt is provided, get a response and print it.
        # History is not managed for single prompt executions from CLI.
        print("LLM: Thinking...")
        response = get_response(
            prompt=args.prompt,
            model_name=args.model_name,
            api_key=args.api_key # Ensure API key is passed here
        )
        print("LLM:", response)
    else:
        # If no prompt, start the interactive chat session.
        run_chat()

if __name__ == "__main__":
    main()

Welcome to the Contextual LLM Chat! Type 'exit' to quit.
-------------------------------------------------------
LLM: Thinking...

If you can't give me money, I don't wanna waste time on you.


## 3. Testing and Evaluation

Create a script to test your chat implementations with specific healthcare questions.

In [None]:
# utils/test_chat.py

import os
import csv
from pathlib import Path

# Import our chat modules - since we're in the same directory
from utils.one_off_chat import get_response as get_one_off_response
# Optionally import the conversation module if testing that too
# from conversation import get_response as get_contextual_response

def test_chat(questions, model_name="google/flan-t5-base", api_key=None):
    """
    Test the chat function with a list of questions
    
    Args:
        questions: A list of questions to test
        model_name: Name of the model to use
        api_key: API key for authentication
        
    Returns:
        A dictionary mapping questions to responses
    """
    results = {}
    
    for question in questions:
        print(f"Testing question: {question}")
        # Get response using the one-off chat function
        response = get_one_off_response(question, model_name, api_key)
        results[question] = response
        
    return results

# List of healthcare questions to test
test_questions = [
    "What are the symptoms of gout?",
    "How is gout diagnosed?",
    "What treatments are available for gout?",
    "What lifestyle changes can help manage gout?",
    "What foods should be avoided with gout?"
]

def save_results(results, output_file="results/part_2/usage_examples.txt"):
    """
    Save the test results to a file
    
    Args:
        results: Dictionary mapping questions to responses
        output_file: Path to the output file
    """
    with open(output_file, 'w') as f:
        # Write header
        f.write("# LLM Chat Tool Test Results\n\n")
        
        # Write usage examples
        f.write("## Usage Examples\n\n")
        f.write("```bash\n")
        f.write("# Run the one-off chat\n")
        f.write("python utils/one_off_chat.py\n\n")
        f.write("# Run the contextual chat\n")
        f.write("python utils/conversation.py\n")
        f.write("```\n\n")
        
        # Write test results
        f.write("## Test Results\n\n")
        f.write("```csv\n")
        f.write("question,response\n")
        
        for question, response in results.items():
            # Format the question and response for CSV
            q = question.replace(',', '').replace('\n', ' ')
            r = response.replace(',', '').replace('\n', ' ')
            f.write(f"{q},{r}\n")
            
        f.write("```\n")

# Run the test and save results
if __name__ == "__main__":
    results = test_chat(test_questions)
    save_results(results)
    print("Test results saved to results/part_2/example.txt")

Testing question: What are the symptoms of gout?
https://router.huggingface.co/hf-inference/models/openai/whisper-large-v3
Testing question: How is gout diagnosed?
https://router.huggingface.co/hf-inference/models/openai/whisper-large-v3
Testing question: What treatments are available for gout?
https://router.huggingface.co/hf-inference/models/openai/whisper-large-v3
Testing question: What lifestyle changes can help manage gout?
https://router.huggingface.co/hf-inference/models/openai/whisper-large-v3
Testing question: What foods should be avoided with gout?
https://router.huggingface.co/hf-inference/models/openai/whisper-large-v3
Test results saved to results/part_2/example.txt


## Progress Checkpoints

1. **API Connection**:
   - [ ] Successfully connect to the Hugging Face API
   - [ ] Send a query and receive a response
   - [ ] Handle API errors gracefully

2. **Chat Function Implementation**:
   - [ ] Implement the get_response function
   - [ ] Create the run_chat function for interactive sessions
   - [ ] Handle errors and edge cases

3. **Command Line Interface**:
   - [ ] Create a parser with appropriate arguments
   - [ ] Implement the main function
   - [ ] Test the CLI functionality

4. **Testing and Evaluation**:
   - [ ] Test the functions with healthcare questions
   - [ ] Save the results in a structured format
   - [ ] Analyze the quality of responses

## Common Issues and Solutions

1. **API Access Issues**:
   - Problem: Rate limiting
   - Solution: Implement exponential backoff and retry logic
   - Problem: Authentication errors
   - Solution: Verify API key and environment variables

2. **Response Parsing Issues**:
   - Problem: Unexpected response format
   - Solution: Add error handling for different response structures
   - Problem: Empty or error responses
   - Solution: Provide meaningful fallback responses

3. **CLI Issues**:
   - Problem: Arguments not parsed correctly
   - Solution: Test with different argument combinations
   - Problem: Script not executable
   - Solution: Check file permissions

## What to Submit

1. Your implementation of the chat scripts:
   - Basic requirement: `utils/one_off_chat.py` for single prompt/response chat
   - Stretch goal (optional): `utils/conversation.py` for contextual chat
   - Testing script: `utils/test_chat.py` to evaluate your implementation

2. Test results in `results/part_2/example.txt` with the following format:
   - Usage examples section showing how to run your scripts
   - Test results section with CSV-formatted question/response pairs
   - If you implemented the stretch goal, include examples of contextual exchanges

The auto-grader should check:
1. That your chat scripts can be executed
2. That they correctly handle the test questions
3. That your results file contains the required sections