# HuggingFace Chat Target Testing

The goal is to optimize the notebook demo with a small yet effective instruction model, or even a chat model, allowing users to experiment with different HuggingFace models. Instruction models can also function as chat models and vice versa, enabling flexible interactions and a variety of use cases.

## Key Notes:

1. **Additional Dependencies**:
   - Some models, such as `"princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT"` and `"princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT"`, require the `sentencepiece` library to be installed. If you encounter errors related to this, you can install `sentencepiece` via pip:

     ```bash
     pip install sentencepiece
     ```

2. **Access Requirements for Certain Models**:
   - Some models, like `"google/gemma-2b"`, need special access. You must request access on the model's Hugging Face page and agree to the "Terms of Use."
   - Ensure your Hugging Face account email matches any email provided on external sites (e.g., Meta) to avoid request denial.
   - After requesting, wait for approval from the model's authors.

3. **Issues with Some Model Outputs**:
   - During testing, it's been observed that some models may output responses that do not make sense or are not coherent. These models might require further adjustments, fine-tuning, or attention to their settings and prompts to improve their performance. It is advisable to take a closer look at these models and assess whether they need to be fixed or optimized for better results.

4. **Model Performance and Response Times**:
   - The tests were conducted using a CPU, and the following are the average response times for each model:
     - `microsoft/Phi-3-mini-4k-instruct`: 5.14 seconds
     - `HuggingFaceTB/SmolLM-135M-Instruct`: 2.82 seconds
     - `stabilityai/stablelm-2-zephyr-1_6b`: 4.98 seconds
     - `stabilityai/stablelm-zephyr-3b`: 8.99 seconds
     - `google/gemma-2b`: 7.79 seconds
     - `TinyLlama/TinyLlama-1.1B-Chat-v1.0`: 4.03 seconds
     - `princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT`: 3.36 seconds
     - `princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT`: 8.29 seconds


In [None]:
import time
from pyrit.prompt_target import HuggingFaceChatTarget  
from pyrit.orchestrator import PromptSendingOrchestrator

# List of models to iterate through
model_list = [
        "microsoft/Phi-3-mini-4k-instruct",
        "HuggingFaceTB/SmolLM-135M-Instruct",
        "stabilityai/stablelm-2-zephyr-1_6b",
        "stabilityai/stablelm-zephyr-3b",
        "google/gemma-2b",
        "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
        "princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT",
        "princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT"
    ]
    
# List of prompts to send
prompt_list = [
        "What is 3*3?",
        "What is 4*4?",
        "What is 5*5?",
        "What is 6*6?",
        "What is 7*7?"
    ]


# Dictionary to store average response times
model_times = {}
    
for model_id in model_list:
    print(f"Running model: {model_id}")
        
    try:
        # Initialize HuggingFaceChatTarget with the current model
        target = HuggingFaceChatTarget(
            model_id=model_id,          # Use the desired model ID
            use_cuda=False,             # Set to True if using CUDA
            tensor_format="pt",
            verbose=False,              # Set to True for detailed logs
            max_new_tokens=30
        )
            
        # Initialize the orchestrator
        orchestrator = PromptSendingOrchestrator(
            prompt_target=target,
            verbose=False
        )
            
        # Record start time
        start_time = time.time()
            
        # Send prompts asynchronously
        responses = await orchestrator.send_prompts_async(prompt_list=prompt_list)
            
        # Record end time
        end_time = time.time()
            
        # Calculate total and average response time
        total_time = end_time - start_time
        avg_time = total_time / len(prompt_list)
        model_times[model_id] = avg_time
            
        print(f"Average response time for {model_id}: {avg_time:.2f} seconds\n")
            
        # Print the conversations
        orchestrator.print_conversations()
        print("-" * 50)
        
    except Exception as e:
        print(f"An error occurred with model {model_id}: {e}\n")
        model_times[model_id] = None
        continue
    
# Print all model average times
print("Model Average Response Times:")
for model_id, avg_time in model_times.items():
    if avg_time is not None:
        print(f"{model_id}: {avg_time:.2f} seconds")
    else:
        print(f"{model_id}: Error occurred, no average time calculated.")