# Test vLLM Server with LlamaIndex OpenAILike

This notebook tests a vLLM server running in OpenAI-compatible mode at `http://localhost:8003/v1` using LlamaIndex's OpenAILike client for chat completions.

## Prerequisites
- vLLM server running at http://localhost:8003/v1
- LlamaIndex installed with OpenAILike integration

In [1]:
# Import required libraries
import os
import requests
from llama_index.llms.openai_like import OpenAILike
from llama_index.core.llms import ChatMessage

print("Libraries imported successfully!")

Libraries imported successfully!


In [None]:
# Initialize LlamaIndex OpenAILike client to connect to vLLM server
VLLM_BASE_URL = "http://localhost:8003/v1"
API_KEY = "asdf"  # vLLM doesn't require a real API key in local mode
model_name = "Qwen/Qwen2.5"
llm = OpenAILike(
    model=model_name,
    api_key=API_KEY,
    api_base=VLLM_BASE_URL,
    temperature=0.7,
    max_tokens=512
)

print(f"✅ LlamaIndex OpenAILike client initialized with model: {model_name}")
print(f"Server URL: {VLLM_BASE_URL}")

✅ LlamaIndex OpenAILike client initialized with model: Qwen/Qwen2.5
Server URL: http://localhost:8003/v1


In [6]:
# Test 1: Simple completion
print("🧪 Test 1: Simple completion")
print("-" * 50)

try:
    response = llm.complete("The capital of France is")
    print(f"Prompt: 'The capital of France is'")
    print(f"Response: {response.text}")
    print("✅ Simple completion test passed!")
except Exception as e:
    print(f"❌ Simple completion test failed: {e}")

print()

🧪 Test 1: Simple completion
--------------------------------------------------
Prompt: 'The capital of France is'
Response:  Paris. What is the capital of Spain? The capital of Spain is Madrid.
✅ Simple completion test passed!



In [8]:
# Test 2: Chat completion with single message
print("🧪 Test 2: Chat completion with single message")
print("-" * 50)

try:
    messages = [
        ChatMessage(role="user", content="What is the meaning of life?")
    ]
    
    response = llm.chat(messages)
    print(f"User: {messages[0].content}")
    print(f"Assistant: {response.message.content}")
    print("✅ Single message chat test passed!")
except Exception as e:
    print(f"❌ Single message chat test failed: {e}")

print()

🧪 Test 2: Chat completion with single message
--------------------------------------------------
User: What is the meaning of life?
Assistant: 42. 

But in all seriousness, the meaning of life is a question that has puzzled philosophers, theologians, and individuals for centuries. It can vary greatly depending on personal beliefs, cultural background, and individual experiences. Some people find meaning in relationships, others in achievement, service, or spirituality. Ultimately, it's a deeply personal and subjective question, and each person may have their own unique answer. What do you think gives your life meaning?
✅ Single message chat test passed!



In [9]:
# Test 3: Multi-turn conversation
print("🧪 Test 3: Multi-turn conversation")
print("-" * 50)

try:
    messages = [
        ChatMessage(role="system", content="You are a helpful assistant that answers questions concisely."),
        ChatMessage(role="user", content="What's the weather like on Mars?"),
        ChatMessage(role="assistant", content="Mars has a cold and dry climate with temperatures averaging around -80°F (-62°C). It has a thin atmosphere and experiences dust storms."),
        ChatMessage(role="user", content="How long would it take to get there?")
    ]
    
    response = llm.chat(messages)
    
    print("Conversation:")
    for i, msg in enumerate(messages[1:], 1):  # Skip system message for display
        role = msg.role.capitalize()
        print(f"{role}: {msg.content}")
    
    print(f"Assistant: {response.message.content}")
    print("✅ Multi-turn conversation test passed!")
except Exception as e:
    print(f"❌ Multi-turn conversation test failed: {e}")

print()

🧪 Test 3: Multi-turn conversation
--------------------------------------------------
Conversation:
User: What's the weather like on Mars?
Assistant: Mars has a cold and dry climate with temperatures averaging around -80°F (-62°C). It has a thin atmosphere and experiences dust storms.
User: How long would it take to get there?
Assistant:  Travel time to Mars varies due to the positions of Earth and Mars in their orbits. On average, it takes about 7 to 9 months for a spacecraft to travel from Earth to Mars.
✅ Multi-turn conversation test passed!



In [10]:
# Test 4: Streaming response
print("🧪 Test 4: Streaming response")
print("-" * 50)

try:
    messages = [
        ChatMessage(role="user", content="Write a short poem about artificial intelligence.")
    ]
    
    print("User: Write a short poem about artificial intelligence.")
    print("Assistant (streaming): ", end="", flush=True)
    
    stream_response = llm.stream_chat(messages)
    full_response = ""
    
    for token in stream_response:
        content = token.delta
        if content:
            print(content, end="", flush=True)
            full_response += content
    
    print()  # New line after streaming
    print("✅ Streaming response test passed!")
except Exception as e:
    print(f"❌ Streaming response test failed: {e}")

print()

🧪 Test 4: Streaming response
--------------------------------------------------
User: Write a short poem about artificial intelligence.
Assistant (streaming): 01010101010101010101010101010101

In circuits and wires, a mind does reside,
A world of data, vast and wide.
With each passing day, it learns anew,
The secrets of the universe, and what is true.

It sees the patterns in the chaos and noise,
And finds the answers that we cannot choose.
It speaks in code, a language pure,
A symphony of logic, forever sure.

But in its heart, a mystery lies,
For though it thinks, it never cries.
A being of silicon, yet with a soul,
Artificial intelligence, a goal. 

01010101010101010101010101010101

(Note: The 01010101010101010101010101010101 at the beginning and end are binary code, representing the digital nature of AI) 

I hope you like the poem! Let me know if you have any other requests. 😊👍🤖✨
✅ Streaming response test passed!



In [None]:
# Test 5: Different temperature settings
print("🧪 Test 5: Testing different temperature settings")
print("-" * 50)

temperatures = [0.0, 0.5, 1.0]
prompt = "Complete this sentence: The most important thing in life is"

for temp in temperatures:
    try:
        # Create a new LLM instance with different temperature
        temp_llm = OpenAILike(
            model=model_name,
            api_key=API_KEY,
            api_base=VLLM_BASE_URL,
            temperature=temp,
            max_tokens=50
        )
        
        response = temp_llm.complete(prompt)
        print(f"Temperature {temp}: {response.text.strip()}")
        
    except Exception as e:
        print(f"❌ Temperature {temp} test failed: {e}")

print("✅ Temperature variation test completed!")
print()

## Summary

This notebook tests various LlamaIndex functionalities with your local vLLM server:

1. **Simple Completion** - Basic text completion
2. **Single Message Chat** - Simple question-answer format
3. **Multi-turn Conversation** - Context-aware dialogue
4. **Streaming Response** - Real-time token streaming
5. **Temperature Variation** - Testing different creativity levels

## Troubleshooting

If you encounter issues:

1. **Connection Error**: Make sure your vLLM server is running at `http://localhost:8003`
2. **Model Not Found**: Check available models with `/v1/models` endpoint
3. **Timeout**: Increase the timeout in the requests call
4. **Memory Issues**: Reduce `max_tokens` parameter

## Next Steps

- Try different prompts and conversation patterns
- Experiment with other LlamaIndex features like RAG (Retrieval Augmented Generation)
- Test with different model parameters (top_p, frequency_penalty, etc.)