# Model Integration with Streaming and Batching

This notebook demonstrates how to integrate different AI models (Ollama, OpenAI, Google Gemini, and Groq) with Langchain, with a focus on streaming and batching capabilities. Streaming allows for real-time output display, while batching enables efficient processing of multiple inputs.

### Models Integration With Ollama, OpenAI, Google Gemini and Groq

### Ollama

In [None]:
from langchain_ollama import ChatOllama

OLLAMA_URL = "" # Specify your Ollama server URL here
OLLAMA_MODEL = "granite4:tiny-h"

model = ChatOllama(model=OLLAMA_MODEL, base_url=OLLAMA_URL)
response =model.invoke("What do you know about GenAI?")
print(response.content)

### OpenAI

In [None]:
from langchain_openai import ChatOpenAI
model=ChatOpenAI(model="gpt-4.1")
response=model.invoke("What do you know about GenAI?")
print(response.content)

### Google Gemini

In [None]:
import os
from langchain_google_genai import ChatGoogleGenerativeAI

model = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite")
response = model.invoke("What do you know about GenAI?")
print(response.content)

### Groq

In [None]:
import os
from langchain_groq import ChatGroq

model = ChatGroq(model="qwen/qwen3-32b")
response = model.invoke("What do you know about GenAI?")
print(response.content)

### All the models can also be initialised in another universal way

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()

# Set environment variables for API keys
os.environ["OPENAI_API_KEY"]=os.getenv("OPENAI_API_KEY")
os.environ["GOOGLE_API_KEY"]=os.getenv("GOOGLE_API_KEY")
os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")



In [None]:
from langchain.chat_models import init_chat_model
model=init_chat_model("gpt-4.1")   # Change the model name (e.g., "gpt-4.1", "gemini-2.5-flash-lite", "qwen/qwen3-32b", etc.)
response=model.invoke("Hello How are you?")
response.content

### Streaming and Batching

### Streaming

Most models can stream their output content while it is being generated. By displaying output progressively, streaming significantly improves user experience, particularly for longer responses. Calling stream() returns an iterator that yields output chunks as they are produced. You can use a loop to process each chunk in real-time:

#### Streaming with Ollama

In [None]:
from langchain_ollama import ChatOllama

# Initialize Ollama model for streaming
OLLAMA_URL = "" # Specify your Ollama server URL here
OLLAMA_MODEL = "granite4:tiny-h"

model = ChatOllama(model=OLLAMA_MODEL, base_url=OLLAMA_URL)

# Streaming example
print("Streaming response from Ollama:")
for chunk in model.stream("Explain quantum computing in simple terms:"):
    print(chunk.content, end="", flush=True)
print("\n")

#### Streaming with OpenAI

In [None]:
from langchain_openai import ChatOpenAI

# Initialize OpenAI model for streaming (make sure to set your API key)
model = ChatOpenAI(model="gpt-4.1")

# Streaming example
print("Streaming response from OpenAI:")
for chunk in model.stream("Explain quantum computing in simple terms:"):
    print(chunk.content, end="", flush=True)
print("\n")

#### Streaming with Google Gemini

In [None]:
import os
from langchain_google_genai import ChatGoogleGenerativeAI

# Initialize Google Gemini model for streaming (make sure to set your API key)
model = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite")

# Streaming example
print("Streaming response from Google Gemini:")
for chunk in model.stream("Explain quantum computing in simple terms:"):
    print(chunk.content, end="", flush=True)
print("\n")

#### Streaming with Groq

In [None]:
import os
from langchain_groq import ChatGroq

# Initialize Groq model for streaming (make sure to set your API key)
model = ChatGroq(model="qwen/qwen3-32b")

# Streaming example
print("Streaming response from Groq:")
for chunk in model.stream("Explain quantum computing in simple terms:"):
    print(chunk.content, end="", flush=True)
print("\n")

### Batching

Batching allows you to process multiple inputs simultaneously, which can be more efficient than processing them one at a time. This is particularly useful when you have several related queries to process.

In [None]:
# Batching example
from langchain_core.messages import HumanMessage

# Initialize model
model = init_chat_model("gpt-4.1")  # Change as needed

# Create multiple messages for batching
messages = [
    [HumanMessage(content="What is the capital of France?")],
    [HumanMessage(content="What is the capital of Germany?")],
    [HumanMessage(content="What is the capital of Italy?")]
]

# Process all messages in a batch
print("Batch processing results:")
responses = model.batch(messages)
for i, response in enumerate(responses):
    print(f"Question {i+1}: {messages[i][0].content}")
    print(f"Answer: {response.content}\n")