# Example of Using a Chat Model in LangChain
Here’s a basic example illustrating how to use a chat model (specifically ChatGroq) in LangChain:

Chat models are advanced language models that handle conversational messages and produce message-based outputs. They excel in interactive applications like chatbots, virtual assistants, and other dialog-based interfaces.

# 1. Basic Chat Model Setup with Hugging Face


In [None]:
pip install -qU langchain huggingface_hub


In [None]:
from langchain.llms import HuggingFaceHub
import os

# Set up your Hugging Face API key
os.environ["HUGGINGFACEHUB_API_TOKEN"] = " put your api key"

# Initialize a language model from Hugging Face
chat_model = HuggingFaceHub(repo_id="microsoft/DialoGPT-medium", model_kwargs={"temperature": 0.7})

# Send a message to simulate a conversation
response = chat_model("Hello! How can I assist you today?")
print(response)


# 2. Function and Tool Calling with Chat Models

Chat models can perform specific actions or “tools” based on input. This is useful in applications where certain actions, like retrieving weather info or performing calculations, are needed

In [None]:
from langchain_core.tools import Tool

def calculate_square(num):
    return num ** 2

# Example function to get square and format response
def get_square_response(question):
    if "square of" in question:
        # Extract the number from the question (for simplicity, we use '5')
        number = 5  # or use regex to extract numbers dynamically from question
        result = calculate_square(number)
        return f"The square of {number} is {result}."
    else:
        return "I'm here to answer your questions!"

# Sample input
question = "What is the square of 5?"
response = get_square_response(question)
print(response)


# 3. Returning Structured Output
For applications that require structured responses, like JSON, you can guide the model to return structured data:

In [None]:
structured_prompt = "Provide the answer in JSON format with fields 'greeting' and 'response'."
response = chat_model.invoke(f"{structured_prompt} Hello!")
print(response)  # Expected output: {"greeting": "Hello", "response": "Hi there!"}


# 4. Caching Model Responses
Caching saves responses for repeated queries, which can reduce latency and costs.

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import LLMResult

# Initialize the model
chat_model = ChatOpenAI()

# Simple cache dictionary
cache = {}

# Caching wrapper function
def cached_invoke(prompt):
    if prompt in cache:
        print("Retrieving from cache...")
        return cache[prompt]
    else:
        print("Generating new response...")
        response = chat_model.invoke(prompt)
        cache[prompt] = response
        return response

# Testing the cache
response_1 = cached_invoke("Hello!")
response_2 = cached_invoke("Hello!")  # This should retrieve from cache
print(response_1)
print(response_2)


# 5. Log Probabilities of Responses
Hugging Face models can provide token probabilities, useful for debugging or evaluating response confidence.

In [None]:
pip install openai==0.28

In [None]:
import getpass
import os

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass()
    

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini").bind(logprobs=True)

msg = llm.invoke(("human", "how are you today"))

msg.response_metadata["logprobs"]["content"][:5]

In [None]:
ct = 0
full = None
for chunk in llm.stream(("human", "how are you today")):
    if ct < 5:
        full = chunk if full is None else full + chunk
        if "logprobs" in full.response_metadata:
            print(full.response_metadata["logprobs"]["content"])
    else:
        break
    ct += 1

# 6. Creating a Custom Chat Model Class

In [None]:
pip install transformers

In [None]:
pip install langchain==0.0.208

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
class MyCustomChatModel:
    def __init__(self, model_name):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name)

    def invoke(self, message):
        customized_message = message + " (Customized)"
        inputs = self.tokenizer(customized_message, return_tensors="pt")
        #outputs = self.model.generate(**inputs)
        outputs = self.model.generate(**inputs, pad_token_id=self.tokenizer.eos_token_id)

        return self.tokenizer.decode(outputs[0], skip_special_tokens=True)

# Usage example
my_model = MyCustomChatModel("microsoft/DialoGPT-medium")
response = my_model.invoke("Hello!")
print(response)


# 7. Streaming Responses
For real-time applications, you can stream responses incrementally.

In [None]:
from langchain_openai import ChatOpenAI
import os

# Set up your API key
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = "your api key"

# Initialize the chat model with streaming enabled
chat_model = ChatOpenAI(streaming=True)

# Stream response chunks
for chunk in chat_model.stream("Tell me something interesting!"):
    print(chunk)  # Outputs each response chunk in real-time
