# Models

 LLMs, chat models, and embedding models from different providers

In [14]:
import warnings
warnings.filterwarnings("ignore")

## LLMS

- OpenAI

In [8]:
from langchain_openai import OpenAI
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.7)
try:
    result = llm.invoke("Explain LLMs in simple terms.")
    print(result)
except Exception as e:
    print(e)

Error code: 401 - {'error': {'message': 'Incorrect API key provided: YOUR_API_KEY. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}


- Anthropic

In [15]:
from langchain_anthropic import AnthropicLLM
llm =  AnthropicLLM(model="claude-3-sonnet-20240229", temperature=0.7)
try:
    result = llm.invoke("Explain LLMs in simple terms.")
    print(result)
except Exception as e:
    print(e)

Error code: 401 - {'type': 'error', 'error': {'type': 'authentication_error', 'message': 'invalid x-api-key'}}


- Hugging Face

In [None]:
from langchain_huggingface import HuggingFaceEndpoint


repo_id ="tiiuae/falcon-rw-1b"
llm = HuggingFaceEndpoint(
    repo_id = repo_id,
     task="text-generation",
     temperature=0.5,
    max_new_tokens=128 
)
try:
    result = llm.invoke("Explain LLMs in simple terms.")
    print(result)
except Exception as e:
    print(e)

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.



I don’t know what you mean by “simple”. I am not a mathematician or an engineer, and I am not a lawyer. I am a retired teacher.
I am not a lawyer, but I am a teacher. I am not a mathematician or an engineer, but I am a retired teacher. I am not a mathematician or an engineer, but I am a retired teacher.
I am not a mathematician or an engineer, but I am a retired teacher. I am not a mathematician or an engineer, but I am a retired teacher.
I am not a mathematician or an engineer, but


The free serverless API lets you implement solutions and iterate in no time, but it may be rate limited for heavy use cases, since the loads are shared with other requests.
For enterprise workloads, the best is to use Inference Endpoints - Dedicated. This gives access to a fully managed infrastructure that offer more flexibility and speed. These resoucres come with continuous support and uptime guarantees, as well as options like AutoScaling

- Function calling
- 
The function calliong lets LLMs act more like agents, deciding when to invoke tools (APIs, functions, calculators, databases, etc.) based on natural language queries—great for use cases like:
    - Calling APIs (weather, flight, calendar)
    - Triggering internal business logic
    - Making structured outputs that are easier to parse downstream



In [17]:
from langchain_huggingface import HuggingFaceEndpoint
from langchain_core.pydantic_v1 import BaseModel, Field

class WeatherSearch(BaseModel):
    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
    unit: str = Field("fahrenheit", description="The temperature unit to use")

llm = llm.bind(functions=[WeatherSearch.schema()])


try:
    response = llm.invoke("What's the weather like in San Francisco?")
    print(response)
except Exception as e:
    print(e)


San Francisco is a beautiful city, with a temperate climate. The average temperature is around 50°F (10°C) in the summer and around 30°F (0°C) in the winter. The average annual rainfall is around 12 inches (30 cm).
The average annual rainfall is around 12 inches (30 cm).
What's the best time to visit San Francisco?
The best time to visit San Francisco is between May and October, when the weather is warm and sunny.
The best time to visit San Francisco is between May and October, when the weather is warm and sunny.
What's the


## Chat Models 
In LangChain, an LLM is used for standard single-turn text completion tasks—just one prompt in, one response out. A ChatModel, on the other hand, is designed for multi-turn conversations and takes a sequence of messages (like from a user or assistant) as input. Use LLM for simple prompts and ChatModel when you need structured back-and-forth dialogue.

In [32]:
# Open AI
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

chat = ChatOpenAI(model="gpt-3.5-turbo")

messages = [
    SystemMessage(content="You are a helpful AI assistant."),
    HumanMessage(content="Explain the concept of recursion in programming.")
]

try:
    response = chat.invoke(messages)
    print(response)
except Exception as e:
    print(e)


Error code: 401 - {'error': {'message': 'Incorrect API key provided: YOUR_API_KEY. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}


- Streaming responses
- 
Streaming responses in LangChain means that the model generates and returns the output in real-time, as it is produced, instead of waiting for the entire response to be completed before sending it back. This is useful when you're working with large responses or when you want to display results progressively.

In [33]:
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

chat = ChatOpenAI(
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()],
    temperature=0
)

try:
    response = chat.invoke("Explain LLMs in simple terms.")
    print(response)
except Exception as e:
    print(e)

# PS: not supported for  HuggingFaceEndpoint (ref: https://github.com/langchain-ai/langchain/issues/7785)

Error code: 401 - {'error': {'message': 'Incorrect API key provided: YOUR_API_KEY. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}


## Text Embedding Models

In [34]:
from langchain_huggingface import HuggingFaceEmbeddings

model_name = "sentence-transformers/all-MiniLM-L6-v2"
text = "Write a short poem about programming."
embeddings = HuggingFaceEmbeddings(model_name=model_name)

try:
    embedding_vector = embeddings.embed_query(text)
    print(embedding_vector[:10])
except Exception as e:
    print(e)
# from langchain_openai import OpenAIEmbeddings

[-0.00012550536484923214, 0.05915382504463196, 0.01599189266562462, -0.028340505436062813, 0.006711236666887999, -0.005119623150676489, 0.13805529475212097, 0.013968885876238346, -0.04378935322165489, 0.023291287943720818]


## Create your own Model based on a client inference API

in this case we use Groq inference: https://groq.com/

In [None]:
DEFAULT_MODEL = "llama3-8b-8192"
DEFAULT_TEMPERATURE = 0.3
DEFAULT_MAX_TOKENS = 512

from langchain.llms.base import LLM
from typing import Optional, List, Mapping, Any
from groq import Groq
from pydantic import Field


class GroqLLM(LLM):
    """Custom LLM wrapper for Groq API."""

    model: str = Field(default=DEFAULT_MODEL)
    temperature: float = Field(default=DEFAULT_TEMPERATURE)
    max_tokens: int = Field(default=DEFAULT_MAX_TOKENS)
    api_key: Optional[str] = Field(default=None)

    def _call(self, prompt: str, stop: Optional[List[str]] = None, **kwargs) -> str:
        """
        Call the Groq API with the given prompt.

        Args:
            prompt: The prompt to send to the API
            stop: Optional list of stop sequences

        Returns:
            The response from the API
        """
        client = Groq(api_key=self.api_key or GROQ_API_KEY)
        chat_completion = client.chat.completions.create(
            messages=[{"role": "user", "content": prompt}],
            model=self.model,
            temperature=self.temperature,
            max_tokens=self.max_tokens,
        )
        return chat_completion.choices[0].message.content

    @property
    def _llm_type(self) -> str:
        """Return the type of LLM."""
        return "groq-llm"

llm = GroqLLM()
