# Exploring LLMs and ChatModels for LLM Input / Output with LangChain

## Install OpenAI, HuggingFace and LangChain dependencies

In [1]:
!pip install langchain==0.3.11
!pip install langchain-openai==0.2.12
!pip install langchain-community==0.3.11
!pip install huggingface_hub==0.26.5

Collecting huggingface_hub==0.26.5
  Downloading huggingface_hub-0.26.5-py3-none-any.whl.metadata (13 kB)
Collecting filelock (from huggingface_hub==0.26.5)
  Using cached filelock-3.18.0-py3-none-any.whl.metadata (2.9 kB)
Collecting fsspec>=2023.5.0 (from huggingface_hub==0.26.5)
  Using cached fsspec-2025.5.1-py3-none-any.whl.metadata (11 kB)
Downloading huggingface_hub-0.26.5-py3-none-any.whl (447 kB)
Using cached fsspec-2025.5.1-py3-none-any.whl (199 kB)
Using cached filelock-3.18.0-py3-none-any.whl (16 kB)
Installing collected packages: fsspec, filelock, huggingface_hub
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3/3[0m [huggingface_hub] [huggingface_hub]
[1A[2KSuccessfully installed filelock-3.18.0 fsspec-2025.5.1 huggingface_hub-0.26.5


In [2]:
# Don't run if you want to use only chatgpt
# This is for accessing open LLMs from huggingface
!pip install transformers==4.46.3

Collecting transformers==4.46.3
  Downloading transformers-4.46.3-py3-none-any.whl.metadata (44 kB)
Collecting tokenizers<0.21,>=0.20 (from transformers==4.46.3)
  Downloading tokenizers-0.20.3-cp311-cp311-macosx_11_0_arm64.whl.metadata (6.7 kB)
Collecting safetensors>=0.4.1 (from transformers==4.46.3)
  Downloading safetensors-0.5.3-cp38-abi3-macosx_11_0_arm64.whl.metadata (3.8 kB)
Downloading transformers-4.46.3-py3-none-any.whl (10.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m [36m0:00:01[0m
[?25hDownloading tokenizers-0.20.3-cp311-cp311-macosx_11_0_arm64.whl (2.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading safetensors-0.5.3-cp38-abi3-macosx_11_0_arm64.whl (418 kB)
Installing collected packages: safetensors, tokenizers, transformers
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3/3[0m [transfo

In [4]:
# dbutils.library.restartPython()

## Enter API Tokens

#### Enter your Open AI Key here

You can get the key from [here](https://platform.openai.com/api-keys) after creating an account or signing in

In [5]:
from getpass import getpass

OPENAI_KEY = getpass('Enter Open AI API Key: ')

In [6]:
import os

os.environ['OPENAI_API_KEY'] = OPENAI_KEY

#### Enter your HuggingFace token here

You can get the key from [here](https://huggingface.co/settings/tokens) after creating an account or signing in. This is free.

In [8]:
# skip if only using chatgpt
from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass('Please enter your HuggingFace Token here: ')

## Setup necessary system environment variables

In [9]:
import os

os.environ['HUGGINGFACEHUB_API_TOKEN'] = HUGGINGFACEHUB_API_TOKEN
os.environ['OPENAI_API_KEY'] = OPENAI_KEY

# Model I/O

In LangChain, the central part of any application is the language model. This module provides crucial tools for working effectively with any language model, ensuring it integrates smoothly and communicates well.

### Key Components of Model I/O

**LLMs and Chat Models (used interchangeably):**
- **LLMs:**
  - **Definition:** Pure text completion models.
  - **Input/Output:** Receives a text string and returns a text string.
- **Chat Models:**
  - **Definition:** Based on a language model but with different input and output types.
  - **Input/Output:** Takes a list of chat messages as input and produces a chat message as output.


## Chat Models and LLMs

Large Language Models (LLMs) are a core component of LangChain. LangChain does not implement or build its own LLMs. It provides a standard API for interacting with almost every LLM out there.

There are lots of LLM providers (OpenAI, Hugging Face, etc) - the LLM class is designed to provide a standard interface for all of them.

## Accessing Commercial LLMs like ChatGPT



### Accessing ChatGPT as an LLM

Here we will show how to access a basic ChatGPT Instruct LLM. However the ChatModel interface which we will see later, is better because the LLM API doesn't support the chat models like `gpt-3.5-turbo`and only support the `instruct`models which can respond to instructions but can't have a conversation with you.

In [10]:
from langchain_openai import OpenAI

chatgpt = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0)

In [11]:
prompt = """Explain what is Generative AI in 3 bullet points"""
print(prompt)

Explain what is Generative AI in 3 bullet points


In [12]:
response = chatgpt.invoke(prompt)
print(response)



1. Generative AI is a subset of artificial intelligence that focuses on creating new and original content, rather than just analyzing and processing existing data.

2. It uses algorithms and machine learning techniques to generate new ideas, designs, or solutions based on a set of input data or parameters.

3. Generative AI has a wide range of applications, including creating art, music, and text, as well as assisting in product design and optimization. It has the potential to revolutionize industries by automating creative tasks and providing innovative solutions.


### Accessing ChatGPT as an Chat Model LLM

Here we will show how to access the more advanced ChatGPT Turbo Chat-based LLM. The ChatModel interface is better because this supports the chat models like `gpt-3.5-turbo`which can respond to instructions as well as have a conversation with you. We will look at the conversation aspect slightly later in the notebook.

In [13]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [14]:
prompt = """Explain what is Generative AI in 3 bullet points"""
print(prompt)

Explain what is Generative AI in 3 bullet points


In [17]:
response = chatgpt.invoke(prompt)
print(response)

content='- Generative AI is a type of artificial intelligence that is capable of creating new content, such as images, text, or music, based on patterns and data it has been trained on.\n- It uses algorithms and neural networks to generate this content, often mimicking the style or characteristics of the data it has been exposed to.\n- Generative AI has a wide range of applications, from creating realistic images for video games to generating personalized recommendations for users based on their preferences.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 95, 'prompt_tokens': 19, 'total_tokens': 114, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run--b9172a50-a821-4094-bfbe-8

In [18]:
print(response.content)

- Generative AI is a type of artificial intelligence that is capable of creating new content, such as images, text, or music, based on patterns and data it has been trained on.
- It uses algorithms and neural networks to generate this content, often mimicking the style or characteristics of the data it has been exposed to.
- Generative AI has a wide range of applications, from creating realistic images for video games to generating personalized recommendations for users based on their preferences.


In [19]:
!pip install langchain_google_genai

Collecting langchain_google_genai
  Downloading langchain_google_genai-2.1.5-py3-none-any.whl.metadata (5.2 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain_google_genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting google-ai-generativelanguage<0.7.0,>=0.6.18 (from langchain_google_genai)
  Downloading google_ai_generativelanguage-0.6.18-py3-none-any.whl.metadata (9.8 kB)
Downloading langchain_google_genai-2.1.5-py3-none-any.whl (44 kB)
Downloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Downloading google_ai_generativelanguage-0.6.18-py3-none-any.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: filetype, google-ai-generativelanguage, langchain_google_genai
[2K  Attempting uninstall: google-ai-generativelanguage
[2K    Found existing installation: google-ai-generativelanguage 0.6.10
[2K    Uninstalling googl

In [20]:
from langchain_google_genai import ChatGoogleGenerativeAI

gemini = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0)

DefaultCredentialsError: Your default credentials were not found. To set up Application Default Credentials, see https://cloud.google.com/docs/authentication/external/set-up-adc for more information.

In [0]:
prompt = """Explain what is Generative AI in 3 bullet points"""
print(prompt)

In [0]:
response = gemini.invoke(prompt)
response

In [0]:
print(response.content)

## Accessing Open Source LLMs with HuggingFace and LangChain

### Accessing Open LLMs with HuggingFace Serverless API

The free [serverless API](https://huggingface.co/inference-api/serverless) lets you implement solutions and iterate in no time, but it may be rate limited for heavy use cases, since the loads are shared with other requests.

For enterprise workloads, you can use Inference Endpoints - Dedicated which would be hosted on a specific cloud instance of your choice and would have a cost associated with it. Here we will use the free serverless API which works quite well in most cases.

The advantage is you do not need to download the models or run them locally on a GPU compute infrastructure which takes time and also would cost you a fair amount.

#### Accessing Microsoft Phi-3 Mini Instruct

The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. Check more details [here](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)

In [0]:
# !pip install langchain_huggingface

In [0]:
from langchain_huggingface import HuggingFaceEndpoint

repo_id = "microsoft/Phi-3.5-mini-instruct"

phi3_params = {
                  "wait_for_model": True, # waits if model is not available in Hugginface serve
                  "do_sample": False, # greedy decoding - temperature = 0
                  "return_full_text": False, # don't return input prompt
                  "max_new_tokens": 1000, # max tokens answer can go upto
                }

llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    # max_length=128,
    temperature=0.5,
    huggingfacehub_api_token="",
   **phi3_params
)

In [0]:
# Phi3 expects input prompt to be formatted in a specific way
# check more details here: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
phi3_prompt = """<|user|>Explain what is Generative AI in 3 bullet points<|end|>
<|assistant|>"""
print(phi3_prompt)

In [0]:
response = llm.invoke(phi3_prompt)
print(response)

#### Accessing Google Gemma 2B Instruct

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure. Check more details [here](https://huggingface.co/google/gemma-1.1-2b-it)

In [0]:
gemma_repo_id = "google/gemma-2b-it"

gemma_params = {
                  "wait_for_model": True, # waits if model is not available in Hugginface serve
                  "do_sample": False, # greedy decoding - temperature = 0
                  "return_full_text": False, # don't return input prompt
                  "max_new_tokens": 1000, # max tokens answer can go upto
                }

llm = HuggingFaceEndpoint(
    repo_id=gemma_repo_id,
    **gemma_params
)

In [0]:
prompt

In [0]:
response = llm.invoke(prompt)
print(response)

### Accessing Local LLMs with HuggingFacePipeline API

Hugging Face models can be run locally through the `HuggingFacePipeline` class. However remember you need a good GPU to get fast inference

The Hugging Face Model Hub hosts over 500k models, 90K+ open LLMs

These can be called from LangChain either through this local pipeline wrapper or by calling their hosted inference endpoints through the `HuggingFaceEndpoint` API we saw earlier.

To use, you should have the `transformers` python package installed, as well as `pytorch`.

Advantages include the model being completely local, high privacy and security. Disadvantages are basically the necessity of a good compute infrastructure, preferably with a GPU

#### Accessing Google Gemma 2B and running it locally

In [0]:
from langchain_huggingface import HuggingFacePipeline

In [0]:
gemma_params = {
                  "do_sample": False, # greedy decoding - temperature = 0
                  "return_full_text": False, # don't return input prompt
                  "max_new_tokens": 1000, # max tokens answer can go upto
                }

local_llm = HuggingFacePipeline.from_model_id(
    model_id="google/gemma-1.1-2b-it",
    task="text-generation",
    pipeline_kwargs=gemma_params,
    # device=0 # when running on Colab selects the GPU, you can change this if you run it on your own instance if needed
)

In [0]:
local_llm

In [0]:
prompt

In [0]:
# Gemma2B when used locally expects input prompt to be formatted in a specific way
# check more details here: https://huggingface.co/google/gemma-1.1-2b-it#chat-template
gemma_prompt = """<bos><start_of_turn>user\n""" + prompt + """\n<end_of_turn>
<start_of_turn>model
"""
print(gemma_prompt)

In [0]:
response = local_llm.invoke(gemma_prompt)
print(response)

### Accessing Open LLMs in HuggingFace as a Chat Model LLM

Here we will show how to access open LLMs from HuggingFace like Google Gemma 2B and make them have a conversation with you. We will look at the conversation aspect slightly later in the notebook.

In [0]:
from langchain_huggingface import ChatHuggingFace

chat_gemma = ChatHuggingFace(llm=llm,
                             model_id='google/gemma-1.1-2b-it')

In [0]:
print(response.content)

## Message Types for ChatModels and Conversational Prompting

Conversational prompting is basically you, the user, having a full conversation with the LLM. The conversation history is typically represented as a list of messages.

ChatModels process a list of messages, receiving them as input and responding with a message. Messages are characterized by a few distinct types and properties:

- **Role:** Indicates who is speaking in the message. LangChain offers different message classes for various roles.
- **Content:** The substance of the message, which can vary:
  - A string (commonly handled by most models)
  - A list of dictionaries (for multi-modal inputs, where each dictionary details the type and location of the input)

Additionally, messages have an `additional_kwargs` property, used for passing extra information specific to the message provider, not typically general. A well-known example is `function_call` from OpenAI.

### Specific Message Types

- **HumanMessage:** A user-generated message, usually containing only content.
- **AIMessage:** A message from the model, potentially including `additional_kwargs`, like `tool_calls` for invoking OpenAI tools.
- **SystemMessage:** A message from the system instructing model behavior, typically containing only content. Not all models support this type.


## Conversational Prompting with ChatGPT

Here we use the `ChatModel` API in `ChatOpenAI` to have a full conversation with ChatGPT while maintaining a full flow of the historical conversations

In [21]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

In [22]:
from langchain_core.messages import HumanMessage, SystemMessage

prompt = """Can you explain what is Generative AI in 3 bullet points?"""
sys_prompt = """Act as a helpful assistant and give meaningful examples in your responses."""
messages = [
    SystemMessage(content=sys_prompt),
    HumanMessage(content=prompt),
]

messages

[SystemMessage(content='Act as a helpful assistant and give meaningful examples in your responses.', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Can you explain what is Generative AI in 3 bullet points?', additional_kwargs={}, response_metadata={})]

In [23]:
type(messages)

list

In [24]:
response = chatgpt.invoke(messages)
response

AIMessage(content='Certainly! Here are three key points that explain Generative AI:\n\n1. **Definition and Functionality**: Generative AI refers to a class of artificial intelligence models that can create new content, such as text, images, music, or even videos, by learning patterns from existing data. For example, models like GPT-3 can generate human-like text based on prompts, while DALL-E can create images from textual descriptions.\n\n2. **Applications**: Generative AI has a wide range of applications across various fields. In creative industries, it can assist in generating artwork, writing scripts, or composing music. In business, it can be used for generating marketing content, automating customer service responses, or even creating personalized product recommendations.\n\n3. **Ethical Considerations**: The use of Generative AI raises important ethical questions, such as issues of copyright, misinformation, and the potential for misuse. For instance, deepfake technology can cre

In [25]:
print(response.content)

Certainly! Here are three key points that explain Generative AI:

1. **Definition and Functionality**: Generative AI refers to a class of artificial intelligence models that can create new content, such as text, images, music, or even videos, by learning patterns from existing data. For example, models like GPT-3 can generate human-like text based on prompts, while DALL-E can create images from textual descriptions.

2. **Applications**: Generative AI has a wide range of applications across various fields. In creative industries, it can assist in generating artwork, writing scripts, or composing music. In business, it can be used for generating marketing content, automating customer service responses, or even creating personalized product recommendations.

3. **Ethical Considerations**: The use of Generative AI raises important ethical questions, such as issues of copyright, misinformation, and the potential for misuse. For instance, deepfake technology can create realistic but fake vi

In [26]:
# add the past conversation history into messages
messages.append(response)
# add the new prompt to the conversation history list
prompt = """What did we discuss so far?"""
messages.append(HumanMessage(content=prompt))
messages

[SystemMessage(content='Act as a helpful assistant and give meaningful examples in your responses.', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Can you explain what is Generative AI in 3 bullet points?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Certainly! Here are three key points that explain Generative AI:\n\n1. **Definition and Functionality**: Generative AI refers to a class of artificial intelligence models that can create new content, such as text, images, music, or even videos, by learning patterns from existing data. For example, models like GPT-3 can generate human-like text based on prompts, while DALL-E can create images from textual descriptions.\n\n2. **Applications**: Generative AI has a wide range of applications across various fields. In creative industries, it can assist in generating artwork, writing scripts, or composing music. In business, it can be used for generating marketing content, automating customer service re

In [27]:
# sent the conversation history along with the new prompt to chatgpt
response = chatgpt.invoke(messages)
response.content

'So far, we discussed the concept of Generative AI, highlighting three key points:\n\n1. **Definition and Functionality**: Generative AI creates new content by learning patterns from existing data, with examples like GPT-3 for text and DALL-E for images.\n  \n2. **Applications**: It has diverse applications in creative industries (art, writing, music) and business (marketing content, customer service, personalized recommendations).\n\n3. **Ethical Considerations**: The technology raises ethical issues, including copyright concerns, misinformation, and potential misuse, such as in the case of deepfakes.\n\nIf you have any further questions or need more information, feel free to ask!'

## Conversational Prompting with Open LLMs via HuggingFace

Here we use the `ChatModel` API in `ChatHuggingFace` to have a full conversation with any open LLMs while maintaining a full flow of the historical conversations. Here we use the Google Gemma 2B LLM.

In [30]:
!pip install langchain_huggingface

Collecting langchain_huggingface
  Downloading langchain_huggingface-0.3.0-py3-none-any.whl.metadata (996 bytes)
Collecting langchain-core<1.0.0,>=0.3.65 (from langchain_huggingface)
  Downloading langchain_core-0.3.66-py3-none-any.whl.metadata (5.8 kB)
Collecting huggingface-hub>=0.30.2 (from langchain_huggingface)
  Downloading huggingface_hub-0.33.1-py3-none-any.whl.metadata (14 kB)
Collecting langsmith>=0.3.45 (from langchain-core<1.0.0,>=0.3.65->langchain_huggingface)
  Downloading langsmith-0.4.2-py3-none-any.whl.metadata (15 kB)
Collecting hf-xet<2.0.0,>=1.1.2 (from huggingface-hub>=0.30.2->langchain_huggingface)
  Downloading hf_xet-1.1.5-cp37-abi3-macosx_11_0_arm64.whl.metadata (879 bytes)
Downloading langchain_huggingface-0.3.0-py3-none-any.whl (27 kB)
Downloading langchain_core-0.3.66-py3-none-any.whl (438 kB)
Downloading huggingface_hub-0.33.1-py3-none-any.whl (515 kB)
Downloading hf_xet-1.1.5-cp37-abi3-macosx_11_0_arm64.whl (2.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [31]:
# not needed if you are only running chatgpt
from langchain_huggingface import ChatHuggingFace

chat_gemma = ChatHuggingFace(llm=llm,
                             model_id='google/gemma-1.1-2b-it')

NameError: name 'llm' is not defined

In [0]:
# this runs prompts using the open LLM - however gemma doesnt support a system prompt
prompt = """Explain Deep Learning in 3 bullet points"""

messages = [
    HumanMessage(content=prompt),
]

response = chat_gemma.invoke(messages) # doesn't support system prompts
messages.append(response)
print(response.content)

In [0]:
# this runs prompts using the open LLM - however gemma doesnt support a system prompt
prompt = """Explain Deep Learning in 3 bullet points"""

messages = [
    HumanMessage(content=prompt),
]

response = chat_gemma.invoke(messages) # doesn't support system prompts
messages.append(response)
print(response.content)

In [0]:
messages

In [0]:
# formatting prompt is automatically done inside the chatmodel
# formats in this syntax: https://huggingface.co/google/gemma-1.1-2b-it#chat-template
print(chat_gemma._to_chat_prompt([messages[0]]))

In [0]:
prompt = """Now do the same for Machine learning"""
messages.append(HumanMessage(content=prompt))

response = chat_gemma.invoke(messages) # doesn't support system prompts
print(response.content)

In [0]:
from huggingface_hub import InferenceClient

In [0]:
import huggingface_hub

In [0]:
huggingface_hub.__version__

In [0]:
client = InferenceClient(model='google/gemma-1.1-2b-it')

In [0]:
for message in client.chat_completion(messages=[{'role': 'user', 'content': 'waht is the capital of france?'}], 
                                      max_tokens=200, stream=True):
    print(message.choices[0].delta.content)