# **Models - LLM(Legacy) and ChatModel**

- A model can be a **LLM(Legacy)** or a **ChatModel**.
- LLMs(Legacy) handle various language operations such as translation, summarization, question answering, and content creation. 
- Modern LLMs are typically accessed through a chat model interface that takes a list of messages as input and returns a message as output. Chat Models are customized for conversational usage. **[Click Here](https://python.langchain.com/docs/integrations/chat/)** to check the complete list of LLMs which can be used with LangChain.
- The output of a ChatModel (and therefore, of this chain) is a message.

The newest generation of chat models offer additional capabilities:
1. [Tool calling](https://python.langchain.com/docs/concepts/tool_calling/): Many popular chat models offer a native tool calling API. This API allows developers to build rich applications that enable LLMs to interact with external services, APIs, and databases. Tool calling can also be used to extract structured information from unstructured data and perform various other tasks.
2. [Structured output](https://python.langchain.com/docs/concepts/structured_outputs/): A technique to make a chat model respond in a structured format, such as JSON that matches a given schema.
3. [Multimodality](https://python.langchain.com/docs/concepts/multimodality/): The ability to work with data other than text; for example, images, audio, and video.

### **Integrations**
LangChain has many chat model integrations that allow you to use a wide variety of models from different providers. These integrations are one of two types:

1. **Official models:** These are models that are officially supported by LangChain and/or model provider. You can find these models in the **`langchain-<provider>`** packages.
2. **Community models:** There are models that are mostly contributed and supported by the community. You can find these models in the **`langchain-community`** package.

LangChain chat models are named with a convention that prefixes "Chat" to their class names (e.g., ChatOllama, ChatAnthropic, ChatOpenAI, etc.).


### **Key Methods**
The key methods of a chat model are:

1. invoke: The primary method for interacting with a chat model. It takes a list of messages as input and returns a list of messages as output.
2. stream: A method that allows you to stream the output of a chat model as it is generated.
3. batch: A method that allows you to batch multiple requests to a chat model together for more efficient processing.
4. bind_tools: A method that allows you to bind a tool to a chat model for use in the model's execution context.
5. with_structured_output: A wrapper around the invoke method for models that natively support structured output.

### **Standard Parameters**

Standard parameters are currently only enforced on integrations that have their own integration packages (e.g. langchain-openai, langchain-anthropic, etc.), they're not enforced on models in langchain-community.

| Parameter      | Description |
|--------------|-------------|
| model        | The name or identifier of the specific AI model you want to use (e.g., "gpt-3.5-turbo" or "gpt-4"). |
| temperature  | Controls the randomness of the model's output. A higher value (e.g., 1.0) makes responses more creative, while a lower value (e.g., 0.0) makes them more deterministic and focused. |
| timeout      | The maximum time (in seconds) to wait for a response from the model before canceling the request. Ensures the request doesn’t hang indefinitely. |
| max_tokens   | Limits the total number of tokens (words and punctuation) in the response. This controls how long the output can be. |
| stop         | Specifies stop sequences that indicate when the model should stop generating tokens. For example, you might use specific strings to signal the end of a response. |
| max_retries  | The maximum number of attempts the system will make to resend a request if it fails due to issues like network timeouts or rate limits. |
| api_key      | The API key required for authenticating with the model provider. This is usually issued when you sign up for access to the model. |
| base_url     | The URL of the API endpoint where requests are sent. This is typically provided by the model's provider and is necessary for directing your requests. |
| rate_limiter | An optional BaseRateLimiter to space out requests to avoid exceeding rate limits. See rate-limiting below for more details. |


## **OpenAI Chat Model**

In [1]:
# ! pip install langchain-openai -U

In [2]:
# Setup API Key

f = open('keys/.openai_api_key.txt')

OPENAI_API_KEY = f.read()

In [3]:
# Import OpenAI ChatModel
from langchain_openai import ChatOpenAI

# Set the OpenAI Key and initialize a ChatModel
chat_model = ChatOpenAI(api_key=OPENAI_API_KEY, model="gpt-4o-mini", temperature=1)

prompt = "Tell me a short joke about Data Science"

chat_model.invoke(prompt)



AIMessage(content='Why did the data scientist break up with the statistician? \n\nBecause she found him mean and not very significant!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 24, 'prompt_tokens': 15, 'total_tokens': 39, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_06737a9306', 'finish_reason': 'stop', 'logprobs': None}, id='run-2a2e01eb-97a8-4d93-ae2f-71e1e192a80e-0', usage_metadata={'input_tokens': 15, 'output_tokens': 24, 'total_tokens': 39, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

## **Google GenAI - Chat Model**

In [4]:
# ! pip install langchain-google-genai -U

In [5]:
# Setup API Key

f = open('keys/.gemini.txt')

GOOGLE_API_KEY = f.read()

In [6]:
# Import Google ChatModel
from langchain_google_genai import ChatGoogleGenerativeAI

# Set the OpenAI Key and initialize a ChatModel
chat_model = ChatGoogleGenerativeAI(api_key=GOOGLE_API_KEY, model="gemini-2.0-flash-exp", temperature=1)

prompt = "Tell me a short joke about Data Science"

chat_model.invoke(prompt)

I0000 00:00:1740421668.630050 1980547 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


AIMessage(content='Why did the data scientist break up with the statistician?\n\nBecause they kept arguing about whether correlation implied causation!', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []}, id='run-f32d6ce0-44bc-4ef4-9e88-2a7214939393-0', usage_metadata={'input_tokens': 8, 'output_tokens': 24, 'total_tokens': 32, 'input_token_details': {'cache_read': 0}})

## **HuggingFace**

In [10]:
# !pip install langchain-huggingface -U

In [11]:
from langchain_huggingface import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline(
    task="text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512
)

hf = HuggingFacePipeline(pipeline=pipe)

hf.invoke("What should I study to become a data scientist?")

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


"What should I study to become a data scientist?\n\nAs for studying, I am sure that with enough time to study it. After the year 6, study 1 will help create some interesting data. It is very important for me to learn about things like the structure of data and to see some data as it changes in the course of the study.\n\nYou have a very interesting idea of how you want to go about learning data science. What will you do?\n\nIf I have good skills in the field I can come up with the way I want to learn in less time. It will not be a perfect and I can try very hard to understand what I want it to be like. But I hope that it doesn't make it impossible for me.\n\nIf I fail in the learning I can try to take my time but at the same time I will have a lot more time. Even then it isn't as good as working on data in school or study 1 with other people. Besides that, in my experience many people are unhappy with data. Even if I succeed I can still be successful, even if I fail I am still working 