## Token Setting


In [1]:
import os
from dotenv import load_dotenv


env_file = "../.env"
if os.path.exists(env_file):
    load_dotenv()
else:
    os.environ["OPENAI_API_KEY"] = "<YOUR-OPENAI-API-KEY>"
    os.environ["HF_TOKEN"] = "<YOUR-HUGGINGFACE-API-KEY>"

## Model setting


In [2]:
from langchain_community.chat_models import ChatLiteLLM
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage


# openai_llm = ChatLiteLLM(model="openai/gpt-3.5-turbo")
openai_llm = ChatOpenAI(model="gpt-3.5-turbo")
messages = [HumanMessage(content="what model are you")]
openai_llm.invoke(messages).pretty_print()


I am a language model developed by OpenAI called GPT-3 (Generative Pre-trained Transformer 3).


In [3]:
hug_llm = ChatLiteLLM(
    model="huggingface/meta-llama/Llama-3.1-8B-Instruct",
    temperature=0.2,
)
messages = [HumanMessage(content="what model are you")]
hug_llm.invoke(messages).pretty_print()


I’m a large language model. When you ask me a question or provide me with a prompt, I analyze what you say and generate a response that is relevant and accurate. I'm constantly learning and improving, so over time I'll be even better at assisting you. Is there anything I can help you with?


### Supported parameters

With `litellm.get_supported_openai_params` function, check the parameters that our llm endpoint can use.

In [4]:
from litellm import get_supported_openai_params

response = get_supported_openai_params("gpt-3.5-turbo")

print(response)

['frequency_penalty', 'logit_bias', 'logprobs', 'top_logprobs', 'max_tokens', 'max_completion_tokens', 'modalities', 'prediction', 'n', 'presence_penalty', 'seed', 'stop', 'stream', 'stream_options', 'temperature', 'top_p', 'tools', 'tool_choice', 'function_call', 'functions', 'max_retries', 'extra_headers', 'parallel_tool_calls', 'response_format', 'user']


`gpt-3.5-turbo` supports various of functions, and in this tutorial series we will check how the tool-related functions are working.
- `"frequency_penalty"`
- `"logit_bias"`
- `"logprobs"`
- `"top_logprobs"`
- `"max_tokens"`
- `"max_completion_tokens"`
- `"modalities"`
- `"prediction"`
- `"n"`
- `"presence_penalty"`
- `"seed"`
- `"stop"`
- `"stream"`
- `"stream_options"`
- `"temperature"`
- `"top_p"`
- `"tools"`
- `"tool_choice"`
- `"function_call"`
- `"functions"`
- `"max_retries"`
- `"extra_headers"`
- `"parallel_tool_calls"`
- `"response_format"`
- `"user"`

In [5]:
from litellm import get_supported_openai_params

response = get_supported_openai_params("huggingface/meta-llama/Llama-3.1-8B-Instruct")

print(response)
# ['stream', 'temperature', 'max_tokens', 'max_completion_tokens', 'top_p', 'stop', 'n', 'echo']

['stream', 'temperature', 'max_tokens', 'max_completion_tokens', 'top_p', 'stop', 'n', 'echo']


On the other hand, huggingface endpoint does not support tool-related functions.

## `stop` function

In this tutorial we will check stop function.
Both OpenAI and Huggingface support about stop function.
When given token is generated llm stops generating.

To use this `stop` function in langchain, we have to use `bind` attribute.
In practice, bind are used to bind structured outputs, tools, functions.
These are used with `**kwargs` form and have to define about such items.
But `stop` function doesn't require any other.

- Reference
    - [Langchain document](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI.bind)

With out `stop` function llm will generate all the given words.

In [6]:
from langchain_core.output_parsers import StrOutputParser

# Without bind.
chain = openai_llm | StrOutputParser()

response = chain.invoke("Repeat quoted words exactly: 'One two three four five.'")
print(response)
# Output is 'One two three four five.'

'One two three four five.'


But when we bind stop functions, llm stops generating the words after given word.

In [7]:
# With bind.
chain = openai_llm.bind(stop=["three"]) | StrOutputParser()

response = chain.invoke("Repeat quoted words exactly: 'One two three four five.'")
print(response)
# Output is 'One two'

'One two 


Differ from OpenAI, Huggingface stops generating after the given token is generated.

In [8]:
# Without bind.
chain = hug_llm | StrOutputParser()

response = chain.invoke("Repeat quoted words exactly: 'One two three four five.'")
print(response)
# Output is 'One two three four five.'

"One two three four five."


In [9]:
# With bind.
chain = hug_llm.bind(stop=["three"]) | StrOutputParser()

response = chain.invoke("Repeat quoted words exactly: 'One two three four five.'")
print(response)
# Output is 'One two'

"One two three
