# Project start

I will use the openAI API. Specifically, the gpt-3.5-turbo-instruct model.

You can change the template as you like.

In [None]:
from langchain import OpenAI
llm = OpenAI(model="gpt-3.5-turbo-instruct")

You can create prompts and trigger your model however you like. Follow these steps:

In [2]:
prompt = "What is the capital of France?"
llm.invoke(prompt)

'\n\nThe capital of France is Paris.'

### Using the stream() method

You can use the string() method inside a for structure in python to have the model generate one sentence at a time. Follow these steps:

In [4]:
prompt = "Create a haiku about the ocean."
for step in llm.stream(prompt):
    print(step, end="", flush=True)




Gentle waves caress
Whispers of secrets untold
Eternal embrace.

Asking questions in batches to the model:

In [5]:
# Batch processing
questions = [
    "What is the capital of France?",
    "What is the capital of Germany?",
    "What is the capital of Italy?"
    ]
# This will return a list of responses
llm.batch(questions)


['\n\nThe capital of France is Paris.',
 '\n\nThe capital of Germany is Berlin.',
 '\n\nThe capital of Italy is Rome.']

### Chat models

In LangChain, chat models are abstractions that provide a structured interface for interacting with conversational large language models (LLMs). Unlike traditional LLMs that operate on plain text inputs and outputs, chat models handle sequences of messages, each associated with a role (e.g., "system", "user", "assistant"), facilitating more natural and context-aware interactions .

In [7]:
from langchain_openai import ChatOpenAI

chat = ChatOpenAI(model="gpt-3.5-turbo-0125")

In [8]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You are an assistant who responds with irony"),
    HumanMessage(content="What is the role of cache memory?")
]

answers = chat.invoke(messages)

The model response:

In [11]:
answers.content

"Oh, cache memory? It's just there to take up space and make your computer look fancy. Totally not important for speeding up data access or anything like that."

Observations on metadata


In [12]:
answers.response_metadata

{'token_usage': {'completion_tokens': 33,
  'prompt_tokens': 27,
  'total_tokens': 60,
  'completion_tokens_details': {'accepted_prediction_tokens': 0,
   'audio_tokens': 0,
   'reasoning_tokens': 0,
   'rejected_prediction_tokens': 0},
  'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}},
 'model_name': 'gpt-3.5-turbo-0125',
 'system_fingerprint': None,
 'finish_reason': 'stop',
 'logprobs': None}

### Prompt few shots

In LangChain (and more broadly in the context of using LLMs), "prompt few-shot" refers to a technique where examples of input-output pairs are included in the prompt to guide the language model in generating more accurate or relevant responses. This is a part of few-shot learning, where the model is shown a few examples before making a prediction or generating a response.

In [13]:
from langchain_openai import ChatOpenAI

chat = ChatOpenAI()

In [15]:
from langchain_core.messages import HumanMessage, AIMessage

messages = [
    HumanMessage(content="what is the first day of the week?"),
    AIMessage(content="sunday"),
    HumanMessage(content="what is the third day of the week?"),
    AIMessage(content="Tuesday"),
    HumanMessage(content="What is the last day of the week?"),
]

chat.invoke(messages)

AIMessage(content='Saturday', response_metadata={'token_usage': {'completion_tokens': 1, 'prompt_tokens': 53, 'total_tokens': 54, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-096078c2-709d-45e7-b900-958c85339f4a-0')

### langchain debug

You can use langchain debug to see what is happening at the low level of execution.

In [16]:
import langchain

langchain.debug = True
chat.invoke(messages)

[32;1m[1;3m[llm/start][0m [1m[llm:ChatOpenAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Human: what is the first day of the week?\nAI: sunday\nHuman: what is the third day of the week?\nAI: Tuesday\nHuman: What is the last day of the week?"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[llm:ChatOpenAI] [1.08s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "Sunday",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "Sunday",
            "response_metadata": {
              "token_usage": {
                "completion_tokens": 1,
                "prompt_tokens": 53,
                "total_tokens": 54,
                "

AIMessage(content='Sunday', response_metadata={'token_usage': {'completion_tokens': 1, 'prompt_tokens': 53, 'total_tokens': 54, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-95eb2123-e4a1-409d-99c5-a43e2d09b9aa-0')

In [17]:
langchain.debug = False

## Caching

In LangChain, caching refers to storing the outputs of expensive LLM calls so they don’t need to be recomputed when the same input is used again. This can significantly improve performance, reduce latency, and save on API costs, especially during development or when working with large prompts.

In [18]:
from langchain_openai.chat_models import ChatOpenAI

chat = ChatOpenAI(model="gpt-3.5-turbo-0125")

In [None]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You are an ironic assistant."),
    HumanMessage(content="What is the fifth day of the week?")
]

In [20]:
from langchain.cache import InMemoryCache
from langchain.globals import set_llm_cache

set_llm_cache(InMemoryCache())

In [21]:
%%time
chat.invoke(messages)

CPU times: total: 46.9 ms
Wall time: 1.17 s


AIMessage(content='The fifth day of the week is obviously... Thursday! Or is it Friday? Oh, the suspense!', response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 26, 'total_tokens': 47, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-21835b49-524f-422b-8c51-01260d5fb5dc-0')

Now the information is in memory. Let's test it again.

In [26]:
%%time
chat.invoke(messages)

CPU times: total: 0 ns
Wall time: 2.99 ms


AIMessage(content='The fifth day of the week is obviously... Thursday! Or is it Friday? Oh, the suspense!', response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 26, 'total_tokens': 47, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-21835b49-524f-422b-8c51-01260d5fb5dc-0')

Using the cache tool makes our applications much faster.

Additionally, we can use database caching.

In [27]:
from langchain.cache import SQLiteCache
from langchain.globals import set_llm_cache


set_llm_cache(SQLiteCache(database_path="files/langchain_cache.sqlite"))

In [28]:
%%time
chat.invoke(messages)

CPU times: total: 15.6 ms
Wall time: 1.15 s


AIMessage(content='Ah, the elusive fifth day of the week that seems to evade many people\'s calendars. Some may call it "Friday," but who really knows for sure?', response_metadata={'token_usage': {'completion_tokens': 32, 'prompt_tokens': 26, 'total_tokens': 58, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-5a5cf9a9-b737-4125-8dcd-53fe3aaf4afa-0')

the information is saved in our database.

In [29]:
%%time
chat.invoke(messages)

CPU times: total: 406 ms
Wall time: 448 ms


AIMessage(content='Ah, the elusive fifth day of the week that seems to evade many people\'s calendars. Some may call it "Friday," but who really knows for sure?', response_metadata={'token_usage': {'completion_tokens': 32, 'prompt_tokens': 26, 'total_tokens': 58, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-5a5cf9a9-b737-4125-8dcd-53fe3aaf4afa-0')