## Memory
Large Language Models 是 'stateless'
- 每個對話是獨立的
- LLM 不會自動記得你前面說過什麼

對話機器人看起來「有記憶」，其實是因為：
- 系統每次都把完整對話內容重新送進模型當作 context（上下文）
- 模型只是「讀了過去說過的話」，再根據上下文回答

1. LLM 本身無記憶。每輪互動都需要提供上下文
2. Memory 的目的是。幫你自動儲存、累積、注入上下文
3. LangChain 設計。把「記憶」模組化，方便 plug-in 使用

由於對話長度累加逐漸拉長後會增加記憶體的使用量，使 LLM 的計算成本逐漸提高，如果無限制的保存所有的對話內容顯然代價是昂貴的，因此Langchain中還提供以下幾種記憶體緩衝方式來保存對話內容

In [2]:
from langchain_ollama.llms import OllamaLLM
chat = OllamaLLM(temperature=0.0, model="llama3")
chat

OllamaLLM(model='llama3', temperature=0.0)

In [None]:
from langchain_core.chat_history import BaseChatMessageHistory, BaseModel
from langchain_core.messages import BaseMessage
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate,  MessagesPlaceholder
from pydantic import BaseModel, Field


class InMemoryHistory(BaseChatMessageHistory, BaseModel):
    """In memory implementation of chat message history."""

    messages: list[BaseMessage] = Field(default_factory=list)

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add a list of messages to the store"""
        self.messages.extend(messages)

    def clear(self) -> None:
        self.messages = []


prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant who's good at {ability}"),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{question}"),
])

chain = prompt | chat
store = {}

def get_by_session_id(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_by_session_id,
    input_messages_key="question",
    history_messages_key="history",
)


# RunnableWithMessageHistory(
#     chain, #  LLMChain / pipe
#     get_by_session_id, # Callback，根據 session_id 回傳 Memory 實例
#     input_messages_key="input", # 是 user 輸入（對話訊息）
#     history_messages_key="chat_history",#  prompt 中接收歷史訊息的變數名稱
#     history_factory_config=[] #  可選，傳給 memory 建構函數的參數，即 Callback
# )

In [None]:
print(chain_with_history.invoke(
    {"ability": "math", "question": "What does cosine mean?"},
    config={"configurable": {"session_id": "foo"}}
))

Cosine is a fundamental concept in mathematics, and I'm happy to help you understand it.

In simple terms, the cosine of an angle (usually represented by the symbol cos) is a value that describes how much of the angle is "up" or "down" from the horizontal. Think of it like this: if you're standing on a hill, the cosine would tell you how steep the hill is.

Mathematically, the cosine of an angle θ (theta) is defined as:

cos(θ) = adjacent side / hypotenuse

In a right triangle with angles θ, φ, and γ, the cosine of θ is equal to the length of the adjacent side divided by the length of the hypotenuse.

For example, if you have a right triangle with an angle of 30 degrees (π/6 radians), the cosine of that angle would be approximately 0.866 (or √3/2). This means that the angle is "up" about 86.6% from the horizontal.

Cosine has many practical applications in fields like physics, engineering, and computer graphics, where it's used to calculate distances, angles, and shapes. It's also a fu

In [9]:
print(store)

{'foo': InMemoryHistory(messages=[HumanMessage(content='What does cosine mean?', additional_kwargs={}, response_metadata={}), AIMessage(content='Cosine is a fundamental concept in mathematics, and I\'m happy to help you understand it.\n\nIn simple terms, the cosine of an angle (usually represented by the symbol cos) is a value that describes how much of the angle is "up" or "down" from the horizontal. Think of it like this: if you\'re standing on a hill, the cosine would tell you how steep the hill is.\n\nMathematically, the cosine of an angle θ (theta) is defined as:\n\ncos(θ) = adjacent side / hypotenuse\n\nIn a right triangle with angles θ, φ, and γ, the cosine of θ is equal to the length of the adjacent side divided by the length of the hypotenuse.\n\nFor example, if you have a right triangle with an angle of 30 degrees (π/6 radians), the cosine of that angle would be approximately 0.866 (or √3/2). This means that the angle is "up" about 86.6% from the horizontal.\n\nCosine has man

In [None]:
print(chain_with_history.invoke(
    {"ability": "math", "question": "1 + 1"},
    config={"configurable": {"session_id": "foo"}}
))

I think there might be some confusion! You asked what cosine means, and I provided an explanation. Then, you asked "1 + 1", which is a basic arithmetic operation.

If you meant to ask about the result of 1 + 1, the answer is simply 2!

However, if you'd like to explore how cosine fits into this equation or apply it to a specific scenario, I'm here to help. Just let me know!


In [11]:
print(store) 

{'foo': InMemoryHistory(messages=[HumanMessage(content='What does cosine mean?', additional_kwargs={}, response_metadata={}), AIMessage(content='Cosine is a fundamental concept in mathematics, and I\'m happy to help you understand it.\n\nIn simple terms, the cosine of an angle (usually represented by the symbol cos) is a value that describes how much of the angle is "up" or "down" from the horizontal. Think of it like this: if you\'re standing on a hill, the cosine would tell you how steep the hill is.\n\nMathematically, the cosine of an angle θ (theta) is defined as:\n\ncos(θ) = adjacent side / hypotenuse\n\nIn a right triangle with angles θ, φ, and γ, the cosine of θ is equal to the length of the adjacent side divided by the length of the hypotenuse.\n\nFor example, if you have a right triangle with an angle of 30 degrees (π/6 radians), the cosine of that angle would be approximately 0.866 (or √3/2). This means that the angle is "up" about 86.6% from the horizontal.\n\nCosine has man

## trim_messages

```python
%pip install transformers
```

In [22]:
from langchain_core.messages import (
    AIMessage,         # 機器人（LLM）說的話
    BaseMessage,       # 所有 message 類型的基底 class
    HumanMessage,      # 使用者說的話
    SystemMessage,     # 系統設定（如角色、規則）
    trim_messages,     # 用來修剪過長的 message（避免超過 token 限制）。使用 trim_messages 來實現保留對話的最後 n 個訊息的邏輯。當訊息數量超過 n 時，它將刪除最舊的訊息。
                        # https://python.langchain.com/api_reference/core/messages/langchain_core.messages.utils.trim_messages.html
)

messages = [
    SystemMessage("you're a good assistant, you always respond with a joke."), # 系統設定
    HumanMessage("i wonder why it's called langchain"), # 使用者說的話
    AIMessage(
        'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
    ), # LLM 說的話
    HumanMessage("and who is harrison chasing anyways"), # 使用者說的話
    AIMessage(
        "Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
    ), # LLM 說的話
    HumanMessage("why is 42 always the answer?"),
    AIMessage(
        "Because it’s the only number that’s constantly right, even when it doesn’t add up!"
    ),
    HumanMessage("What did the cow say?"),
]


In [23]:

def ollama_token_counter(message):
    # 將 BaseMessage 轉成 string
    text = message.content if hasattr(message, "content") else str(message)
    return chat.get_num_tokens(text)

"""
Like ConversationTokenBufferMemory 
"""
selected_messages_token_buffer = trim_messages(
    messages,           # 原始訊息清單（HumanMessage / AIMessage / SystemMessage 等）
    #使用 Ollama 真正的 token 計算方式來量化訊息長度（非 len()）
    #可準確控制 token 負載，避免超過模型輸入限制（例如 LLaMA2 約 4096 tokens）
    token_counter=ollama_token_counter,
    # 整體訊息最大可用 token 數，數值可能是 512、1024、或 2048 等
    # 裁切會從最後一條開始往前加，加到總 token <=  max_tokens 為止
    max_tokens=128,       # 最多 token 量
    strategy="last",    # 裁切策略：保留最後幾條（最新的對話）
    # Most chat models expect that chat history starts with either:
    # (1) a HumanMessage or
    # (2) a SystemMessage followed by a HumanMessage
    # start_on="human" makes sure we produce a valid chat history
    start_on="human",   # 對話歷史必須從 HumanMessage 開始（符合多數模型格式要求）
    # Usually, we want to keep the SystemMessage
    # if it's present in the original history.
    # The SystemMessage has special instructions for the model.
    include_system=True,# 如果原始訊息有 SystemMessage，就保留它（用於角色設定）
    allow_partial=False # 不允許只保留一部分訊息（訊息要完整裁切）
)

for msg in selected_messages_token_buffer:
    msg.pretty_print()


you're a good assistant, you always respond with a joke.

why is 42 always the answer?

Because it’s the only number that’s constantly right, even when it doesn’t add up!

What did the cow say?


In [24]:

"""
Like ConversationBufferWindowMemory  
"""
selected_messages_buffer_windows = trim_messages(
    messages,           # 原始訊息清單（HumanMessage / AIMessage / SystemMessage 等）
    token_counter=len, # 使用 Python 的 len() 函式
    max_tokens=5,       # 最多保留 5 則訊息
    strategy="last",    # 裁切策略：保留最後幾條（最新的對話）
    # Most chat models expect that chat history starts with either:
    # (1) a HumanMessage or
    # (2) a SystemMessage followed by a HumanMessage
    # start_on="human" makes sure we produce a valid chat history
    start_on="human",   # 對話歷史必須從 HumanMessage 開始（符合多數模型格式要求）
    # Usually, we want to keep the SystemMessage
    # if it's present in the original history.
    # The SystemMessage has special instructions for the model.
    include_system=True,# 如果原始訊息有 SystemMessage，就保留它（用於角色設定）
    allow_partial=False # 不允許只保留一部分訊息（訊息要完整裁切）
)

for msg in selected_messages_token_buffer:
    msg.pretty_print()


you're a good assistant, you always respond with a joke.

why is 42 always the answer?

Because it’s the only number that’s constantly right, even when it doesn’t add up!

What did the cow say?


## 參考
- https://myapollo.com.tw/blog/langchain-configure-chain-at-runtime/