## Streaming의 이유
- 빠른 답변 
- 적응형 상호작용
- 효율성

### 1. Stdout 사용
- streaming=True
- StreamingStdOutCallbackHandler와 같이 토큰이 생성될 때 그대로 출력

In [1]:
import os
from langchain.chat_models import ChatOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

In [2]:
import os
OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')

In [3]:
llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    temperature=0.0,
    model_name="gpt-3.5-turbo",
    streaming=True,  # ! important
    callbacks=[StreamingStdOutCallbackHandler()]  # ! important
)

  warn_deprecated(


In [4]:
from langchain.schema import HumanMessage

messages = [HumanMessage(content="tell me a long story")]

llm(messages)

  warn_deprecated(


Once upon a time, in a small village nestled in the mountains, there lived a young girl named Elara. Elara was known throughout the village for her kindness and generosity, always willing to help those in need. She lived with her parents in a cozy cottage at the edge of the forest, where she spent her days exploring the woods and playing with the animals that called it home.

One day, while out on a walk, Elara stumbled upon a wounded deer. The poor creature had been caught in a trap set by hunters and was in desperate need of help. Without hesitation, Elara carefully freed the deer from the trap and tended to its wounds, using herbs and plants she had learned about from her mother, who was a skilled healer.

As the days passed, Elara nursed the deer back to health, and the two formed a deep bond. The deer, whom Elara named Luna, became her constant companion, following her wherever she went and protecting her from harm. Together, they roamed the forest, exploring its hidden wonders an

AIMessage(content="Once upon a time, in a small village nestled in the mountains, there lived a young girl named Elara. Elara was known throughout the village for her kindness and generosity, always willing to help those in need. She lived with her parents in a cozy cottage at the edge of the forest, where she spent her days exploring the woods and playing with the animals that called it home.\n\nOne day, while out on a walk, Elara stumbled upon a wounded deer. The poor creature had been caught in a trap set by hunters and was in desperate need of help. Without hesitation, Elara carefully freed the deer from the trap and tended to its wounds, using herbs and plants she had learned about from her mother, who was a skilled healer.\n\nAs the days passed, Elara nursed the deer back to health, and the two formed a deep bond. The deer, whom Elara named Luna, became her constant companion, following her wherever she went and protecting her from harm. Together, they roamed the forest, explorin

#### process
- memory 설정
- tool: llm_math 설정
- agent 초기화: 생성되면 멈춤 

In [5]:
from langchain.memory import ConversationBufferWindowMemory
from langchain.agents import load_tools, AgentType, initialize_agent

# initialize conversational memory
memory = ConversationBufferWindowMemory(
    memory_key="chat_history",
    k=5,
    return_messages=True,
    output_key="output"
)

# create a single tool to see how it impacts streaming
tools = load_tools(["llm-math"], llm=llm)

# initialize the agent
agent = initialize_agent(
    agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
    tools=tools,
    llm=llm,
    memory=memory,
    verbose=True,
    max_iterations=3,
    early_stopping_method="generate",
    return_intermediate_steps=False
)

  warn_deprecated(


In [6]:
prompt = "Hello, how are you?"

agent(prompt)

  warn_deprecated(




[1m> Entering new AgentExecutor chain...[0m
```json
{
    "action": "Final Answer",
    "action_input": "I'm just a computer program, so I don't have feelings, but I'm here and ready to assist you. How can I help you today?"
}
```[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "I'm just a computer program, so I don't have feelings, but I'm here and ready to assist you. How can I help you today?"
}
```[0m

[1m> Finished chain.[0m


{'input': 'Hello, how are you?',
 'chat_history': [],
 'output': "I'm just a computer program, so I don't have feelings, but I'm here and ready to assist you. How can I help you today?"}

- JSON 형식이 같이 출력됨

In [7]:
agent("what is the square root of 71?")



[1m> Entering new AgentExecutor chain...[0m
```json
{
    "action": "Calculator",
    "action_input": "square root of 71"
}
```[32;1m[1;3m```json
{
    "action": "Calculator",
    "action_input": "square root of 71"
}
```[0m```text
71**0.5
```
...numexpr.evaluate("71**0.5")...

Observation: [36;1m[1;3mAnswer: 8.426149773176359[0m
Thought:```json
{
    "action": "Final Answer",
    "action_input": "The square root of 71 is approximately 8.426149773176359."
}
```[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "The square root of 71 is approximately 8.426149773176359."
}
```[0m

[1m> Finished chain.[0m


{'input': 'what is the square root of 71?',
 'chat_history': [HumanMessage(content='Hello, how are you?'),
  AIMessage(content="I'm just a computer program, so I don't have feelings, but I'm here and ready to assist you. How can I help you today?")],
 'output': 'The square root of 71 is approximately 8.426149773176359.'}

In [8]:
new_llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    temperature=0.0,
    model_name="gpt-3.5-turbo",
    streaming=True, 
    callbacks=[StreamingStdOutCallbackHandler()]  # ! important
)

In [11]:
from langchain.callbacks.streaming_stdout_final_only import (
    FinalStreamingStdOutCallbackHandler,
)

new_llm.callbacks = [
    FinalStreamingStdOutCallbackHandler(
        answer_prefix_tokens=["Final", "Answer"]
    )
]

In [12]:
agent("what is the square root of 71?")



[1m> Entering new AgentExecutor chain...[0m
```json
{
    "action": "Calculator",
    "action_input": "square root of 71"
}
```[32;1m[1;3m```json
{
    "action": "Calculator",
    "action_input": "square root of 71"
}
```[0m```text
71**0.5
```
...numexpr.evaluate("71**0.5")...

Observation: [36;1m[1;3mAnswer: 8.426149773176359[0m
Thought:```json
{
    "action": "Final Answer",
    "action_input": "The square root of 71 is approximately 8.426149773176359."
}
```[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "The square root of 71 is approximately 8.426149773176359."
}
```[0m

[1m> Finished chain.[0m


{'input': 'what is the square root of 71?',
 'chat_history': [HumanMessage(content='Hello, how are you?'),
  AIMessage(content="I'm just a computer program, so I don't have feelings, but I'm here and ready to assist you. How can I help you today?"),
  HumanMessage(content='what is the square root of 71?'),
  AIMessage(content='The square root of 71 is approximately 8.426149773176359.')],
 'output': 'The square root of 71 is approximately 8.426149773176359.'}

### 2. 클래스 정의: 생성된 토큰을 캡처하고 최종 답변이 처리될 때 콘솔에 특정 콘텐츠 출력 

In [13]:
import sys

class CallbackHandler(StreamingStdOutCallbackHandler):
    def __init__(self):
        self.content: str = ""
        self.final_answer: bool = False

    def on_llm_new_token(self, token: str, **kwargs: any) -> None:
        self.content += token
        if "Final Answer" in self.content:
            # now we're in the final answer section, but don't print yet
            self.final_answer = True
            self.content = ""
        if self.final_answer:
            if '"action_input": "' in self.content:
                if token not in ["}"]:
                    sys.stdout.write(token)  # equal to `print(token, end="")`
                    sys.stdout.flush()

new_llm = [CallbackHandler()]

In [14]:
agent("what is the square root of 71?")



[1m> Entering new AgentExecutor chain...[0m
```json
{
    "action": "Calculator",
    "action_input": "square root of 71"
}
```[32;1m[1;3m```json
{
    "action": "Calculator",
    "action_input": "square root of 71"
}
```[0m```text
71**0.5
```
...numexpr.evaluate("71**0.5")...

Observation: [36;1m[1;3mAnswer: 8.426149773176359[0m
Thought:```json
{
    "action": "Final Answer",
    "action_input": "The square root of 71 is approximately 8.426149773176359."
}
```[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "The square root of 71 is approximately 8.426149773176359."
}
```[0m

[1m> Finished chain.[0m


{'input': 'what is the square root of 71?',
 'chat_history': [HumanMessage(content='Hello, how are you?'),
  AIMessage(content="I'm just a computer program, so I don't have feelings, but I'm here and ready to assist you. How can I help you today?"),
  HumanMessage(content='what is the square root of 71?'),
  AIMessage(content='The square root of 71 is approximately 8.426149773176359.'),
  HumanMessage(content='what is the square root of 71?'),
  AIMessage(content='The square root of 71 is approximately 8.426149773176359.')],
 'output': 'The square root of 71 is approximately 8.426149773176359.'}

### Using FastAPI with Agents

In [None]:
import requests

res = requests.get("http://localhost:8000/health")
res.json()

In [None]:
res = requests.get("http://localhost:8000/chat",
    json={"text": "hello there!"}
)
res

In [None]:
res.json()

- StreamingResponse 를 사용하여 토큰을 FastAPI로 공급하는 생성기 함수로 토큰을 보냄.--> 비동기 방식 사용 

- 큐는 콜백 핸들러에 의해 액세스되며, 각 토큰이 생성될 때 토큰을 큐에 넣음
- 생성기는 토큰이 추가된 것을 보자마자 토큰을 얻고 StreamingResponse 을 산출함.

- 실행 중인 것을 보기 위해 get_stream이라는 스트림 요청 함수를 정의

- 로컬 FastAPI 서버의 엔드포인트(/chat)에 요청
- 응답을 스트리밍
- 스트리밍된 콘텐츠를 한 줄씩 인쇄

In [None]:
def get_stream(query: str):
    s = requests.Session()
    with s.get(
        "http://localhost:8000/chat",
        stream=True,
        json={"text": query}
    ) as r:
        for line in r.iter_content():
            print(line.decode("utf-8"), end="")

- fastapi 사용 