# OpenAI Key Setting

In [6]:
import getpass

api_key = getpass.getpass("Please enter your input API KEY :")
organization = getpass.getpass("Please enter your input ORGANIZATION ID :")

# pip Install

In [16]:
!pip install --quiet openai \
  langchain langchain_community langchain_core langchain_openai langchainhub \
  python-dotenv \
  tenacity \
  google-search-results \
  unstructured \
  arxiv pymupdf \
  tiktoken \
  streamlit streamlit-folium wikipedia



In [4]:
from openai import OpenAI

client = OpenAI( # Not Use Ollama
    api_key=api_key,
    organization=organization
)

model_name = "gpt-3.5-turbo"

## 1. openai api로 chatting

In [7]:
# chat completion으로 응답 생성하기
response = client.chat.completions.create(
    model=model_name,
    messages=[
        # 시스템 프롬프트, LLM이 어떤 역할을 하거나, 전체 대화에서의 제약사항을 적어줄 수 있습니다.
        {"role": "system", "content": "You are a helpful assistant."},

        # 메인 프롬프트: user와 assitant가 번갈아가면서 나옵니다.
        # 보통 user로 시작합니다.
        # 아래 예시의 경우, LLM이 한번 답변을 한 상황입니다.
        # LLM의 답변은 assitant role로 적어주야합니다.

        ### 대화의 내용은 2020년 월드 시리즈에서 어느 팀이 이겼는지를 묻고 있습니다.
        ### 2020년에 Dodgers가 Rays 상대로 3:1로 우승한 것을 확인할 수 있습니다. (https://www.google.com/search?q=2020+world+series&oq=2020+world+series&sourceid=chrome&ie=UTF-8)
        ### (LLama3는 2020년 이후에 학습되었기 때문에 해당 정보를 학습하고 기억하고 있는 모습입니다.)
        ### 어디서 경기가 열렸는지 추가로 질문해봅시다.
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The LA Dodgers won in 2020."},
        {"role": "user", "content": "Where did the game start?"}
    ]
)

print(response.choices[0].message.content)

The 2020 World Series was held at a neutral site due to the COVID-19 pandemic. The games were played at Globe Life Field in Arlington, Texas.


In [8]:
# 이어서 대화를 해봅시다.
msgs = [
     {"role": "user", "content": "Recommend me 5 Korean foods for dinner."}
  ]

# chat completion으로 응답 생성하기
response = client.chat.completions.create(
    model=model_name,
    messages=msgs
)

print(response.choices[0].message.content)

1. Bibimbap - a mixed rice dish topped with assorted vegetables, meat (usually beef), and a fried egg, and served with spicy gochujang sauce.
2. Samgyeopsal - grilled pork belly often served with lettuce wraps, garlic, green onions, and various dipping sauces.
3. Kimchi jjigae - a hearty stew made with kimchi, pork, tofu, and vegetables, flavored with gochugaru (Korean chili pepper flakes) and gochujang.
4. Japchae - a stir-fried dish made with glass noodles, vegetables, and meat (usually beef) marinated in a sweet and savory sauce.
5. Sundubu jjigae - a spicy tofu stew made with soft tofu, vegetables, and sometimes seafood or meat, flavored with gochugaru and gochujang.


In [9]:
# 수학 문제 풀어보기
# CoT, ICL
# CoT: https://arxiv.org/pdf/2205.11916
# Define the prompt with a one-shot example
# in-context learning 관련 논문들이 많이 나온 이후로 예시를 프롬프트와 함께 제공하는, few-shot example, 방식이 많이 정형화 되었습니다.
# 일부 테스크에서는 예시가 없으면 성능이 크게 저하되는 모습을 보여줄정도로 in-context learning은 LLM prompting에서 새로운 era를 열었습니다.
prompt = [
    {"role": "user", "content": """
Solve the following math problem:

Example:
Problem: What is 7 + 5?
Solution: 7 + 5 = 12

Now solve this problem:
Problem: What is 8 * 6?

Think step by step.

Solution:
"""}
]

# Call the OpenAI API with the defined prompt
response = client.chat.completions.create(
    model=model_name,
    messages=prompt,
    temperature=0.2
)

# Extract and print the response
solution = response.choices[0].message.content.strip()
print(solution)


8 * 6 = 48

Therefore, 8 multiplied by 6 equals 48.


In [10]:
# llama3-8B의 학습데이터는 2023년 3월까지의 정보입니다.
# https://huggingface.co/meta-llama/Meta-Llama-3-8B
msgs = [
     {"role": "user", "content": "How is the interest rate in US?"}
  ]
# chat completion으로 응답 생성하기
response = client.chat.completions.create(
    model=model_name,
    messages=msgs
)

print(response.choices[0].message.content)

As of October 2021, the Federal Reserve has maintained the federal funds rate at a target range of 0% to 0.25%. This historically low interest rate is intended to stimulate economic activity and support the recovery from the impact of the COVID-19 pandemic. Mortgage rates, personal loan rates, and savings account interest rates are also currently at relatively low levels.


In [11]:
# https://www.bbc.com/news/articles/c1ddj7v9y97o
msgs = [
     {"role": "user", "content": """
The US Federal Reserve has signalled that it will cut its key interest rate just once this year despite inflation easing.
Back in March, the central bank had been expected to reduce borrowing costs three times by the end of 2024.
However, on Wednesday, new forecasts from Fed officials who make decisions on rates pencilled in a single reduction.
The new outlook emerged after the Fed voted to hold interest rates at their current 23-year high even as inflation ticked lower.
Inflation, which measures the pace of price rises, slowed to 3.3% in the year to May. That compares with 3.4% in the 12 months to April.
However, between April and May inflation was unchanged and it remains above the Fed's 2% target.
Jerome Powell, chair of the Federal Reserve, said that only "modest" progress had been made on hitting the target and the central bank would need to see "good inflation readings" before interest rates can be cut.
US interest rates were held at 5.25%-5.5%.
Anastassia Fedyk, assistant professor of finance at Haas Business School at the University of California Berkeley, told the BBC's Today programme: "We did get some good news in terms of better inflation numbers.
"But the Fed is still being pretty cautious so they are signalling that in the future they are going to be doing one, most likely, rate drop and not a very large one at that."
Some analysts suggested that the central bank would backtrack on the number of interest rate cuts this year.
Ian Shepherdson, chief economist at Pantheon Macroeconomics, said that reducing forecasts of interest rate cuts from three to one this year was "unnecessarily aggressive".
While economists at Wells Fargo said it would be a "close call" between making one or two reductions in 2024.
Officials at the US Fed were split over how many interest rate cuts they expected this year. Of the 19 policymakers who gave their outlook, four expected no cut, seven forecasted one reduction while eight thought there would be two.
US jobs surge casts doubt over interest rate cuts
Will the UK and US cut interest rates like Europe?
When will interest rates come down?
Forecasts from the US Fed signalled one modest cut to 5%-5.25%.
Mr Powell acknowledged that a reduction of this size would not have a major impact on the US economy.
But he said when a cut finally does come it would be “a consequential decision for the economy” which “you want to get right".
While inflation eased a little, the US employment market remains robust. Recent data showed that US employers added 272,000 jobs in May - far above the 185,000 expected.
Ms Fedyk said: "The Fed is trying to react to the data but not overreact to the data."
Some other major economies have cut interest rates, including the European Central Bank and the Bank of Canada.
But the US - and the UK - are yet to make a similar move. The Bank of England will meet next week and is widely expected to hold interest rates at 5.25%, their highest level for 16 years.
The Consumer Prices Index (CPI) measure of inflation has slowed significantly in the UK from a high of 11.1% in October 2022 to 2.3% currently.
However, some elements of inflation remain stubbornly high. At the same time, average wage growth in the UK remains strong compared to inflation.
Earlier this week, Ruth Gregory, deputy chief UK economist at Capital Economics said: "Overall, the stickiness of wage growth may not stop the Bank from cutting interest rates for the first time in August, as we are forecasting, as long as other indicators such as pay settlements data and next week’s CPI inflation release show decent progress."

How is the interest rate in US?
"""}
  ]
# chat completion으로 응답 생성하기
response = client.chat.completions.create(
    model=model_name,
    messages=msgs
)

print(response.choices[0].message.content)

The interest rate in the US is currently at 5.25%-5.5%. The Federal Reserve has signalled that there will be a single reduction in interest rates this year, but the exact timing and amount of the cut have not been confirmed.


# 2. Tools
LLM을 사용하다 보면 LLM이 만능인가라는 생각이 듭니다.

하지만 LLM도 그 능력이 자신이 학습한 데이터안에서 문자를 생성하는 데에 한정되어 있다는 것을 몇시간 사용하다보면 알게됩니다.

그러면 어떻게 하면 LLM의 능력을 확장시킬 수 있을까요?

사람과 마찬가지로 그 능력에 특화된 도구를 사용하는 식으로 이룰 수 있습니다.

마치 손으로 종이를 찢어도 되지만 가위를 사용하는 사람처럼 말이죠.

그러면 이런것도 가능할까요?

1. LLM이 학습한 이후에 발행된 논문에 대해 요약본을 얻고 주요 핵심 내용을 질문하거나
2. 여행계획을 짜는 데 주요 여행지를 알아서 검색해주고, 여행 계획을 짜주는 식으로 말이죠.

이어지는 세션에서는
1. LLM이 arxiv에 접근해서 논문을 가져오고, 적절히 분할해 요약본을 만들고, 사용자의 질문에 답변할 수 있게 능력을 확장해봅니다.
2. 여러분이 직접 랭체인 문서를 읽어보며 툴들을 활용해봅니다.


#### 2-1. [Arxiv] Arxiv PDF 요약해보기

많은 preprint 페이퍼들이 arxiv에 공개됩니다. 그 만큼 arxiv에 주요한 페이퍼들을 follow up 하는 것만으로도 관련 내용을 이해하는데에 많은 도움이 됩니다.

하지만 pre-print다 보니, 하루에도 수십개의 페이퍼들이 올라옵니다.

그 중에서 중요한걸 고르는 것도, 중요해보이는 페이퍼를 읽어서 정리하는 것도 시간을 많이 필요로 합니다.

이떄 LLM을 활용해볼 수 있습니다.

LLM에게 요약, 주요 contribution 정리, 중요 결과를 물어봅시다.

위 기능들을 직접 앞서 구현한 `chat.compleitions` 로도 구현해볼 수 있지만
너무 많은 번거로움과 오류가 발생할 수 있습니다.

Langchain을 사용하면 이런 번거로움과 에러 핸들링에서 벗어날 수 있습니다.

langchain에서는 LLM외에 LLM과 여러 툴과 소통하면 목적을 달성할 수 있게 도와줍니다.

또한, 문서가 길어 'lost-in-the-middle' (문서 전체의 중간을 까먹는 현상) 을 방지 할 수 있게 문서를 shard해서 잘라주고, 각 문서별 요약을 한 뒤에 요약을 해주는 방식도 제공해줍니다.

Langchain 에 대한 자세한 내용을 알고 싶다면 [langchain docs](https://python.langchain.com/v0.2/docs/introduction/)를 읽어보세요.

In [17]:
# imports
import os
import warnings
from pprint import pprint

from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader, PDFMinerLoader, ArxivLoader, PyMuPDFLoader
from langchain_community.retrievers import ArxivRetriever

from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain

from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import ReduceDocumentsChain, MapReduceDocumentsChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.question_answering import load_qa_chain

In [18]:
# 설정 셋업.
load_dotenv()
VERBOSE = False
arxiv_id = "1706.03762" # Attention Is All You Need

In [19]:
# pdf 읽기
# 직접 pdf를 읽어서 사용할 수도 있지만
# 여기서는 arxiv에서 자동으로 불러오는 기능을 사용해봅시다.

# # load pdf
# loader = PyMuPDFLoader(pdf_name)
# pdf_pages = loader.load_and_split()

# load from arxiv
retriever = ArxivRetriever(load_max_docs=1)
retriever.get_full_documents = True
retriever.doc_content_chars_max = 1_000_000
arxiv_doc = retriever.invoke(arxiv_id)[0] # arxiv_id로 문서 검색

In [39]:
print(arxiv_doc)

page_content='Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Noam Shazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Jakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.com
Aidan N. Gomez∗†
University of Toronto
aidan@cs.toronto.edu
Łukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
e

In [40]:
# @title 기본 제목 텍스트
# ========== 0 LLM setting ========== #
# 랭 체인을 사용해 LLM을 설정합니다.
llm = ChatOpenAI(
    api_key=api_key,
    organization=organization,
    temperature=0,
    model_name=model_name
)

In [41]:
# ========== 1 문서분할 ========== #
# 문서를 나눠줍니다.
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    separator="\n",  # 분할기준
    chunk_size=3000,  # 사이즈
    chunk_overlap=500,  # 중첩 사이즈
)

# 분할 실행
split_docs = text_splitter.split_documents([arxiv_doc])
# 총 분할된 도큐먼트 수
print(f'총 분할된 도큐먼트 수: {len(split_docs)}')

총 분할된 도큐먼트 수: 5


In [43]:
# Map-Reduce를 활용해 분할 처리를 해줍니다.
# map 함수를 이용해 분할된 문서들에 요약을 시킵니다.
# reduce 함수로 분할된 문서의 요약을 하나로 합쳐줍니다.

# ========== 2 Map 단계 ========== #

# Map 단계에서 처리할 프롬프트 정의
# 분할된 문서에 적용할 프롬프트 내용을 기입합니다.
# 여기서 {pages} 변수에는 분할된 문서가 차례대로 대입되니다.
map_template = """The following is a page from the document:
{pages}
Please summarize the content of the page.
Response:"""

# Map 프롬프트 완성
map_prompt = PromptTemplate.from_template(map_template)

# Map에서 수행할 LLMChain 정의
map_chain = LLMChain(llm=llm, prompt=map_prompt) # deprecated code
# 분할된 문서 요약

In [60]:
map_example = '''
The quick brown fox jumps over the lazy dog.
The dog, named Rover, was very lazy and often slept through most of the day.
Despite this, Rover was a beloved pet who brought joy to his owner.
'''
map_chain.invoke(map_example)
# pages -> input, text -> llm 결과

{'pages': '\nThe quick brown fox jumps over the lazy dog.\nThe dog, named Rover, was very lazy and often slept through most of the day.\nDespite this, Rover was a beloved pet who brought joy to his owner.\n',
 'text': 'The page describes a quick brown fox jumping over a lazy dog named Rover. Rover is described as being very lazy and sleeping most of the day, but still being a beloved pet who brings joy to his owner.'}

In [59]:
# ========== 3 Reduce 단계 ========== #

# Reduce 단계에서 처리할 프롬프트 정의
reduce_template = """These are partial summaries from each page of the documents:
{doc_summaries}
Please summarize the summaries into a single coherent summary.
Response:"""

# Reduce 프롬프트 완성
reduce_prompt = PromptTemplate.from_template(reduce_template)

# Reduce에서 수행할 LLMChain 정의
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

In [61]:
reduce_example = '''
1. A lazy dog named Rover brought joy to his owner despite sleeping most of the day.
2. The fox chased butterflies in a beautiful meadow full of flowers and birds, a place enjoyed by many animals.
3. As the sun set, the animals returned home. The fox went to its den and Rover to his bed, resting for the next day.
'''
reduce_chain.invoke(reduce_example)

{'doc_summaries': '\n1. A lazy dog named Rover brought joy to his owner despite sleeping most of the day.\n2. The fox chased butterflies in a beautiful meadow full of flowers and birds, a place enjoyed by many animals.\n3. As the sun set, the animals returned home. The fox went to its den and Rover to his bed, resting for the next day.\n',
 'text': 'Rover, a lazy dog, brought joy to his owner by sleeping most of the day. Meanwhile, a fox chased butterflies in a beautiful meadow filled with flowers and birds, a place enjoyed by many animals. As the sun set, the animals returned home, with the fox going to its den and Rover to his bed, resting for the next day.'}

In [30]:
# 문서의 목록을 받아들여, 이를 단일 문자열로 결합하고, 이를 LLMChain에 전달합니다.
# https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.StuffDocumentsChain.html
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain,
    document_variable_name="doc_summaries"  # Reduce 프롬프트에 대입되는 변수
)

# Map 문서를 통합하고 순차적으로 Reduce합니다.
reduce_documents_chain = ReduceDocumentsChain(
    # https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.reduce.ReduceDocumentsChain.html
    # 호출되는 최종 체인입니다.
    combine_documents_chain=combine_documents_chain,
    # 문서가 `StuffDocumentsChain`의 컨텍스트를 초과하는 경우
    collapse_documents_chain=combine_documents_chain,
    # 문서를 그룹화할 때의 토큰 최대 개수입니다.
    token_max=4000,
)

In [64]:
# ========== 4 Map-Reduce 통합단계 ========== #

# 문서들에 체인을 매핑하여 결합하고, 그 다음 결과들을 결합합니다.
# https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.map_reduce.MapReduceDocumentsChain.html
map_reduce_chain = MapReduceDocumentsChain( 
    # Map 체인
    # Step1. We first call llm_chain on each document individually, passing in the page_content and any other kwargs. This is the map step.
    llm_chain=map_chain,
    
    # Reduce 체인
    # Step2. We first call llm_chain on each document individually, passing in the page_content and any other kwargs. This is the map step.
    reduce_documents_chain=reduce_documents_chain,
    # 문서를 넣을 llm_chain의 변수 이름(map_template 에 정의된 변수명)
    document_variable_name="pages",
    # 출력에서 매핑 단계의 결과를 반환합니다.
    return_intermediate_steps=False,
    verbose=True
)


In [74]:
# ========== 5 실행 결과 ========== #

# Map-Reduce 체인 실행
# 입력: 분할된 도큐먼트(1의 결과물)
result = map_reduce_chain.invoke(split_docs, {"show_progress": True})
# 요약결과 출력
print(result["output_text"])




[1m> Entering new MapReduceDocumentsChain chain...[0m

[1m> Finished chain.[0m
The document discusses the Transformer model architecture, which relies on attention mechanisms without using recurrence or convolutions. It explains the concept of Multi-Head Attention, where linear projections of queries, keys, and values are performed multiple times to attend to different representation subspaces. The training process and results of the Transformer model in machine translation tasks are highlighted, showing superior performance compared to previous models. The success of attention-based models in translation tasks is emphasized, with plans to extend the Transformer model to handle tasks beyond text. The attention mechanism in the model is analyzed, showcasing its ability to capture long-distance dependencies and resolve anaphora in sentences. Additionally, the page mentions the difficulty in the registration or voting process in American governments since 2009.


#### Appendix. Prompt 별도 생성이 아닌 Chain Tool 이용하여 map-reduce 진행

In [75]:
# QnA: main contribution
# https://api.python.langchain.com/en/latest/chains/langchain.chains.question_answering.chain.load_qa_chain.html
chain = load_qa_chain(
    llm=llm,
    chain_type="map_reduce",
    verbose=True
)

query = "What are the main contributions of this paper?"
result_qa_chain = chain.run(input_documents=split_docs, question=query)
pprint(result_qa_chain)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following portion of a long document to see if any of the text is relevant to answer the question. 
Return any relevant text verbatim.
______________________
Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Noam Shazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Jakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.com
Aidan N. Gomez∗†
University of Toronto
aidan@cs.toronto.edu
Łukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutiona

In [77]:
# QnA: main methodology
query = "What is the main methodology of this paper?"
result_qa_chain = chain.run(input_documents=split_docs, question=query)
pprint(result_qa_chain)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following portion of a long document to see if any of the text is relevant to answer the question. 
Return any relevant text verbatim.
______________________
Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Noam Shazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Jakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.com
Aidan N. Gomez∗†
University of Toronto
aidan@cs.toronto.edu
Łukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutiona

In [78]:
# QnA: main results
query = "What are the important results to look at in this paper? Please provide with numbers. Provide all the results if possible."
result_qa_chain = chain.run(input_documents=split_docs, question=query)
pprint(result_qa_chain)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following portion of a long document to see if any of the text is relevant to answer the question. 
Return any relevant text verbatim.
______________________
Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Noam Shazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Jakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.com
Aidan N. Gomez∗†
University of Toronto
aidan@cs.toronto.edu
Łukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutiona

# [직접 만들어보기]
https://python.langchain.com/v0.1/docs/integrations/tools/

랭체인 문서에 들어가면 정말 다양한 툴들이 있고 각각을 쉽게 사용할 수 있게 문서가 설명되어 있습니다.

어떤 툴이 있는 지 보고 하나 혹은 두개 이상의 툴들을 체이닝 해서 LLM의 능력을 확장해봅시다.

## [예시]
1. 요즘 유행하는 주제와 관련 기업 가치 파악
  1. "google trends"를 이용해 현재 어떤 주제가 유행인지를 파악.
  2. "google search"를 이용해 관련 주제를 운용하는 기업을 검색.
  3. "goolge finance"를 이용해 해당 기업의 가치를 파악.

2. 수학 문제 과외 선생님
  1. 간단한 계산은 직접 풀어주기
  2. 그래프 문제는 "ulfram alpha"를 이용해 그려주기


In [None]:
import os

from langchain_community.tools.google_trends import GoogleTrendsQueryRun
from langchain_community.utilities.google_trends import GoogleTrendsAPIWrapper
from langchain_community.utilities import GoogleSerperAPIWrapper
from langchain_community.tools.google_finance import GoogleFinanceQueryRun
from langchain_community.utilities.google_finance import GoogleFinanceAPIWrapper

from langchain_core.runnables import chain
from langchain_core.output_parsers import StrOutputParser
from langchain.output_parsers import ResponseSchema, StructuredOutputParser

In [None]:
# https://serpapi.com/users/welcome
os.environ["SERPAPI_API_KEY"] = ""
os.environ["SERP_API_KEY"] = ""
os.environ["SERPER_API_KEY"] = ""

google_trends = GoogleTrendsQueryRun(api_wrapper=GoogleTrendsAPIWrapper())
google_search = GoogleSerperAPIWrapper()
google_finance = GoogleFinanceQueryRun(api_wrapper=GoogleFinanceAPIWrapper())

In [None]:
# 급상승 키워드 output 정리
@chain
def get_rising_queries(google_trend_output):
  google_trend_output = google_trend_output.split("\n")

  trend_output = {}
  if len(google_trend_output) > 1:
    for trend_output_tmp in google_trend_output:
        k, v = trend_output_tmp.split(":")
        k, v = k.strip().lower(), v.strip()
        trend_output[k] = v
    related_words = trend_output["rising related queries"].split(", ")
    print("\n Debug"," ".join(related_words))
    return related_words

In [None]:
# 관련 분야 stock 찾기
@chain
def trending_stock_field_search(fields):
    print("debug ff", fields)
    search_results = [
        google_search.run(field) for field in fields
    ]
    print("debug", search_results)
    return {"fields": fields, "search_result": search_results}

In [None]:
# 관련 종목 가격 검색
@chain
def trending_stock_price_search(stock_codes_output):
  stock_codes = stock_codes_output["stock_code"]
  stock_codes = [stock_code.strip() for stock_code in stock_codes]
  output = {}
  for stock_code in stock_codes:
    stock_info_raw = google_finance.run(stock_code)
    stock_info = {}
    for stock_info_tmp in stock_info_raw.strip().split("\n"):
      if stock_info_tmp and ":" in stock_info_tmp:
        kvs = stock_info_tmp.split(":")
        k = kvs[0].strip()
        v = " ".join(kvs[1:])
        k, v = k.strip().lower(), v.strip()
        stock_info[k] = v
    output[stock_code] = stock_info
  return output

In [None]:
# 관련 종목의 분야 LLM 추론
trending_stock_topic_return_schema = [
    ResponseSchema(name="fields", description="list of relevant fields in stock market"),
]
trending_stock_topic_output_parser = StructuredOutputParser.from_response_schemas(trending_stock_topic_return_schema)
trending_stock_topic_format_instructions = trending_stock_topic_output_parser.get_format_instructions()


trending_stock_topic_prompt = PromptTemplate(
    template="""Given the rising related queries, what field or sector of stocks are relevant to the queries?
Related Queries: {related_queries}
{format_instructions}
Response:""",
    input_variables=["related_queries"],
    partial_variables={"format_instructions": trending_stock_topic_format_instructions},
)

In [None]:
trending_stock_topic_prompt.format(related_queries="t!e!s!t!q!u!e!r!y")

'Given the rising related queries, what field or sector of stocks are relevant to the queries?\nRelated Queries: t!e!s!t!q!u!e!r!y\nThe output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"fields": string  // list of relevant fields in stock market\n}\n```\nResponse:'

In [None]:
trending_stock_search_return_schema = [
    ResponseSchema(name="stock_code", description="list of relevant stock code"),
]
trending_stock_search_output_parser = StructuredOutputParser.from_response_schemas(trending_stock_search_return_schema)
trending_stock_search_format_instructions = trending_stock_search_output_parser.get_format_instructions()


trending_stock_search_prompt = PromptTemplate(
    template="""Given the relevant field and its search result which stock are relevant to current KOSPI market?
Response the stock code only.
Related Fields: {fields}
Search Result: {search_result}
{format_instructions}
Response:""",
    input_variables=["fields", "search_result"],
    partial_variables={"format_instructions": trending_stock_search_format_instructions},
)

In [None]:
total_chain = (
    # [tool] 급상승 키워드 가져오기
    google_trends
    # 급상승 키워드 output 정리
    | get_rising_queries

    # 급상승 키워드 관련 필드 추론
    | trending_stock_topic_prompt
    | llm
    | trending_stock_topic_output_parser

    # [tool] 관련 필드 검색
    | trending_stock_field_search

    # 검색해온 내용을 바탕으로 주식 코드 추론
    | trending_stock_search_prompt
    | llm
    | trending_stock_search_output_parser

    # [tool] 주식 코드로 정보 가져오기
    | trending_stock_price_search
)

In [None]:
from pprint import pprint

output = total_chain.invoke({"query": "korea"}, verbose=True)
print("\n", output)

[32;1m[1;3mQuery: korea
Date From: Jul 9, 2023
Date To: Jul 13, 2024
Min Value: 43
Max Value: 100
Average Value: 49.660377358490564
Percent Change: -11.76470588235294%
Trend values: 51, 57, 53, 49, 49, 47, 47, 48, 47, 49, 51, 52, 47, 48, 48, 46, 43, 47, 47, 47, 46, 44, 44, 47, 50, 58, 51, 53, 60, 61, 54, 47, 46, 43, 46, 44, 51, 49, 46, 44, 48, 100, 52, 47, 46, 45, 47, 52, 53, 49, 46, 45, 45
Rising Related Queries: indonesia vs korea u23, indonesia vs korea selatan u23, korea selatan vs indonesia u-23, liga voli korea, travis king north korea, korea utara u-23 vs indonesia u-23, indonesia vs korea selatan, indonesia vs korea utara, yordania vs korea selatan, skor indonesia vs korea u23, piala afc u-23, indonesia vs korea selatan u17, jordan vs south korea, jordan vs korea, north korea south korea balloons, skor indonesia vs korea selatan, hasil indonesia vs korea selatan, boyhood korea, indonesia vs korea, korea vs bahrain, south korea u-23 vs indonesia u-23, korea open 2023, afc asia

In [None]:
from pprint import pprint

output = total_chain.invoke({"query": "korea"}, verbose=True)


[32;1m[1;3mQuery: korea
Date From: Jul 9, 2023
Date To: Jul 13, 2024
Min Value: 43
Max Value: 100
Average Value: 49.660377358490564
Percent Change: -11.76470588235294%
Trend values: 51, 57, 53, 49, 49, 47, 47, 48, 47, 49, 51, 52, 47, 48, 48, 46, 43, 47, 47, 47, 46, 44, 44, 47, 50, 58, 51, 53, 60, 61, 54, 47, 46, 43, 46, 44, 51, 49, 46, 44, 48, 100, 52, 47, 46, 45, 47, 52, 53, 49, 46, 45, 45
Rising Related Queries: indonesia vs korea u23, indonesia vs korea selatan u23, korea selatan vs indonesia u-23, liga voli korea, travis king north korea, korea utara u-23 vs indonesia u-23, indonesia vs korea selatan, indonesia vs korea utara, yordania vs korea selatan, skor indonesia vs korea u23, piala afc u-23, indonesia vs korea selatan u17, jordan vs south korea, jordan vs korea, north korea south korea balloons, skor indonesia vs korea selatan, hasil indonesia vs korea selatan, boyhood korea, indonesia vs korea, korea vs bahrain, south korea u-23 vs indonesia u-23, korea open 2023, afc asia

In [None]:
print("\n", output["search_result"][0])


 Justin Fields: Football quarterback. Justin Skyler Fields is an American football quarterback for the Pittsburgh Steelers of the National Football League. Justin Fields Born: March 5, 1999 (age 25 years), Kennesaw, GA. Justin Fields Current team: Pittsburgh Steelers (#2 / Quarterback). Justin Fields School: The Ohio State University. Justin Fields Dates joined: March 16, 2024 (Pittsburgh Steelers), 2021 (Chicago Bears), January 2019 (Ohio State Buckeyes football), and more. Justin Fields Height: 6′ 3″. Justin Fields Parents: Ivant Fields and Gina Tobey. Justin Fields Siblings: Jaiden Fields. Welcome to Field's Steak & Oyster Bar, the premier dining destination in Bay St. Louis since April 24, 2019. Founded by Field Nicaud, a graduate of the ... Justin Fields. JustinFields. Pittsburgh Steelers. Steelers; #2; QB. HT/WT. 6' 3", 227 lbs. Birthdate. 3/5/1999 (25). College. Ohio State. Follow. 2023 regular ... Fields Steak & Oyster Bar serves up fresh seafood, waygu steaks, and fine cockta