### Financial News Scrapper

이번 실습에서는 Google News로부터 종목을 검색해 투자 조언을 생성하는 LLM을 생성하는 것으로 한다.

### 1. Google Search function

Google News에서 내용을 파싱하는 함수가 다음과 같이 존재한다. 이를 실행하기 위해서는 다음과 같은 라이브러리가 필요하다.

- `Selenium` : pip install selenium
- `webdriver_manager` : pip3 install webdriver_manager
- `Newspaper` : pip3 install newspaper3k
- `lxml_html_clean` : pip install lxml_html_clean

In [1]:
# import package
from bs4 import BeautifulSoup
import requests
import urllib
import re
import pandas as pd
from newspaper import Article
from selenium import webdriver

In [4]:
def search_from_google(asset:str, page_nums:int) -> pd.DataFrame :
    '''
    google news로부터 검색을 한 뒤, selenium을 통해 뉴스 데이터들을 가져옵니다.
    :param asset: 검색할 자산
    :param page_nums: 뉴스를 검색할 총 페이지의 수
    :return: news data가 들어있는 DataFrame
    '''
    keyword = f'{asset} buying reason'
    news_df = pd.DataFrame()
    
    # selenium headless mode
    options = webdriver.ChromeOptions()
    options.add_argument('headless')
    driver = webdriver.Chrome(options=options)
    
    for page_num in range(0, page_nums-1):
        # google news crawling
        url = f'https://www.google.com/search?q={keyword}&sca_esv=1814fa2a4600643d&tbas=0&tbs=qdr:m&tbm=nws&ei=rE3pZeLxNeHX1e8PpdOcMA&start={page_num}&sa=N&ved=2ahUKEwji9-zrsuGEAxXha_UHHaUpBwYQ8tMDegQIBBAE&biw=2560&bih=1313&dpr=1'
        req = requests.get(url)
        content = req.content
        soup = BeautifulSoup(content, 'html.parser')
    
        # last page check
        if soup.select('div.BNeawe.vvjwJb') == []: break
    
        title_list = [t.text for t in soup.select('div.BNeawe.vvjwJb')]  # title
        url_list = []
    
        # url
        for u in soup.select('a'):
            for t in title_list:
                if t in u.text:
                    temp_url = urllib.parse.unquote(u['href'])
                    temp_url = re.findall('http\S+&sa',temp_url)[0][:-3]
                    url_list.append(temp_url)
    
        # article
        for ind, news_url in enumerate(url_list):
            try:
                article = Article(url=news_url)
                article.download()
                article.parse()    
                news_article = article.text
            except:  # ssl error
                driver.get(news_url)
                article.download(input_html=driver.page_source)
                article.parse()
                news_article = article.text
    
            news_df = pd.concat([news_df, pd.DataFrame([[title_list[ind], news_article, news_url]])])
    
        news_df[0] = news_df[0].apply(lambda x: re.sub('\s+',' ',x))
        news_df = news_df.reset_index(drop=True)
    
    news_df.columns = ['Title','Contents','URL']
    
    return news_df

In [6]:
search_result = search_from_google('META', 3)

In [8]:
# 출력 결과의 확인
search_result.head()

Unnamed: 0,Title,Contents,URL
0,Why Meta Platforms (META) Is the Best Blue Chi...,We recently published a list of 10 Best Blue C...,https://finance.yahoo.com/news/why-meta-platfo...
1,Should I buy Meta shares? How to profit from I...,Important information Your capital is at risk....,https://www.thetimes.com/money-mentor/investin...
2,Why Meta Platforms Stock Slumped on Monday,Meta Platforms (META 0.35%) received some disc...,https://www.fool.com/investing/2025/01/13/why-...
3,Nvidia Rises On Amazon's 'Obvious' Partnership...,Access to this page has been denied because we...,https://www.investors.com/research/nvda-stock-...
4,Elon Musk buying TikTok seems like a stretch —...,Could Elon Musk buy TikTok?\n\nSeveral reports...,https://www.businessinsider.com/elon-musk-buyi...


### 2. 

In [11]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.llms import OpenAI
import os
import openai

with open("../../config/api.key", "r") as f:
    lines = f.readlines()
    config = lines[0].strip()

openai.api_key = config
os.environ["OPENAI_API_KEY"] = config

Asset = 'META'

def extract_reason(row) :
    try :
        res = chain.predict(
            title=row["Title"],
            content=row["Contents"]
        )
    except :
        res = chain.predict(
            title=row["Title"],
            content=row["Contents"][:4096]
        )
    return res

# 🔹 LLM 모델 설정
llm = OpenAI(temperature = 0)

template = f"""
당신은 전문적인 금융 투자자입니다. 
제공하는 뉴스기사의 제목과 본문을 읽고 {Asset}종목의 매수를 추천하는 이유를 한 가지만 설명해주세요.
본문의 내용이 {Asset} 종목과 무관한 경우, '연관 없는 내용입니다' 라고 채워주세요.
이유는 50자 내외로 짧게 생성하며, 경어체가 아닌 ~이다. ~되었다. 으로 끝냅니다.
수치 데이터 중심으로 설명해 주세요.

예시 : 
2024년 4분기 수익성이 개선되었다.

한국어가 아닌 경우 한국어로 우선 번역합니다.
번역한 텍스트는 출력하지 않습니다.

""" + """
Title : {title}
Content : {content}

Answer : 
"""

prompt = PromptTemplate(
    input_variables = ['title','content'],
    template = template
)

chain = LLMChain(
    llm = llm,
    prompt = prompt
)

search_result['reason'] = search_result.apply(extract_reason, axis = 1)

  llm = OpenAI(temperature = 0)
  chain = LLMChain(


In [12]:
search_result

Unnamed: 0,Title,Contents,URL,reason
0,Why Meta Platforms (META) Is the Best Blue Chi...,We recently published a list of 10 Best Blue C...,https://finance.yahoo.com/news/why-meta-platfo...,2025����지 META 종목은 ����� 대�����의 �����적인 정���과...
1,Should I buy Meta shares? How to profit from I...,Important information Your capital is at risk....,https://www.thetimes.com/money-mentor/investin...,Meta shares have been volatile but have strong...
2,Why Meta Platforms Stock Slumped on Monday,Meta Platforms (META 0.35%) received some disc...,https://www.fool.com/investing/2025/01/13/why-...,메�� ��������의 주식이 ��요일에 하���한 이��는 �����의 최고��...
3,Nvidia Rises On Amazon's 'Obvious' Partnership...,Access to this page has been denied because we...,https://www.investors.com/research/nvda-stock-...,Nvidia의 수��성이 개��되��다. 이는 Amazon과의 파트����과 1조 ...
4,Elon Musk buying TikTok seems like a stretch —...,Could Elon Musk buy TikTok?\n\nSeveral reports...,https://www.businessinsider.com/elon-musk-buyi...,"Musk가 TikTok을 사는 것은 ���가능해 보이지만, 이상한 시대이기 때문에 ..."
5,What Zuckerberg Risks by Following Musk’s Lead,"On Tuesday, Meta CEO Mark Zuckerberg announced...",https://time.com/7205821/what-zuckerberg-risks...,Meta의 CEO Mark Zuckerberg는 Meta가 �����에서 제3자 사...
6,Why Mark Zuckerberg Is Ditching Human Fact-Che...,Michael Calore: Zoë's snapping her fingers in ...,https://www.wired.com/story/uncanny-valley-pod...,메��의 ��������� 모����이션 정���이 바���면서 ����� 정부의 ...
7,Why Did Apple and Meta Platforms Rise While Nv...,Jan. 27 was a wild day in the stock market as ...,https://www.nasdaq.com/articles/why-did-apple-...,AI 기���의 발전으로 인해 META 종목의 수��성이 개��되��다.
8,Meta Platforms Continues to Prove Why It's a P...,Meta Platforms Today META Meta Platforms $714....,https://www.marketbeat.com/originals/meta-plat...,Meta의 ���고 모��은 최�� AI 기���을 ��용하여 개��되��다. 이로...
9,Meta Platforms Reports Q4 Earnings on January ...,Switch the Market flag\n\nOpen the menu and sw...,https://www.barchart.com/story/news/30623015/m...,2021�� 4분기 수��성이 개��되��다.
