In [2]:
import pandas as pd
import dotenv
dotenv.load_dotenv()

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


True

# Reading Data
Read in news data, drop duplicates then parse published date into Timestamp object

In [3]:
import datetime
import dateutil.parser
import os
if not os.path.isfile(".data/processed_news.pickle"):
    news = pd.read_pickle(".data/organized_news.pickle")
    news.drop([news.columns[0]], axis=1, inplace=True)
    news.drop_duplicates(["url"])
    news = news[news["fetched_body"] != ""]
    news['week'] = news['published date'].apply(lambda x: x.isocalendar()[1])
    news['year'] = news['published date'].apply(lambda x:  x.isocalendar()[0])
    sampled_news = news
else:
    sampled_news = pd.read_pickle(".data/processed_news.pickle")
sampled_news

Unnamed: 0,description,published date,url,publisher,fetched_title,fetched_body,Topic,week,year,gpt_results
0,Tech Concentration in the S&P 500 is Highest S...,2017-06-20 07:00:00+00:00,https://news.google.com/rss/articles/CBMiUWh0d...,"{'href': 'https://www.investopedia.com', 'titl...",Tech Concentration in the S&P 500 is Highest S...,It has been a good year for the technology sec...,AAPL,25,2017,Summary: The tech sector has been performing e...
1,'A watershed moment': Tech execs react to Uber...,2017-06-21 07:00:00+00:00,https://news.google.com/rss/articles/CBMiZWh0d...,"{'href': 'https://finance.yahoo.com', 'title':...",‘A watershed moment’: Tech execs react to Uber...,Former Uber CEO and co-founder Travis Kalanick...,AAPL,25,2017,Summary: Travis Kalanick has officially steppe...
2,Tim Cook offloads more than $43 million in AAP...,2017-08-28 07:00:00+00:00,https://news.google.com/rss/articles/CBMiVGh0d...,"{'href': 'https://9to5mac.com', 'title': '9to5...",Tim Cook offloads more than $43 million in AAP...,Apple CEO Tim Cook has sold some $43 million w...,AAPL,35,2017,Summary: Apple CEO Tim Cook has sold $43 milli...
3,Even an Apple store can't prevent the death of...,2017-09-02 07:00:00+00:00,https://news.google.com/rss/articles/CBMiTmh0d...,"{'href': 'https://qz.com', 'title': 'Quartz'}",Even an Apple store can’t prevent the death of...,"The American mall is dying, and not even Apple...",AAPL,35,2017,Summary: Despite the decline of American malls...
4,The Apple TV is not the 'future of TV' Apple p...,2017-08-29 07:00:00+00:00,https://news.google.com/rss/articles/CBMiTGh0d...,"{'href': 'https://www.businessinsider.com', 't...",The Apple TV is a mess — and hardly the 'futur...,I’ve owned the fourth-generation Apple TV for ...,AAPL,35,2017,Summary: The current Apple TV hardware is capa...
...,...,...,...,...,...,...,...,...,...,...
5025,Longer Term Outlooks Using NDX Options Nasdaq,2023-05-22 07:00:00+00:00,https://news.google.com/rss/articles/CBMiRmh0d...,"{'href': 'https://www.nasdaq.com', 'title': 'N...",Longer Term Outlooks Using NDX Options,We know all the rage is about short-dated opti...,^NDX,21,2023,Summary: Market participants are still using t...
5026,Is the NASDAQ 100 Close to Topping out? | inve...,2023-05-25 07:00:00+00:00,https://news.google.com/rss/articles/CBMiU2h0d...,"{'href': 'https://www.investing.com', 'title':...",Is the NASDAQ 100 Close to Topping out?,NDX -1.15% Add to/Remove from Watchlist By us...,^NDX,21,2023,Summary: The news discusses the use of the Ell...
5027,Indexed Finance (NDX) Receives a Very Bullish ...,2023-05-26 07:00:00+00:00,https://news.google.com/rss/articles/CBMiiQFod...,"{'href': 'https://www.investorsobserver.com', ...",Indexed Finance (NDX) Receives a Very Bullish ...,Indexed Finance has a Very Bullish sentiment r...,^NDX,21,2023,Summary: Indexed Finance has a Very Bullish se...
5028,NDX Option Numbers to Know Before Tomorrow's C...,2023-05-09 07:00:00+00:00,https://news.google.com/rss/articles/CBMiT2h0d...,"{'href': 'https://www.nasdaq.com', 'title': 'N...",NDX Option Numbers to Know Before Tomorrow’s CPI,Over the past year and a half or so the invest...,^NDX,19,2023,Summary: The upcoming release of the Consumer ...


In [4]:

sampled_news["fetched_body"] = sampled_news["fetched_body"].convert_dtypes(convert_string=True)
sampled_news = sampled_news.dropna(inplace=False)
if "gpt_results" not in sampled_news:
    sampled_news["gpt_results"] = ""

In [5]:
sampled_news["fetched_body"].apply(lambda x: x.upper())

0       IT HAS BEEN A GOOD YEAR FOR THE TECHNOLOGY SEC...
1       FORMER UBER CEO AND CO-FOUNDER TRAVIS KALANICK...
2       APPLE CEO TIM COOK HAS SOLD SOME $43 MILLION W...
3       THE AMERICAN MALL IS DYING, AND NOT EVEN APPLE...
4       I’VE OWNED THE FOURTH-GENERATION APPLE TV FOR ...
                              ...                        
5025    WE KNOW ALL THE RAGE IS ABOUT SHORT-DATED OPTI...
5026    NDX -1.15% ADD TO/REMOVE FROM WATCHLIST\n\nBY ...
5027    INDEXED FINANCE HAS A VERY BULLISH SENTIMENT R...
5028    OVER THE PAST YEAR AND A HALF OR SO THE INVEST...
5030    NASDAQ 100 & USD/JPY FORECAST:\n\nNASDAQ 100 G...
Name: fetched_body, Length: 4717, dtype: object

In [6]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
import langchain
from openai import RateLimitError

In [7]:
gpt = ChatOpenAI(model="gpt-3.5-turbo")

# Run Sample through GPT (Remote Large Language Model)

In [8]:
news_summary_prompt = ChatPromptTemplate.from_template("Summarize the following noisy news data and identify keywords of the news relevant to a Apple stock analyst. Provide a formatted answer as such Summary: ..., Keywords: ... News: {news}")
news_summary_chain = news_summary_prompt | gpt | StrOutputParser()
gpt_results = news_summary_chain.invoke({"news": sampled_news["fetched_body"].iloc[-1]})
print(gpt_results)

Summary: Nasdaq 100 gains after soft U.S. inflation data, threatens to break major resistance. USD/JPY declines on lower U.S. Treasury yields post CPI report. Technical analysis suggests key levels to watch for both assets.

Keywords: Nasdaq 100, USD/JPY, technical resistance, U.S. inflation data, U.S. Treasury yields, CPI report, technical analysis, support, resistance.


Using the delay of the response from the remote LLM as a natural way of rate limiting the system to avoid running into gpt rate cap

In [9]:
def batched(iter, batch_size):
    temp = []
    for i in iter:
        temp.append(i)
        if len(temp) == batch_size:
            yield temp
            temp = []
    yield temp

In [12]:
import time

def column_apply_with_saving(table: pd.DataFrame, write_column, read_column, fn , file_to_save = None):

    if write_column not in table:
        table[write_column] = ""

    for article_index in (table[table[write_column] == ""].index):
        row_data = table[read_column].loc[article_index][:15000*4]
        for i in range(5):
            try:
                result = fn(row_data)
                table.loc[article_index, write_column] = result
                print(f"{article_index}: {result}")
                if file_to_save is not None:
                    table.to_pickle(file_to_save)
                break
            except RateLimitError:
                time.sleep(20)
                continue
            except Exception as e:
                print(e)
                assert False
column_apply_with_saving(sampled_news, "fetched_body", "gpt_results", fn = lambda v: news_summary_chain.invoke({"news": v}) , file_to_save=".data/processed_news.pickle")

In [15]:
weekly_summaries = pd.pivot_table(sampled_news, values=["gpt_results"], index = ["year", "week", "Topic"], aggfunc="sum")
if os.path.isfile(".data/summarized_weekly_news.pickle"):
    weekly_summaries = pd.read_pickle(".data/summarized_weekly_news.pickle")

column_apply_with_saving(weekly_summaries, "Weeks Summary", "gpt_results", fn = lambda v: news_summary_chain.invoke({"news": v}) , file_to_save=".data/summarized_weekly_news.pickle")

(2021, 34, 'AAPL'): Summary: The news data focuses on Apple Inc.'s upcoming iPhone 13 release and its potential impact on the company's revenue and stock price. It also highlights CEO Tim Cook's successful tenure, market dominance, and the company's robust financial performance. Analysts are advised to consider long-term sales growth, stock appreciation trend, services revenue, and industry tailwinds when evaluating Apple's outlook.

Keywords: Apple Inc., AAPL, iPhone 13, revenue, stock price, Tim Cook, CEO, market dominance, services revenue, financial performance, industry tailwinds, stock analyst.
(2021, 34, 'AMZN'): Summary: Amazon has faced challenges with its NYC headquarters plans but continues to be a favorite among investors with strong fundamentals. The company is investing in growth, expanding operations, and optimizing its business. Additionally, Amazon is celebrating the evolution of its EC2 service and launching new security initiatives to protect against cybersecurity th