In [48]:
import pandas as pd
df = pd.read_csv("../data/all_articles.csv")
df["published_at"].value_counts()
df = df[df["published_at"].isin(["2024-07-26", "2024-07-25"])]

In [49]:
import langchain
from langchain.chains.summarize import load_summarize_chain
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_community.document_loaders import DataFrameLoader
load_dotenv()

loader = DataFrameLoader(df, page_content_column="content")
docs = loader.load()


#### Stuffing

In [42]:
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-1106")
chain = load_summarize_chain(llm, chain_type="stuff")
result = chain.invoke(docs)
print(result["output_text"])

# Does not work because it suerpasses the maximum context length allowed

Elon Musk's social media platform, X, has removed several cryptocurrency emojis, causing confusion and concern in the crypto community. The removal of the emojis, including Bitcoin, has led to speculation about its impact on the market. Meanwhile, the Web3 gaming industry is facing challenges with some projects failing to deliver on promises, while Nigeria's technology sector is developing the country's first artificial intelligence with the help of a local startup. JPMorgan Chase has launched an in-house AI product, and there are concerns about the impact of spot Ethereum ETFs on Bitcoin's price. Additionally, Hollywood video game performers are set to go on strike, and Google DeepMind's AI models have achieved breakthroughs in solving complex mathematics problems. Interpol has issued red notices for two Hong Kong crypto influencers wanted for theft, fraud, and money laundering. Lastly, executives at Metaplanet and Semler Scientific have followed MicroStrategy's Bitcoin playbook to re

#### Map Reduce

In [43]:
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain_core.prompts import PromptTemplate
from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain_text_splitters import CharacterTextSplitter

llm = ChatOpenAI(temperature=0)

# Map
map_template = """The following is a set of documents
{docs}
Based on this list of docs, please identify the main themes 
Helpful Answer:"""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)

# Reduce
reduce_template = """The following is set of summaries:
{docs}
Take these and distill it into a final, consolidated summary of the main themes. 
Helpful Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

In [44]:
# Takes a list of documents, combines them into a single string, and passes this to an LLMChain
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="docs"
)

# Combines and iteratively reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
    # This is final chain that is called.
    combine_documents_chain=combine_documents_chain,
    # If documents exceed context for `StuffDocumentsChain`
    collapse_documents_chain=combine_documents_chain,
    # The maximum number of tokens to group documents into.
    token_max=8000,
)

In [46]:
# Combining documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
    # Map chain
    llm_chain=map_chain,
    # Reduce chain
    reduce_documents_chain=reduce_documents_chain,
    # The variable name in the llm_chain to put the documents in
    document_variable_name="docs",
    # Return the results of the map steps in the output
    return_intermediate_steps=True,
)

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=0
)
split_docs = text_splitter.split_documents(docs)

In [47]:
result = map_reduce_chain.invoke(split_docs)

print(result["output_text"])

The main themes across the summaries include cryptocurrency developments, advancements in artificial intelligence, regulatory challenges, market trends, and the impact of key figures like Elon Musk and Donald Trump. These themes also touch on the intersection of technology, finance, and politics, highlighting the evolving landscape of digital assets, AI applications, and regulatory frameworks. Additionally, there is a focus on industry-specific issues such as labor rights in gaming, financial investments in AI research, and the implications of cryptocurrency fraud. Overall, the summaries reflect a dynamic and multifaceted ecosystem where innovation, regulation, and market dynamics intersect to shape the future of technology and finance.


#### Summarize

In [50]:
chain = load_summarize_chain(llm, chain_type="refine")
result = chain.invoke(split_docs)

print(result["output_text"])

The International Criminal Police Organization (Interpol) has issued red notices for two Hong Kong crypto influencers, Wong Ching Kit and Mok Tsun Ting, who are wanted for theft, fraud, and money laundering. Wong, known as "Coin Young Master," is wanted for fraud and theft in Hong Kong, while Mok is wanted for money laundering. Both individuals were involved in promoting the now-collapsed crypto exchange JPEX, which was at the center of a significant fraud case in Hong Kong. Wong has a history of legal controversies and was previously ordered to remove all videos related to JPEX from his social media. Mok was arrested in connection to the JPEX scandal but was released on bail. The exchange collapsed amid allegations of a Ponzi scheme, resulting in over 70 arrests and $206 million in losses for clients. Wong and Mok have also faced fraud charges in the past related to the sale of "mining rigs" for the cryptocurrency Filecoin. The investigation into JPEX began in September last year afte