# Trending Wikipedia articles with current news context

At this point we have a good handle articles that trend upon the death of a notable figure. But on Tuesday Jan 14, 2024 we had 5 articles trend

1. [Aunt Jemima](https://en.wikipedia.org/wiki/Aunt_Jemima)
2. [Josie Totah](https://en.wikipedia.org/wiki/Josie_Totah)
3. [Dendrocnide moroides](https://en.wikipedia.org/wiki/Dendrocnide_moroides)
4. [Robin Zander](https://en.wikipedia.org/wiki/Robin_Zander)
5. [Zane Gonzalez](https://en.wikipedia.org/wiki/Zane_Gonzalez)

Not a single one of these reasons was correctly identified so it is due time to get up-to-date news tied into this thing!

Using [Serper](https://serper.dev/) I submitted a query for recent news pertaining to each article. I then passed this raw output to ChatGPT asking if it could see a reason for it to trend today. 

Initially I saw that the results for dendrocnide moroides were not recent or current. I thought I'd have to iterate on that and pre-filter the results by date. But by passing the entire list of out-of-date articles to chatgpt, it correctly returned a negative result!

So now we know that Josie is a former child star who kissed someone on TikTok, Robin Zander performed in a halftime show with his band Cheap Trick during a football game Zane Gonzalez evidently played in.

We have absolutely no idea why dendrocnide doroides trended... it's an Australian plant that... surprise surprise... can kill you. There's no current news for this.. not even in my own search so I think I'll have to link social media into this next. 

Aunt Jemima... I'm not completely sold on why she's trending so far..

## Takeaways
- Very pleasantly surprised with how well ChatGPT handled a prompt with raw json from the news api
- Impressed with ChatGPT's ability to reply with an html structure

In [None]:
#!pip install langchain langchain-openai langchain-community openai

In [None]:
from dotenv import load_dotenv
import os

load_dotenv()

OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
THENEWS_API_KEY = os.environ["THENEWS_API_KEY"] 
SERPER_API_KEY = os.environ["SERPER_API_KEY"] 

In [None]:
# import urllib.parse

# # search_term = "aunt jemima"
# search_term = "Dendrocnide moroides"
# search_term = "Josie Totah"
# encoded_search_term = urllib.parse.quote(search_term)

# news_api_url= f"https://api.thenewsapi.com/v1/news/all?api_token={THENEWS_API_KEY}&search={encoded_search_term}"
# print(news_api_url)

In [None]:
import json
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

### Get trending wikipedia articles

In [None]:
import requests
import datetime

today = datetime.datetime.now()
yesterday = today - datetime.timedelta(days=1)

date_to_query = yesterday
url = f"https://api.wikimedia.org/feed/v1/wikipedia/en/featured/{date_to_query.strftime('%Y/%m/%d')}"


response = requests.get(url)
featured_feed = response.json()
print(f"API call: {url}")

### Save raw Wikipedia data to file

In [None]:

# Ensure the 'featured-feed' folder exists
file_directory = "data"
os.makedirs(file_directory, exist_ok=True)

# Define the filename based on the date
base_file_name = date_to_query.strftime('%Y-%m-%d')
file_path = f'{file_directory}/{base_file_name}.json'

# Save to JSON file (overwrite if it already exists)
with open(file_path, 'w', encoding='utf-8') as file:
    json.dump(featured_feed, file, indent=4, ensure_ascii=False)

print(f'Saved Wikipedia response to {file_path}')

### Build data structure with all relevant information and placeholders for LLM responses

In [None]:
article_list = []


for item in featured_feed['mostread']['articles']:
    title = item['title']
    normalized_title = item['titles']['normalized']
    views = item['views']
    link = item['content_urls']['desktop']['page']
    extract = item['extract']
    thumbnail = item.get('thumbnail', {}).get('source', None)
    view_history = item['view_history']

    article={
        'title': title,
        'normalized_title': normalized_title,
        'views': views,
        'link': link,
        'thumbnail': thumbnail,
        'extract': extract,
        'text': '',
        'trending_reason': '',
        'memory_context': '',
        'view_history': view_history,
        'is_newly_trending': '',
        'raw_new_results': '',
        'news_relation': ''
    }

    article_list.append(article)

## Determine if the article is newly trending. If it is, add to new list

- If the article views increased by a factor of 5 from the previous day I'm calling newly trending



In [None]:
def is_newly_trending(view_history):
    view_history_length = len(view_history)

    yesterdays_views = view_history[view_history_length-2]['views']
    todays_views = view_history[view_history_length-1]['views']

    return todays_views*0.2 > yesterdays_views

newly_trending_article_list = []

for article in article_list:
    print(f"title: {article['title']}")
    print(f"view_history: {article['view_history']}")

    newly_trending = is_newly_trending(article['view_history'])
    print(f"newly_trending: {newly_trending}")
    article['is_newly_trending'] = newly_trending
    
    if newly_trending:
        newly_trending_article_list.append(article)
  

In [None]:
for article in newly_trending_article_list:
    print(article['title'])
    print(article['is_newly_trending'])
    print(article['view_history'])
    print("")

### Get first 5000 characters of article

In [None]:
for article in newly_trending_article_list:
      # Download raw text of article
  url = f"https://en.wikipedia.org/w/index.php?title={article['title']}&action=raw"
  print(url)

  article_text = requests.get(url).text
  article_text_truncated = article_text[:5000]
  article['text'] =  article_text_truncated

### Creating conversation chain with memory

In [None]:
trending_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful Wikipedia analyst and historian. 
            You speak consiseley and given the choice to say too much or too little, you say too little"""),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

memory = ConversationBufferMemory(return_messages=True)

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.7,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

trending_conversation = ConversationChain(
    llm=llm,
    memory=memory,
    prompt=trending_prompt,
    verbose=True
)

#### Loop through all articles in data structure
- Use LangChain/ChatGPT to give suggestions why each one is trending
- Save reason to structure

In [None]:

for article in newly_trending_article_list:

    title = article['normalized_title']
    text = article['text']

    prediction_prompt = f"""Act as a professional news summarizer. Based on your knowledge of {title} 
    and the following extract. In 1 concise and confident sentence, explain why the {title} 
    article might be trending on Wikipedia on #{date_to_query}:\n\n{text}"""

    response = trending_conversation.predict(input=prediction_prompt)
    print("trending_reason:", response)
    
    article['trending_reason'] =  response

#### Use conversation memory to derive more context from

- Pass memory from first conversation into a new conversation 
- Search for cross context between today's articles

In [None]:
print(trending_conversation.memory)

In [None]:
memory_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful Wikipedia analyst and historian. 
            You speak consiseley and given the choice to say too much or too little, you say too little.
            If you do not know something, you say so."""),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

# todays_memory = load_memory()
memory_conversation = ConversationChain(
    llm=llm,
    memory=trending_conversation.memory,
    prompt=memory_prompt,
    verbose=True
)

llm_miss_response = "False"

for article in newly_trending_article_list[:1]:
    

    title = article['normalized_title']
    text = article['text']
    print(f"Analyzing {title}")
    memory_prompt = f"""Does {title} relate to any other trending article from today?
     If it does, give me a short description of the relation. If it does not, reply with '{llm_miss_response}'"""

    response = memory_conversation.predict(input=memory_prompt)
    print("memory_context:", response)
    
    article['memory_context'] =  response

In [None]:
# import requests
# import json

url = "https://google.serper.dev/news"

for article in newly_trending_article_list:

  title = article['normalized_title']
  payload = json.dumps({
    "q": title,
    "tbs": "qdr:w"
  })
  headers = {
    'X-API-KEY': SERPER_API_KEY,
    'Content-Type': 'application/json'
  }

  response = requests.request("POST", url, headers=headers, data=payload)

  article['raw_new_results'] = response.json()

  print(response.text)

In [None]:
news_prompt_template = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful Wikipedia news analyst. 
            You speak consiseley and given the choice to say too much or too little, you say too little.
            If you do not know something, you say so."""),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

news_memory = ConversationBufferMemory(return_messages=True)

# todays_memory = load_memory()
news_conversation = ConversationChain(
    llm=llm,
    prompt=news_prompt_template,
    memory=news_memory,
    verbose=True
)

llm_miss_response = "False"

for article in newly_trending_article_list:
    print(f"Analyzing {title}")

    title = article['title']
    news = article['raw_new_results']

    news_prompt = f"""Does {title} relate to any current news found in this list {news}?
     If it does not, reply with '{llm_miss_response}'
     
     If it does, reply with a consise description with no leading text. For example:

     instead of 'The xxxxxx article might be trending on Wikipedia due to ..... [reason]'
     you will return: 'reason'

     You will follow this with 3 links to relevant articles in the html format:
    <br>
    <ul>
     <li><a href="link">Title</a> snippet</li>
     <li><a href="link">Title</a> snippet</li>
     <li><a href="link">Title</a> snippet</li>
    </ul>
     """

    response = news_conversation.predict(input=news_prompt)
    print("news_relation:", response)
    
    article['news_relation'] =  response

In [None]:

# Ensure the 'featured-feed' folder exists
file_directory = "data"
os.makedirs(file_directory, exist_ok=True)

# Define the filename based on the date
base_file_name = date_to_query.strftime('%Y-%m-%d')
file_path = f'{file_directory}/{base_file_name}_with_news.json'

# Save to JSON file (overwrite if it already exists)
with open(file_path, 'w', encoding='utf-8') as file:
    json.dump(newly_trending_article_list, file, indent=4, ensure_ascii=False)

print(f'Dumped trending list to {file_path}')

In [None]:
print(newly_trending_article_list)

#### Build HTML Page to display the top 10 list complete with 
- title
- thumbnail
- trending reason
- relation to other articles

In [None]:
# Start building the HTML
html_title = f"<h1>Newly Trending on {date_to_query.strftime("%B %d, %Y")}</h1>\n"
if len(newly_trending_article_list) > 0:
    html_list = "<ol>\n"

    # Iterate through the data
    for item in newly_trending_article_list:
        title = item['normalized_title']
        link = item['link']
        views = item['views']
        thumbnail = item['thumbnail']
        trending_reason = item['trending_reason']
        news_relation = item['news_relation']
        
        memory_context = item['memory_context']
        extract = item['extract']
        

        # Handle null thumbnail
        if thumbnail:
            thumbnail_html = f'<img src="{thumbnail}" alt="Thumbnail for {title}"/><br>'
        else:
            thumbnail_html = ''
        

        # Handle relation to others prompt returning a miss, 
        sanitized_memory_context = memory_context.strip().rstrip('.').lower()

        if sanitized_memory_context == llm_miss_response.lower():
            article_relation_output = ''
        else:
            article_relation_output = f"<strong>Relation to other trending articles:</strong> {memory_context}<br><br>"

        sanitized_news_relation = news_relation.strip().rstrip('.').lower()

        if sanitized_news_relation == llm_miss_response.lower():
            news_relation_output = ''
        else:
            news_relation_output = f"<strong>News related to this:</strong> {news_relation}<br><br>"


        # Create a list item for each entry
        html_list += f"""
        <li>
            <h2>
            <a href="{link}" target="_blank">{title}</a><br>
            </h2>
            {thumbnail_html}
            <strong>Views:</strong> {views}<br><br>
            <strong>Reason for Trending:</strong> {trending_reason}<br><br>
            {article_relation_output}
            {news_relation_output}
            
        </li>
        """
        

    # Close the HTML list
    html_list += "\n</ol>"
else:
    html_list = "<p>No articles are trending today.</p>"
html_page = html_title + html_list



In [None]:
# Ensure the 'data' folder exists
file_directory = "data"
os.makedirs(file_directory, exist_ok=True)

# Define the filename based on the date
base_file_name = date_to_query.strftime('%Y-%m-%d')

# Save to html file (overwrite if it already exists)
file_path = f'{file_directory}/{base_file_name}.html'

with open(file_path, 'w', encoding='utf-8') as file:
    file.write(html_page)

# Prepend to the master file
master_file_path = f'{file_directory}/master.html'

# Read the existing content of the master file if it exists
if os.path.exists(master_file_path):
    with open(master_file_path, 'r', encoding='utf-8') as master_file:
        master_content = master_file.read()
else:
    master_content = ''

# Combine the new content with the old master content
updated_master_content = html_page + '\n' + master_content

# Save the updated content back to the master file
with open(master_file_path, 'w', encoding='utf-8') as master_file:
    master_file.write(updated_master_content)

### Display generated html

In [287]:
# Display the HTML in the notebook (assuming Jupyter or similar)
from IPython.display import display, HTML
display(HTML(updated_master_content))