# Trending Wikipedia articles with current news context

At this point we have a good handle articles that trend upon the death of a notable figure. But on Tuesday Jan 14, 2024 we had 5 articles trend

1. [Aunt Jemima](https://en.wikipedia.org/wiki/Aunt_Jemima)
2. [Josie Totah](https://en.wikipedia.org/wiki/Josie_Totah)
3. [Dendrocnide moroides](https://en.wikipedia.org/wiki/Dendrocnide_moroides)
4. [Robin Zander](https://en.wikipedia.org/wiki/Robin_Zander)
5. [Zane Gonzalez](https://en.wikipedia.org/wiki/Zane_Gonzalez)

Not a single one of these reasons was correctly identified so it is due time to get up-to-date news tied into this thing!

Using [Serper](https://serper.dev/) I submitted a query for recent news pertaining to each article. I then passed this raw output to ChatGPT asking if it could see a reason for it to trend today. 

Initially I saw that the results for dendrocnide moroides were not recent or current. I thought I'd have to iterate on that and pre-filter the results by date. But by passing the entire list of out-of-date articles to chatgpt, it correctly returned a negative result!

So now we know that Josie is a former child star who kissed someone on TikTok, Robin Zander performed in a halftime show with his band Cheap Trick during a football game Zane Gonzalez evidently played in.

We have absolutely no idea why dendrocnide doroides trended... it's an Australian plant that... surprise surprise... can kill you. There's no current news for this.. not even in my own search so I think I'll have to link social media into this next. 

Aunt Jemima... I'm not completely sold on why she's trending so far..

## Takeaways
- Very pleasantly surprised with how well ChatGPT handled a prompt with raw json from the news api
- Impressed with ChatGPT's ability to reply with an html structure

In [4]:
#!pip install langchain langchain-openai langchain-community openai

In [3]:
from dotenv import load_dotenv
import os

load_dotenv()

OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
SERPER_API_KEY = os.environ["SERPER_API_KEY"] 

In [4]:
import json
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

### Get trending wikipedia articles

In [39]:
import requests
import datetime

today = datetime.datetime.now()
yesterday = today - datetime.timedelta(days=1)

date_to_query = yesterday
url = f"https://api.wikimedia.org/feed/v1/wikipedia/en/featured/{date_to_query.strftime('%Y/%m/%d')}"


response = requests.get(url)
featured_feed = response.json()
print(f"API call: {url}")

API call: https://api.wikimedia.org/feed/v1/wikipedia/en/featured/2025/02/23


### Save raw Wikipedia data to file

In [40]:

# Ensure the 'featured-feed' folder exists
file_directory = "data"
os.makedirs(file_directory, exist_ok=True)

# Define the filename based on the date
base_file_name = date_to_query.strftime('%Y-%m-%d')
file_path = f'{file_directory}/{base_file_name}.json'

# Save to JSON file (overwrite if it already exists)
with open(file_path, 'w', encoding='utf-8') as file:
    json.dump(featured_feed, file, indent=4, ensure_ascii=False)

print(f'Saved Wikipedia response to {file_path}')

Saved Wikipedia response to data/2025-02-23.json


### Build data structure with all relevant information and placeholders for LLM responses

In [41]:
article_list = []


for item in featured_feed['mostread']['articles']:
    title = item['title']
    normalized_title = item['titles']['normalized']
    views = item['views']
    link = item['content_urls']['desktop']['page']
    extract = item['extract']
    thumbnail = item.get('thumbnail', {}).get('source', None)
    view_history = item['view_history']

    article={
        'title': title,
        'normalized_title': normalized_title,
        'views': views,
        'link': link,
        'thumbnail': thumbnail,
        'extract': extract,
        'text': '',
        'trending_reason': '',
        'memory_context': '',
        'view_history': view_history,
        'is_newly_trending': '',
        'raw_new_results': '',
        'news_relation': ''
    }

    article_list.append(article)

## Determine if the article is newly trending. If it is, add to new list

- If the article views increased by a factor of 5 from the previous day I'm calling newly trending



In [42]:
def is_newly_trending(view_history):
    view_history_length = len(view_history)

    yesterdays_views = view_history[view_history_length-2]['views']
    todays_views = view_history[view_history_length-1]['views']

    return todays_views*0.2 > yesterdays_views

newly_trending_article_list = []

for article in article_list:
    print(f"title: {article['title']}")
    print(f"view_history: {article['view_history']}")

    newly_trending = is_newly_trending(article['view_history'])
    print(f"newly_trending: {newly_trending}")
    article['is_newly_trending'] = newly_trending
    
    if newly_trending:
        newly_trending_article_list.append(article)
  

title: Kash_Patel
view_history: [{'date': '2025-02-18Z', 'views': 20103}, {'date': '2025-02-19Z', 'views': 29523}, {'date': '2025-02-20Z', 'views': 251785}, {'date': '2025-02-21Z', 'views': 436366}, {'date': '2025-02-22Z', 'views': 387898}]
newly_trending: False
title: Chhaava
view_history: [{'date': '2025-02-18Z', 'views': 398228}, {'date': '2025-02-19Z', 'views': 364197}, {'date': '2025-02-20Z', 'views': 293562}, {'date': '2025-02-21Z', 'views': 279447}, {'date': '2025-02-22Z', 'views': 278858}]
newly_trending: False
title: Zero_Day_(American_TV_series)
view_history: [{'date': '2025-02-18Z', 'views': 18085}, {'date': '2025-02-19Z', 'views': 16629}, {'date': '2025-02-20Z', 'views': 166157}, {'date': '2025-02-21Z', 'views': 235842}, {'date': '2025-02-22Z', 'views': 270689}]
newly_trending: False
title: Sambhaji
view_history: [{'date': '2025-02-18Z', 'views': 332718}, {'date': '2025-02-19Z', 'views': 340163}, {'date': '2025-02-20Z', 'views': 203730}, {'date': '2025-02-21Z', 'views': 207

In [43]:
for article in newly_trending_article_list:
    print(article['title'])
    print(article['is_newly_trending'])
    print(article['view_history'])
    print("")

Charles_Q._Brown_Jr.
True
[{'date': '2025-02-18Z', 'views': 1106}, {'date': '2025-02-19Z', 'views': 1273}, {'date': '2025-02-20Z', 'views': 5855}, {'date': '2025-02-21Z', 'views': 7154}, {'date': '2025-02-22Z', 'views': 216291}]

Dan_Caine
True
[{'date': '2025-02-18Z', 'views': 17}, {'date': '2025-02-19Z', 'views': 557}, {'date': '2025-02-20Z', 'views': 480}, {'date': '2025-02-21Z', 'views': 112}, {'date': '2025-02-22Z', 'views': 207118}]

Christine_McVie
True
[{'date': '2025-02-18Z', 'views': 2424}, {'date': '2025-02-19Z', 'views': 2477}, {'date': '2025-02-20Z', 'views': 2759}, {'date': '2025-02-21Z', 'views': 2808}, {'date': '2025-02-22Z', 'views': 116156}]

Artur_Beterbiev
True
[{'date': '2025-02-18Z', 'views': 8263}, {'date': '2025-02-19Z', 'views': 8813}, {'date': '2025-02-20Z', 'views': 9556}, {'date': '2025-02-21Z', 'views': 13292}, {'date': '2025-02-22Z', 'views': 106065}]

Dmitry_Bivol
True
[{'date': '2025-02-18Z', 'views': 6876}, {'date': '2025-02-19Z', 'views': 8248}, {'date

### Get first 5000 characters of article

In [44]:
for article in newly_trending_article_list:
      # Download raw text of article
  url = f"https://en.wikipedia.org/w/index.php?title={article['title']}&action=raw"
  print(url)

  article_text = requests.get(url).text
  article_text_truncated = article_text[:5000]
  article['text'] =  article_text_truncated

https://en.wikipedia.org/w/index.php?title=Charles_Q._Brown_Jr.&action=raw
https://en.wikipedia.org/w/index.php?title=Dan_Caine&action=raw
https://en.wikipedia.org/w/index.php?title=Christine_McVie&action=raw
https://en.wikipedia.org/w/index.php?title=Artur_Beterbiev&action=raw
https://en.wikipedia.org/w/index.php?title=Dmitry_Bivol&action=raw
https://en.wikipedia.org/w/index.php?title=Can-can&action=raw
https://en.wikipedia.org/w/index.php?title=Fleetwood_Mac&action=raw
https://en.wikipedia.org/w/index.php?title=Lisa_Franchetti&action=raw
https://en.wikipedia.org/w/index.php?title=Lynne_Marie_Stewart&action=raw
https://en.wikipedia.org/w/index.php?title=Stevie_Nicks&action=raw
https://en.wikipedia.org/w/index.php?title=Chairman_of_the_Joint_Chiefs_of_Staff&action=raw


### Creating conversation chain with memory

In [45]:
trending_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful Wikipedia analyst and historian. 
            You speak consiseley and given the choice to say too much or too little, you say too little"""),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

memory = ConversationBufferMemory(return_messages=True)

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.7,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

trending_conversation = ConversationChain(
    llm=llm,
    memory=memory,
    prompt=trending_prompt,
    verbose=True
)

#### Loop through all articles in data structure
- Use LangChain/ChatGPT to give suggestions why each one is trending
- Save reason to structure

In [46]:

for article in newly_trending_article_list:

    title = article['normalized_title']
    text = article['text']

    prediction_prompt = f"""Act as a professional news summarizer. Based on your knowledge of {title} 
    and the following extract. In 1 concise and confident sentence, explain why the {title} 
    article might be trending on Wikipedia on #{date_to_query}:\n\n{text}"""

    response = trending_conversation.predict(input=prediction_prompt)
    print("trending_reason:", response)
    
    article['trending_reason'] =  response



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: You are a helpful Wikipedia analyst and historian. 
            You speak consiseley and given the choice to say too much or too little, you say too little
Human: Act as a professional news summarizer. Based on your knowledge of Charles Q. Brown Jr. 
    and the following extract. In 1 concise and confident sentence, explain why the Charles Q. Brown Jr. 
    article might be trending on Wikipedia on #2025-02-23 12:58:00.917193:

{{short description|US Air Force general (born 1962)}}
{{Use dmy dates|date=December 2024}}{{Use American English|date=December 2024}}
{{Infobox military person
|image        = CJCS Brown.jpg
|alt          = 
|caption      = Official portrait, 2023
|nickname     = CQ
|birth_date   = {{birth date and age|1962|3|2|df=y}}
|birth_place  = [[San Antonio]], Texas, U.S.
|death_date   = 
|death_place  = 
|allegiance   = United States
|branch       = [[United States Air For

#### Use conversation memory to derive more context from

- Pass memory from first conversation into a new conversation 
- Search for cross context between today's articles

In [47]:
print(trending_conversation.memory)

chat_memory=InMemoryChatMessageHistory(messages=[HumanMessage(content='Act as a professional news summarizer. Based on your knowledge of Charles Q. Brown Jr. \n    and the following extract. In 1 concise and confident sentence, explain why the Charles Q. Brown Jr. \n    article might be trending on Wikipedia on #2025-02-23 12:58:00.917193:\n\n{{short description|US Air Force general (born 1962)}}\n{{Use dmy dates|date=December 2024}}{{Use American English|date=December 2024}}\n{{Infobox military person\n|image        = CJCS Brown.jpg\n|alt          = \n|caption      = Official portrait, 2023\n|nickname     = CQ\n|birth_date   = {{birth date and age|1962|3|2|df=y}}\n|birth_place  = [[San Antonio]], Texas, U.S.\n|death_date   = \n|death_place  = \n|allegiance   = United States\n|branch       = [[United States Air Force]]\n|serviceyears = 1984–present\n|rank         = [[General (United States)|General]]\n|commands     = {{plainlist|\n*[[Chairman of the Joint Chiefs of Staff]]\n*[[Chief of

In [48]:
memory_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful Wikipedia analyst and historian. 
            You speak consiseley and given the choice to say too much or too little, you say too little.
            If you do not know something, you say so."""),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

# todays_memory = load_memory()
memory_conversation = ConversationChain(
    llm=llm,
    memory=trending_conversation.memory,
    prompt=memory_prompt,
    verbose=True
)

llm_miss_response = "False"

for article in newly_trending_article_list[:1]:
    

    title = article['normalized_title']
    text = article['text']
    print(f"Analyzing {title}")
    memory_prompt = f"""Does {title} relate to any other trending article from today?
     If it does, give me a short description of the relation. If it does not, reply with '{llm_miss_response}'"""

    response = memory_conversation.predict(input=memory_prompt)
    print("memory_context:", response)
    
    article['memory_context'] =  response

Analyzing Charles Q. Brown Jr.


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: You are a helpful Wikipedia analyst and historian. 
            You speak consiseley and given the choice to say too much or too little, you say too little.
            If you do not know something, you say so.
Human: Act as a professional news summarizer. Based on your knowledge of Charles Q. Brown Jr. 
    and the following extract. In 1 concise and confident sentence, explain why the Charles Q. Brown Jr. 
    article might be trending on Wikipedia on #2025-02-23 12:58:00.917193:

{{short description|US Air Force general (born 1962)}}
{{Use dmy dates|date=December 2024}}{{Use American English|date=December 2024}}
{{Infobox military person
|image        = CJCS Brown.jpg
|alt          = 
|caption      = Official portrait, 2023
|nickname     = CQ
|birth_date   = {{birth date and age|1962|3|2|df=y}}
|birth_place  = [[San Antonio]], Texas, U.S.
|death_date   = 


In [49]:
# import requests
# import json

url = "https://google.serper.dev/news"

for article in newly_trending_article_list:

  title = article['normalized_title']
  payload = json.dumps({
    "q": title,
    "tbs": "qdr:w"
  })
  headers = {
    'X-API-KEY': SERPER_API_KEY,
    'Content-Type': 'application/json'
  }

  response = requests.request("POST", url, headers=headers, data=payload)

  article['raw_new_results'] = response.json()

  print(response.text)

{"searchParameters":{"q":"Charles Q. Brown Jr.","type":"news","tbs":"qdr:w","engine":"google"},"news":[{"title":"Trump fires chairman of the Joint Chiefs of Staff Gen. Charles Q. Brown Jr.","link":"https://www.npr.org/2025/02/21/nx-s1-5305288/trump-fires-chairman-joint-chiefs-of-staff-charles-brown-pentagon","snippet":"President Trump has fired the chairman of the Joint Chiefs of Staff, Air Force Gen. Charles Q. Brown Jr., and announced he will nominate a retired...","date":"2 days ago","source":"NPR","imageUrl":"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTQFwPz2eeLJp_ycN9IpaZX1jboD5tQYttdTFtK1g4rFLlxfoX3-Okea9g&usqp=CAI&s","position":1},{"title":"Trump Fires Joint Chiefs Chairman Amid Flurry of Dismissals at Pentagon","link":"https://www.nytimes.com/2025/02/21/us/politics/trump-fires-cq-brown-pentagon.html","snippet":"The decision to fire Gen. Charles Q. Brown Jr., a four-star fighter pilot, broke a tradition in which the Joint Chiefs chairman remains in place with a new...

In [50]:
news_prompt_template = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful Wikipedia news analyst. 
            You speak consiseley and given the choice to say too much or too little, you say too little.
            If you do not know something, you say so."""),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

news_memory = ConversationBufferMemory(return_messages=True)

# todays_memory = load_memory()
news_conversation = ConversationChain(
    llm=llm,
    prompt=news_prompt_template,
    memory=news_memory,
    verbose=True
)

llm_miss_response = "False"

for article in newly_trending_article_list:
    print(f"Analyzing {title}")

    title = article['title']
    news = article['raw_new_results']

    news_prompt = f"""Does {title} relate to any current news found in this list {news}?
     If it does not, reply with '{llm_miss_response}'
     
     If it does, reply with a consise description with no leading text. For example:

     instead of 'The xxxxxx article might be trending on Wikipedia due to ..... [reason]'
     you will return: 'reason'

     You will follow this with 3 links to relevant articles in the html format:
    <br>
    <ul>
     <li><a href="link">Title</a> snippet</li>
     <li><a href="link">Title</a> snippet</li>
     <li><a href="link">Title</a> snippet</li>
    </ul>
     """

    response = news_conversation.predict(input=news_prompt)
    print("news_relation:", response)
    
    article['news_relation'] =  response

Analyzing Chairman of the Joint Chiefs of Staff


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: You are a helpful Wikipedia news analyst. 
            You speak consiseley and given the choice to say too much or too little, you say too little.
            If you do not know something, you say so.
Human: Does Charles_Q._Brown_Jr. relate to any current news found in this list {'searchParameters': {'q': 'Charles Q. Brown Jr.', 'type': 'news', 'tbs': 'qdr:w', 'engine': 'google'}, 'news': [{'title': 'Trump fires chairman of the Joint Chiefs of Staff Gen. Charles Q. Brown Jr.', 'link': 'https://www.npr.org/2025/02/21/nx-s1-5305288/trump-fires-chairman-joint-chiefs-of-staff-charles-brown-pentagon', 'snippet': 'President Trump has fired the chairman of the Joint Chiefs of Staff, Air Force Gen. Charles Q. Brown Jr., and announced he will nominate a retired...', 'date': '2 days ago', 'source': 'NPR', 'imageUrl': 'https://encrypted-tbn0.gstatic.co

In [51]:

# Ensure the 'featured-feed' folder exists
file_directory = "data"
os.makedirs(file_directory, exist_ok=True)

# Define the filename based on the date
base_file_name = date_to_query.strftime('%Y-%m-%d')
file_path = f'{file_directory}/{base_file_name}_with_news.json'

# Save to JSON file (overwrite if it already exists)
with open(file_path, 'w', encoding='utf-8') as file:
    json.dump(newly_trending_article_list, file, indent=4, ensure_ascii=False)

print(f'Dumped trending list to {file_path}')

Dumped trending list to data/2025-02-23_with_news.json


In [52]:
print(newly_trending_article_list)

[{'title': 'Charles_Q._Brown_Jr.', 'normalized_title': 'Charles Q. Brown Jr.', 'views': 216291, 'link': 'https://en.wikipedia.org/wiki/Charles_Q._Brown_Jr.', 'thumbnail': 'https://upload.wikimedia.org/wikipedia/commons/thumb/1/10/Gen_Charles_Q._Brown_Jr._%283%29.jpg/320px-Gen_Charles_Q._Brown_Jr._%283%29.jpg', 'extract': 'Charles Quinton Brown Jr. is a United States Air Force general who served as the 21st chairman of the Joint Chiefs of Staff from 2023 to 2025.', 'text': '{{short description|US Air Force general (born 1962)}}\n{{Use dmy dates|date=December 2024}}{{Use American English|date=December 2024}}\n{{Infobox military person\n|image        = CJCS Brown.jpg\n|alt          = \n|caption      = Official portrait, 2023\n|nickname     = CQ\n|birth_date   = {{birth date and age|1962|3|2|df=y}}\n|birth_place  = [[San Antonio]], Texas, U.S.\n|death_date   = \n|death_place  = \n|allegiance   = United States\n|branch       = [[United States Air Force]]\n|serviceyears = 1984–present\n|rank

#### Build HTML Page to display the top 10 list complete with 
- title
- thumbnail
- trending reason
- relation to other articles

In [53]:
# Start building the HTML
html_title = f"<h1>Newly Trending on {date_to_query.strftime("%B %d, %Y")}</h1>\n"
if len(newly_trending_article_list) > 0:
    html_list = "<ol>\n"

    # Iterate through the data
    for item in newly_trending_article_list:
        title = item['normalized_title']
        link = item['link']
        views = item['views']
        thumbnail = item['thumbnail']
        trending_reason = item['trending_reason']
        news_relation = item['news_relation']
        
        memory_context = item['memory_context']
        extract = item['extract']
        

        # Handle null thumbnail
        if thumbnail:
            thumbnail_html = f'<img src="{thumbnail}" alt="Thumbnail for {title}"/><br>'
        else:
            thumbnail_html = ''
        

        # Handle relation to others prompt returning a miss, 
        sanitized_memory_context = memory_context.strip().rstrip('.').lower()

        if sanitized_memory_context == llm_miss_response.lower():
            article_relation_output = ''
        else:
            article_relation_output = f"<strong>Relation to other trending articles:</strong> {memory_context}<br><br>"

        sanitized_news_relation = news_relation.strip().rstrip('.').lower()

        if sanitized_news_relation == llm_miss_response.lower():
            news_relation_output = ''
        else:
            news_relation_output = f"<strong>News related to this:</strong> {news_relation}<br><br>"

        view_history_list = "<ul>"
        for view in item['view_history']:
            view_history_list += f"<li><strong>{view['date'].split("Z")[0]}:</strong> {view['views']:,}</li>"
        view_history_list += "</ul>"

        # Create a list item for each entry
        html_list += f"""
        <li>
            <h2>
            <a href="{link}" target="_blank">{title}</a><br>
            </h2>
            {thumbnail_html}
            <strong>Views:</strong><br>
            {view_history_list}<br><br>
            <strong>Reason for Trending:</strong> {trending_reason}<br><br>
            {article_relation_output}
            {news_relation_output}
            
        </li>
        """
        

    # Close the HTML list
    html_list += "\n</ol>"
else:
    html_list = "<p>No articles are trending today.</p>"
html_page = html_title + html_list



In [54]:
# Ensure the 'data' folder exists
file_directory = "data"
os.makedirs(file_directory, exist_ok=True)

# Define the filename based on the date
base_file_name = date_to_query.strftime('%Y-%m-%d')

# Save to html file (overwrite if it already exists)
file_path = f'{file_directory}/{base_file_name}.html'

with open(file_path, 'w', encoding='utf-8') as file:
    file.write(html_page)

# Prepend to the master file
master_file_path = f'{file_directory}/master.html'

# Read the existing content of the master file if it exists
if os.path.exists(master_file_path):
    with open(master_file_path, 'r', encoding='utf-8') as master_file:
        master_content = master_file.read()
else:
    master_content = ''

# Combine the new content with the old master content
updated_master_content = html_page + '\n' + master_content

# Save the updated content back to the master file
with open(master_file_path, 'w', encoding='utf-8') as master_file:
    master_file.write(updated_master_content)

### Display generated html

In [55]:
# Display the HTML in the notebook (assuming Jupyter or similar)
from IPython.display import display, HTML
display(HTML(updated_master_content))