# AI Gaming Newsletter Experiment -- Project Report

## Overview

This experimental project demonstrates an end‑to‑end workflow for
scraping gaming news, processing long‑form content, and generating a
polished newsletter using LLM‑based summarization. The notebook explores
data collection, text cleaning, large‑context summarization, and
newsletter prompt engineering.

The end goal of a production‑ready system would include: - A backend
service (API) that automates scraping, processing, and summarization\
- A frontend interface to preview, edit, and send newsletters\
- A scheduled weekly pipeline (cron/job scheduler)\
- Automatic email delivery via SendGrid, SES, or similar\
- Support for multiple websites, RSS feeds, and additional content
types\
- Error handling, logging, retries, and monitoring

This notebook focuses on the **experimental workflow**, using basic
Python scripting and a large LLM to handle extended context lengths.

In [44]:
!pip install requests beautifulsoup4 lxml feedparser

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




In [45]:
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import feedparser
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

headers = {
    "User-Agent": "Mozilla/5.0"
}


## Data Collection & Scraping

Several scrapers were implemented, including: - **Gamespot Top 10 News
Scraper** - Other website fetchers and RSS feed parsers - Functions for
extracting article text, metadata, and paragraphs

Key techniques: - Using `requests` with a browser‑like User‑Agent
header\
- Parsing pages using **BeautifulSoup**\
- Extracting
```{=html}
<article>
```
,
```{=html}
<p>
```
, and structured content blocks\
- Cleaning and joining multi‑paragraph articles

This establishes a foundation for multi‑source news ingestion.

In [46]:
import requests
from bs4 import BeautifulSoup
import time

def get_article_content(url):
    """Get content from any GameSpot article URL."""
    headers = {"User-Agent": "Mozilla/5.0"}
    
    try:
        r = requests.get(url, headers=headers, timeout=10)
        soup = BeautifulSoup(r.text, "html.parser")
        
        # Try standard articles first
        paragraphs = soup.select("div.article-body p, div.content-entity-body p")
        if paragraphs:
            return "\n".join([p.get_text(strip=True) for p in paragraphs])
        
        # Try gallery/deal pages
        deck = soup.select_one("div.content-deck, div.article-deck")
        if deck:
            return deck.get_text(strip=True)
        
        # Fallback: get all paragraphs in main content
        main = soup.select_one("main, article")
        if main:
            paragraphs = main.find_all('p')
            return "\n".join([p.get_text(strip=True) for p in paragraphs if len(p.get_text(strip=True)) > 50])
        
        return None
    except:
        return None


def scrape_gamespot(limit=10):
    """Scrape GameSpot news with content."""
    base_url = "https://www.gamespot.com"
    headers = {"User-Agent": "Mozilla/5.0"}
    
    r = requests.get(f"{base_url}/news/", headers=headers)
    soup = BeautifulSoup(r.text, "html.parser")
    
    articles = []
    for item in soup.select("article")[:limit]:
        title_tag = item.select_one("h3")
        link_tag = item.select_one("a")
        
        if not title_tag or not link_tag:
            continue
        
        title = title_tag.get_text(strip=True)
        link = link_tag.get("href")
        if link.startswith("/"):
            link = base_url + link
        
        content = get_article_content(link)
        
        articles.append({
            "title": title,
            "link": link,
            "content": content
        })
        
        time.sleep(0.5)  # Be nice to the server
    
    return articles


# Run it
articles = scrape_gamespot(limit=10)

# Display
for i, article in enumerate(articles, 1):
    print(f"\n[{i}] {article['title']}")
    print(f"Link: {article['link']}")
    if article['content']:
        print(f"Content: {article['content'][:300]}...")
    else:
        print("Content: [Not extracted]")
    print("-" * 80)


[1] The Best Games Of 2025 (So Far)
Link: https://www.gamespot.com/gallery/best-reviewed-games-2025/2900-6127/
Content: GameSpot may receive revenue from affiliate and advertising partnerships for sharing this content and from purchases through links.
It has been a great year so far for gamers, as 2025 has delivered several outstanding games already. The year isn't over yet, but so far we've had sleeper hits like Blu...
--------------------------------------------------------------------------------

[2] 2025 Upcoming Games Release Schedule
Link: https://www.gamespot.com/articles/2025-upcoming-games-release-schedule/1100-6526471/
Content: 2025 has been an interesting year for video games, as the current-gen PlayStation and Xbox systems are well into their lifespans while theNintendo Switch's successor, the Switch 2, is just beginning its life. And let's not forget all the exclusives we're sure to see across PC and mobile platforms--t...
------------------------------------------------

In [47]:
print("\n" + "="*80)
print("DETAILED VIEW - FIRST ARTICLE")
print("="*80)

if articles and len(articles) > 0:
    first = articles[0]
    print(f"\nTitle: {first['title']}")
    print(f"\nLink: {first['link']}")
    print(f"\nTotal Characters: {len(first['content']) if first['content'] else 0}")
    print(f"\n{'='*80}")
    print("FULL CONTENT:")
    print("="*80)
    if first['content']:
        # Show first 5000 characters (adjust as needed)
        print(first['content'][:5000])
        if len(first['content']) > 5000:
            print(f"\n\n... [Showing 5000 of {len(first['content'])} total characters]")
    else:
        print("[NO CONTENT EXTRACTED]")
else:
    print("No articles found!")


DETAILED VIEW - FIRST ARTICLE

Title: The Best Games Of 2025 (So Far)

Link: https://www.gamespot.com/gallery/best-reviewed-games-2025/2900-6127/

Total Characters: 32899

FULL CONTENT:
GameSpot may receive revenue from affiliate and advertising partnerships for sharing this content and from purchases through links.
It has been a great year so far for gamers, as 2025 has delivered several outstanding games already. The year isn't over yet, but so far we've had sleeper hits like Blue Prince and Clair Obscur: Expedition 33 alongside fun blockbuster titles like Assassin's Creed Shadows and Doom: The Dark Ages
On top of that, the Switch launched earlier this year and several of its exclusives have been some of the best games of the year. Mario Kart World is pure pedal-to-the-metal racing action, Donkey Kong Bananza is an exciting adventure for the ape wonder of the world, and we're still looking forward to more first-party releases throughout the year.
You can also count on indie games to

## Using Large Language Models for Summarization

A smaller summarization model was initially considered, but due to: -
**Very long article content** - Combined tokens from multiple sources -
The need for **high‑quality, human‑readable summaries**

A **larger model with extended context** was required.

The notebook loads a high‑capacity HuggingFace model:

``` python
model = AutoModelForSeq2SeqLM.from_pretrained(...)
tokenizer = AutoTokenizer.from_pretrained(...)
```

Technical notes: - The device is detected automatically (`cuda` if
available)\
- FP16 is used on GPU to save memory\
- Text is tokenized with truncation up to **4096 tokens**\
- Articles are summarized in a loop

Summaries are stored and combined downstream.

In [48]:
hf_token = "*****"
model_name = "meta-llama/Llama-3.2-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=hf_token)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained(model_name, token=hf_token, torch_dtype=torch.float16 if device == "cuda" else torch.float32).to(device)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [49]:
# Summarize all articles
for i, article in enumerate(articles, 1):
    if not article['content']:
        print(f"\n[{i}] {article['title']}")
        print("No content")
        continue
    
    # Create prompt
    prompt = f"""Summarize this gaming article in 2-3 clear sentences, and output only the summarization without asking questions:

{article['content'][:8000]}

Summary:"""
    
    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=4096).to(device)
    
    # Generate
    outputs = model.generate(
        **inputs,
        max_new_tokens=300,
        temperature=0.7,
        do_sample=True,
        top_p=0.9
    )
    
    # Decode
    full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    summary = full_response.split("Summary:")[-1].strip()
    
    # Print
    print(f"\n[{i}] {article['title']}")
    print(f"Summary: {summary}")
    print("-" * 80)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



[1] The Best Games Of 2025 (So Far)
Summary: Several standout games have emerged in 2025, including Hades 2, Split Fiction, and Absolum, which have received perfect scores. These games showcase exceptional craftsmanship, engaging gameplay, and memorable storytelling. The year has also seen the release of innovative games like Arc Raiders, Blue Prince, and Cabernet, which offer unique experiences and challenges. Additionally, popular franchises like Elden Ring and Final Fantasy Tactics have been revitalized with new releases. Overall, 2025 has been a great year for gaming.
--------------------------------------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



[2] 2025 Upcoming Games Release Schedule
Summary: There are numerous games planned for release in 2025, covering various platforms such as PlayStation, Xbox, Nintendo Switch, PC, and mobile devices. The list includes a mix of first-party and third-party titles, as well as some indie and niche games. Some notable titles include The Legend of Cyber Cowboy, Ys Memoire: The Oath in Felghana, and Tales of Graces f Remastered. The release window for 2025 is quite extensive, spanning from January to December. 
Note: This summary is based on the provided text and may not reflect the entire scope of games planned for release in 2025.
--------------------------------------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



[3] Three Lego Star Wars UCS Sets Are Discounted In Walmart's Black Friday Sale
Summary: Walmart is offering discounts on several Lego Star Wars Ultimate Collector Series (UCS) models, including the Millennium Falcon, Jabba's Sail Barge, and Jango Fett's Firespray-Class Starship, with prices ranging from $40 to $765. The company is also offering discounts on smaller Lego Star Wars sets, including the Tantive IV Starship and the Executor Super Star Destroyer. Additionally, Walmart is offering a one-year membership to Walmart+, which includes free shipping and other benefits. Some deals include free Walmart Cash, and customers can claim their cash by clicking on a box on the store page.
--------------------------------------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



[4] Expedition 33 Actor Says He's "Truly Grateful" For Outpouring Of Support After Charlie Cox Spoke Out
Summary: Charlie Cox has expressed gratitude and support for Maxence Cazoria, a performance-capture actor in the game Clair Obscur: Expedition 33, after Cox's nomination for Best Performance in the game. Cazoria said he was touched by Cox's words, which he believes acknowledged the collaborative nature of their performance. The two actors will compete for the Best Performance award, which recognizes both voice-acting and performance-capture. Clair Obscur: Expedition 33 is one of the most-nominated games of the year, with 12 nominations including a Game of the Year nod. The Game Awards will take place on December 11.
--------------------------------------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



[5] Check Out Over 200 Nintendo Switch Game Deals For Black Friday
Summary: Nintendo Switch exclusives are on sale for up to 50% off, including Mario Kart World and Zelda: Tears of Kingdom. Walmart is offering the best deals on Switch exclusives, with nine games available for $30 each. In addition to Switch exclusives, many third-party games are discounted to all-time low prices, with some games available for as low as $5. The sale also includes deals on Switch 2 games, such as the Nintendo Switch 2 + Mario Kart World Bundle for $499. 
Nintendo Switch 2 games are also available at discounted prices, with some games available for as low as $15. The sale features a wide range of games across multiple genres, including action, adventure, and role-playing games. 
Nintendo Switch 2 games are also available at discounted prices, with some games available for as low as $15. The sale features a wide range of games across multiple genres, including action, adventure, and role-playing games. 
N

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



[6] Suikoden Remastered Collection Gets Sweet Black Friday Price Cut
Summary: The Suikoden I & II HD Remaster is on sale for $30 at Amazon during Black Friday 2025, with discounts available on Nintendo Switch, PS5, and Xbox Series X, as well as PC players through Fanatical. The remasters feature HD sprites and backgrounds, enhanced sound effects, and quality-of-life options. Suikoden II is widely regarded as one of the best RPGs of all time, and the first game is still a solid experience worth trying. Several other JRPGs and exclusive titles are also on sale for Black Friday.
--------------------------------------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



[7] Save Up To 70% On Atlus RPGs - Metaphor ReFantazio, Persona, SMT & more
Summary: Several Atlus RPGs are on sale for up to 70% off at Amazon, including Metaphor ReFantazio, Persona 3: Reload, Shin Megami Tensei V: Vengeance Edition, and Raidou Remastered. These deals include PS5, Xbox Series X, and Nintendo Switch versions, with prices starting at $18 for some titles. The games offer engaging stories, lovable characters, and enticing gameplay, making them great options for those looking for a big RPG experience.
--------------------------------------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



[8] Steam Machine Price: Valve May Have Just Given The Best Clue Yet
Summary: Valve has provided a potentially telling comment on the Steam Machine's price, suggesting it will be competitive with the cost of building a PC with similar performance. However, the company cannot disclose pricing due to various external factors. The Steam Machine aims to offer value comparable to that of a PC, but its price may be affected by factors such as tariffs and supply chain volatility. 
Valve is taking a more traditional PC pricing approach, aiming for a good deal rather than selling the system at a loss and making up for it with game and accessory sales. The company's focus is on offering a unique product with features like quiet operation, HDMI, and wireless connectivity, rather than just the cost. 
The Steam Machine's price is expected to be influenced by its small form factor, noise level, and integration of features like HDMI and CEC, which may justify a premium price. 
The exact price of the

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



[9] Call Of Duty Hasn't Been Outsold By Another Shooter In The US Since 2006
Summary: Battlefield 6 is reportedly outselling Call of Duty: Black Ops 7 in the US, with Battlefield 6 needing only 22 days to eclipse the lifetime sales of Battlefield 1 in the US. This marks the first time a Battlefield game has outsold a Call of Duty game in the US over a full year. The Call of Duty series has dominated the US gaming market for 13 of the past 16 years, but this year's competitive market and the release of Battlefield 6 may have changed the game. Battlefield 6 is performing better than Call of Duty's latest releases in the US, with first-month sales higher than any other game's first-month sales going back to 2022.
--------------------------------------------------------------------------------

[10] Metal Gear Solid Delta: Tactical Edition Is Only $30 For Black Friday
Summary: Metal Gear Solid Delta: Snake Eater is on sale for $30 (was $70) on PS5 and Xbox Series X, featuring the Tactical


## Newsletter Construction

Once summaries are generated, the notebook: - Builds a combined
newsletter prompt\
- Instructs the model to produce a structured, polished newsletter\
- Includes formatting rules (highlights, main stories, short bullets)

Example logic:

``` python
articles_combined = "\n\n".join(article_summaries)

prompt = f'Create a polished gaming newsletter using the summaries below...'
```

This converts raw scraped data into a readable curated output.

In [60]:
# Build email prompt with all summaries
articles_text = ""
for i, item in enumerate(all_summaries, 1):
    articles_text += f"\n[{i}] {item['title']}\n{item['summary']}\nLink: {item['link']}\n"

email_prompt = f"""
Write a **friendly, lively, and engaging gaming newsletter email** using the following 10 article summaries. 
Instructions:
1. Start with a catchy introduction to excite the reader.
2. Present each article naturally, highlighting the most exciting ones with more detail.
3. Keep less exciting articles brief.
4. Conclude with a warm outro.
5. DO NOT add commentary about the newsletter or instructions.
6. Do NOT include repeated sign-offs, placeholders, or "ready to send" notes.

Articles:
{articles_text}

Write the complete newsletter below, 
Email:
"""



# Generate email
inputs = tokenizer(email_prompt, return_tensors="pt", truncation=True, max_length=4096).to(device)
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.75,
    do_sample=True,
    top_p=0.9
)

full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
email = full_response.split("Email:")[-1].strip()

# Print final email
print("\n" + "="*80)
print("YOUR GAMING NEWSLETTER")
print("="*80)
print(email)
print("\n" + "="*80)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



YOUR GAMING NEWSLETTER
Subject: Get Ready for Epic Gaming Moments in 2025!

Dear Fellow Gamers,

It's time to gear up for an epic year of gaming! We're excited to share the best games of 2025 so far, the most anticipated releases of 2025, and some awesome deals on Lego Star Wars sets. Let's dive in!

**Top Games of 2025 So Far**

2025 is shaping up to be an incredible year for gamers, and we're excited to highlight some of the standout titles that have already hit the market. Games like Hades 2, Split Fiction, Absolum, Arc Raiders, Blue Prince, Cabernet, Clair Obscur: Expedition 33, Donkey Kong Bananza, Elden Ring: Nightreign, Final Fantasy Tactics: The Ivalice Chronicles, and Ghost of Yotei have already received rave reviews and are considered some of the best of the year. With many more great games still to come, we can't wait to see what the rest of 2025 has in store!

**2025 Upcoming Games Release Schedule**

Get ready for the most epic gaming year ever! Several notable games are 


## Next Steps for a Production Pipeline

To transform this prototype into a real, reliable system:

### 1. Backend Automation

-   Convert notebook code into Python modules\
-   Use FastAPI or Flask for a stable service\
-   Implement scraping workers and API endpoints

### 2. Frontend Application

-   Admin dashboard to preview newsletters\
-   Edit or override summaries\
-   Trigger or schedule mailings

### 3. Scheduled Pipelines

-   Cron jobs or Airflow / Prefect\
-   Automatic weekly scraping\
-   Automatic newsletter generation

### 4. Email Delivery

-   Integration with SendGrid, AWS SES, or Mailgun\
-   Support for HTML templates\
-   Tracking and analytics

### 5. Multi‑Website Expansion

-   Add more gaming news sources\
-   RSS feeds\
-   Social posts or video metadata

------------------------------------------------------------------------

## Conclusion

This notebook serves as an experimental prototype showing: - Web
scraping\
- Long‑context LLM summarization\
- Automated newsletter generation

While not production‑ready, it provides a strong starting point for
building a fully automated, scalable, multi‑source gaming newsletter
pipeline.