# Business Logic
---
This is going to be an experimentation for natural language processing in the context of finance. Using libraries like FinBERT, and LDA, we shall leverage language models to help us inform people about what's going in the world of a specific company. Practices will be followed by extracting the article for a given stock and reusing
the methodddology for analysis here.
---


# Importing Libraries

In [25]:
!pip install feedparser



In [26]:
import yfinance as yf
import pandas as pd
from transformers import BertForSequenceClassification, BertTokenizer
import feedparser
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import re
import torch

# Text extraction

In [27]:
class YahooFinanceFullArticleScraper:
    """
    Extracts full article content from Yahoo Finance RSS feeds
    Uses RSS for article discovery, then fetches full content from article URLs
    """

    def __init__(self):
        self.base_rss_url = "https://feeds.finance.yahoo.com/rss/2.0/headline?s={}&region=US&lang=en-US"
        self.headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'}

    def _extract_article_text(self, url):
        """Extract full article text from a Yahoo Finance article URL"""
        try:
            response = requests.get(url, headers=self.headers, timeout=10)
            response.raise_for_status()

            soup = BeautifulSoup(response.content, 'html.parser')

            # Find article content - Yahoo Finance uses various selectors
            article_content = None

            # Try common article content selectors
            selectors = [
                'article',
                '[data-module="ArticleBody"]',
                '.caas-body',
                '.article-body',
                '[class*="article"]',
                '[class*="content"]'
            ]

            for selector in selectors:
                article_content = soup.select_one(selector)
                if article_content:
                    break

            if not article_content:
                # Fallback: find main content area
                article_content = soup.find('main') or soup.find('article')

            if article_content:
                # Remove script and style elements
                for script in article_content(["script", "style", "nav", "footer", "header"]):
                    script.decompose()

                # Extract text and clean it
                text = article_content.get_text(separator=' ', strip=True)
                # Clean up multiple whitespaces
                text = re.sub(r'\s+', ' ', text).strip()
                return text

            return None

        except Exception as e:
            return None

    def get_full_articles_for_ticker(self, ticker, max_articles=10, verbose=False):
        """
        Get full article content for a ticker symbol

        Args:
            ticker (str): Stock ticker symbol
            max_articles (int): Maximum number of articles to fetch
            verbose (bool): Print progress messages

        Returns:
            list: List of articles with full text content
        """
        articles = []

        if verbose:
            print(f"Fetching RSS feed for {ticker}...")

        try:
            # Get RSS feed
            feed_url = self.base_rss_url.format(ticker.upper())
            feed = feedparser.parse(feed_url)

            if not feed.entries:
                if verbose:
                    print(f"  ‚ö† No articles found for {ticker}")
                return articles

            if verbose:
                print(f"Found {len(feed.entries)} articles in RSS feed")
                print(f"Fetching full content (filtering for articles >150 words)..\n")

            # Process articles until we have max_articles that meet the word count requirement
            articles_processed = 0
            articles_skipped = 0

            for entry in feed.entries:
                # Stop if we have enough articles
                if len(articles) >= max_articles:
                    break

                articles_processed += 1
                article_url = entry.get('link', '').strip()
                title = entry.get('title', '').strip()

                if not article_url:
                    continue

                if verbose:
                    title_short = title[:60] + "..." if len(title) > 60 else title
                    print(f"[{articles_processed}] Fetching: {title_short}...")

                # Extract full article text
                full_text = self._extract_article_text(article_url)

                # Parse publication date
                published = entry.get('published', '')
                published_datetime = None
                if hasattr(entry, 'published_parsed') and entry.published_parsed:
                    try:
                        published_datetime = datetime(*entry.published_parsed[:6])
                    except:
                        pass

                # Calculate word count
                word_count = len(full_text.split()) if full_text else 0

                # Filter: Only keep articles with more than 150 words
                if word_count <= 150:
                    articles_skipped += 1
                    if verbose:
                        print(f"    ‚ö† Skipped: {word_count} words (minimum 150 required)")
                    continue

                article = {
                    'ticker': ticker.upper(),
                    'title': title,
                    'link': article_url,
                    'rss_description': entry.get('summary', '').strip(),
                    'published': published,
                    'published_datetime': published_datetime,
                    'guid': entry.get('guid', ''),
                    'full_text': full_text or '',
                    'word_count': word_count,
                    'has_full_text': full_text is not None and len(full_text) > 0
                }

                articles.append(article)

                if verbose and full_text:
                    print(f"    ‚úì Retrieved {word_count} words")
                elif verbose:
                    print(f"    ‚ö† Could not extract content")

            # Summary
            if verbose:
                print(f"\nüìä Summary:")
                print(f"   Articles processed: {articles_processed}")
                print(f"   Articles skipped (<150 words): {articles_skipped}")
                print(f"   Articles returned: {len(articles)}")

            return articles

        except Exception as e:
            if verbose:
                print(f"  ‚úó Error: {e}")
            return articles

# Initialize scraper
scraper = YahooFinanceFullArticleScraper()

# Get full articles for a ticker (e.g., 'AAPL')
# The 'articles' variable will contain a list of dictionaries, each with full article content
articles_for_nlp = scraper.get_full_articles_for_ticker('AAPL', max_articles=5, verbose=False)

In [28]:
articles_for_nlp[0]['full_text']

"Apple's 6 straight records, bitcoin recovery: Market takeaways Yahoo Finance Video Tue, December 2, 2025 at 5:00 PM CST ^IXIC BTC-USD ^DJI ^GSPC DX-Y.NYB Yahoo Finance Markets and Data Editor Jared Blikre joins Asking for a Trend host Josh Lipton to discuss three key takeaways from Tuesday's trading session: Apple ( AAPL ) hitting six straight records, how the US dollar ( DX-Y.NYB ) is moving, and bitcoin ( BTC-USD ) beginning to make a recovery. To watch more expert insights and analysis on the latest market action, check out more Asking for a Trend . Video Transcript 00:00 Speaker A Let's uh focus on Apple first and this is a volatility here or the dollar. We'll get to that in a second, but we're going to talk about Apple's six straight records. In fact, it's been up seven days straight. Six of those, the last ones were record closes. So let's take a look. I'm going to show our uh Nasdaq 100 heat map. Let's do that. And we can see over the last 10 days, it's been up roughly 7%. and 

In [29]:
articles_for_nlp[4]['full_text']

"Top Research Reports for Apple, Tesla & Micron Technology Mark Vickery Tue, December 2, 2025 at 3:34 PM CST 8 min read AAPL TSLA MU KTCC HBB Tuesday, December 2, 2025 The Zacks Research Daily presents the best research output of our analyst team. Today's Research Daily features new research reports on 16 major stocks, including Apple Inc. (AAPL), Tesla, Inc. (TSLA) and Micron Technology, Inc. (MU), as well as two micro-cap stocks: Hamilton Beach Brands Holding Co. (HBB) and Key Tronic Corp. (KTCC). The Zacks microcap research is unique as our research content on these small and under-the-radar companies is the only research of its type in the country. These research reports have been hand-picked from the roughly 70 reports published by our analyst team today. You can see all of today‚Äôs research reports here >>> Ahead of Wall Street The daily 'Ahead of Wall Street' article is a must-read for all investors who would like to be ready for that day's trading action. The article comes out

## Model load

### FinBERT

In [30]:
model_name = "ProsusAI/finbert"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)

# Sentiment analysis

### AAPL

In [31]:
# Extract article text (summary or description, fallback to title)
article_text = articles_for_nlp[0]['full_text']

# Tokenize the extracted text
inputs = tokenizer(article_text, return_tensors="pt", padding=True, truncation=True)

# Pass tokenized input through the model
outputs = model(**inputs)

# Apply softmax to get probabilities
probabilities = torch.softmax(outputs.logits, dim=1)

# Get the predicted sentiment
predicted_class_id = probabilities.argmax().item()
sentiment = model.config.id2label[predicted_class_id]

print(f"Article Text: {article_text}")
print(f"Sentiment Probabilities: {probabilities}")
print(f"Predicted Sentiment: {sentiment}")

Article Text: Apple's 6 straight records, bitcoin recovery: Market takeaways Yahoo Finance Video Tue, December 2, 2025 at 5:00 PM CST ^IXIC BTC-USD ^DJI ^GSPC DX-Y.NYB Yahoo Finance Markets and Data Editor Jared Blikre joins Asking for a Trend host Josh Lipton to discuss three key takeaways from Tuesday's trading session: Apple ( AAPL ) hitting six straight records, how the US dollar ( DX-Y.NYB ) is moving, and bitcoin ( BTC-USD ) beginning to make a recovery. To watch more expert insights and analysis on the latest market action, check out more Asking for a Trend . Video Transcript 00:00 Speaker A Let's uh focus on Apple first and this is a volatility here or the dollar. We'll get to that in a second, but we're going to talk about Apple's six straight records. In fact, it's been up seven days straight. Six of those, the last ones were record closes. So let's take a look. I'm going to show our uh Nasdaq 100 heat map. Let's do that. And we can see over the last 10 days, it's been up rou

In [32]:
# Extract article text (summary or description, fallback to title)
article_text = articles_for_nlp[1]['full_text']

# Tokenize the extracted text
inputs = tokenizer(article_text, return_tensors="pt", padding=True, truncation=True)

# Pass tokenized input through the model
outputs = model(**inputs)

# Apply softmax to get probabilities
probabilities = torch.softmax(outputs.logits, dim=1)

# Get the predicted sentiment
predicted_class_id = probabilities.argmax().item()
sentiment = model.config.id2label[predicted_class_id]

print(f"Article Text: {article_text}")
print(f"Sentiment Probabilities: {probabilities}")
print(f"Predicted Sentiment: {sentiment}")

Article Text: Boeing was the top-performing stock in the S&P 500 on Tuesday, Dec. 2, 2025. GIUSEPPE CACACE / AFP via Getty Images Close Key Takeaways An aerospace giant got a lift on Tuesday, Dec. 2, 2025, as an executive provided an optimistic forecast for deliveries and free cash flow, while a major chipmaker extended its rally. Boeing shares took off after the plane maker's chief financial officer offered a bright outlook for 2026. Intel stock added to its recent string of gains amid speculation about new business from Apple. Shares of an aircraft manufacturer skyrocketed as a top executive said deliveries and free cash flow would trend higher in 2026, while a large semiconductor player extended its hot streak. Major U.S. equities indexes moved higher Tuesday, recovering from a sell-off in the prior session. The S&P 500 advanced 0.3%, the Dow added 0.4%, and the Nasdaq gained 0.6%. In another bright sign for risk assets, the price of Bitcoin ( BTCUSD ) clawed back some of its recent

In [33]:
# Extract article text (summary or description, fallback to title)
article_text = articles_for_nlp[2]['full_text']

# Tokenize the extracted text
inputs = tokenizer(article_text, return_tensors="pt", padding=True, truncation=True)

# Pass tokenized input through the model
outputs = model(**inputs)

# Apply softmax to get probabilities
probabilities = torch.softmax(outputs.logits, dim=1)

# Get the predicted sentiment
predicted_class_id = probabilities.argmax().item()
sentiment = model.config.id2label[predicted_class_id]

print(f"Article Text: {article_text}")
print(f"Sentiment Probabilities: {probabilities}")
print(f"Predicted Sentiment: {sentiment}")

Sentiment Probabilities: tensor([[0.5912, 0.0136, 0.3952]], grad_fn=<SoftmaxBackward0>)
Predicted Sentiment: positive


In [34]:
# Extract article text (summary or description, fallback to title)
article_text = articles_for_nlp[3]['full_text']

# Tokenize the extracted text
inputs = tokenizer(article_text, return_tensors="pt", padding=True, truncation=True)

# Pass tokenized input through the model
outputs = model(**inputs)

# Apply softmax to get probabilities
probabilities = torch.softmax(outputs.logits, dim=1)

# Get the predicted sentiment
predicted_class_id = probabilities.argmax().item()
sentiment = model.config.id2label[predicted_class_id]

print(f"Article Text: {article_text}")
print(f"Sentiment Probabilities: {probabilities}")
print(f"Predicted Sentiment: {sentiment}")

Article Text: Shares of Intel ( INTC +8.65% ) gained on Tuesday, finishing the day up 8.7%. The rise came as the S&P 500 and the Nasdaq Composite jumped 0.2% and 0.5%, respectively. Intel's stock continued to climb today, driven by Friday's unconfirmed report that the chipmaker is set to begin manufacturing semiconductors for Apple 's MacBook Air and iPad Pro. Expand NASDAQ : INTC Intel Today's Change ( 8.65 %) $ 3.46 Current Price $ 43.47 Key Data Points Market Cap $191B Day's Range $ 40.05 - $ 43.68 52wk Range $ 17.66 - $ 43.68 Volume 167M Avg Vol 110M Gross Margin 35.58 % Dividend Yield N/A Intel could be partnering with Apple On Friday, TF International analyst Ming-Chi Kuo claimed on X that Intel will supply Apple with its lower-end M processors, which power the iPad Pro and MacBook Air. The first shipments are expected as early as the second quarter of 2027. If the report is confirmed, it would be a massive win for the embattled chipmaker. Intel once dominated the semiconductor i

In [35]:
# Extract article text (summary or description, fallback to title)
article_text = articles_for_nlp[4]['full_text']

# Tokenize the extracted text
inputs = tokenizer(article_text, return_tensors="pt", padding=True, truncation=True)

# Pass tokenized input through the model
outputs = model(**inputs)

# Apply softmax to get probabilities
probabilities = torch.softmax(outputs.logits, dim=1)

# Get the predicted sentiment
predicted_class_id = probabilities.argmax().item()
sentiment = model.config.id2label[predicted_class_id]

print(f"Article Text: {article_text}")
print(f"Sentiment Probabilities: {probabilities}")
print(f"Predicted Sentiment: {sentiment}")

Article Text: Top Research Reports for Apple, Tesla & Micron Technology Mark Vickery Tue, December 2, 2025 at 3:34 PM CST 8 min read AAPL TSLA MU KTCC HBB Tuesday, December 2, 2025 The Zacks Research Daily presents the best research output of our analyst team. Today's Research Daily features new research reports on 16 major stocks, including Apple Inc. (AAPL), Tesla, Inc. (TSLA) and Micron Technology, Inc. (MU), as well as two micro-cap stocks: Hamilton Beach Brands Holding Co. (HBB) and Key Tronic Corp. (KTCC). The Zacks microcap research is unique as our research content on these small and under-the-radar companies is the only research of its type in the country. These research reports have been hand-picked from the roughly 70 reports published by our analyst team today. You can see all of today‚Äôs research reports here >>> Ahead of Wall Street The daily 'Ahead of Wall Street' article is a must-read for all investors who would like to be ready for that day's trading action. The arti

### Bitcoin (BTC-USD)

In [40]:
articles_for_nlp = scraper.get_full_articles_for_ticker('BTC-USD', max_articles=5, verbose=False)

In [41]:
# Extract article text (summary or description, fallback to title)
article_text = articles_for_nlp[0]['full_text']

# Tokenize the extracted text
inputs = tokenizer(article_text, return_tensors="pt", padding=True, truncation=True)

# Pass tokenized input through the model
outputs = model(**inputs)

# Apply softmax to get probabilities
probabilities = torch.softmax(outputs.logits, dim=1)

# Get the predicted sentiment
predicted_class_id = probabilities.argmax().item()
sentiment = model.config.id2label[predicted_class_id]

print(f"Article Text: {article_text}")
print(f"Sentiment Probabilities: {probabilities}")
print(f"Predicted Sentiment: {sentiment}")

Article Text: Apple's 6 straight records, bitcoin recovery: Market takeaways Yahoo Finance Video Tue, December 2, 2025 at 5:00 PM CST ^IXIC BTC-USD ^DJI ^GSPC DX-Y.NYB Yahoo Finance Markets and Data Editor Jared Blikre joins Asking for a Trend host Josh Lipton to discuss three key takeaways from Tuesday's trading session: Apple ( AAPL ) hitting six straight records, how the US dollar ( DX-Y.NYB ) is moving, and bitcoin ( BTC-USD ) beginning to make a recovery. To watch more expert insights and analysis on the latest market action, check out more Asking for a Trend . Video Transcript 00:00 Speaker A Let's uh focus on Apple first and this is a volatility here or the dollar. We'll get to that in a second, but we're going to talk about Apple's six straight records. In fact, it's been up seven days straight. Six of those, the last ones were record closes. So let's take a look. I'm going to show our uh Nasdaq 100 heat map. Let's do that. And we can see over the last 10 days, it's been up rou

In [42]:
# Extract article text (summary or description, fallback to title)
article_text = articles_for_nlp[1]['full_text']

# Tokenize the extracted text
inputs = tokenizer(article_text, return_tensors="pt", padding=True, truncation=True)

# Pass tokenized input through the model
outputs = model(**inputs)

# Apply softmax to get probabilities
probabilities = torch.softmax(outputs.logits, dim=1)

# Get the predicted sentiment
predicted_class_id = probabilities.argmax().item()
sentiment = model.config.id2label[predicted_class_id]

print(f"Article Text: {article_text}")
print(f"Sentiment Probabilities: {probabilities}")
print(f"Predicted Sentiment: {sentiment}")

Article Text: Vanguard Will Now Allow Crypto ETFs on Its Platform Emily Graffeo Tue, December 2, 2025 at 3:57 PM CST 2 min read BTC-USD GC=F BLK ETH-USD XRP-USD (Bloomberg) -- Vanguard Group, the world‚Äôs second-largest asset manager, has decided to allow ETFs and mutual funds that primarily hold cryptocurrencies to be traded on its platform, reversing a longstanding position. Starting on Tuesday, Vanguard will allow ETFs and mutual funds that primarily hold select cryptocurrencies, including Bitcoin, Ether, XRP, and Solana, to be eligible for trading on its platform. It‚Äôs a compromise that belies the firm‚Äôs long-standing view that digital assets are too volatile and speculative for serious portfolios and comes despite a more than $1 trillion drawdown in crypto market value since early October. Most Read from Bloomberg Steve Cohen, Bally‚Äôs, Genting Picked to Run Casinos in NYC Europe‚Äôs Largest Capital Without a Subway Is Finally Getting One Wealthy New Jersey Town‚Äôs Vote on 

In [43]:
# Extract article text (summary or description, fallback to title)
article_text = articles_for_nlp[2]['full_text']

# Tokenize the extracted text
inputs = tokenizer(article_text, return_tensors="pt", padding=True, truncation=True)

# Pass tokenized input through the model
outputs = model(**inputs)

# Apply softmax to get probabilities
probabilities = torch.softmax(outputs.logits, dim=1)

# Get the predicted sentiment
predicted_class_id = probabilities.argmax().item()
sentiment = model.config.id2label[predicted_class_id]

print(f"Article Text: {article_text}")
print(f"Sentiment Probabilities: {probabilities}")
print(f"Predicted Sentiment: {sentiment}")

Article Text: This Trump-Linked Crypto Stock Just Plunged. Should You Buy the Dip as Shares Hit Deeply Oversold Levels? Wajeeh Khan - Barchart - Tue Dec 2, 3:57PM CST Share Crypto coins by Kanchanara via Unsplash American Bitcoin (ABTC) shares crashed as much as 49% on Dec. 2 amid broader macro-driven turmoil in the cryptocurrency market. Bitcoin (BTCUSD) is currently down some 30% versus its year-to-date high of over $126,000 in early October. This severe correction has triggered massive liquidations exceeding $19 billion, affecting over 1.6 million traders in what analysts describe as one of the most significant deleveraging events in crypto history. Versus its year-to-date high, ABTC stock is now down more than 85% , with its relative strength index (RSI) indicating deeply oversold territory. www.barchart.com Should You Load Up On ABTC Stock at Current Levels Despite a sharp pullback, American Bitcoin shares aren‚Äôt worth buying on the dip as the underlying business model faces unp

In [44]:
# Extract article text (summary or description, fallback to title)
article_text = articles_for_nlp[3]['full_text']

# Tokenize the extracted text
inputs = tokenizer(article_text, return_tensors="pt", padding=True, truncation=True)

# Pass tokenized input through the model
outputs = model(**inputs)

# Apply softmax to get probabilities
probabilities = torch.softmax(outputs.logits, dim=1)

# Get the predicted sentiment
predicted_class_id = probabilities.argmax().item()
sentiment = model.config.id2label[predicted_class_id]

print(f"Article Text: {article_text}")
print(f"Sentiment Probabilities: {probabilities}")
print(f"Predicted Sentiment: {sentiment}")

Article Text: How major US stock indexes fared Tuesday, 12/2/2025 The Associated Press Tue, December 2, 2025 at 3:17 PM CST 1 min read ^GSPC BTC-USD BA ^DJI ^IXIC U.S. stocks bounced back as both bond yields and bitcoin stabilized. The S&P 500 rose 0.2% Tuesday, following its first loss in six days. The Dow Jones Industrial Average added 0.4%, and the Nasdaq composite climbed 0.6%. Boeing was one of the strongest forces lifting the market after it gave an encouraging forecast for how much cash it will produce next year. That helped offset losses for Signet Jewelers and Procter & Gamble, which highlighted potential challenges for U.S. households. Treasury yields eased following their jumps the day before. Bitcoin rose back above $91,000 after tumbling below $85,000 on Monday. On Tuesday: The S&P 500 rose 16.74 points, or 0.2%, to 6,829.37. The Dow Jones Industrial Average rose 185.13 points, or 0.4%, to 47,474.46. The Nasdaq composite rose 137.75 points, or 0.6%, to 23,413.67. The Russe

In [45]:
# Extract article text (summary or description, fallback to title)
article_text = articles_for_nlp[4]['full_text']

# Tokenize the extracted text
inputs = tokenizer(article_text, return_tensors="pt", padding=True, truncation=True)

# Pass tokenized input through the model
outputs = model(**inputs)

# Apply softmax to get probabilities
probabilities = torch.softmax(outputs.logits, dim=1)

# Get the predicted sentiment
predicted_class_id = probabilities.argmax().item()
sentiment = model.config.id2label[predicted_class_id]

print(f"Article Text: {article_text}")
print(f"Sentiment Probabilities: {probabilities}")
print(f"Predicted Sentiment: {sentiment}")

Article Text: Crypto Firm Tied to Trumps Sees Shares Sink as Lockup Ends Monique Mulima Tue, December 2, 2025 at 2:22 PM CST 3 min read BTC-USD ABTC (Bloomberg) -- American Bitcoin Corp. stock plunged on Tuesday after restricted shares of the crypto miner co-founded by Eric Trump were freed up to be traded. Most Read from Bloomberg Steve Cohen, Bally‚Äôs, Genting Picked to Run Casinos in NYC Europe‚Äôs Largest Capital Without a Subway Is Finally Getting One Wealthy New Jersey Town‚Äôs Vote on Fixing School Deficit Canceled The selloff was swift. Shares lost more than half of their value in less than 30 minutes as the equity lockup expired, triggering repeated trading halts. The stock pared declines later in the trading session, falling 35% to $2.33 as of 2:30 p.m. in New York. Shares from a private placement that took place before American Bitcoin merged with Gryphon Digital Mining Inc. became available on Tuesday, according to the crypto miner and a post on the social media platform X

# LDA

## Apple (AAPL)