# **News Sentiment Analysis**

This notebook retrieves financial news articles related to a selected stock ticker over the past 365 days. It performs sentiment analysis on the news headlines and descriptions to determine the overall sentiment trends. The final output provides a daily sentiment score to measure how news coverage might impact investor sentiment.

# **Full Description and Details**


## **Data Sources & API Usage**
**Source**: Polygon.io News API
**API Call Strategy**: Data is fetched in 30-day batches to optimize API usage and ensure complete coverage.
**Ticker Selection**: The stock ticker is pulled from a CSV file containing the daily top 10 tickers (first row is used).
**NASDAQ Data**: A reference file containing NASDAQ-listed stocks and their company names is loaded to ensure correct company associations.

##**Filters & Assumptions**
Only news articles explicitly mentioning
the selected stock ticker are included.

Each article is analyzed for sentiment using the FinBERT model, which categorizes text into positive, neutral, or negative sentiment.

Sentiment is calculated based on the article's title and description (description is used only if the title is unavailable).

**Sentiment Score Calculation**:
Sentiment Score =(Positive Sentiment)−(Negative Sentiment)
Sentiment Score=(Positive Sentiment)−(Negative Sentiment)

A positive score suggests bullish sentiment.
A negative score suggests bearish sentiment.
A score near zero suggests a neutral sentiment.

the CSV file is saved to Google Drive using the format:

news_sentiment_{TICKER}_{YYYY-MM-DD}.csv


# **Code**

In [1]:
import requests
import pandas as pd
from datetime import datetime, timedelta
import time
from transformers import pipeline
from google.colab import drive

In [2]:
drive.mount('/content/drive',force_remount=True)

Mounted at /content/drive


In [15]:
tickers_csv_path = "/content/drive/My Drive/StockDashboard_Automation/ExportToGitHub/Top Ten Tickers.csv"
df_tickers = pd.read_csv(tickers_csv_path)
selected_ticker = df_tickers.iloc[0, 0]

In [16]:
print(f"📌 Selected Ticker: {selected_ticker}")

📌 Selected Ticker: NVDA


In [17]:
def get_nasdaq_symbols():
    file_path = '/content/drive/My Drive/StockDashboard_Automation/nasdaqlisted.txt'
    df = pd.read_csv(file_path, delimiter='\t')
    symbols_df = df[['Symbol', 'Security Name']]
    symbols_df.columns = ['Ticker', 'Company Name']
    return symbols_df

nasdaq_symbols = get_nasdaq_symbols()
print(f"Loaded {len(nasdaq_symbols)} NASDAQ companies.")

Loaded 4804 NASDAQ companies.


In [18]:
company_name = nasdaq_symbols[nasdaq_symbols["Ticker"] == selected_ticker]["Company Name"].values[0]
print(f"📌 Selected Company Name: {company_name}")

📌 Selected Company Name: NVIDIA Corporation - Common Stock


In [19]:
POLYGON_API_KEY = "wdorvTsrsiLe3IGmJ_idEeQRrsCOzR1d"

In [20]:
def fetch_polygon_news(ticker, company_name, days_back=1825, articles_per_request=1000):
    end_date = datetime.utcnow()
    start_date = end_date - timedelta(days=days_back)
    all_articles = []
    interval = 30
    current_start = start_date

    while current_start <= end_date:
        current_end = min(current_start + timedelta(days=interval), end_date)
        print(f"\n📅 Fetching news articles from {current_start.strftime('%Y-%m-%d')} to {current_end.strftime('%Y-%m-%d')}")

        url = "https://api.polygon.io/v2/reference/news"
        params = {
            "ticker": ticker,
            "published_utc.gte": current_start.strftime("%Y-%m-%dT00:00:00Z"),
            "published_utc.lte": current_end.strftime("%Y-%m-%dT23:59:59Z"),
            "limit": articles_per_request,
            "apiKey": POLYGON_API_KEY
        }

        while True:
            response = requests.get(url, params=params)
            time.sleep(1)

            if response.status_code == 200:
                data = response.json()
                articles = data.get("results", [])

                if not articles:
                    break

                for article in articles:
                    all_articles.append({
                        "date": article.get("published_utc", "").split("T")[0],
                        "title": article.get("title"),
                        "description": article.get("description", ""),
                        "source": article.get("publisher", {}).get("name", "Unknown"),
                        "url": article.get("article_url")
                    })

                if "next_url" in data and data["next_url"]:
                    url = data["next_url"]
                else:
                    break

            else:
                print(f"❌ API Error: {response.status_code} - {response.text}")
                break

        current_start = current_end + timedelta(days=1)

    return pd.DataFrame(all_articles)

In [21]:
articles_df = fetch_polygon_news(selected_ticker, company_name)



📅 Fetching news articles from 2020-05-06 to 2020-06-05

📅 Fetching news articles from 2020-06-06 to 2020-07-06

📅 Fetching news articles from 2020-07-07 to 2020-08-06

📅 Fetching news articles from 2020-08-07 to 2020-09-06

📅 Fetching news articles from 2020-09-07 to 2020-10-07

📅 Fetching news articles from 2020-10-08 to 2020-11-07

📅 Fetching news articles from 2020-11-08 to 2020-12-08

📅 Fetching news articles from 2020-12-09 to 2021-01-08

📅 Fetching news articles from 2021-01-09 to 2021-02-08

📅 Fetching news articles from 2021-02-09 to 2021-03-11

📅 Fetching news articles from 2021-03-12 to 2021-04-11

📅 Fetching news articles from 2021-04-12 to 2021-05-12

📅 Fetching news articles from 2021-05-13 to 2021-06-12

📅 Fetching news articles from 2021-06-13 to 2021-07-13

📅 Fetching news articles from 2021-07-14 to 2021-08-13

📅 Fetching news articles from 2021-08-14 to 2021-09-13

📅 Fetching news articles from 2021-09-14 to 2021-10-14

📅 Fetching news articles from 2021-10-15 to 202

In [22]:
sentiment_pipeline = pipeline("sentiment-analysis", model="ProsusAI/finbert")

Device set to use cpu


In [23]:
def analyze_sentiment(text):
    """Analyze sentiment using FinBERT, truncating text to 512 characters."""
    if not text.strip():
        return {"positive": "-", "neutral": "-", "negative": "-"}

    result = sentiment_pipeline(text[:512])[0]

    return {
        "positive": round(result["score"] if result["label"] == "positive" else 0, 4),
        "neutral": round(result["score"] if result["label"] == "neutral" else 0, 4),
        "negative": round(result["score"] if result["label"] == "negative" else 0, 4)
    }


def compute_sentiment_score(row):
    pos = row["positive_sentiment_news"]
    neg = row["negative_sentiment_news"]
    neu = row["neutral_sentiment_news"]

    if pos == "-" and neg == "-" and neu == "-":
        return "-"

    pos = 0 if pos == "-" else pos
    neg = 0 if neg == "-" else neg
    neu = 0 if neu == "-" else neu

    return round((pos - neg) / (1 - neu) if (1 - neu) > 0 else 0, 4)

if not articles_df.empty:
    print("\n🔍 Performing Sentiment Analysis...")

    articles_df["sentiment"] = articles_df.apply(
        lambda x: analyze_sentiment((x["title"] or "") + " " + (x["description"] or "")), axis=1
    )

    articles_df["negative_sentiment_news"] = articles_df["sentiment"].apply(lambda x: x["negative"])
    articles_df["neutral_sentiment_news"] = articles_df["sentiment"].apply(lambda x: x["neutral"])
    articles_df["positive_sentiment_news"] = articles_df["sentiment"].apply(lambda x: x["positive"])

    articles_df["news_sentiment_score"] = articles_df.apply(compute_sentiment_score, axis=1)

    articles_df.drop(columns=["sentiment"], inplace=True)

    daily_sentiment = articles_df.groupby("date").agg(
        article_count=("title", "count"),  # Count number of articles per day
        negative_sentiment_news=("negative_sentiment_news", "mean"),
        neutral_sentiment_news=("neutral_sentiment_news", "mean"),
        positive_sentiment_news=("positive_sentiment_news", "mean"),
        news_sentiment_score=("news_sentiment_score", "mean")
    ).reset_index()

    daily_sentiment["news_sentiment_score"] = ((daily_sentiment["news_sentiment_score"] + 1) / 2) * 100

    daily_sentiment.fillna("-", inplace=True)

    daily_sentiment["ticker"] = selected_ticker

    print("✅ Full News Sentiment Data:")
    print(articles_df.head())

    print("\n📊 Daily Sentiment Breakdown:")
    print(daily_sentiment.head())

    export_path = f"/content/drive/My Drive/StockDashboard_Automation/ExportToGitHub/news_sentiment.csv"
    daily_sentiment.to_csv(export_path, index=False)

    print(f"✅ Exported Daily Sentiment to {export_path}")

else:
    print("⚠ No news articles found.")



🔍 Performing Sentiment Analysis...
✅ Full News Sentiment Data:
         date                                              title  \
0  2020-05-19  Dow, S&P Jump Over 3% on Encouraging Vaccine Data   
1  2020-08-22  S&P, NASDAQ Reach New Closing Highs to End a P...   
2  2020-08-20    Record-Setting Pace Cools Off after Fed Minutes   
3  2020-08-19   S&P Finally Breaks Through and Sets a New Record   
4  2020-08-18  NASDAQ Makes It Look Easy, Gains 1% to New Clo...   

                                         description  \
0  Dow, S&P Jump Over 3% on Encouraging Vaccine Data   
1  S&P, NASDAQ Reach New Closing Highs to End a P...   
2    Record-Setting Pace Cools Off after Fed Minutes   
3   S&P Finally Breaks Through and Sets a New Record   
4  NASDAQ Makes It Look Easy, Gains 1% to New Clo...   

                      source  \
0  Zacks Investment Research   
1  Zacks Investment Research   
2  Zacks Investment Research   
3  Zacks Investment Research   
4  Zacks Investment Research  

In [24]:
print(daily_sentiment['article_count'].sum())

17374


In [25]:
daily_sentiment = daily_sentiment.round(2)

In [26]:
export_path = f"/content/drive/My Drive/StockDashboard_Automation/ExportToGitHub/news_sentiment.csv"
daily_sentiment.to_csv(export_path, index=False)

print(f"✅ Exported Daily Sentiment to {export_path}")

✅ Exported Daily Sentiment to /content/drive/My Drive/StockDashboard_Automation/ExportToGitHub/news_sentiment.csv
