<a href="https://colab.research.google.com/github/BassamTar99/StockPrediction/blob/News_Impact_Model/News_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install newsapi-python transformers torch pandas


Collecting newsapi-python
  Downloading newsapi_python-0.2.7-py2.py3-none-any.whl.metadata (1.2 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.

In [2]:
import pandas as pd
import datetime
from newsapi import NewsApiClient
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np

# Define the FinBERTNewsAnalyzer class
class FinBERTNewsAnalyzer:
    def __init__(self, api_key):
        self.newsapi = NewsApiClient(api_key=api_key)
        self.tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")
        self.model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")

    def analyze_text(self, text):
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
        outputs = self.model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
        return {
            'positive': predictions[0][0].item(),
            'negative': predictions[0][1].item(),
            'neutral': predictions[0][2].item()
        }

    def get_stock_news(self, ticker, days=7):
        today = datetime.date.today()
        start_date = today - datetime.timedelta(days=days)

        articles = self.newsapi.get_everything(
            q=ticker,
            from_param=start_date.isoformat(),
            to=today.isoformat(),
            language='en',
            sort_by='relevancy',
            page_size=100
        )

        data = []
        for article in articles['articles']:
            headline = article['title']
            date = article['publishedAt'][:10]
            description = article.get('description', '')
            text = f"{headline}. {description}"
            sentiment_scores = self.analyze_text(text)
            dominant_sentiment = max(sentiment_scores.items(), key=lambda x: x[1])[0]

            data.append({
                'date': date,
                'headline': headline,
                'description': description,
                'url': article['url'],
                'sentiment': dominant_sentiment,
                'confidence': sentiment_scores[dominant_sentiment],
                'positive_score': sentiment_scores['positive'],
                'negative_score': sentiment_scores['negative'],
                'neutral_score': sentiment_scores['neutral']
            })

        if not data:
            print("No news found.")
            return pd.DataFrame(), pd.DataFrame()

        df = pd.DataFrame(data)

        daily = df.groupby('date').agg({
            'positive_score': 'mean',
            'negative_score': 'mean',
            'neutral_score': 'mean',
            'confidence': 'mean'
        }).reset_index()

        def get_daily_sentiment(row):
            scores = {
                'positive': row['positive_score'],
                'negative': row['negative_score'],
                'neutral': row['neutral_score']
            }
            return max(scores.items(), key=lambda x: x[1])[0]

        daily['sentiment'] = daily.apply(get_daily_sentiment, axis=1)
        return daily, df

    def get_sentiment_summary(self, ticker, days=7):
        daily_sentiment, detailed_news = self.get_stock_news(ticker, days)

        if daily_sentiment.empty:
            return "No news found for this period."

        summary = {
            'overall_sentiment': daily_sentiment['sentiment'].mode()[0],
            'average_confidence': daily_sentiment['confidence'].mean(),
            'positive_days': (daily_sentiment['sentiment'] == 'positive').sum(),
            'negative_days': (daily_sentiment['sentiment'] == 'negative').sum(),
            'neutral_days': (daily_sentiment['sentiment'] == 'neutral').sum(),
            'most_recent_news': detailed_news.sort_values('date', ascending=False).head(3)
        }

        return summary


In [3]:
# Initialize the analyzer (with your real API key)
analyzer = FinBERTNewsAnalyzer(api_key='e85a418d537146af9e862d574bd651d9')  # Replace with your actual API key


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/252 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/758 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

In [4]:
# Ask the user for input (stock ticker and number of days for news)
ticker = input("Enter the stock ticker (e.g., AAPL, AMZN): ").strip().upper()
days = int(input("Enter the number of days of news to analyze (e.g., 7): ").strip())


Enter the stock ticker (e.g., AAPL, AMZN): aapl
Enter the number of days of news to analyze (e.g., 7): 1


In [5]:
# Run sentiment analysis for the user-specified stock
daily_sentiment, detailed_news = analyzer.get_stock_news(ticker, days)

# Show daily summary
print("\nüìä Daily Sentiment Summary:")
print(daily_sentiment)

# Show headline-level analysis
print("\nüì∞ Detailed News:")
print(detailed_news[['date', 'headline', 'sentiment', 'confidence']])

summary = analyzer.get_sentiment_summary(ticker, days)

print("\nüß† Sentiment Summary:")
print(f"Overall Sentiment: {summary['overall_sentiment']}")
print(f"Avg. Confidence: {summary['average_confidence']:.2f}")
print(f"Positive Days: {summary['positive_days']}")
print(f"Negative Days: {summary['negative_days']}")
print(f"Neutral Days: {summary['neutral_days']}")
print("\nüóûÔ∏è Most Recent News:")
print(summary['most_recent_news'][['date', 'headline', 'sentiment']])

# Optionally, you can save the results to a CSV file
daily_sentiment.to_csv(f"{ticker}_daily_sentiment.csv", index=False)
detailed_news.to_csv(f"{ticker}_detailed_news.csv", index=False)

print(f"\nResults saved as {ticker}_daily_sentiment.csv and {ticker}_detailed_news.csv.")



üìä Daily Sentiment Summary:
         date  positive_score  negative_score  neutral_score  confidence  \
0  2025-04-24        0.328241        0.311476       0.360283    0.878001   

  sentiment  
0   neutral  

üì∞ Detailed News:
          date                                           headline sentiment  \
0   2025-04-24  Jim Cramer Questions Apple Inc. (AAPL) Tariff ...   neutral   
1   2025-04-24  Analyst Explains Threats to Apple (AAPL), Says...   neutral   
2   2025-04-24                Apple Q2 Earnings: Tariffs In Focus  negative   
3   2025-04-24  Logitech„ÅÆ„ÇØ„É™„Ç®„Ç§„Çø„ÉºÂêë„Åë„Ç´„Çπ„Çø„É†„Ç≠„Éº„Éá„Éê„Ç§„Çπ„ÄåMX Creative Consol...   neutral   
4   2025-04-24  CodeWeavers„ÄÅWin„Çí„Éô„Éº„Çπ„Å´Mac„ÇÑLinux‰∏ä„ÅßWindows„ÅÆx86„Ç¢„Éó„É™„Çí...   neutral   
5   2025-04-24  How Google Provides A Better Short Option Than...  positive   
6   2025-04-24  Analyst Explains Threats to Apple (AAPL), Says...   neutral   
7   2025-04-24  Jim Cramer Questions Apple Inc. (AAPL) Tariff ...  

In [6]:
!pip install joblib




In [7]:
import joblib

# Save the tokenizer and model separately
joblib.dump(analyzer.tokenizer, 'finbert_tokenizer.joblib')
joblib.dump(analyzer.model, 'finbert_model.joblib')

# Optionally, save any other attributes like the API key
api_key = 'e85a418d537146af9e862d574bd651d9'  # Replace with your actual API key
joblib.dump(api_key, 'api_key.joblib')

print("Model components saved successfully.")


Model components saved successfully.
