In [1]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch
import pandas as pd
from datasets import Dataset

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
print("PyTorch Version:", torch.__version__)
print("CUDA Version:", torch.version.cuda)
print("CUDA Available:", torch.cuda.is_available())

PyTorch Version: 2.3.1
CUDA Version: 12.1
CUDA Available: True


In [3]:
# Initialize model
model_name = "ProsusAI/finbert"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

In [4]:
# Single article to test
article_text = """When deciding whether to buy, sell, or hold a stock, investors often rely on analyst recommendations. Media reports about rating changes by these brokerage-firm-employed (or sell-side) analysts often influence a stock's price, but are they really important? 
Lets take a look at what these Wall Street heavyweights have to say about Apple (AAPL) before we discuss the reliability of brokerage recommendations and how to use them to your advantage.
Apple currently has an average brokerage recommendation (ABR) of 1.71, on a scale of 1 to 5 (Strong Buy to Strong Sell), calculated based on the actual recommendations (Buy, Hold, Sell, etc.) made by 29 brokerage firms. An ABR of 1.71 approximates between Strong Buy and Buy.
Of the 29 recommendations that derive the current ABR, 17 are Strong Buy and three are Buy. Strong Buy and Buy respectively account for 58.6% and 10.3% of all recommendations.
Brokerage Recommendation Trends for AAPL


Check price target & stock forecast for Apple here>>>

The ABR suggests buying Apple, but making an investment decision solely on the basis of this information might not be a good idea. According to several studies, brokerage recommendations have little to no success guiding investors to choose stocks with the most potential for price appreciation.
Are you wondering why? The vested interest of brokerage firms in a stock they cover often results in a strong positive bias of their analysts in rating it. Our research shows that for every "Strong Sell" recommendation, brokerage firms assign five "Strong Buy" recommendations.
This means that the interests of these institutions are not always aligned with those of retail investors, giving little insight into the direction of a stock's future price movement. It would therefore be best to use this information to validate your own analysis or a tool that has proven to be highly effective at predicting stock price movements.
With an impressive externally audited track record, our proprietary stock rating tool, the Zacks Rank, which classifies stocks into five groups, ranging from Zacks Rank #1 (Strong Buy) to Zacks Rank #5 (Strong Sell), is a reliable indicator of a stock's near -term price performance. So, validating the Zacks Rank with ABR could go a long way in making a profitable investment decision.
ABR Should Not Be Confused With Zacks Rank
Although both Zacks Rank and ABR are displayed in a range of 1-5, they are different measures altogether.
The ABR is calculated solely based on brokerage recommendations and is typically displayed with decimals (example: 1.28). In contrast, the Zacks Rank is a quantitative model allowing investors to harness the power of earnings estimate revisions. It is displayed in whole numbers -- 1 to 5.
Analysts employed by brokerage firms have been and continue to be overly optimistic with their recommendations. Since the ratings issued by these analysts are more favorable than their research would support because of the vested interest of their employers, they mislead investors far more often than they guide.
On the other hand, earnings estimate revisions are at the core of the Zacks Rank. And empirical research shows a strong correlation between trends in earnings estimate revisions and near-term stock price movements.
In addition, the different Zacks Rank grades are applied proportionately to all stocks for which brokerage analysts provide current-year earnings estimates. In other words, this tool always maintains a balance among its five ranks.
There is also a key difference between the ABR and Zacks Rank when it comes to freshness. When you look at the ABR, it may not be up-to-date. Nonetheless, since brokerage analysts constantly revise their earnings estimates to reflect changing business trends, and their actions get reflected in the Zacks Rank quickly enough, it is always timely in predicting future stock prices.
Is AAPL a Good Investment?
In terms of earnings estimate revisions for Apple, the Zacks Consensus Estimate for the current year has remained unchanged over the past month at $6.56.
Analysts' steady views regarding the company's earnings prospects, as indicated by an unchanged consensus estimate, could be a legitimate reason for the stock to perform in line with the broader market in the near term.
The size of the recent change in the consensus estimate, along with three other factors related to earnings estimates, has resulted in a Zacks Rank #3 (Hold) for Apple. You can see the complete list of today's Zacks Rank #1 (Strong Buy) stocks here >>>>
It may therefore be prudent to be a little cautious with the Buy-equivalent ABR for Apple.
Zacks Reveals ChatGPT "Sleeper" Stock
One little-known company is at the heart of an especially brilliant Artificial Intelligence sector. By 2030, the AI industry is predicted to have an internet and iPhone-scale economic impact of $15.7 Trillion.
As a service to readers, Zacks is providing a bonus report that names and explains this explosive growth stock and 4 other "must buys." Plus more.
Download Free ChatGPT Stock Report Right Now >>
Want the latest recommendations from Zacks Investment Research? Today, you can download 7 Best Stocks for the Next 30 Days. Click to get this free report
Apple Inc. (AAPL) : Free Stock Analysis Report
To read this article on Zacks.com click here.
Zacks Investment Research
The views and opinions expressed herein are the views and opinions of the author and do not necessarily reflect those of Nasdaq, Inc."""

In [5]:
# Define input for single article
inputs = tokenizer(
    article_text, 
    return_tensors='pt', 
    truncation=True, 
    max_length=512, 
    padding='max_length'
)

In [6]:
# Run classification
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    probabilities = torch.softmax(logits, dim=1).squeeze()

In [7]:
# Map labels and pick one with highest probability
label_mapping = {0: "negative", 1: "neutral", 2: "positive"}
predicted_label = label_mapping[probabilities.argmax().item()]
print("Predicted sentiment:", predicted_label)

Predicted sentiment: positive


In [8]:
# Check FinBERT output probabilities
probabilities

tensor([0.0751, 0.0173, 0.9076])

In [9]:
# load data
news_df = pd.read_csv('sp100_news_2018_2023.csv')

  news_df = pd.read_csv('sp100_news_2018_2023.csv')


In [10]:
# reduce df to only necessary columns
news_df = news_df[['Date', 'Article_title', 'Stock_symbol', 'Article']]

In [11]:
# check
news_df.head()

Unnamed: 0,Date,Article_title,Stock_symbol,Article
0,2023-12-16 22:00:00,My 6 Largest Portfolio Holdings Heading Into 2...,AAPL,"After an absolute disaster of a year in 2022, ..."
1,2023-12-16 22:00:00,Brokers Suggest Investing in Apple (AAPL): Rea...,AAPL,"When deciding whether to buy, sell, or hold a ..."
2,2023-12-16 21:00:00,"Company News for Dec 19, 2023",AAPL,Shares of Apple Inc. AAPL lost 0.9% on China’s...
3,2023-12-16 21:00:00,NVIDIA (NVDA) Up 243% YTD: Will It Carry Momen...,AAPL,NVIDIA Corporation NVDA has witnessed a remark...
4,2023-12-16 21:00:00,"Pre-Market Most Active for Dec 19, 2023 : BMY,...",AAPL,The NASDAQ 100 Pre-Market Indicator is up 10.1...


In [12]:
# confirm shape
news_df.shape

(332788, 4)

In [None]:
# Create a sentiment pipeline on GPU (device=0)
sentiment_pipeline = pipeline(
    "sentiment-analysis",
    model=model,
    tokenizer=tokenizer,
    device=0,         # Use GPU
    batch_size=64,
    truncation=True     
)

In [15]:
# Ensure the 'Article' column is clean and valid
news_df["Article"] = news_df["Article"].fillna("").astype(str)

# Convert to Hugging Face Dataset
hf_dataset = Dataset.from_pandas(news_df)

In [20]:
# Function to process batches
def analyze_batch(batch):
    # Ensure all inputs are strings and truncate to 512 tokens
    texts = [str(text) for text in batch["Article"]]
    results = sentiment_pipeline(texts)
    batch["Sentiment_label"] = [res["label"] for res in results]
    batch["Sentiment_score"] = [res["score"] for res in results]
    return batch

# Apply the function to the dataset in batches
processed_dataset = hf_dataset.map(analyze_batch, batched=True, batch_size=64)

# Convert back to pandas DataFrame
processed_df = processed_dataset.to_pandas()

# Save the enriched DataFrame
processed_df.to_csv("news_with_sentiment.csv", index=False)

  attn_output = torch.nn.functional.scaled_dot_product_attention(
Map:   0%|          | 640/332788 [00:03<27:01, 204.88 examples/s]You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
                                                                     

In [21]:
check_sent_df = pd.read_csv("news_with_sentiment.csv")

  check_sent_df = pd.read_csv("news_with_sentiment.csv")


In [22]:
check_sent_df.head(20)

Unnamed: 0,Date,Article_title,Stock_symbol,Article,Sentiment_label,Sentiment_score
0,2023-12-16 22:00:00,My 6 Largest Portfolio Holdings Heading Into 2...,AAPL,"After an absolute disaster of a year in 2022, ...",neutral,0.822993
1,2023-12-16 22:00:00,Brokers Suggest Investing in Apple (AAPL): Rea...,AAPL,"When deciding whether to buy, sell, or hold a ...",neutral,0.902589
2,2023-12-16 21:00:00,"Company News for Dec 19, 2023",AAPL,Shares of Apple Inc. AAPL lost 0.9% on China’s...,neutral,0.5505
3,2023-12-16 21:00:00,NVIDIA (NVDA) Up 243% YTD: Will It Carry Momen...,AAPL,NVIDIA Corporation NVDA has witnessed a remark...,positive,0.889521
4,2023-12-16 21:00:00,"Pre-Market Most Active for Dec 19, 2023 : BMY,...",AAPL,The NASDAQ 100 Pre-Market Indicator is up 10.1...,neutral,0.875821
5,2023-12-16 20:00:00,3 Artificial Intelligence (AI) Stocks for 2024...,AAPL,What was the top financial story of 2023? It h...,positive,0.944678
6,2023-12-16 20:00:00,AAPL Quantitative Stock Analysis,AAPL,Below is Validea's guru fundamental report for...,neutral,0.874864
7,2023-12-16 18:00:00,Should Vanguard S&P 500 ETF (VOO) Be on Your I...,AAPL,If you're interested in broad exposure to the ...,neutral,0.765141
8,2023-12-16 18:00:00,Is FlexShares Quality Dividend ETF (QDF) a Str...,AAPL,"Launched on 12/14/2012, the FlexShares Quality...",neutral,0.82034
9,2023-12-16 18:00:00,Is FlexShares STOXX US ESG Select Index Fund (...,AAPL,"Making its debut on 07/13/2016, smart beta exc...",neutral,0.855043


In [24]:
check_sent_df['Sentiment_label'].describe()

count      332788
unique          3
top       neutral
freq       178912
Name: Sentiment_label, dtype: object