This is a notebook to perform TABSA (Targeted Aspect Based Sentiment Analysis) on News-Data (to know how to get news data, please refer to the **raw_news_sentiment.ipynb** notebook).

This is using PyAbsa library, Please keep in mind that the PyAbsa library is prone to many dependency issues, I have tried to cover the necessary installations in this jupyter notebook, however depending on your installation, you may need to resolve dependency and version errors.

Also you need to have a list of terms that you are actively looking for in the data for it to work properly.

[PyAbsa Github](https://github.com/yangheng95/PyABSA)


In [None]:
import yfinance as yf
import pandas as pd
import numpy as np
import news_signals
import matplotlib.pyplot as plt 

Uses yfinance library to get financial data, then classifies the trend of the market trend movement by checking if the price has moved by a certain percentage over a certain time period.

Currently it is set for TSLA stock (Tesla) for the year 2023, over a 3 day rolling window of 3% threshold, i.e it gets financial data for Tesla stock for the entire year of 2023, it splits it into 3 day rolling windows, and gives classification of *+1 for upward trend, 0 for neutral (within threshold), -1 for a downward trend*, so if in 3 days, the stock has moved up or down more than 3%, it gets a +1 or -1 classification respectively, otherwise it gets a neutral 0.


Modify **ticker** to whichever stock you wish to get data for (e.g TSLA,AAPL,JPM). **start_date** and **end_date** to specify the time period of the overall financial data, **window_size** for the rolling window days and **percent_change** to change the decimal percentage value.

In [None]:
ticker = "TSLA"            # Change ticker if needed
start_date = "2023-01-01"    # Start date for historical data
end_date = "2023-12-31"      # End date for historical data
window_size = 3            # 3-day rolling window

# Download daily stock data
data = yf.download(ticker, start=start_date, end=end_date)
data.index = pd.to_datetime(data.index)

def classify_window(window):
    """
        +1 if cumulative return > %change and > volatility  (upward trend)
        -1 if cumulative return < -%change and < -volatility (downward trend)
         0 otherwise (neutral)
    """
    first_open = float(window['Open'].iloc[0])
    last_close = float(window['Close'].iloc[-1])
    cumulative_return = (last_close - first_open) / first_open
    daily_returns = (window['Close'] - window['Open']) / window['Open']
    volatility = float(daily_returns.std())
    
    if cumulative_return > 0.03 and cumulative_return > volatility:
        return 1
    elif cumulative_return < -0.03 and cumulative_return < -volatility:
        return -1
    else:
        return 0


# Apply a rolling window to classify the trend for each period
trend_results = []
dates = []
for i in range(window_size - 1, len(data)):
    window = data.iloc[i - window_size + 1 : i + 1]
    trend = classify_window(window)
    trend_results.append(trend)
    dates.append(data.index[i])

# Create a DataFrame with the trend classifications (using the last day of each window as the index)
rolling_trend_df = pd.DataFrame({'Trend': trend_results}, index=dates)
print(rolling_trend_df)

Can run this cell to check distribution of financial trends

In [None]:
class_distribution = rolling_trend_df['Trend'].value_counts(normalize=True) * 100
print("Class Distribution (Percentage):")
print(class_distribution)

# Plot the class distribution as percentages
class_distribution.plot(kind='bar', color=['red', 'blue', 'green'])
plt.title('Class Distribution of Trends (Percentage)')
plt.xlabel('Trend')
plt.ylabel('Percentage')
plt.xticks(rotation=0)
plt.show()

Few necessary imports, however dependent on setup to resolve dependencies

In [None]:
!pip install pyabsa

In [None]:
!pip install tf-keras

In [None]:
!pip install transformers==4.29.2

Here we perform TABSA on our dataset.

**target_aspects** : Is a list of terms towards who we are checking for sentiments, feel free to modify for your use-case

Sentiment mapping:

**Positive -> 1**

**Neutral -> 0**

**Negative -> -1**

We then attach the sentiments of each term to our financial trend data.

In [None]:
import os
import pandas as pd
import yfinance as yf
from pyabsa import APCCheckpointManager


# =============================
# Part 1: Load ABSA Model for Analysis
# =============================

# Load the Aspect Polarity Classification model
apc_model = APCCheckpointManager.get_sentiment_classifier(
    checkpoint="English",
    dataset="None"
    )


def extract_aspect_sentiments(text):
    """
    Extract sentiment scores for Tesla-related aspects.
    Sentiment mapping:
    - Positive -> 1
    - Neutral -> 0
    - Negative -> -1
    """
    target_aspects = [
        "production", "delivery", "earnings", "innovation",
        "autopilot", "leadership", "supply chain", "sustainability"
    ]
    aspect_scores = {aspect: 0.0 for aspect in target_aspects}
    counts = {aspect: 0 for aspect in target_aspects}

    # Ensure input is a valid string
    if not isinstance(text, str) or not text.strip():
        return aspect_scores  # Return default zero scores if input is invalid

    try:
        # Run inference
        results = apc_model.predict(text, print_result=False)

        #  Debugging: Print results to inspect the structure
        print(f"DEBUG: PyABSA Output -> {results}")

        # Ensure results is in list format
        if isinstance(results, dict):  
            results = [results]  

        if not isinstance(results, list):
            print(f"Skipping unexpected format: {results}")
            return aspect_scores

        for result in results:
            if isinstance(result, dict):
                aspect_texts = result.get("aspect", [])
                sentiments = result.get("sentiment", [])

                # Ensure both are lists
                if not isinstance(aspect_texts, list):
                    aspect_texts = [aspect_texts]
                if not isinstance(sentiments, list):
                    sentiments = [sentiments]

                for aspect_text, sentiment in zip(aspect_texts, sentiments):
                    aspect_text = aspect_text.lower()
                    sentiment = sentiment.lower()

                    score = {"positive": 1, "negative": -1, "neutral": 0}.get(sentiment, 0)

                    for target in target_aspects:
                        if target in aspect_text:
                            aspect_scores[target] += score
                            counts[target] += 1
            else:
                print(f"Skipping unexpected format: {result}")
                continue  

        # Average scores for aspects mentioned multiple times
        for target in target_aspects:
            if counts[target] > 0:
                aspect_scores[target] /= counts[target]

    except RuntimeError as e:
        print(f"RuntimeError in PyABSA: {e}")
        return aspect_scores

    return aspect_scores


# =============================
# Part 2: Process News Data and Extract Sentiments
# =============================

news_df = pd.read_csv("entity_news_processed_azure_reduced.csv")
news_df["published_at"] = pd.to_datetime(news_df["published_at"]).dt.tz_convert(None)
news_df["Processed_Article"] = news_df["Processed_Article"].fillna("").astype(str)

news_df["Aspect_Sentiments"] = news_df["Processed_Article"].apply(extract_aspect_sentiments)
aspect_scores_df = pd.json_normalize(news_df["Aspect_Sentiments"])
news_df = pd.concat([news_df, aspect_scores_df], axis=1)


rolling_trend_df_reset = rolling_trend_df.reset_index().rename(columns={'index': 'Date'})
rolling_trend_df_reset["Date"] = pd.to_datetime(rolling_trend_df_reset["Date"])
rolling_trend_df_reset = rolling_trend_df_reset.sort_values("Date")


# =============================
# Part 4: Attach Sentiments to Financial Data
# =============================

FINANCE_START_DATE = pd.to_datetime(start_date)
ATTACHED_ASPECT_FEATURES = []
ASPECTS = [
    "production", "delivery", "earnings", "innovation",
    "autopilot", "leadership", "supply chain", "sustainability"
]

prev_date = FINANCE_START_DATE

for current_date in rolling_trend_df_reset["Date"]:
    mask = (news_df["published_at"] >= prev_date) & (news_df["published_at"] < current_date)
    window_news = news_df[mask]

    if not window_news.empty:
        avg_scores = window_news[ASPECTS].mean().to_dict()
    else:
        avg_scores = {aspect: 0.0 for aspect in ASPECTS}

    ATTACHED_ASPECT_FEATURES.append(avg_scores)
    prev_date = current_date

aspect_features_df = pd.DataFrame(ATTACHED_ASPECT_FEATURES)
final_financial_df = pd.concat([rolling_trend_df_reset.reset_index(drop=True), aspect_features_df], axis=1)


# =============================
# Part 5: Save Final Data
# =============================

OUTPUT_FILENAME = "financial_data_with_aspect_sentiments.csv"
final_financial_df.to_csv(OUTPUT_FILENAME, index=False)

print(f"Saved updated financial data with aspect sentiment features to {OUTPUT_FILENAME}")


