<a href="https://colab.research.google.com/github/TEBIAN/Text-Trend-Forecasting-using-BERT/blob/main/Text_Trend_Forecasting_using_BERT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📈 Text Trend Forecasting from Social Media Sentiment

### 🧠 Project Overview

This project uses social media data (Twitter) to perform sentiment analysis and forecast emerging trends in public opinion over time. By combining BERT-based sentiment analysis with transformer-based time series forecasting, it helps predict how people feel about key topics, such as legal reforms, AI policy, or social issues.

### 🔧 Key Technologies


*   Python
*   Tweepy (Twitter API access)
*   Transformers (Hugging Face BERT)
*   Darts (Time series forecasting)
*   Pandas, Matplotlib, Numpy


### **🧪 Pipeline Steps**

### 1. Data Collection

*   Collect tweets using Twitter’s API (tweepy)
*   Filter by hashtag or keyword (e.g., #AI, contract law)
*   Store created_at and text fields





In [None]:
# Install required libraries
!pip install tweepy

In [2]:
# Import libraries
import tweepy
import pandas as pd

In [3]:
# Set Twitter API Bearer Token
bearer_token = "AAAAAAAAAAAAAAAAAAAAAEYK1wEAAAAAdqAy%2FghAOKhO8zhiQ1s26LHIb5g%3DcfgbU2CR2oCaxbiCHwWYK6W0F3I38GWD0uAYIjgsLi644ZVvDF"

#Initialize Twitter API client
client = tweepy.Client(bearer_token=bearer_token)

#Define query and fetch tweets
query = "#AI"
response = client.search_recent_tweets(query=query, max_results=100, tweet_fields=["created_at", "text"])

#Process results
tweets = []
for tweet in response.data:
    tweets.append([tweet.created_at, tweet.text])
df = pd.DataFrame(tweets, columns=["created_at", "text"])


In [7]:
#display and download dataset
print(df.head())
df.to_csv("tweets 16 may")

                 created_at                                               text
0 2025-05-16 07:04:49+00:00  📌Despite the rule’s withdrawal, the US Commerc...
1 2025-05-16 07:04:47+00:00  🎯The #US has scrapped a #Biden-era export cont...
2 2025-05-16 07:04:46+00:00  RT @silencemokutora: 斑点的肉球啊啊啊啊啊\n#Furry #Ai #A...
3 2025-05-16 07:04:46+00:00  @0xBclub @bcgame ⚙️ Builders, this is your mom...
4 2025-05-16 07:04:45+00:00  RT @hantutama: ほうれん草か菜の花の胡麻和え\n\nそれにしてもなんでAIの画...


### 2. Sentiment Analysis (BERT)

*   Use a pretrained BERT sentiment model (e.g., nlptown/bert-base-multilingual-uncased-sentiment)
*   Assign a score from 1 to 5 based on tweet tone



In [10]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

def extract_score(result):
    label = result['label']  # Example: '4 stars'
    return int(label.split()[0])

df['sentiment_score'] = df['text'].apply(lambda x: extract_score(classifier(x[:512])[0]))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Device set to use cpu


In [11]:
df.head()

Unnamed: 0,created_at,text,sentiment_score
0,2025-05-16 07:04:49+00:00,"📌Despite the rule’s withdrawal, the US Commerc...",1
1,2025-05-16 07:04:47+00:00,🎯The #US has scrapped a #Biden-era export cont...,1
2,2025-05-16 07:04:46+00:00,RT @silencemokutora: 斑点的肉球啊啊啊啊啊\n#Furry #Ai #A...,1
3,2025-05-16 07:04:46+00:00,"@0xBclub @bcgame ⚙️ Builders, this is your mom...",5
4,2025-05-16 07:04:45+00:00,RT @hantutama: ほうれん草か菜の花の胡麻和え\n\nそれにしてもなんでAIの画...,2


### 3. Time Aggregation

*   Average sentiment per hour
*   Create a time series of sentiment over time

In [32]:
# extract time from date
df['mins'] = df['created_at'].apply(lambda x: x.minute)
df['seconds'] = df['created_at'].apply(lambda x: x.second)

In [35]:
df.head()

Unnamed: 0,created_at,text,sentiment_score,mins,seconds
0,2025-05-16 07:04:49+00:00,"📌Despite the rule’s withdrawal, the US Commerc...",1,4,49
1,2025-05-16 07:04:47+00:00,🎯The #US has scrapped a #Biden-era export cont...,1,4,47
2,2025-05-16 07:04:46+00:00,RT @silencemokutora: 斑点的肉球啊啊啊啊啊\n#Furry #Ai #A...,1,4,46
3,2025-05-16 07:04:46+00:00,"@0xBclub @bcgame ⚙️ Builders, this is your mom...",5,4,46
4,2025-05-16 07:04:45+00:00,RT @hantutama: ほうれん草か菜の花の胡麻和え\n\nそれにしてもなんでAIの画...,2,4,45


In [36]:
df_daily = df.groupby('mins')['sentiment_score'].mean().reset_index()
df_daily.columns = ['mins', 'avg_sentiment']
df_daily.head()


Unnamed: 0,mins,avg_sentiment
0,1,3.636364
1,2,3.285714
2,3,3.441176
3,4,3.15


In [37]:
import numpy as np

def create_sequences(data, window_size):
    X, y = [], []
    for i in range(len(data) - window_size):
        X.append(data[i:i+window_size])
        y.append(data[i+window_size])
    return np.array(X), np.array(y)

window = 7
data = df_daily['avg_sentiment'].values
X, y = create_sequences(data, window)


### 4. Forecasting

*   Use a transformer-based time series model (via darts)
*   Train on historical sentiment to forecast future values

In [38]:
!pip install darts

from darts import TimeSeries
from darts.models import TransformerModel
from darts.utils.timeseries_generation import datetime_attribute_timeseries

from darts.dataprocessing.transformers import ScalerWrapper

series = TimeSeries.from_dataframe(df_daily, time_col="mins", value_cols="avg_sentiment")
scaler = ScalerWrapper()
series_scaled = scaler.fit_transform(series)

model = TransformerModel(
    input_chunk_length=window,
    output_chunk_length=1,
    n_epochs=100,
    model_name="bert-sentiment-transformer",
    random_state=42,
)

model.fit(series_scaled)
forecast = model.predict(n=1)
forecast = scaler.inverse_transform(forecast)

print(f"Forecasted sentiment: {forecast.values()[0][0]:.2f}")



ImportError: cannot import name 'ScalerWrapper' from 'darts.dataprocessing.transformers' (/usr/local/lib/python3.11/dist-packages/darts/dataprocessing/transformers/__init__.py)

### 5. Visualization

*   Plot real vs. forecasted sentiment
*   Highlight trend changes or anomalies

In [None]:
series_scaled.plot(label="Historical")
forecast.plot(label="Forecast")