we will fine-tune a BERT model to process stock-related news articles and obtain sentiment scores, then feed LSTM with historical stock price data, use embedding layers for stock tickers, and finally concatenate the outputs to make a prediction.

1.Preprocessing News Data for BERT: Use the processed_text field (preprocessed news) and fine-tune a BERT model to classify sentiment (positive/negative) or output a sentiment score.

2.Preprocessing Stock Data for LSTM: Use stock_data (e.g., stock prices, volume) to prepare time-series data for LSTM.

3.Stock Ticker Embedding: Use an embedding layer to convert stock tickers (ticker_name like WCIL.NS) into dense vectors.

4.Combine Models: Concatenate the outputs of the BERT model (news sentiment) and the LSTM model (price movement) along with ticker embeddings and pass through fully connected layers for final prediction.

1. Fine-tuning BERT Model on News Text
We'll first fine-tune a BERT model to process the processed_text field (news articles) and output a sentiment score. For simplicity, we'll use a binary classification of sentiment (positive/negative) based on the finbert_analysis score, which is already provided.

In [None]:
from transformers import BertTokenizer, TFBertForSequenceClassification
import tensorflow as tf

# Load the pre-trained BERT model and tokenizer
model_name = 'yiyanghkust/finbert-tone'  # Fine-tuned FinBERT model
tokenizer = BertTokenizer.from_pretrained(model_name)
model = TFBertForSequenceClassification.from_pretrained(model_name)

# Tokenizing the processed text (e.g., stock-related news)
text = "western carrier stock zoom securing r crore contract vedanta four yearlong agreement..."

inputs = tokenizer(text, return_tensors="tf", truncation=True, padding=True, max_length=512)

# Perform inference
outputs = model(inputs)
logits = outputs.logits
predicted_class = tf.argmax(logits, axis=-1).numpy()

print(f"Predicted Sentiment Class: {predicted_class}")



2. LSTM Model for Stock Price Prediction
Now, we'll use the stock price data (stock_data like avg_price_5d, volatility_5d, avg_volume_5d, etc.) and feed it into an LSTM to capture temporal dependencies.

2.1 Prepare the Stock Data for LSTM
Assuming that you're working with a window of data, you'll need to organize the historical stock data. Here's an example:

In [2]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Example stock data for the past 5 days (just for demonstration)
stock_data = {
    'avg_price_5d': [114.92, 115.99, 118.50, 120.45, 121.44],
    'volatility_5d': [0.77, 0.80, 0.75, 0.74, 0.77],
    'avg_volume_5d': [263441.75, 250000.45, 245000.50, 255000.33, 2463672.0],
    'open_price_news_day': [115.99, 116.20, 117.00, 118.10, 120.00],
    'close_price_news_day': [121.44, 119.50, 118.80, 121.00, 122.50]
}

# Convert into pandas DataFrame
df = pd.DataFrame(stock_data)

# Normalize the stock data using MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(df.values)

# Reshape the data into sequences for LSTM input
lookback_window = 5  # Use the last 5 days of data to predict

X = []
y = []

for i in range(len(scaled_data) - lookback_window):
    X.append(scaled_data[i:i+lookback_window])
    y.append(scaled_data[i+lookback_window][3])  # Use close price as target for prediction

X = np.array(X)
y = np.array(y)

# Split data into training and testing sets
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]


2.2 Define the LSTM Model

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

# Define the LSTM model for stock price prediction
lstm_model = Sequential()
lstm_model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])))
lstm_model.add(Dropout(0.2))
lstm_model.add(LSTM(units=50, return_sequences=False))
lstm_model.add(Dropout(0.2))
lstm_model.add(Dense(units=1))

lstm_model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
lstm_model.fit(X_train, y_train, epochs=10, batch_size=32)


3. Embedding Layers for Stock Tickers
Next, use an embedding layer to process stock tickers (WCIL.NS). This transforms the categorical stock ticker into a continuous embedding vector.

3.1 Define the Embedding Layer for Stock Ticker

In [None]:
from tensorflow.keras.layers import Embedding, Input, Flatten

# Example: Use stock tickers as an input and create embeddings for them
ticker_input = Input(shape=(1,), dtype=tf.int32, name="ticker")
embedding = Embedding(input_dim=9000, output_dim=50)(ticker_input)  # 9,000 tickers and embedding dimension of 50
ticker_embedding = Flatten()(embedding)



4. Final Model: Combining BERT, LSTM, and Ticker Embeddings
Finally, concatenate the outputs from BERT (news sentiment), LSTM (stock price movement), and ticker embeddings, then pass them through a fully connected layer for final prediction.

In [None]:
from tensorflow.keras.layers import concatenate

# Define the inputs for BERT, LSTM, and stock ticker embedding
bert_input = Input(shape=(512,), dtype=tf.int32, name="bert_input")  # For BERT tokenized input
lstm_input = Input(shape=(X_train.shape[1], X_train.shape[2]), name="lstm_input")  # For LSTM time-series data
ticker_input = Input(shape=(1,), dtype=tf.int32, name="ticker_input")  # For Stock Ticker embedding

# BERT Model for sentiment (already fine-tuned)
bert_model = TFBertForSequenceClassification.from_pretrained(model_name)
bert_output = bert_model(bert_input).logits
bert_output = tf.keras.layers.Dense(1, activation="sigmoid")(bert_output)

# LSTM Model for stock price prediction (already defined)
lstm_output = lstm_model(lstm_input)

# Ticker Embedding Layer (already defined)
ticker_embedding_output = ticker_embedding

# Concatenate the outputs of all models
merged = concatenate([bert_output, lstm_output, ticker_embedding_output])

# Fully connected layers for final prediction
fc = Dense(128, activation="relu")(merged)
fc = Dropout(0.5)(fc)
final_output = Dense(1, activation="sigmoid")(fc)  # Binary classification (up/down)

# Final Model
final_model = tf.keras.Model(inputs=[bert_input, lstm_input, ticker_input], outputs=final_output)

# Compile the model
final_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Model summary
final_model.summary()


5. Train the Final Model
Now, you can train the final model with the combined data:

In [None]:
# Assume X_bert_input, X_lstm_input, and X_ticker_input are your inputs
final_model.fit([X_bert_input, X_lstm_input, X_ticker_input], y_train, epochs=10, batch_size=32)

Summary of the Model Architecture
BERT: Fine-tuned for sentiment analysis of stock-related news (processed text).
LSTM: Processes historical stock prices to capture temporal relationships.
Stock Ticker Embedding: Efficient representation of stock tickers as embeddings.
Fully Connected Layers: Combine the outputs from BERT, LSTM, and stock ticker embeddings to make the final prediction.
This model is designed to handle both news data (processed text) and time-series stock data, combining them into a comprehensive model for stock market prediction.