**Sentiment  Analysis and Stock Market Prediction**

This project analyzes sentiment of news articles and social media posts related to specific companies and use this sentiment data to predict their stock prices.

**Importing and Installing Libraries**

In [1]:
#importing necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer


Installing the Plotly library for interactive visualization.

In [2]:
#download plotly for more interactive graph model

!pip install plotly

#libaries

import plotly.graph_objs as go
from plotly.subplots import make_subplots





**Data Collection**

*   Stock Data: Fetches historical stock price data for HCL Technologies from Yahoo Finance, covering the period from January 1, 2020, to July 18, 2024.



In [3]:
#Download Apple Stock Price Data

stock_data = yf.download('HCLTECH.NS', start='2020-01-01', end='2024-07-18')


[*********************100%%**********************]  1 of 1 completed


**HCLTECH Data from News Articles & Tweets for Sentiment Analysis**



Text Data: Contains example text data, such as news articles or tweets related to the stock.

In [4]:
#Ex text data (news articles, tweets)

text_data = ["HCLTech Recognized as HPE Hybrid Cloud Partner of the Year 2024",
             "HCLTech and HPE have been at the forefront of co-creating and driving innovative solutions for decades",
             "HCLTech and HPE have also forged a strong relationship, emphasizing innovation and collaboration, to deliver advanced AI-led solutions that meet evolving business needs."]

**`Text Preprocessing`**

*   NLTK Downloads: Downloads NLTK resources needed for text processing.
*   Text Preprocessing: Cleans the text data by tokenizing, removing stopwords, and lemmatizing the words.



In [5]:
#Preprocess text data
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')

stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def preprocess_text(text):
  words = word_tokenize(text)
  words = [w for w in words if not w in stop_words]
  words = [lemmatizer.lemmatize(w) for w in words]
  return ' '.join(words)

text_data = [preprocess_text(text) for text in text_data]

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...


**Sentiment Analysis**

*   nltk (for sentiment analysis)
*   Sentiment Analysis: Uses NLTK's VADER sentiment analysis tool to compute sentiment scores for the text data. The compound score represents the overall sentiment.



In [6]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

#Sentiment analysis using NLTK's VADER

nltk.download('vader_lexicon')
sid = SentimentIntensityAnalyzer()

def analyze_sentiment(text):
   return sid.polarity_scores(text)['compound']

sentiments =[analyze_sentiment(text) for text in text_data]

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


**Combining Sentiment Scores with Stock Data**

*   Sentiment Integration: Matches the length of sentiment scores with stock data by adding default values and then combines them.



In [7]:
# Ensure the length of sentiment scores matches the number of rows in stock data
# Fill the remaining rows with a default sentiment score, e.g., 0 (neutral)
sentiments += [0] * (len(stock_data) - len(sentiments))

# Combine sentiment scores with stock prices data
stock_data['Sentiment'] = pd.Series(sentiments, index=stock_data.index[:len(sentiments)])

**Feature Engineering**


*   pandas (for data manipulation)
*   Feature Engineering: Adds a 5-day moving average feature to the stock data and removes rows with missing values.









In [8]:

# Generate additional features (e.g., moving averages)
stock_data['MA_5'] = stock_data['Close'].rolling(window=5).mean()

# Drop rows with any NaN values in the selected columns
stock_data = stock_data[['Close', 'Sentiment', 'MA_5']].dropna()


**Stock Price Prediction Model**

*   sklearn (for machine learning)
*   Dataset Preparation: Sets up features (Sentiment, MA_5) and target variable (Close). Splits data into training and testing sets.
*   Model Training: Trains a Linear Regression model on the training data.
*   Prediction and Evaluation: Predicts stock prices on the test data and evaluates the model using Mean Squared Error (MSE).





In [9]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Prepare dataset for modeling
features = ['Sentiment', 'MA_5']
X = stock_data[features]
y = stock_data['Close']

# Verify the length of X and y to ensure they are the same
print(f"Length of X: {len(X)}")
print(f"Length of y: {len(y)}")

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

# Train a Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict stock prices
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')

Length of X: 1120
Length of y: 1120
Mean Squared Error: 540.951703125026


**Data Visualization**

*   plotly (for interactive plots)


In [10]:
# Prepare DataFrame for exporting to Excel
results = pd.DataFrame({
    'Date': stock_data.index[-len(y_test):],
    'Actual Price': y_test,
    'Predicted Price': y_pred
})

# Save the results to an Excel file
results.to_excel('stock_price_predictions.xlsx', index=False)
print('Excel file with predictions saved as stock_price_predictions.xlsx')

# Interactive visualization with Plotly
fig = make_subplots(specs=[[{"secondary_y": True}]])

fig.add_trace(
    go.Scatter(x=results['Date'], y=results['Actual Price'], name='Actual Prices'),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=results['Date'], y=results['Predicted Price'], name='Predicted Prices', line=dict(dash='dash')),
    secondary_y=False,
)

fig.update_layout(
    title_text='Actual vs. Predicted Apple Stock Prices'
)

fig.update_xaxes(title_text='Date')
fig.update_yaxes(title_text='Stock Price')

fig.show()

Excel file with predictions saved as stock_price_predictions.xlsx
