## Sentiment analysis for stock market trends with LightningChart Python

### 1. Introduction
Stock market sentiment analysis is a powerful tool for predicting market movements by analyzing the emotions and opinions expressed in news articles, social media, and other textual data sources. Utilizing Python for this analysis provides a flexible and efficient approach to harnessing these sentiments to inform trading strategies. This article will explore how to perform stock market sentiment analysis using Python, particularly focusing on utilizing LightningChart for visualization.

#### 1.1 What is Stock Market Sentiment Analysis?
Stock market sentiment analysis involves evaluating public sentiment to predict stock price movements. This process typically uses natural language processing (NLP) and machine learning techniques to analyze text from various sources, such as news articles, social media posts, and financial reports. By quantifying the sentiment expressed, analysts can gauge the market's mood and make informed predictions about stock price trends.

#### 1.2 Importance of Sentiment Analysis in Stock Market Prediction
Sentiment analysis plays a crucial role in stock market prediction because emotions and opinions significantly influence investor behavior. Positive news can drive stock prices up, while negative sentiments can cause them to collapse. By analyzing these sentiments, traders and analysts can gain insights into potential market movements and adjust their strategies accordingly. This method complements traditional financial analysis, providing a more holistic view of the market.

#### 1.3 How Sentiment Analysis Impacts Stock Market Trends
The impact of sentiment analysis on stock market trends is profound. It helps identify the underlying mood of the market, which can be a leading indicator of price movements. For instance, a surge in positive sentiments on social media about a particular stock can signal an upcoming price increase. Conversely, negative news can foreshadow a decline. By integrating sentiment analysis into their strategies, traders can anticipate and react to market changes more effectively.

### 2. LightningChart Python 

#### 2.1 Overview of LightningChart Python
LightningChart is a high-performance data visualization library that provides a wide range of chart types and features, ideal for visualizing complex data sets like those used in stock market sentiment analysis. Its Python version allows developers to create interactive, high-performance visualizations with ease.

#### 2.2 Features and Chart Types to be Used in the Project
LightningChart Python offers a variety of chart types, each designed to handle specific types of data visualization needs. In this project, we use the following chart types to visualize stock price prediction data:

- **XY Chart**: For visualizing data in two dimensions with series types such as Line Series, Point Line Series, and Area Series.
- **Line Chart**: Used for visualizing changes in stock prices over time.
- **Area Chart**: Fills the area beneath a line series, useful for emphasizing volume or cumulative values.
- **Bar Chart**: Used for visualizing categorical data as bars, making it easy to compare different categories side by side.
- **Grouped Bar Chart**: Similar to the bar chart, but groups bars together based on additional categories, facilitating comparison within groups.
- **Pie Chart**: This kind of chart visualizes proportions and percentages between categories by dividing a circle into proportional segments, providing a clear view of category distribution.
- **Box Plot**: This chart type is used for visualizing data groups through quartiles. It is used to visualize the distribution of data based on statistical measures like quartiles, median, and outliers, providing insights into the data spread and variability.
- **Pyramid Chart**: This chart type Visualizes proportions and percentages between categories, by dividing a pyramid into proportional segments.
- **Spider Chart**: Chart for visualizing data in a radial form as dissected by named axes.

![LighteningChart](./images/charts.png)

#### 2.3 Performance Characteristics
LightningChart handling millions of data points with ease and maintaining smooth user interactions. One of the standout aspects of LightningChart Python is its performance. The library is optimized for handling large volumes of data with minimal latency, which is crucial for financial applications where data needs to be processed and visualized in real-time to inform trading decisions.

### 3. Setting Up Python Environment

#### 3.1 Installing Python and Necessary Libraries
Install Python from the [official website](https://www.python.org/downloads/) and use pip to install necessary libraries including LightningChart Python from PyPI. To get the [documentation](https://lightningchart.com/python-charts/docs/) and the [license](https://lightningchart.com/python-charts/), please visit [LightningChart Website](https://lightningchart.com/).

In [None]:
# pip install lightningcharts random numpy pandas nltk

In [1]:
# Importing the libraries and LighteningChart license 
import lightningchart as lc
import random

lc.set_license('my-license-key')

import numpy as np 
import pandas as pd 
import yfinance as yf
from datetime import datetime
import time
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment import SentimentIntensityAnalyzer

!pip install yfinance

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\aomid\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!





[notice] A new release of pip is available: 24.0 -> 24.1
[notice] To update, run: python.exe -m pip install --upgrade pip


#### 3.2 Overview of Libraries Used
- **LightningChart**: Advanced data visualization.
- **NumPy**: Numerical computation.
- **Pandas**: Data manipulation and analysis.
- **NLTK**: Uses for natural language processing tasks.

#### 3.3 Setting Up Your Development Environment
Recommended IDEs include Jupyter Notebook, PyCharm, or Visual Studio Code.

### 4. Loading and Processing Data

#### 4.1 How to Load the Data Files
To perform stock market sentiment analysis, you'll need historical stock price data and sentiment data from social media. For this example, we will use the stock data for Apple from yfinance and a Twitter dataset containing all tweets related to Apple company:

#### Load and Merge Twitter Data:
Use Pandas to load and merge CSV files containing tweets and company tweets related to Apple in 2018:

In [2]:
# import tweets
tweets=pd.read_csv('./Tweet.csv')
company_tweet=pd.read_csv('./Company_Tweet.csv')

tweets=tweets.merge(company_tweet,how='left',on='tweet_id')
# format dates
tweets['date'] = pd.to_datetime(tweets['post_date'], unit='s').dt.date
tweets.date=pd.to_datetime( tweets.date,errors='coerce')
tweets['time'] = pd.to_datetime(tweets['post_date'], unit='s').dt.time

In [3]:
tweets.head()

Unnamed: 0,tweet_id,writer,post_date,body,comment_num,retweet_num,like_num,ticker_symbol,date,time
0,550441509175443456,VisualStockRSRC,1420070457,"lx21 made $10,008 on $AAPL -Check it out! htt...",0,0,1,AAPL,2015-01-01,00:00:57
1,550441672312512512,KeralaGuy77,1420070496,Insanity of today weirdo massive selling. $aap...,0,0,0,AAPL,2015-01-01,00:01:36
2,550441732014223360,DozenStocks,1420070510,S&P100 #Stocks Performance $HD $LOW $SBUX $TGT...,0,0,0,AMZN,2015-01-01,00:01:50
3,550442977802207232,ShowDreamCar,1420070807,$GM $TSLA: Volkswagen Pushes 2014 Record Recal...,0,0,1,TSLA,2015-01-01,00:06:47
4,550443807834402816,i_Know_First,1420071005,Swing Trading: Up To 8.91% Return In 14 Days h...,0,0,1,AAPL,2015-01-01,00:10:05


In [4]:
# Get the unique values in the 'ticker_symbol' column
unique_ticker_symbols = tweets['ticker_symbol'].unique()

# Print the number of unique values
print(f"Number of unique ticker symbols: {len(unique_ticker_symbols)}")

# Print the unique values
print("Unique ticker symbols:")
print(unique_ticker_symbols)

Number of unique ticker symbols: 6
Unique ticker symbols:
['AAPL' 'AMZN' 'TSLA' 'MSFT' 'GOOG' 'GOOGL']


### Visualizing Data with LightningChart using Grouped Bar chart

In [5]:
# Grouped Bar Chart
# Assuming 'tweets' DataFrame is already prepared and loaded with tweet data
tweets['date'] = pd.to_datetime(tweets['date'])
tweets['year'] = tweets['date'].dt.year

# Get the unique ticker symbols
unique_ticker_symbols = tweets['ticker_symbol'].unique()

# Initialize lists to store years and tweet counts for each ticker
years = sorted(tweets['year'].unique())
data = []

# Loop through each unique ticker symbol and get the tweet counts per year
for ticker in unique_ticker_symbols:
    ticker_tweets = tweets[tweets['ticker_symbol'] == ticker]
    tweet_counts = ticker_tweets.groupby('year').size().reindex(years, fill_value=0)
    data.append({'subCategory': ticker, 'values': tweet_counts.values.tolist()})

# Prepare year labels as strings
year_labels = [str(year) for year in years]

# Initialize LightningChart
chart = lc.BarChart(
    vertical=True,
    theme=lc.Themes.Dark,
    title='Number of Tweets per Year for Each Company'
)

# Set data for the chart
chart.set_data_grouped(
    year_labels,
    data
)

# Add a legend to the chart
legend = chart.add_legend()
legend.add(chart)

# Set sorting and display the chart
chart.set_sorting('alphabetical')
chart.open()


127.0.0.1 - - [26/Jun/2024 10:23:18] "GET / HTTP/1.1" 200 -


### Visualizing Data with LightningChart using Pyramid chart

In [6]:
# Pyramid Chart
# Assuming 'tweets' DataFrame is already prepared and loaded with tweet data
tweets['date'] = pd.to_datetime(tweets['date'])
tweets['year'] = tweets['date'].dt.year

# Get the unique ticker symbols
unique_ticker_symbols = tweets['ticker_symbol'].unique()

# Initialize list to store total tweet counts for each ticker
pyramid_data = []

# Loop through each unique ticker symbol and get the total tweet counts
for ticker in unique_ticker_symbols:
    ticker_tweets = tweets[tweets['ticker_symbol'] == ticker]
    total_tweet_count = ticker_tweets.shape[0]
    pyramid_data.append({'name': ticker, 'value': total_tweet_count})


chart = lc.PyramidChart(
    slice_mode='height',
    theme=lc.Themes.Dark,
    title='Total Number of Tweets per Company'
)

# Add slices to the pyramid chart
chart.add_slices(pyramid_data)

# Add a legend to the chart
legend = chart.add_legend()
legend.add(chart).set_title('Companies')

# Open the chart
chart.open()


127.0.0.1 - - [26/Jun/2024 10:23:19] "GET / HTTP/1.1" 200 -


### Visualizing Data with LightningChart using Spider chart

In [7]:
# Spider Chart
# Assuming 'tweets' DataFrame is already prepared and loaded with tweet data
tweets['date'] = pd.to_datetime(tweets['date'])
tweets['year'] = tweets['date'].dt.year

# Get the unique ticker symbols and years
unique_ticker_symbols = tweets['ticker_symbol'].unique()
years = sorted(tweets['year'].unique())

chart = lc.SpiderChart(
    theme=lc.Themes.Dark,
    title='Total Number of Tweets per Company per Year'
)

chart.set_web_mode('circle')

# Add series for each year to the spider chart
for year in years:
    spider_data = []
    for ticker in unique_ticker_symbols:
        ticker_tweets = tweets[(tweets['ticker_symbol'] == ticker) & (tweets['year'] == year)]
        total_tweet_count = ticker_tweets.shape[0]
        spider_data.append({'axis': ticker, 'value': total_tweet_count})
    
    series = chart.add_series()
    series.add_points(spider_data)
    series.set_name(f'Year {year}')

# Add a legend to the chart
legend = chart.add_legend()
legend.add(chart).set_title('Years')

# Open the chart
chart.open()


127.0.0.1 - - [26/Jun/2024 10:23:21] "GET / HTTP/1.1" 200 -


### Visualizing Data with LightningChart using Box Plot

In [8]:
# Box Plot
# Assuming 'tweets' DataFrame is already prepared and loaded with tweet data
tweets['date'] = pd.to_datetime(tweets['date'])
tweets['year'] = tweets['date'].dt.year

# Get the unique ticker symbols
unique_ticker_symbols = tweets['ticker_symbol'].unique()

# Initialize lists to store data for each ticker
box_plot_data = []

# Loop through each unique ticker symbol and calculate box plot statistics
for idx, ticker in enumerate(unique_ticker_symbols):
    ticker_tweets = tweets[tweets['ticker_symbol'] == ticker]
    tweet_counts = ticker_tweets.groupby('year').size().values

    if len(tweet_counts) == 0:
        continue

    median = np.median(tweet_counts).item()
    lower_quartile = np.percentile(tweet_counts, 25).item()
    upper_quartile = np.percentile(tweet_counts, 75).item()
    lower_extreme = np.min(tweet_counts).item()
    upper_extreme = np.max(tweet_counts).item()
    
    box_plot_data.append({
        'start': idx + 0.6,  # Start position for the box on x-axis
        'end': idx + 1.4,  # End position for the box on x-axis
        'median': median,
        'lowerQuartile': lower_quartile,
        'upperQuartile': upper_quartile,
        'lowerExtreme': lower_extreme,
        'upperExtreme': upper_extreme,
        'label': ticker
    })

chart = lc.ChartXY(
    theme=lc.Themes.Dark,
    title='Box Plot of Tweet Counts per Year for Each Company'
)

# Add box series to the chart
series = chart.add_box_series()
series.add_multiple(box_plot_data)

# Set x-axis to display company names using custom ticks
x_axis = chart.get_default_x_axis()
x_axis.set_interval(start=0.5, end=len(unique_ticker_symbols) + 0.5)

for idx, ticker in enumerate(unique_ticker_symbols):
    custom_tick = x_axis.add_custom_tick()
    custom_tick.set_value(idx + 1)

# Add text boxes above each box plot to display the company name
for data in box_plot_data:
    text_box = chart.add_textbox()
    text_box.set_text(data['label'])
    text_box.set_position(
        x=(data['start'] + data['end']) / 2,
        y=data['upperExtreme'] + 5000  # Position above the upper extreme
    )

# Open the chart
chart.open()


127.0.0.1 - - [26/Jun/2024 10:23:23] "GET / HTTP/1.1" 200 -


#### 4.2 Handling and preprocessing the data
After using Pandas to read the Twitter data into DataFrames, you need to preprocess the text data using NLTK. 

In [9]:
sia = SentimentIntensityAnalyzer()

def get_sentiment(tweets,ticker='AAPL',start='2018-01-01',end='2018-12-31'):
    #sbuset
    df=tweets.loc[((tweets.ticker_symbol==ticker)&(tweets.date>=start)&(tweets.date<=end))]
    # applt the SentimentIntensityAnalyzer
    df.loc[:,('score')]=df.loc[:,'body'].apply(lambda x: sia.polarity_scores(x)['compound'])
    # create label
    #bins= pd.interval_range(start=-1, freq=3, end=1)
    df.loc[:,('label')]=pd.cut(np.array(df.loc[:,'score']),bins=[-1, -0.66, 0.32, 1],right=True ,labels=["bad", "neutral", "good"])
    
    df=df.loc[:,["date","score","label","tweet_id","body"]]
    return df

print('apple misses earnings, analyst suggest downgrade , sell now ')
sia.polarity_scores('apple misses earnings, analyst suggest downgrade , sell now ')

apple misses earnings, analyst suggest downgrade , sell now 


{'neg': 0.213, 'neu': 0.787, 'pos': 0.0, 'compound': -0.2263}

In [10]:
# Customizing Sentiment Analyzer with Financial Lexicon
positive_words='high profit Growth Potential Opportunity Bullish Strong Valuable Success Promising Profitable Win Winner Outstanding Record Earnings Breakthrough buy bull long support undervalued underpriced cheap upward rising trend moon rocket hold breakout call beat support buying holding'
negative_words='resistance squeeze cover seller Risk Loss Decline Bearish Weak Declining Uncertain Troubling Downturn Struggle Unstable Volatile Slump Disaster Plunge sell bear bubble bearish short overvalued overbought overpriced expensive downward falling sold sell low put miss'

dictOfpos = { i : 4 for i in positive_words.split(" ") }
dictOfneg = { i : -4 for i in negative_words.split(" ")  }
Financial_Lexicon = {**dictOfpos, **dictOfneg}

sia.lexicon.update(Financial_Lexicon)


print('apple misses earnings, analyst suggest downgrade , sell now ')
sia.polarity_scores('apple misses earnings, analyst suggest downgrade , sell now ')

apple misses earnings, analyst suggest downgrade , sell now 


{'neg': 0.535, 'neu': 0.465, 'pos': 0.0, 'compound': -0.7845}

In [11]:
# Getting tweets
start='2018-01-01'
end='2018-12-31'
ticker='AAPL'
tw=get_sentiment(tweets,ticker,start,end)
tw.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.loc[:,('score')]=df.loc[:,'body'].apply(lambda x: sia.polarity_scores(x)['compound'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.loc[:,('label')]=pd.cut(np.array(df.loc[:,'score']),bins=[-1, -0.66, 0.32, 1],right=True ,labels=["bad", "neutral", "good"])


Unnamed: 0,date,score,label,tweet_id,body
2516892,2018-01-01,0.8807,good,947619846124122113,How is $AAPL @Apple going to get me to buy a #...
2516898,2018-01-01,-0.3612,neutral,947622772800450561,"$IBM settled -0.4% at $153.42, making for a 20..."
2516900,2018-01-01,0.0,neutral,947623169241821184,"$AAPL 2018, with 3 new handsets and possibly a..."
2516909,2018-01-01,0.8561,good,947626570226728960,Start investing @RobinhoodApp and get a stock ...
2516910,2018-01-01,0.4981,good,947626976642248704,JOIN NOW! TALK STOCKS IN OUR GOAL ORIENTED CHA...


### Visualizing Data with LightningChart using Pie chart

In [12]:
# Sample data preparation using Pie Chart
# Ensure the 'date' column is in datetime format
tweets['date'] = pd.to_datetime(tweets['date'])
tweets['year'] = tweets['date'].dt.year

# Filter tweets for AAPL
aapl_tweets = tweets[tweets['ticker_symbol'] == 'AAPL']

# Group by year and count the number of tweets
tweet_counts = aapl_tweets.groupby('year').size()

# Prepare data for LightningChart PieChart
years = tweet_counts.index.astype(str).tolist()
tweet_counts_values = tweet_counts.values.tolist()
data = [{'name': year, 'value': value} for year, value in zip(years, tweet_counts_values)]


chart = lc.PieChart(
    labels_inside_slices=False,
    title='Number of Tweets per Year for AAPL',
    theme=lc.Themes.Dark
)

# Add slices to the pie chart
chart.add_slices(data)
chart.set_inner_radius(50)

# Add a legend to the chart
legend = chart.add_legend()
legend.add(chart)

# Open the chart
chart.open()


127.0.0.1 - - [26/Jun/2024 10:23:59] "GET / HTTP/1.1" 200 -


### Visualizing Data with LightningChart using Bar chart

In [13]:
# Bar chart
# # Assuming 'tweets' DataFrame is already prepared and loaded with tweet data
tweets['date'] = pd.to_datetime(tweets['date'])
tweets['year'] = tweets['date'].dt.year

# Filter tweets for AAPL
aapl_tweets = tweets[tweets['ticker_symbol'] == 'AAPL']

# Group by year and count the number of tweets
tweet_counts = aapl_tweets.groupby('year').size()

# Prepare data for LightningChart
years = tweet_counts.index.astype(str).tolist()
tweet_counts_values = tweet_counts.values.tolist()

# Initialize LightningChart
chart = lc.BarChart(
    vertical=True,
    theme=lc.Themes.Dark,
    title='Number of Tweets per Year for AAPL'
)

# Define colors for each bar
colors = ["#FF5733", "#33FF57", "#3357FF", "#F5A623", "#D0021B"]

# Prepare data with color information
data = []
for year, value, color in zip(years, tweet_counts_values, colors):
    data.append({
        'category': year,
        'value': value,
        'color': color
    })

# Set data with color information
chart.set_data(data)

# Open the chart
chart.set_sorting('alphabetical')
chart.open()


127.0.0.1 - - [26/Jun/2024 10:24:00] "GET / HTTP/1.1" 200 -


### Visualizing Data with LightningChart using Area chart

In [14]:
# Area chart
# # Assuming 'tw' DataFrame is already prepared and loaded with tweet sentiment data
tw['date'] = pd.to_datetime(tw['date'])
sentiment_counts = tw.groupby(['date', 'label']).size().unstack(fill_value=0)

# Create a LightningChart XY chart
chart = lc.ChartXY(theme=lc.Themes.Dark, title='AAPL Daily Sentiment Distribution by Date (2018)')
chart.get_default_x_axis().dispose()

# Set the base time to January 1, 2018
time_origin = pd.Timestamp("2018-01-01").timestamp() * 1000  # Convert to milliseconds
x_axis = chart.add_x_axis(axis_type='linear-highPrecision')
x_axis.set_tick_strategy('DateTime', time_origin=time_origin)
x_axis.set_interval(start=time_origin, end=time_origin + 365 * 24 * 3600 * 1000)

# Convert date to milliseconds since epoch (for high precision axis)
sentiment_counts.index = (sentiment_counts.index - pd.Timestamp("2018-01-01")) // pd.Timedelta('1ms')

# Adding area series for each sentiment and attempting a simple legend addition
legend = chart.add_legend()

# Define colors for each sentiment
colors = {
    'good': lc.Color("orange"),
    'neutral': lc.Color("gray"), 
    'bad': lc.Color("brown")
}

for sentiment in ['good', 'neutral', 'bad']:
    if sentiment in sentiment_counts.columns:
        series = chart.add_area_series()
        x_values = sentiment_counts.index.values
        y_values = sentiment_counts[sentiment].values
        series.add(x_values, y_values)
        series.set_name(sentiment.capitalize())
        series.set_fill_color(colors[sentiment])
        legend.add(series)

chart.open()


  sentiment_counts = tw.groupby(['date', 'label']).size().unstack(fill_value=0)


127.0.0.1 - - [26/Jun/2024 10:24:01] "GET / HTTP/1.1" 200 -


In [15]:
# Convert the 'date' column to datetime objects
tw['date'] = pd.to_datetime(tw['date'])

# Group the data by date and calculate the average sentiment score for each day
daily_sentiment = tw.groupby(tw['date'].dt.date)['score'].mean()

daily_sentiment_df = pd.DataFrame({'Date': daily_sentiment.index, 'Average Score': daily_sentiment.values})

# Print the resulting DataFrame
daily_sentiment_df.head()

Unnamed: 0,Date,Average Score
0,2018-01-01,0.38036
1,2018-01-02,0.325566
2,2018-01-03,0.256721
3,2018-01-04,0.330698
4,2018-01-05,0.187864


### Visualizing Data with LightningChart using Line chart

In [16]:
# Line chart
# Assuming 'tw' DataFrame is already prepared and loaded with tweet sentiment data
tw['date'] = pd.to_datetime(tw['date'])

# Group the data by date and calculate the average sentiment score for each day
daily_sentiment = tw.groupby(tw['date'].dt.date)['score'].mean()

# Create a LightningChart XY chart
chart = lc.ChartXY(
    theme=lc.Themes.Dark,
    title='AAPL Average Sentiment Score for Each Day for the year 2018'
)
chart.get_default_x_axis().dispose()

# Set the base time to January 1, 2018
time_origin = pd.Timestamp("2018-01-01").timestamp() * 1000
x_axis = chart.add_x_axis(axis_type='linear-highPrecision')
x_axis.set_tick_strategy('DateTime', time_origin=time_origin)
x_axis.set_title('Date')

y_axis = chart.get_default_y_axis()
y_axis.set_title('Average Sentiment Score')

# Convert date to milliseconds since epoch (for high precision axis)
x_values = ((pd.to_datetime(daily_sentiment.index) - pd.Timestamp("2018-01-01")) / np.timedelta64(1, 'ms')).astype(np.int64)
y_values = daily_sentiment.values

# Add line series to the chart
series = chart.add_line_series()
series.add(x_values, y_values)

# Customize series appearance
series.set_name('Average Sentiment Score')
series.set_line_thickness(2)

# Open the chart
chart.open()


127.0.0.1 - - [26/Jun/2024 10:24:03] "GET / HTTP/1.1" 200 -


### Fetching Historical Data for Apple and Visualizing Data with LightningChart using Line chart

In [17]:
# Line chart
# Fetch historical data for Apple
aapl = yf.Ticker("AAPL")
hist = aapl.history(period="max")

# Reset the index to make 'Date' a column
hist.reset_index(inplace=True)
hist['Date'] = pd.to_datetime(hist['Date'])

# Create a LightningChart XY chart
chart = lc.ChartXY(theme=lc.Themes.Dark, title="AAPL Open Price")
chart.get_default_x_axis().dispose()

# Set the base time to the minimum date in the dataset
time_origin = pd.Timestamp(hist['Date'].min()).timestamp() * 1000

# Set up a high precision date-time axis
x_axis = chart.add_x_axis(axis_type='linear-highPrecision')
x_axis.set_tick_strategy('DateTime', time_origin=time_origin)
x_axis.set_title('Date')

y_axis = chart.get_default_y_axis()
y_axis.set_title('Open Price ($)')

# Convert date to milliseconds since epoch for high precision axis
x_values = ((hist['Date'] - pd.Timestamp(hist['Date'].min())) / np.timedelta64(1, 'ms')).astype(np.int64).values
y_values = hist['Open'].values

# Add line series to the chart
series = chart.add_line_series()
series.add(x_values, y_values)

# Customize series appearance
series.set_name('AAPL Open Price')
series.set_line_thickness(2)

# Open the chart
chart.open()

127.0.0.1 - - [26/Jun/2024 10:24:04] "GET / HTTP/1.1" 200 -


### Visualizing Data with LightningChart using Line chart

In [18]:
# Line chart
# Define the stock symbol and date range
stock_symbol = "AAPL"
start_date = "2018-01-01"
end_date = "2018-12-31"

# Create a Ticker object for the stock
stock = yf.Ticker(stock_symbol)

# Fetch historical data for the specified date range
hist = stock.history(period="1d", start=start_date, end=end_date)

# Reset the index to make 'Date' a column and ensure 'Date' is in datetime format
hist.reset_index(inplace=True)
hist['Date'] = pd.to_datetime(hist['Date'])

# Create a LightningChart XY chart
chart = lc.ChartXY(theme=lc.Themes.Dark, title=f"{stock_symbol} Close Price (Jan 1, 2018 - Dec 31, 2018)")
chart.get_default_x_axis().dispose()

# Set the base time to the minimum date in the dataset
time_origin = pd.Timestamp(hist['Date'].min()).timestamp() * 1000

# Set up a high precision date-time axis
x_axis = chart.add_x_axis(axis_type='linear-highPrecision')
x_axis.set_tick_strategy('DateTime', time_origin=time_origin)
x_axis.set_title('Date')

y_axis = chart.get_default_y_axis()
y_axis.set_title('Close Price ($)')

# Convert date to milliseconds since epoch for high precision axis
x_values = ((hist['Date'] - pd.Timestamp(hist['Date'].min())) / np.timedelta64(1, 'ms')).astype(np.int64).values
y_values = hist['Close'].values

# Add line series to the chart
series = chart.add_line_series()
series.add(x_values, y_values)

# Customize series appearance
series.set_name('AAPL Close Price')
series.set_line_thickness(2)

# Open the chart
chart.open()


127.0.0.1 - - [26/Jun/2024 10:24:04] "GET / HTTP/1.1" 200 -


In [19]:
# Define the stock symbol and date range
stock_symbol = "AAPL"
start_date = '2018-01-01'
end_date = '2018-12-31'

# Create a Ticker object for the stock
stock = yf.Ticker(stock_symbol)

# Fetch historical data for the specified date range
hist = stock.history(period="1d", start=start_date, end=end_date)

# Make sure 'hist' has a datetime index
hist.index = pd.to_datetime(hist.index)

# Create a date range covering the full date range (including weekends and holidays)
date_range = pd.date_range(start=start_date, end=end_date)

# Create a new DataFrame with the full date range
full_hist = pd.DataFrame(index=date_range)

# Localize the new DataFrame index if necessary
if hist.index.tz is not None:
    full_hist.index = full_hist.index.tz_localize(hist.index.tzinfo)

# Merge or combine the new DataFrame with the existing 'hist' DataFrame
full_hist = full_hist.combine_first(hist)

# Fill NaN values if necessary
full_hist = full_hist.fillna(0)

# Setup display options to show all data in one row
pd.set_option('display.max_rows', None) 
pd.set_option('display.max_columns', None) 
pd.set_option('display.width', 1000) 

# Print the updated DataFrame 'full_hist'
print(full_hist.head())


                               Close  Dividends       High        Low       Open  Stock Splits       Volume
2018-01-01 00:00:00-05:00   0.000000        0.0   0.000000   0.000000   0.000000           0.0          0.0
2018-01-02 00:00:00-05:00  40.615879        0.0  40.625312  39.908532  40.120738           0.0  102223600.0
2018-01-03 00:00:00-05:00  40.608810        0.0  41.155827  40.545152  40.679546           0.0  118071600.0
2018-01-04 00:00:00-05:00  40.797440        0.0  40.901184  40.573447  40.681905           0.0   89738400.0
2018-01-05 00:00:00-05:00  41.261936        0.0  41.349175  40.802161  40.894116           0.0   94640000.0


### Visualizing Data with LightningChart and making comparison between Apple Close Price and Average Twitter Sentiment Score using Line chart

In [20]:
# Make comparison between Apple Close Price and Average Twitter Sentiment Score using line chart
chart = lc.ChartXY(theme=lc.Themes.Dark, title='Apple Close Price and Average Twitter Sentiment Score')

# Configure the default X-axis for date handling
x_axis = chart.get_default_x_axis()
x_axis.set_tick_strategy('DateTime')
x_axis.set_interval(start=pd.Timestamp(start_date).timestamp() * 1000, end=pd.Timestamp(end_date).timestamp() * 1000)

# Create a Y-axis for the stock prices
y_axis1 = chart.get_default_y_axis()
y_axis1.set_title('Stock Price ($)')

# Add a secondary Y-axis for the sentiment scores
y_axis2 = chart.add_y_axis(opposite=True)
y_axis2.set_title('Sentiment Score')

# Convert dates to milliseconds since Unix epoch
dates = pd.to_datetime(full_hist.index).astype(np.int64) // 10**6

# Add series for stock prices
stock_prices = full_hist['Close'].fillna(0)
stock_series = chart.add_line_series(y_axis=y_axis1)
stock_series.set_line_thickness(2)  # Thick lines to simulate bars
stock_series.add(dates, stock_prices)
stock_series.set_name("Stock Price")
stock_series.set_line_color(lc.Color("gray"))

# Add series for sentiment scores
sentiment_scores = daily_sentiment_df['Average Score'].fillna(0)
sentiment_series = chart.add_line_series(y_axis=y_axis2)
sentiment_series.add(dates, sentiment_scores)
sentiment_series.set_name("Sentiment Score")
sentiment_series.set_line_color(lc.Color("orange"))

# Add a legend to the chart
legend = chart.add_legend()
legend.add(stock_series)
legend.add(sentiment_series)

# Open the chart
chart.open()


127.0.0.1 - - [26/Jun/2024 10:24:06] "GET / HTTP/1.1" 200 -


## Validation of the Study
For the validation of the study, correlation coefficients and Granger causality tests were used.

### Correlation Observation:

- There appear to be some correlations between the sentiment score and stock price movements. For example, certain peaks in sentiment scores coincide with peaks in stock prices and vice versa.
- In the initial months (January to March), there seems to be a relatively high correlation where spikes in sentiment scores correspond to increases in stock prices.

The chart suggests a potential relationship between Twitter sentiment and stock price movements. While not all movements correlate perfectly, there are instances where high sentiment scores align with stock price increases. This indicates that monitoring social media sentiment could be a valuable tool for predicting stock price movements.

#### For a more rigorous analysis and to show how these observasions are correlated or not, statistical methods such as correlation coefficients and Granger causality tests could be applied to quantify the relationship between sentiment scores and stock prices.

In [21]:
from scipy.stats import pearsonr
from statsmodels.tsa.stattools import grangercausalitytests

pd.set_option('display.width', 1000)

# Ensure the Date column in daily_sentiment_df is in datetime format and set as index
print("Columns in daily_sentiment_df before renaming:")
print(daily_sentiment_df.columns)

# Check if 'Date' is in columns, otherwise print the columns available
if 'Date' not in daily_sentiment_df.columns:
    print("Column 'Date' not found. Available columns:")
    print(daily_sentiment_df.columns)
else:
    daily_sentiment_df['Date'] = pd.to_datetime(daily_sentiment_df['Date']).dt.tz_localize(None)
    daily_sentiment_df.set_index('Date', inplace=True)

# Ensure the index of full_hist is in datetime format without timezone information and remove the time component
full_hist.index = pd.to_datetime(full_hist.index).normalize().tz_localize(None)

# Check the data
print("Head of full_hist:")
print(full_hist.head())
print("Tail of full_hist:")
print(full_hist.tail())

print("Head of daily_sentiment_df:")
print(daily_sentiment_df.head())
print("Tail of daily_sentiment_df:")
print(daily_sentiment_df.tail())

# Merge the data on the date index
merged_df = pd.merge(full_hist[['Close']], daily_sentiment_df[['Average Score']], left_index=True, right_index=True, how='inner')
print(f"Number of overlapping data points: {len(merged_df)}")

# Check the merged data
print("Head of merged_df:")
print(merged_df.head())
print("Tail of merged_df:")
print(merged_df.tail())

# Correlation Analysis
if len(merged_df) >= 2:
    correlation_coefficient, p_value = pearsonr(merged_df['Close'], merged_df['Average Score'])
    print(f"Pearson correlation coefficient: {correlation_coefficient}")
    print(f"P-value: {p_value}")
else:
    print("Not enough data points for correlation analysis.")

# Granger Causality Test
if len(merged_df) > 15:  # Assuming max_lag is 5, we need at least 15 data points
    data = merged_df[['Average Score', 'Close']].dropna()
    max_lag = 5
    granger_result = grangercausalitytests(data, max_lag, verbose=True)
else:
    print("Not enough data points for Granger causality test.")


Columns in daily_sentiment_df before renaming:
Index(['Date', 'Average Score'], dtype='object')
Head of full_hist:
                Close  Dividends       High        Low       Open  Stock Splits       Volume
2018-01-01   0.000000        0.0   0.000000   0.000000   0.000000           0.0          0.0
2018-01-02  40.615879        0.0  40.625312  39.908532  40.120738           0.0  102223600.0
2018-01-03  40.608810        0.0  41.155827  40.545152  40.679546           0.0  118071600.0
2018-01-04  40.797440        0.0  40.901184  40.573447  40.681905           0.0   89738400.0
2018-01-05  41.261936        0.0  41.349175  40.802161  40.894116           0.0   94640000.0
Tail of full_hist:
                Close  Dividends       High        Low       Open  Stock Splits       Volume
2018-12-27  37.370178        0.0  37.518561  35.915102  37.295989           0.0  212468400.0
2018-12-28  37.389320        0.0  37.937370  36.987261  37.693261           0.0  169165600.0
2018-12-29   0.000000        



### A. Pearson Correlation Analysis:
* Pearson correlation coefficient: -0.5308
* P-value: 6.4821e-28

The Pearson correlation coefficient of -0.5308 indicates a moderate negative correlation between Apple's stock price and the average Twitter sentiment score. The negative value suggests that as the sentiment score increases, the stock price tends to decrease, and vice versa. The p-value is extremely low (6.48e-28), indicating that this result is statistically significant.

### B. Granger Causality Tests:
The Granger causality test is used to determine if one time series can predict another. Here, we tested various lags (from 1 to 5) to see if the sentiment scores can predict the stock price. The results indicate significant causality at all tested lags, with p-values well below 0.05.

The key results for each lag are:

* Lag 1: F=7.7905, p=0.0055
* Lag 2: F=30.3268, p=0.0000
* Lag 3: F=17.9410, p=0.0000
* Lag 4: F=25.6827, p=0.0000
* Lag 5: F=24.3062, p=0.0000

For all lags (1 to 5), the P-values are extremely low, indicating that the average Twitter sentiment score Granger-causes Apple's stock price. This means that past values of the sentiment score can be used to predict future values of the stock price. The strong statistical significance across multiple lags reinforces the robustness of this finding.

#### Results interpretations:
* The negative correlation suggests that higher sentiment scores are associated with lower stock prices, and vice versa.
* The Granger causality test indicates a predictive relationship, where sentiment scores can be used to forecast future stock prices.

The results suggest that there is a significant negative correlation between Apple's stock price and the average Twitter sentiment score. Additionally, the Granger causality test indicates that the sentiment score can be used to predict future stock prices, highlighting a predictive relationship between these two variables. This analysis can be valuable for traders and analysts looking to incorporate social media sentiment into their stock price prediction models.

### 5. Visualizing Data with LightningChart

#### Some results' images:

![Chart](./images/11.png)
![Chart](./images/10.png)
![Chart](./images/9.png)
![Chart](./images/8.png)
![Chart](./images/7.png)
![Chart](./images/6.png)
![Chart](./images/5.png)
![Chart](./images/4.png)
![Chart](./images/3.png)
![Chart](./images/2.png)
![Chart](./images/1.png)

### 6. Conclusion

#### 6.1 Recap of creating the application
In this project, we covered the essentials of performing stock market sentiment analysis using Python and visualizing the results with LightningChart. We discussed setting up the Python environment, loading and preprocessing data, and creating insightful visualizations with LightningChart.

#### 6.2 Why is it useful?
Stock market sentiment analysis using Python provides traders and analysts with a deeper understanding of market trends driven by public sentiment. This approach complements traditional analysis methods, offering a more comprehensive market view.

#### 6.3 Benefits of using LightningChart Python for visualizing data
LightningChart Python stands out for its high performance and flexibility, making it an ideal choice for visualizing large and complex datasets in real-time. Its wide range of features and customization options enables users to create informative and visually appealing charts, enhancing the overall analysis process.
By leveraging the power of sentiment analysis and advanced visualization tools like LightningChart, traders can gain valuable insights and make more informed decisions in the dynamic world of stock trading.