<div style="padding: 35px;color:white;margin:10;font-size:200%;text-align:center;display:fill;border-radius:10px;overflow:hidden;background-image: url(https://images.pexels.com/photos/7078619/pexels-photo-7078619.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)"><b><span style='color:black'><strong>EABL STOCK PRICE PREDICTION </strong></span></b> </div> 

## <div style="padding: 20px;color:white;margin:10;font-size:90%;text-align:left;display:fill;border-radius:10px;overflow:hidden;background-image: url(https://w0.peakpx.com/wallpaper/957/661/HD-wallpaper-white-marble-white-stone-texture-marble-stone-background-white-stone.jpg)"><b><span style='color:black'> Business Understanding</span></b> </div>
East African Breweries Limited (EABL) has a rich history rooted in East Africa's economic and social fabric. Established in 1922, EABL has grown to become a leading beverage company, contributing significantly to the region's economy. Over the years, EABL has built a portfolio of iconic brands, becoming synonymous with quality and innovation in the brewing industry.EABL holds a pivotal role in the East African beverage market, offering a diverse range of alcoholic and non-alcoholic products. Its flagship brands, including Tusker Lager and Guinness, have become household names, reflecting the company's commitment to quality craftsmanship.

Despite its historical success, East African Breweries Limited (EABL) has been facing significant challenges in the form of evolving regulatory landscapes. The recent government's directive to collect taxes within 24 hours after goods leave the store, influenced by the new Finance Act of 2023, has introduced a new layer of complexity. This has prompted investors and stakeholders to reevaluate their strategies in light of the changing regulatory landscape. To navigate this uncertainty, our goal is to provide investors with a comprehensive analysis and forecasting model for EABL's stock prices, incorporating various economic indicators and sentiments.

### <b> <span style='color:#16C2D5'>|</span> Problem statement</b>
The dynamic regulatory environment and recent government directives have created uncertainty in the market, affecting EABL's stock performance. Investors are seeking ways to cushion themselves from potential market crashes and make informed decisions in the face of evolving economic conditions. To address this, we aim to develop a multifaceted analysis, including time series forecasting, sentiment analysis, volatility insights, abnormal trade volume investigation, dividends analysis, trend analysis, and lag analysis of market indicators.

### <b> <span style='color:#16C2D5'>|</span> Objectives</b>
1. **Time Series Forecasting:**
Objective: Develop an accurate time series forecasting model for EABL's stock prices. Incorporate Twitter(Stocks) sentiments, inflation rates, exchange rates, yearly unemployment rates, and EABL dividends payout.

2. **Sentiment Analysis:**
Objective 1: Perform a sentiment analysis of EABL products as well as market sentiments (Twitter).
Objective 2: Conduct sentiment analysis on news articles and Instagram.
Objective 3: Identify key sentiment drivers.

3. **Viability Assessment:**
Objective 1: Uncover EABL's stock volatility patterns for risk assessment.
Objective 2: Develop a risk model to identify and quantify potential risks for managing investment strategies.
Objective 3: Investigate abnormal trade volume spikes and analyze their causes and implications.

4. **Dividends Analysis:**
Objective: Analyze the rates of EABL dividends payout.

5. **Trend Analysis:**
Objective: Analyze trends, seasonality, and autocorrelation patterns in EABL stock data.

6. **Lag Analysis:**
Objective: Analyze the lag effects of market indicators (inflation rates, GDP, USD exchange rates, unemployment rates) on EABL stocks.


### <b> <span style='color:#16C2D5'>|</span> Possible challenges</b>
1. Regulatory Uncertainty: The recent changes in tax collection directives pose a challenge for EABL's financial stability. Understanding and predicting the impact of these changes on stock prices is challenging but crucial.

2. Data Integration: Incorporating diverse data sources like Twitter sentiments, inflation rates, exchange rates, unemployment rates, and dividends payout requires efficient data integration and cleaning to ensure the accuracy and reliability of the analysis.

3. Sentiment Analysis: Analyzing sentiments from various sources (Twitter, news articles, Instagram) poses the challenge of dealing with unstructured data and ensuring that sentiments are accurately captured and interpreted.

4. Market Volatility: Predicting stock prices and identifying potential risks related to EABL's stock volatility demands a robust model capable of handling the inherent unpredictability in financial markets.

5. Complexity of Market Indicators: Understanding the lag effects of market indicators such as inflation rates, GDP, USD exchange rates, and unemployment rates on EABL stocks requires a sophisticated analytical approach.

6. Scanty data on market indicators(Quartely market indicators)

7. Limited market sentiment data(Twitter)

### <b> <span style='color:#16C2D5'>|</span> Conclusion</b>
In conclusion, EABL's current challenges necessitate a comprehensive approach to provide investors with the insights needed to make informed decisions. By combining traditional financial indicators with emerging data sources and advanced analytics, we aim to offer a holistic view of the market dynamics surrounding EABL. The analysis will not only address current uncertainties but also equip investors with tools to anticipate and navigate future trends in the stock market.

In [1]:
# Data manipulation libraries
import pandas as pd 
import numpy as np
 
# visualization libraries
import matplotlib.pyplot as plt 
import seaborn as sns
%matplotlib inline 

# statistical libraries
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima_model import ARIMA

# machine learning libraries
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler, StandardScaler 

#Finance visualization libraries. 
from ta.volatility import AverageTrueRange
import mplfinance as mpf

#Sentiment analysis libraries. 
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from wordcloud import WordCloud, STOPWORDS




ModuleNotFoundError: No module named 'statsmodels'

In [None]:
# pip install statsmodels

## <div style="padding: 20px;color:white;margin:10;font-size:90%;text-align:left;display:fill;border-radius:10px;overflow:hidden;background-image: url(https://w0.peakpx.com/wallpaper/957/661/HD-wallpaper-white-marble-white-stone-texture-marble-stone-background-white-stone.jpg)"><b><span style='color:black'> Data Understanding</span></b> </div>

In [None]:
# Load the final_merge.csv file into a Pandas DataFrame.
df = pd.read_csv("MergedData/final_merge.csv")
df.head()

In [None]:
# Check the summary informartion of the dataframe.
df.info()

In [None]:
# Check for missing values.
df.isna().sum()

In [None]:
df.columns

## <div style="padding: 20px;color:white;margin:10;font-size:90%;text-align:left;display:fill;border-radius:10px;overflow:hidden;background-image: url(https://w0.peakpx.com/wallpaper/957/661/HD-wallpaper-white-marble-white-stone-texture-marble-stone-background-white-stone.jpg)"><b><span style='color:black'> Data Preprocessing and EDA</span></b> </div>

In [None]:
#Convert to datetime formart. 
df['Date'] = pd.to_datetime(df['Date'])

In [None]:
df['PE_Ratio'] = df['Close'] / df['Earnings Per Share']
df['Dividend_Yield'] = (df['Dividends per share'] / df['Close']) * 100


In [None]:
#Moving Averages
df['50_Day_MA'] = df['Close'].rolling(window=50).mean()
df['200_Day_MA'] = df['Close'].rolling(window=200).mean()

In [None]:
#Percentage Changes
df['Daily_Percentage_Change'] = df['Close'].pct_change() * 100

In [None]:
# Feature Engineering
df['PE_Ratio'] = df['Close'] / df['Earnings Per Share']
df['Dividend_Yield'] = (df['Dividends per share'] / df['Close']) * 100


### <b> <span style='color:#16C2D5'>|</span> Volatility Analysis</b>

In [None]:
volatility = df['Close'].std()
volatility

The calculated volatility of 59.23 for the closing prices of EABL stock signifies the average deviation of daily closing prices from their mean. This value indicates a substantial degree of price variability, with an average deviation of approximately 59.23 units (considered in the currency of the stock). Such a level of volatility suggests that EABL stock experiences notable and frequent price fluctuations. It's important to interpret this result in the context of risk assessment, as higher volatility may imply increased uncertainty and potential challenges in predicting future price movements. Investors and analysts should consider this volatility measure along with other risk metrics to form a comprehensive understanding of the stock's historical price dynamics and associated risks.

In [None]:
# Historical volatility
historical_volatility = df['Close'].pct_change().std()
historical_volatility

The historical volatility of approximately 2.2% for the EABL stock means that, on average, the daily percentage change in its closing price over the specified historical period is 2.2%. This measure provides insights into the stock's past price fluctuations, serving as an indicator of its market risk. A higher historical volatility suggests a more variable and potentially riskier market.

In [None]:
atr_window = 30 # The window size as needed
atr = AverageTrueRange(high=df['High'], low=df['Low'], close=df['Close'], window=atr_window).average_true_range()

# Print the calculated ATR values needed
print(atr)

The EABL stock data's computed Average True Range (ATR) values show the degree of market volatility on each matching date. ATR values that are positive indicate rising volatility as you go back in time, whereas values that are negative indicate little to no volatility. This is helpful in figuring out periods of increased market activity and in comprehending previous stock price fluctuations. The ATR values shed light on how market volatility has changed during the historical time that the dataset covers.

In [None]:
# DataFrame index is a datetime index
df.index = pd.to_datetime(df.index)

# Calculate Average True Range (ATR) for volatility
df['atr'] = AverageTrueRange(high=df['High'], low=df['Low'], close=df['Close'], window=14).average_true_range()

# Time series plot with volatility
plt.figure(figsize=(10, 6))
plt.plot(df.index, df['Close'], label='EABL Stock Prices')
plt.plot(df.index, df['atr'], label='Volatility (ATR)', color='orange') 
plt.xlabel('Date')
plt.ylabel('Closing Price / Volatility')
plt.title('EABL Stock Prices Over Time with Volatility')
plt.legend()
plt.show()

In [None]:
# Volatility clustering plot
plt.figure(figsize=(10, 6))
plt.plot(df.index, df['Close'], label='EABL Stock Prices')
plt.plot(df.index, df['Close'].rolling(window=30).std(), label='Rolling Volatility (30 days)')
plt.xlabel('Date')
plt.ylabel('Closing Price / Volatility')
plt.title('Volatility Clustering in EABL Stock Prices')
plt.legend()
plt.show()

### <b> <span style='color:#16C2D5'>|</span> Stock trends</b>

In [None]:
# Stock Price Trends
plt.figure(figsize=(10, 6))
plt.plot(df['Date'], df['Close'], label='Close Price')
plt.title('Stock Price Trends Over Time')
plt.xlabel('Date')
plt.ylabel('Close Price')
plt.legend()
plt.show()

In [None]:
mpf.plot(df.set_index('Date'), type='candle', style='yahoo', title='Candlestick Chart')

# c. Financial Ratios Over Time
plt.figure(figsize=(10, 6))
plt.plot(df['Date'], df['PE_Ratio'], label='P/E Ratio')
plt.plot(df['Date'], df['Dividend_Yield'], label='Dividend Yield')
plt.title('Financial Ratios Over Time')
plt.xlabel('Date')
plt.ylabel('Ratio')
plt.legend()
plt.show()

### <b> <span style='color:#16C2D5'>|</span> Moving averages</b>

Computes the 50-day and 200-day moving averages for the 'Close' prices and adds them as new columns ('50_Day_MA', '200_Day_MA').

In [None]:
# Moving Averages
plt.figure(figsize=(10, 6))
plt.plot(df['Date'], df['Close'], label='Close Price')
plt.plot(df['Date'], df['50_Day_MA'], label='50-Day MA')
plt.plot(df['Date'], df['200_Day_MA'], label='200-Day MA')
plt.title('Moving Averages Over Time')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()

### <b> <span style='color:#16C2D5'>|</span> Financial ratios</b>

Calculates the Price to Earnings ratio (P/E) and Dividend Yield, adding them as new columns ('PE_Ratio', 'Dividend_Yield').

In [None]:
#Financial Ratios Over Time
plt.figure(figsize=(10, 6))
plt.plot(df['Date'], df['PE_Ratio'], label='P/E Ratio')
plt.plot(df['Date'], df['Dividend_Yield'], label='Dividend Yield')
plt.title('Financial Ratios Over Time')
plt.xlabel('Date')
plt.ylabel('Ratio')
plt.legend()
plt.show()


Calculate the Price to Earnings (P/E) ratio by dividing the 'Close' prices by the 'Earnings Per Share' and add it as a new column 'PE_Ratio'.
Compute the Dividend Yield by dividing the 'Dividends per share' by the 'Close' prices and multiplying by 100, adding it as a new column 'Dividend_Yield'.

In [None]:
# Plotting
plt.figure(figsize=(12, 8))

# P/E Ratio Plot
plt.subplot(3, 1, 1)
plt.plot(df['Date'], df['PE_Ratio'], label='P/E Ratio', color='blue')
plt.title('P/E Ratio Over Time')
plt.xlabel('Date')
plt.ylabel('P/E Ratio')
plt.legend()

# Dividend Yield Plot
plt.subplot(3, 1, 2)
plt.plot(df['Date'], df['Dividend_Yield'], label='Dividend Yield', color='green')
plt.title('Dividend Yield Over Time')
plt.xlabel('Date')
plt.ylabel('Dividend Yield (%)')
plt.legend()


plt.tight_layout()
plt.show()


### <b> <span style='color:#16C2D5'>|</span> Lag analysis</b> 

Close Prices:

Introduced a lagged feature, Close_Lag, capturing historical trends in closing prices.
Lagged by 1 period to observe trends over consecutive time points.
Additional Variables:

Lag features created for various financial indicators:
Volume_Lag: Lagged trading volumes.
Average_Lag: Lagged average values.
Dividends_Lag: Lagged dividends per share.
Earnings_Lag: Lagged earnings per share.

In [None]:
# Generate lag features for the 'Close' prices to capture historical trends.
lag_periods = 1  # Adjust the lag period as needed
df['Close_Lag'] = df['Close'].shift(lag_periods)

In [None]:
lag_periods = 1  # Adjust the lag period as needed

# Create lag features for additional variables
df['Volume_Lag'] = df['Volume'].shift(lag_periods)
df['Average_Lag'] = df['Average'].shift(lag_periods)
df['Dividends_Lag'] = df['Dividends per share'].shift(lag_periods)
df['Earnings_Lag'] = df['Earnings Per Share'].shift(lag_periods)

In [None]:
plt.figure(figsize=(14, 8))

# Original 'Close' and lagged 'Close'
plt.subplot(2, 3, 1)
plt.plot(df['Date'], df['Close'], label='Original Close', color='blue')
plt.plot(df['Date'], df['Close_Lag'], label='Close Lag (1 period)', linestyle='dashed', color='red')
plt.title('Close Prices with Lag')
plt.xlabel('Date')
plt.ylabel('Close Price')
plt.legend()

# Original 'Volume' and lagged 'Volume'
plt.subplot(2, 3, 2)
plt.plot(df['Date'], df['Volume'], label='Original Volume', color='green')
plt.plot(df['Date'], df['Volume_Lag'], label='Volume Lag (1 period)', linestyle='dashed', color='orange')
plt.title('Volume with Lag')
plt.xlabel('Date')
plt.ylabel('Volume')
plt.legend()

# Original 'Average' and lagged 'Average'
plt.subplot(2, 3, 3)
plt.plot(df['Date'], df['Average'], label='Original Average', color='purple')
plt.plot(df['Date'], df['Average_Lag'], label='Average Lag (1 period)', linestyle='dashed', color='pink')
plt.title('Average with Lag')
plt.xlabel('Date')
plt.ylabel('Average')
plt.legend()

# Original 'Dividends per share' and lagged 'Dividends per share'
plt.subplot(2, 3, 4)
plt.plot(df['Date'], df['Dividends per share'], label='Original Dividends', color='cyan')
plt.plot(df['Date'], df['Dividends_Lag'], label='Dividends Lag (1 period)', linestyle='dashed', color='brown')
plt.title('Dividends with Lag')
plt.xlabel('Date')
plt.ylabel('Dividends per share')
plt.legend()

# Original 'Earnings Per Share' and lagged 'Earnings Per Share'
plt.subplot(2, 3, 5)
plt.plot(df['Date'], df['Earnings Per Share'], label='Original Earnings', color='gray')
plt.plot(df['Date'], df['Earnings_Lag'], label='Earnings Lag (1 period)', linestyle='dashed', color='yellow')
plt.title('Earnings with Lag')
plt.xlabel('Date')
plt.ylabel('Earnings Per Share')
plt.legend()

plt.tight_layout()
plt.show()

Interpretation:

**Close Prices with Lag**: A comparison of original closing prices with lagged prices.
Provides insights into how closing prices change over consecutive periods.

**Volume with Lag**: Examines trends in trading volumes by comparing original and lagged volumes.
Helps identify patterns and shifts in trading activi

**Average with Lag**: Highlights trends in average values by comparing original and lagged averages.
Useful for understanding changes in average metrics.


**Dividends with Lag**: A comparison of original dividend payouts with lagged dividends.
Enables the identification of trends in dividend distributions.


**Earnings with Lag**: Examines trends in earnings per share by comparing original and lagged earnings.
Provides insights into historical earnings patterns.

Lag analysis allows for the observation of historical trends in various financial indicators as well as enhanced understanding of historical patterns, supporting more informed financial analysis and strategic decision-making based on trends observed in the lagged features.


### <b> <span style='color:#16C2D5'>|</span> Trend analysis</b> 

In [None]:
# Calculate and plot the percentage changes in stock prices
plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['Close'].pct_change() * 100, label='Daily Percentage Change', color='purple')
plt.title('Daily Percentage Change in Stock Prices')
plt.xlabel('Date')
plt.ylabel('Percentage Change')
plt.legend()
plt.show()

In [None]:
# df['Price_Diff'] = df['Close'].diff()  # Assuming 'Close' column is used for daily closing prices

# plt.figure(figsize=(12, 6))
# sns.lineplot(x='Date', y='Price_Diff', data=df, label='Daily Price Difference', color='green')
# plt.axhline(0, color='black', linestyle='--', linewidth=1, label='Zero Line')
# plt.title('EABL Stock Daily Price Differences Over Time')
# plt.xlabel('Date')
# plt.ylabel('Price Difference')
# plt.legend()
# plt.tight_layout()
# plt.show()