<div style="padding: 35px;color:white;margin:10;font-size:200%;text-align:center;display:fill;border-radius:10px;overflow:hidden;background-image: url(https://images.pexels.com/photos/7078619/pexels-photo-7078619.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)"><b><span style='color:black'><strong>EABL STOCK PRICE PREDICTION </strong></span></b> </div> 

### <b> <span style='color:#16C2D5'>|</span> Business Objectives</b>
1. Build a robust time series model leveraging market indicators to forecast future EABL stock prices. 
2. Investigate viability of investing in EABL stock prices. 
3. Build an anomally detection system to identify unusual or unexpected patterns in EABL stock prices. 

In [2]:
import pandas as pd 
import numpy as np
 
import matplotlib.pyplot as plt 
import seaborn as sns
%matplotlib inline 

from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima_model import ARIMA

#### DATA NDERSTANDING
1. Load and inspect the data

   Check the structure and completeness of the dataset.

2. Data preprocessing 

   Handle missing values, convert data types if necessary, and set the date column as the index for time series analysis.

3. Volatility Insights

   Investigate and model the volatility of EABL stock prices over time.

4. Abnormal Trade Volume Analysis

   Identify and analyze spikes or drops in trade volumes.

5. Dividend Analysis

   Examine the trend in dividend payouts.

6. Time Series Decomposition

   Decompose  time series into trends,seasonal and residual components to better understand the underlying patterns.

7. Lag Analysis

   Investigate the effects of market indicators on EABL stock prices whether their impact have a time lag 

8. Stock Valuation

   Investigate how stock prices vary with quarterly  unemployment rates.

In [3]:
# Loading and Previewing the datasets
merged_data = pd.read_csv('Data/final_merge.csv')

print("\nColumns and First 5 Rows in Merged Data:")
print(merged_data.head())



Columns and First 5 Rows in Merged Data:
   Unnamed: 0       Date    Open    High    Low  Close  Average  Volume  \
0           0  1/31/2024  104.00  111.00  104.0  110.0   106.00   42000   
1           1  1/30/2024  105.00  105.00  101.0  104.0   104.00   15600   
2           2  1/29/2024  105.00  105.00   99.0  103.5   100.00  596100   
3           3  1/26/2024  116.25  116.25  100.0  100.0   104.50   60500   
4           4  1/25/2024  119.75  120.00  118.0  118.0   118.25    5700   

   Month  Year  Day  Annual Average Inflation  12-Month Inflation   Mean  \
0      1  2024   31                       NaN                 6.9  161.0   
1      1  2024   30                       NaN                 6.9  161.0   
2      1  2024   29                       NaN                 6.9  162.0   
3      1  2024   26                       NaN                 6.9  162.0   
4      1  2024   25                       NaN                 6.9  163.0   

   Amount  Dividends per share  Earnings Per Share

In [4]:
# Checking for missing values in the final_merge.csv
# Check for missing values
print("\nMissing Values:")
print(merged_data.isnull().sum())



Missing Values:
Unnamed: 0                    0
Date                          0
Open                          0
High                          0
Low                           0
Close                         0
Average                       0
Volume                        0
Month                         0
Year                          0
Day                           0
Annual Average Inflation     21
12-Month Inflation            0
Mean                         12
Amount                      112
Dividends per share           0
Earnings Per Share           21
dtype: int64


The dataset exhibits some missing values in key columns. The 'Annual Average Inflation' column has 21 missing entries, suggesting that information regarding the average annual inflation rate for those specific periods is unavailable. Additionally, the 'Mean' column has 12 missing values, indicating a lack of mean exchange rates for those corresponding dates. Furthermore, the 'Amount' column shows 112 missing values, implying that the exact monetary amount associated with certain transactions or financial events is not recorded. Lastly, the 'Earnings Per Share' column has 21 missing values, suggesting that earnings per share information is absent for some instances.

In [5]:
# Fill missing values in 'Annual Average Inflation' with the mean
merged_data['Annual Average Inflation'].fillna(merged_data['Annual Average Inflation'].mean(), inplace=True)

# Fill missing values in 'Mean' with the mean
merged_data['Mean'].fillna(merged_data['Mean'].mean(), inplace=True)

# Fill missing values in 'Amount' with the median
merged_data['Amount'].fillna(merged_data['Amount'].median(), inplace=True)

# Fill missing values in 'Earnings Per Share' with the mean
merged_data['Earnings Per Share'].fillna(merged_data['Earnings Per Share'].mean(), inplace=True)

# Display the cleaned dataset info
print("Cleaned Dataset Info:")
print(merged_data.info())

Cleaned Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4353 entries, 0 to 4352
Data columns (total 17 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Unnamed: 0                4353 non-null   int64  
 1   Date                      4353 non-null   object 
 2   Open                      4353 non-null   float64
 3   High                      4353 non-null   float64
 4   Low                       4353 non-null   float64
 5   Close                     4353 non-null   float64
 6   Average                   4353 non-null   float64
 7   Volume                    4353 non-null   int64  
 8   Month                     4353 non-null   int64  
 9   Year                      4353 non-null   int64  
 10  Day                       4353 non-null   int64  
 11  Annual Average Inflation  4353 non-null   float64
 12  12-Month Inflation        4353 non-null   float64
 13  Mean                      4353 non-null  

The missing values in the 'Annual Average Inflation,' 'Mean,' 'Amount,' and 'Earnings Per Share' columns have been successfully addressed through imputation. For the 'Annual Average Inflation' column, the missing values were filled with the mean of the available data, ensuring that the imputed values maintain the general trend of inflation. Similarly, the missing values in the 'Mean' column were imputed with the mean of the existing data, providing representative values for exchange rates during those periods. The 'Amount' column, which denotes monetary values, had missing entries filled with the median to minimize the impact of potential outliers on imputed values. Lastly, the 'Earnings Per Share' column was imputed with the mean, providing estimated values for missing earnings information. As a result of these imputation strategies, the dataset now contains 4353 entries with no missing values, enhancing its completeness and enabling a more comprehensive exploration and analysis of the financial and economic indicators.

In [6]:
# Convert 'Date' column to datetime format
merged_data['Date'] = pd.to_datetime(merged_data['Date'])

### Feature Engineering

In [7]:
# Extract days, months, and years from the 'Date' column
merged_data['Day_of_Week'] = merged_data['Date'].dt.dayofweek
merged_data['Month'] = merged_data['Date'].dt.month
merged_data['Year'] = merged_data['Date'].dt.year

# Display the updated dataset info
print("Updated Dataset Info:")
print(merged_data.info())

Updated Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4353 entries, 0 to 4352
Data columns (total 18 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Unnamed: 0                4353 non-null   int64         
 1   Date                      4353 non-null   datetime64[ns]
 2   Open                      4353 non-null   float64       
 3   High                      4353 non-null   float64       
 4   Low                       4353 non-null   float64       
 5   Close                     4353 non-null   float64       
 6   Average                   4353 non-null   float64       
 7   Volume                    4353 non-null   int64         
 8   Month                     4353 non-null   int64         
 9   Year                      4353 non-null   int64         
 10  Day                       4353 non-null   int64         
 11  Annual Average Inflation  4353 non-null   float64       
 12

In [8]:
# Compute rolling averages for numerical columns to capture trends over time
window_size = 5
merged_data['Close_MA'] = merged_data['Close'].rolling(window=window_size).mean()

print("First few rows with Rolling Averages:")
print(merged_data[['Date', 'Close', 'Close_MA']].head())

First few rows with Rolling Averages:
        Date  Close  Close_MA
0 2024-01-31  110.0       NaN
1 2024-01-30  104.0       NaN
2 2024-01-29  103.5       NaN
3 2024-01-26  100.0       NaN
4 2024-01-25  118.0     107.1


Used the rolling average to smooth out short-term fluctuations and highlight longer-term trends in the 'Close' values.

In [9]:
# Calculating the Volume Weighted Average Price (VWAP) which considers both price and volume
merged_data['VWAP'] = (merged_data['Close'] * merged_data['Volume']).cumsum() / merged_data['Volume'].cumsum()
merged_data['VWAP']

0       110.000000
1       108.375000
2       103.929555
3       103.596682
4       103.710724
           ...    
4348    218.960968
4349    218.952812
4350    218.938897
4351    218.929589
4352    218.927385
Name: VWAP, Length: 4353, dtype: float64

The 'VWAP' (Volume Weighted Average Price) values in the displayed results represent the average price at which the East Africa Breweries Limited (EABL) stock has been traded, taking into account both the closing prices and the corresponding trading volumes. 
VWAP is a significant metric in financial analysis, helping traders and investors understand the average price levels at which a stock has been transacted, factoring in trading activity.

In [12]:
# Computing the percentage change for relevant columns
merged_data['Close_Pct_Change'] = merged_data['Close'].pct_change() * 100
merged_data['Close_Pct_Change']

0             NaN
1       -5.454545
2       -0.480769
3       -3.381643
4       18.000000
          ...    
4348     0.714286
4349    -0.709220
4350     1.428571
4351    -1.408451
4352     2.142857
Name: Close_Pct_Change, Length: 4353, dtype: float64

The resulting 'Close_Pct_Change' column contains the percentage change in the closing prices of the EABL stock for each day compared to the previous day. The first row has a NaN (Not a Number) value because there is no previous day to compare with. The patterns continue for the entire dataset, providing insights into the daily percentage changes in the EABL stock's closing prices. This information is valuable for analyzing the volatility and trends in the stock's price movements over time.

In [16]:
# Convert 'Dividends per share' column to datetime
merged_data['Dividends per share'] = pd.to_datetime(merged_data['Dividends per share'])

# Calculate time elapsed since the last dividend announcement
merged_data['Days_Since_Last_Dividend'] = (merged_data['Date'] - merged_data['Dividends per share']).dt.days

# Display the new column
print("Days_Since_Last_Dividend:")
print(merged_data['Days_Since_Last_Dividend'])

Days_Since_Last_Dividend:
0       19753
1       19752
2       19751
3       19748
4       19747
        ...  
4348    13405
4349    13404
4350    13403
4351    13402
4352    13401
Name: Days_Since_Last_Dividend, Length: 4353, dtype: int64


The 'Days_Since_Last_Dividend' column provides a chronological count of the number of days that have passed since the last dividend announcement for each corresponding date in your dataset. This information can be valuable for understanding the temporal patterns and intervals between dividend payments for East Africa Breweries Limited (EABL) stocks.

In [18]:
merged_data['Close_Volume_Product'] = merged_data['Close'] * merged_data['Volume']
merged_data['Close_Volume_Product']

0        4620000.0
1        1622400.0
2       61696350.0
3        6050000.0
4         672600.0
           ...    
4348    31880100.0
4349    15190000.0
4350    26980000.0
4351    17346000.0
4352     4361500.0
Name: Close_Volume_Product, Length: 4353, dtype: float64

Created a new interaction feature called 'Close_Volume_Product' by multiplying the 'Close' and 'Volume' columns for each entry in the dataset. The resulting values represent the product of the closing stock price and the volume traded on that particular day. This interaction feature captures the combined impact of stock price and trading volume, providing a measure of the overall market activity for East Africa Breweries Limited (EABL) stocks on a given day.

## EDA

#### 1. VOLATILITY ANALYSIS

In [None]:
volatility = df['Close'].std()
volatility

The calculated volatility of 59.23 for the closing prices of EABL stock signifies the average deviation of daily closing prices from their mean. This value indicates a substantial degree of price variability, with an average deviation of approximately 59.23 units (considered in the currency of the stock). Such a level of volatility suggests that EABL stock experiences notable and frequent price fluctuations. It's important to interpret this result in the context of risk assessment, as higher volatility may imply increased uncertainty and potential challenges in predicting future price movements. Investors and analysts should consider this volatility measure along with other risk metrics to form a comprehensive understanding of the stock's historical price dynamics and associated risks.

In [None]:
# Historical volatility
historical_volatility = df['Close'].pct_change().std()
historical_volatility

The historical volatility of approximately 2.2% for the EABL stock means that, on average, the daily percentage change in its closing price over the specified historical period is 2.2%. This measure provides insights into the stock's past price fluctuations, serving as an indicator of its market risk. A higher historical volatility suggests a more variable and potentially riskier market.

In [None]:
# Average True Range
from ta.volatility import AverageTrueRange

atr_window = 30 # The window size as needed
atr = AverageTrueRange(high=df['High'], low=df['Low'], close=df['Close'], window=atr_window).average_true_range()

# Print the calculated ATR values needed
print(atr)

The EABL stock data's computed Average True Range (ATR) values show the degree of market volatility on each matching date. ATR values that are positive indicate rising volatility as you go back in time, whereas values that are negative indicate little to no volatility. This is helpful in figuring out periods of increased market activity and in comprehending previous stock price fluctuations. The ATR values shed light on how market volatility has changed during the historical time that the dataset covers.

In [None]:
# DataFrame index is a datetime index
df.index = pd.to_datetime(df.index)

# Calculate Average True Range (ATR) for volatility
df['atr'] = AverageTrueRange(high=df['High'], low=df['Low'], close=df['Close'], window=14).average_true_range()

# Time series plot with volatility
plt.figure(figsize=(10, 6))
plt.plot(df.index, df['Close'], label='EABL Stock Prices')
plt.plot(df.index, df['atr'], label='Volatility (ATR)', color='orange') 
plt.xlabel('Date')
plt.ylabel('Closing Price / Volatility')
plt.title('EABL Stock Prices Over Time with Volatility')
plt.legend()
plt.show()

In [None]:
# Volatility clustering plot
plt.figure(figsize=(10, 6))
plt.plot(df.index, df['Close'], label='EABL Stock Prices')
plt.plot(df.index, df['Close'].rolling(window=30).std(), label='Rolling Volatility (30 days)')
plt.xlabel('Date')
plt.ylabel('Closing Price / Volatility')
plt.title('Volatility Clustering in EABL Stock Prices')
plt.legend()
plt.show()

In [None]:
# Daily returns
df['Daily_Return'] = df['Close'].pct_change()

# Box plot for daily returns
plt.figure(figsize=(8, 5))
plt.boxplot(df['Daily_Return'].dropna())
plt.title('Box Plot of Daily Returns')
plt.ylabel('Percentage Change')
plt.show()

### 2. Lagging Analysis

In [None]:
# DataFrame with 'Close' prices and 'Date' as the index
df['Daily_Return'] = df['Close'].pct_change()

# The lag period
lag_period = 5

# Create lagged version of the 'Daily_Return' column
df['Daily_Return_Lagged'] = df['Daily_Return'].shift(lag_period)

# Plot the original and lagged daily returns
plt.figure(figsize=(10, 6))
plt.plot(df.index, df['Daily_Return'], label='Original Daily Returns')
plt.plot(df.index, df['Daily_Return_Lagged'], label=f'Daily Returns Lagged by {lag_period} day(s)')
plt.xlabel('Date')
plt.ylabel('Percentage Change')
plt.title(f'Daily Returns and Lagged Daily Returns')
plt.legend()
plt.show()

In [None]:
plt.scatter(df['Daily_Return'].shift(5), df['Daily_Return'], alpha=0.5)
plt.title('Scatter Plot of Daily Returns and Lagged Daily Returns')
plt.xlabel('Lagged Daily Returns (t-1)')
plt.ylabel('Daily Returns (t)')
plt.show()

In [None]:
lag_period = 5  # Adjust the lag period as needed
plt.scatter(df['Daily_Return'].iloc[:-lag_period], df['Daily_Return'].shift(lag_period).dropna(), alpha=0.5)
plt.title(f'Time Lag Scatter Plot (Lag Period = {lag_period})')
plt.xlabel('Daily Returns (t - Lag Period)')
plt.ylabel(f'Daily Returns (t)')
plt.show()