## BRONZE TO SILVER LAYER

### GOLD LAYER - PROCESS DIVIDEND HISTORY

### Description
The notebook is designed to process stock dividend data by merging it with holdings data, calculating dividend amounts, and saving the result to a CSV file. The function follows these steps:

In [1]:
# Importing Common Utility Funcation

import pandas as pd
from StockETL import GlobalPath

In [2]:
# Import necessary libraries and utility functions
%run ../COMMON/common_utility.ipynb

In [3]:
# Instantiate GlobalPath
holdingshistory_gold_file_path = GlobalPath(
    "DATA/GOLD/Holdings/HoldingsHistory_data.csv"
)
stockevents_silver_file_path = GlobalPath(
    "DATA/SILVER/StockEvents/StockEvents_data.csv"
)
dividend_gold_file_path = GlobalPath("DATA/GOLD/Dividend/Dividend_data.csv")

In [4]:
# Load holdings data from the GOLD layer
df_holdings = pd.read_csv(holdingshistory_gold_file_path)
df_holdings["date"] = pd.to_datetime(df_holdings["date"])
print(f"Loaded GOLD Layer holdings data from: {holdingshistory_gold_file_path}")

Loaded GOLD Layer holdings data from: /home/runner/work/StockETL/StockETL/DATA/GOLD/Holdings/HoldingsHistory_data.csv


In [5]:
# Load dividend data from the SILVER layer
df_dividends = pd.read_csv(stockevents_silver_file_path)
df_dividends["date"] = pd.to_datetime(df_dividends["date"])
print(
    f"Loaded SILVER Layer stock dividend data from: {stockevents_silver_file_path}"
)

Loaded SILVER Layer stock dividend data from: /home/runner/work/StockETL/StockETL/DATA/SILVER/StockEvents/StockEvents_data.csv


In [6]:
# Filter for dividend events only
df_dividends = df_dividends[df_dividends["event"].str.upper() == "DIVIDENDS"]

# Merge dividend data with holdings data
df_dividend = pd.merge(
    df_holdings, df_dividends, on=["date", "symbol"], how="left"
)

# Calculate the dividend amount
df_dividend["dividend_amount"] = (
    df_dividend["value"].fillna(0) * df_dividend["holding_quantity"]
)
df_dividend["dividend_amount"] = df_dividend["dividend_amount"].round(2)

# Filter out rows where dividend amount is 0
df_dividend = df_dividend[df_dividend["dividend_amount"] != 0]

In [7]:
# Sort and format the DataFrame
df_dividend = df_dividend.sort_values(
    by=["date", "segment", "symbol"]
).reset_index(drop=True)
df_dividend = df_dividend[["date", "segment", "symbol", "dividend_amount"]]

# Save the result to a new CSV file
df_dividend.to_csv(dividend_gold_file_path, index=False)
print(
    f"GOLD Layer CSV file for Holdings successfully created at: {dividend_gold_file_path}"
)

# Display DataFrame debugrmation
df_dividend.info()

GOLD Layer CSV file for Holdings successfully created at: /home/runner/work/StockETL/StockETL/DATA/GOLD/Dividend/Dividend_data.csv
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 4 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   date             16 non-null     datetime64[ns]
 1   segment          16 non-null     object        
 2   symbol           16 non-null     object        
 3   dividend_amount  16 non-null     float64       
dtypes: datetime64[ns](1), float64(1), object(2)
memory usage: 644.0+ bytes
