# **Data Book Notebook**

* This section aims to enrich and centralize the necessary data for the project's execution. Some of this data will be created through calculations based on existing data.

## **Initial Setup**

### Install Libs

In [1]:
%pip install scikit-learn --q --no-cache

Note: you may need to restart the kernel to use updated packages.


### Import Packages

In [2]:
import os
from pathlib import Path
import pandas as pd
import numpy as np
from sklearn.impute import KNNImputer

### Create a file path default

In [3]:
file_path_cleaned = str(Path(os.getcwd()).parent/"data/cleaned")
file_path_book  = str(Path(os.getcwd()).parent/"data/book")

## **Data Book**

* This section's mission is to enrich the data captured and cleaned by the processes in the notebooks "01_data_aquisition.ipynb" and "02_data_cleaning.ipynb". This enrichment will be achieved through calculations with existing fields and possibly the acquisition of new external data. It's important to emphasize the significance of running the previous notebooks for the proper functioning of the project.

### Fundamentals Book

In [4]:
df_fundamentals_cleaned = pd.read_csv(file_path_cleaned + "/fundamentals_cleaned.csv")
df_fundamentals_cleaned.head()

Unnamed: 0,ticker,long_name,sector,industry,market_cap,enterprise_value,total_revenue,profit_margins,operating_margins,dividend_rate,...,total_cash,total_cash_per_share,total_debt,earnings_quarterly_growth,revenue_growth,gross_margins,ebitda_margins,return_on_assets,return_on_equity,gross_profits
0,ABCB4.SA,Banco ABC Brasil S.A.,Financial Services,Banks - Regional,4265434000.0,14773390000.0,1941779000.0,0.41576,0.38826,1.56,...,7774306000.0,35.162,18298460000.0,0.001,0.003,0.0,0.0,0.0153,0.1568,1973086000.0
1,AGRO3.SA,BrasilAgro - Companhia Brasileira de Proprieda...,Consumer Defensive,Farm Products,2466480000.0,2912933000.0,1249437000.0,0.21493,0.25031,3.21,...,383837000.0,3.885,872075000.0,6.801,0.671,0.25252,0.21201,0.03839,0.1217,315504000.0
2,RAIL3.SA,Rumo S.A.,Industrials,Railroads,42288820000.0,55243050000.0,10317460000.0,0.07639,0.33544,0.07,...,7656040000.0,4.132,21843200000.0,3.935,0.121,0.34493,0.43834,0.04252,0.05163,3146360000.0
3,ALPA3.SA,Alpargatas S.A.,Consumer Cyclical,Footwear & Accessories,5309793000.0,6482982000.0,4022153000.0,-0.05671,-0.06434,0.4,...,414288000.0,0.614,1550341000.0,0.0,-0.127,0.43246,-5e-05,-0.0091,-0.04153,1968303000.0
4,ALPA4.SA,Alpargatas S.A.,Consumer Cyclical,Footwear & Accessories,5350758000.0,6395236000.0,4022153000.0,-0.05671,-0.06434,0.43,...,414288000.0,0.614,1550341000.0,0.0,-0.127,0.43246,-5e-05,-0.0091,-0.04153,1968303000.0


#### Feature Creation

In [5]:
df_fundamentals_book_cols = df_fundamentals_cleaned.copy()

In [6]:
df_fundamentals_book_cols['total_assets_approx'] = df_fundamentals_book_cols['total_cash'] + df_fundamentals_book_cols['book_value']
df_fundamentals_book_cols['asset_turnover'] = df_fundamentals_book_cols['total_revenue'] / df_fundamentals_book_cols['total_assets_approx']
df_fundamentals_book_cols['earnings_growth_rate'] = df_fundamentals_cleaned['earnings_quarterly_growth'] * 100
df_fundamentals_book_cols['dividend_payout_ratio'] = (df_fundamentals_cleaned['trailing_annual_dividend_rate'] / df_fundamentals_cleaned['earnings_quarterly_growth']) * 100
df_fundamentals_book_cols['dividend_payout_ratio'] = df_fundamentals_book_cols['dividend_payout_ratio'].replace([np.inf, -np.inf], np.nan)
df_fundamentals_book_cols['dividend_payout_ratio'] = df_fundamentals_book_cols['dividend_payout_ratio'].apply(lambda x: max(0, x))
df_fundamentals_book_cols['equity'] = df_fundamentals_book_cols['total_assets_approx'] - df_fundamentals_book_cols['total_debt']
df_fundamentals_book_cols['debt_to_equity'] = df_fundamentals_book_cols['total_debt'] / df_fundamentals_book_cols['equity']
df_fundamentals_book_cols['roi'] = df_fundamentals_cleaned['total_revenue'] / df_fundamentals_cleaned['enterprise_value']
df_fundamentals_book_cols['roce'] = df_fundamentals_book_cols['ebitda'] / (df_fundamentals_book_cols['total_debt'] + df_fundamentals_book_cols['market_cap'])

* In the code block above, we are working with a DataFrame of company fundamentals. The fields "earnings_growth_rate," "dividend_payout_ratio," and "roi" play pivotal roles in our analysis. "Earnings_growth_rate" informs us about the company's profit growth over time, which is crucial for assessing its potential for appreciation. Meanwhile, "dividend_payout_ratio" aids us in understanding the company's dividend policy, revealing how much of the profits is distributed to shareholders. Lastly, "ROI" is an essential metric for evaluating financial efficiency and company performance in relation to invested capital. These fields are the cornerstones of our analysis and guide our investment decisions.

In [7]:
df_fundamentals_book_cols.columns

Index(['ticker', 'long_name', 'sector', 'industry', 'market_cap',
       'enterprise_value', 'total_revenue', 'profit_margins',
       'operating_margins', 'dividend_rate', 'beta', 'ebitda', 'trailing_pe',
       'forward_pe', 'volume', 'average_volume', 'fifty_two_week_low',
       'fifty_two_week_high', 'price_to_sales_trailing_12_months',
       'fifty_day_average', 'two_hundred_day_average',
       'trailing_annual_dividend_rate', 'trailing_annual_dividend_yield',
       'book_value', 'price_to_book', 'total_cash', 'total_cash_per_share',
       'total_debt', 'earnings_quarterly_growth', 'revenue_growth',
       'gross_margins', 'ebitda_margins', 'return_on_assets',
       'return_on_equity', 'gross_profits', 'total_assets_approx',
       'asset_turnover', 'earnings_growth_rate', 'dividend_payout_ratio',
       'equity', 'debt_to_equity', 'roi', 'roce'],
      dtype='object')

In [8]:
df_fundamentals_book_cols.head(2)

Unnamed: 0,ticker,long_name,sector,industry,market_cap,enterprise_value,total_revenue,profit_margins,operating_margins,dividend_rate,...,return_on_equity,gross_profits,total_assets_approx,asset_turnover,earnings_growth_rate,dividend_payout_ratio,equity,debt_to_equity,roi,roce
0,ABCB4.SA,Banco ABC Brasil S.A.,Financial Services,Banks - Regional,4265434000.0,14773390000.0,1941779000.0,0.41576,0.38826,1.56,...,0.1568,1973086000.0,7774306000.0,0.249769,0.1,155000.0,-10524160000.0,-1.73871,0.131438,0.0
1,AGRO3.SA,BrasilAgro - Companhia Brasileira de Proprieda...,Consumer Defensive,Farm Products,2466480000.0,2912933000.0,1249437000.0,0.21493,0.25031,3.21,...,0.1217,315504000.0,383837000.0,3.255124,680.1,47.640053,-488238000.0,-1.786168,0.428927,0.079343


#### Cleaning Missing Values

In [9]:
df_fundamentals_book_missing = df_fundamentals_book_cols.copy()

In [10]:
df_fundamentals_book_missing.isna().sum()

ticker                               0
long_name                            0
sector                               0
industry                             0
market_cap                           0
enterprise_value                     0
total_revenue                        0
profit_margins                       0
operating_margins                    0
dividend_rate                        0
beta                                 0
ebitda                               0
trailing_pe                          0
forward_pe                           0
volume                               0
average_volume                       0
fifty_two_week_low                   0
fifty_two_week_high                  0
price_to_sales_trailing_12_months    0
fifty_day_average                    0
two_hundred_day_average              0
trailing_annual_dividend_rate        0
trailing_annual_dividend_yield       0
book_value                           0
price_to_book                        0
total_cash               

In [11]:
df_fundamentals_book_missing["dividend_payout_ratio"].fillna(0, inplace = True)

In [12]:
df_fundamentals_book_missing.isna().sum()

ticker                               0
long_name                            0
sector                               0
industry                             0
market_cap                           0
enterprise_value                     0
total_revenue                        0
profit_margins                       0
operating_margins                    0
dividend_rate                        0
beta                                 0
ebitda                               0
trailing_pe                          0
forward_pe                           0
volume                               0
average_volume                       0
fifty_two_week_low                   0
fifty_two_week_high                  0
price_to_sales_trailing_12_months    0
fifty_day_average                    0
two_hundred_day_average              0
trailing_annual_dividend_rate        0
trailing_annual_dividend_yield       0
book_value                           0
price_to_book                        0
total_cash               

In [13]:
df_fundamentals_book = df_fundamentals_book_missing.copy()

In [14]:
df_fundamentals_book.head()

Unnamed: 0,ticker,long_name,sector,industry,market_cap,enterprise_value,total_revenue,profit_margins,operating_margins,dividend_rate,...,return_on_equity,gross_profits,total_assets_approx,asset_turnover,earnings_growth_rate,dividend_payout_ratio,equity,debt_to_equity,roi,roce
0,ABCB4.SA,Banco ABC Brasil S.A.,Financial Services,Banks - Regional,4265434000.0,14773390000.0,1941779000.0,0.41576,0.38826,1.56,...,0.1568,1973086000.0,7774306000.0,0.249769,0.1,155000.0,-10524160000.0,-1.73871,0.131438,0.0
1,AGRO3.SA,BrasilAgro - Companhia Brasileira de Proprieda...,Consumer Defensive,Farm Products,2466480000.0,2912933000.0,1249437000.0,0.21493,0.25031,3.21,...,0.1217,315504000.0,383837000.0,3.255124,680.1,47.640053,-488238000.0,-1.786168,0.428927,0.079343
2,RAIL3.SA,Rumo S.A.,Industrials,Railroads,42288820000.0,55243050000.0,10317460000.0,0.07639,0.33544,0.07,...,0.05163,3146360000.0,7656040000.0,1.347623,393.5,1.677255,-14187160000.0,-1.539646,0.186765,0.070519
3,ALPA3.SA,Alpargatas S.A.,Consumer Cyclical,Footwear & Accessories,5309793000.0,6482982000.0,4022153000.0,-0.05671,-0.06434,0.4,...,-0.04153,1968303000.0,414288000.0,9.708591,0.0,0.0,-1136053000.0,-1.364673,0.620417,-2.9e-05
4,ALPA4.SA,Alpargatas S.A.,Consumer Cyclical,Footwear & Accessories,5350758000.0,6395236000.0,4022153000.0,-0.05671,-0.06434,0.43,...,-0.04153,1968303000.0,414288000.0,9.708591,0.0,0.0,-1136053000.0,-1.364673,0.62893,-2.9e-05


#### Saving Files

In [15]:
Path(file_path_book).mkdir(parents=True, exist_ok=True)
df_fundamentals_book.to_csv(file_path_book + "/fundamentals_book.csv", index=False)

### Macroeconomic Book

In [16]:
df_macroeconomic_cleaned = pd.read_csv(file_path_cleaned + "/macroeconomic_cleaned.csv")
df_macroeconomic_cleaned.head()

Unnamed: 0,date,selic,confidence,pib,incc,ipca,dolar
0,2019-01-31,6.5,128.64,578214.5,0.49,3.78,3.6513
1,2019-02-28,6.5,139.39,576089.7,0.09,3.89,3.7379
2,2019-03-31,6.5,125.53,601749.8,0.31,4.58,3.8961
3,2019-04-30,6.5,121.71,612918.4,0.38,4.94,3.9447
4,2019-05-31,6.5,117.01,615304.9,0.03,4.66,3.9401


#### Feature Creation

In [17]:
df_macroeconomic_book_cols = df_macroeconomic_cleaned.copy()

In [18]:
df_macroeconomic_book_cols['monthly_inflation'] = df_macroeconomic_book_cols['ipca'].pct_change()
df_macroeconomic_book_cols['gdp_growth'] = df_macroeconomic_book_cols['pib'].pct_change()
df_macroeconomic_book_cols['dollar_growth'] = df_macroeconomic_book_cols['dolar'].pct_change()
df_macroeconomic_book_cols['real_interest_rate'] = df_macroeconomic_book_cols['selic'] - df_macroeconomic_book_cols['ipca']
df_macroeconomic_book_cols['inflation_confidence_difference'] = df_macroeconomic_book_cols['ipca'] - df_macroeconomic_book_cols['confidence']

In [19]:
df_macroeconomic_book_cols.columns

Index(['date', 'selic', 'confidence', 'pib', 'incc', 'ipca', 'dolar',
       'monthly_inflation', 'gdp_growth', 'dollar_growth',
       'real_interest_rate', 'inflation_confidence_difference'],
      dtype='object')

* In the code block above, we are working with a DataFrame of economic indicators. The fields "monthly_inflation," "gdp_growth," "dollar_growth," "real_interest_rate," and "inflation_confidence_difference" are pivotal in our economic analysis. "Monthly_inflation" informs us about the monthly changes in inflation, which is vital for understanding price stability. "GDP_growth" tracks the growth rate of the Gross Domestic Product, indicating the economic health and expansion of the country. "Dollar_growth" reflects changes in the value of the national currency relative to the US dollar, providing insights into exchange rate fluctuations. "Real_interest_rate" measures the interest rate adjusted for inflation, helping us assess borrowing costs and investment opportunities. Lastly, "Inflation_confidence_difference" quantifies the difference in inflation expectations, shedding light on economic confidence levels. These fields are integral to our economic analysis and inform our decisions.

In [20]:
df_macroeconomic_book_cols.head(5)

Unnamed: 0,date,selic,confidence,pib,incc,ipca,dolar,monthly_inflation,gdp_growth,dollar_growth,real_interest_rate,inflation_confidence_difference
0,2019-01-31,6.5,128.64,578214.5,0.49,3.78,3.6513,,,,2.72,-124.86
1,2019-02-28,6.5,139.39,576089.7,0.09,3.89,3.7379,0.029101,-0.003675,0.023718,2.61,-135.5
2,2019-03-31,6.5,125.53,601749.8,0.31,4.58,3.8961,0.177378,0.044542,0.042323,1.92,-120.95
3,2019-04-30,6.5,121.71,612918.4,0.38,4.94,3.9447,0.078603,0.01856,0.012474,1.56,-116.77
4,2019-05-31,6.5,117.01,615304.9,0.03,4.66,3.9401,-0.05668,0.003894,-0.001166,1.84,-112.35


#### Cleaning MIssing Values

In [21]:
df_macroeconomic_book_missing = df_macroeconomic_book_cols.copy()

In [22]:
df_macroeconomic_book_missing.isna().sum()

date                               0
selic                              0
confidence                         0
pib                                0
incc                               0
ipca                               0
dolar                              0
monthly_inflation                  1
gdp_growth                         1
dollar_growth                      1
real_interest_rate                 0
inflation_confidence_difference    0
dtype: int64

In [23]:
dates = df_macroeconomic_book_missing['date']
numerical_columns = ['selic', 'confidence', 'pib', 'incc', 'ipca', 'dolar', 'monthly_inflation', 'gdp_growth', 'dollar_growth', 'real_interest_rate', 'inflation_confidence_difference']

In [24]:
imputer = KNNImputer(n_neighbors=5)
imputed_values = imputer.fit_transform(df_macroeconomic_book_missing[numerical_columns])
df_macroeconomic_book_missing = pd.DataFrame(imputed_values, columns=numerical_columns)
df_macroeconomic_book_missing['date'] = dates
df_macroeconomic_book_missing = df_macroeconomic_book_missing[['date','selic', 'confidence', 'pib', 'incc', 'ipca', 'dolar', 'monthly_inflation', 'gdp_growth', 'dollar_growth', 'real_interest_rate', 'inflation_confidence_difference']]

* In the code above, missing data in the macroeconomic dataset stored in the "df_macroeconomic_book_missing" DataFrame was addressed. The process involved the use of the KNNImputer, an imputation method based on the k-nearest neighbors algorithm. With a parameter of n_neighbors set to 5, it considered the five nearest neighbors for each missing value in the numerical columns. The resulting imputed values are stored in the "df_macroeconomic_book_missing" DataFrame. Additionally, a 'date' column was appended to facilitate time-based analysis, ensuring that imputed values are associated with their respective dates.

In [25]:
df_macroeconomic_book_missing.isna().sum()

date                               0
selic                              0
confidence                         0
pib                                0
incc                               0
ipca                               0
dolar                              0
monthly_inflation                  0
gdp_growth                         0
dollar_growth                      0
real_interest_rate                 0
inflation_confidence_difference    0
dtype: int64

#### Save Files

In [26]:
df_macroeconomic_book = df_macroeconomic_book_missing.copy()

In [27]:
Path(file_path_book).mkdir(parents=True, exist_ok=True)
df_macroeconomic_book.to_csv(file_path_book + "/macroeconomic_book.csv", index=False)

### Stocks Book

In [28]:
df_stocks_cleaned = pd.read_csv(file_path_cleaned + "/stocks_cleaned.csv")
df_stocks_cleaned.head()

Unnamed: 0,date,ticker,adj_close,close,high,low,open,volume
0,2019-01-02,AALR3.SA,13.116831,13.25,13.5,13.25,13.31,264200.0
1,2019-01-02,ABCB4.SA,13.077144,17.120001,17.200001,16.35,16.469999,571700.0
2,2019-01-02,ABEV3.SA,13.950425,16.15,16.299999,15.4,15.4,18692900.0
3,2019-01-02,ADHM3.SA,1.243981,1.243981,1.243981,1.235687,1.235687,2170.0
4,2019-01-02,AFLT3.SA,4.572968,5.4,5.48,5.4,5.48,500.0


#### Feature Creation

In [29]:
df_stocks_book_cols = df_stocks_cleaned.copy()

In [30]:
df_stocks_book_cols['daily_return'] = df_stocks_book_cols['adj_close'].pct_change()
df_stocks_book_cols['short_term_moving_average'] = df_stocks_book_cols['adj_close'].rolling(window=10).mean()
df_stocks_book_cols['moving_average_mean'] = df_stocks_book_cols['adj_close'].rolling(window=50).mean()
df_stocks_book_cols['long_term_moving_average'] = df_stocks_book_cols['adj_close'].rolling(window=200).mean()

window = 14
delta = df_stocks_book_cols['adj_close'].diff()
gain = delta.where(delta > 0, 0)
loss = -delta.where(delta < 0, 0)
average_gain = gain.rolling(window=window).mean()
average_loss = loss.rolling(window=window).mean()
rs = average_gain / average_loss
df_stocks_book_cols['rsi'] = 100 - (100 / (1 + rs))

df_stocks_book_cols['volatility'] = df_stocks_book_cols['adj_close'].pct_change().rolling(window=20).std()
df_stocks_book_cols['average_volume'] = df_stocks_book_cols['volume'].rolling(window=20).mean()
df_stocks_book_cols['daily_range'] = df_stocks_book_cols['high'] - df_stocks_book_cols['low']
df_stocks_book_cols['open_close_difference'] = df_stocks_book_cols['open'] - df_stocks_book_cols['adj_close']
df_stocks_book_cols['high_low_difference'] = df_stocks_book_cols['high'] - df_stocks_book_cols['low']
df_stocks_book_cols['acceleration'] = df_stocks_book_cols['adj_close'].pct_change().rolling(window=5).mean()
df_stocks_book_cols['hammer'] = (df_stocks_book_cols['open'] > df_stocks_book_cols['adj_close']) & (df_stocks_book_cols['low'] > df_stocks_book_cols['adj_close'] - 0.01)
df_stocks_book_cols['cagr'] = ((df_stocks_book_cols['adj_close'] / df_stocks_book_cols['adj_close'].shift(1)) - 1).mean()


In this section, we explore crucial financial metrics in our dataset:

- **daily_return**: Tracks the daily percentage change in asset prices, revealing daily performance dynamics.

- **short_term_moving_average**: Computes the average asset price over a brief time frame, aiding short-term trend analysis.

- **moving_average_mean**: Highlights the mean price over a specified time window, offering insights into overall price trends.

- **long_term_moving_average**: Reflects the average asset price over an extended duration, facilitating long-term trend assessment.

- **rsi (Relative Strength Index)**: Measures price momentum, aiding in identifying overbought or oversold conditions.

- **volatility**: Quantifies price variability, assisting in risk evaluation and market analysis.

- **average_volume**: Indicates average trading volume, valuable for assessing asset liquidity and investor interest.

- **daily_range**: Measures the price range between daily highs and lows, a critical factor for risk evaluation.

- **open_close_difference**: Calculates the difference between opening and closing prices, revealing intraday price movements.

- **high_low_difference**: Quantifies the spread between daily high and low prices, aiding in identifying price extremes.

- **acceleration**: Analyzes the rate of price change, providing insights into price movement speed.

- **hammer**: Identifies hammer candlestick patterns for potential trend reversals in technical analysis.

- **cagr (Compound Annual Growth Rate)**: Computes the annual growth rate of an investment over a specified period, valuable for assessing long-term performance.


In [31]:
df_stocks_book_cols.head(2)

Unnamed: 0,date,ticker,adj_close,close,high,low,open,volume,daily_return,short_term_moving_average,...,long_term_moving_average,rsi,volatility,average_volume,daily_range,open_close_difference,high_low_difference,acceleration,hammer,cagr
0,2019-01-02,AALR3.SA,13.116831,13.25,13.5,13.25,13.31,264200.0,,,...,,,,,0.25,0.19317,0.25,,True,2.691226
1,2019-01-02,ABCB4.SA,13.077144,17.120001,17.200001,16.35,16.469999,571700.0,-0.003026,,...,,,,,0.85,3.392856,0.85,,True,2.691226


#### Cleaning Missing Values

In [32]:
df_stocks_book_missing = df_stocks_book_cols.copy()

In [33]:
df_stocks_book_missing.isna().sum()

date                           0
ticker                         0
adj_close                      0
close                          0
high                           0
low                            0
open                           0
volume                         0
daily_return                   1
short_term_moving_average      9
moving_average_mean           49
long_term_moving_average     199
rsi                           13
volatility                    20
average_volume                19
daily_range                    0
open_close_difference          0
high_low_difference            0
acceleration                   5
hammer                         0
cagr                           0
dtype: int64

In [34]:
numeral_columns = df_stocks_book_missing.select_dtypes(include=['number'])
numeral_values = df_stocks_book_missing[numeral_columns.columns]

imputer = KNNImputer(n_neighbors=5)
imputed_values = imputer.fit_transform(numeral_values)
numeral_values = pd.DataFrame(imputed_values, columns=numeral_values.columns)

categorias_columns = df_stocks_book_missing.select_dtypes(exclude=['number'])
categorias_values = df_stocks_book_missing[categorias_columns.columns]

df_stocks_book_missing = pd.concat([numeral_values, categorias_values], axis=1)
df_stocks_book_missing.head(2)

Unnamed: 0,adj_close,close,high,low,open,volume,daily_return,short_term_moving_average,moving_average_mean,long_term_moving_average,...,volatility,average_volume,daily_range,open_close_difference,high_low_difference,acceleration,cagr,date,ticker,hammer
0,13.116831,13.25,13.5,13.25,13.31,264200.0,-0.24484,24.336316,26.882742,40.073211,...,2.608253,1350674.45,0.25,0.19317,0.25,1.183744,2.691226,2019-01-02,AALR3.SA,True
1,13.077144,17.120001,17.200001,16.35,16.469999,571700.0,-0.003026,15.501511,16.783712,28.646179,...,4.124473,1511679.6,0.85,3.392856,0.85,2.16821,2.691226,2019-01-02,ABCB4.SA,True


In [35]:
df_stocks_book_missing.isna().sum()

adj_close                    0
close                        0
high                         0
low                          0
open                         0
volume                       0
daily_return                 0
short_term_moving_average    0
moving_average_mean          0
long_term_moving_average     0
rsi                          0
volatility                   0
average_volume               0
daily_range                  0
open_close_difference        0
high_low_difference          0
acceleration                 0
cagr                         0
date                         0
ticker                       0
hammer                       0
dtype: int64

* In the code above, missing data in the macroeconomic dataset stored in the "df_macroeconomic_book_missing" DataFrame was addressed. The process involved the use of the KNNImputer, an imputation method based on the k-nearest neighbors algorithm. With a parameter of n_neighbors set to 5, it considered the five nearest neighbors for each missing value in the numerical columns. The resulting imputed values are stored in the "df_stocks_book_missing" DataFrame. Additionally, a 'date' column was appended to facilitate time-based analysis, ensuring that imputed values are associated with their respective dates.

In [36]:
df_stocks_book = df_stocks_book_missing.copy()
df_stocks_book = df_stocks_book[['ticker','adj_close', 'close', 'high', 'low', 'open', 'volume', 'daily_return', 'short_term_moving_average', 'moving_average_mean', 'long_term_moving_average', 'rsi', 'volatility', 'average_volume', 'daily_range', 'open_close_difference', 'high_low_difference', 'acceleration', 'cagr', 'date', 'hammer']]

##### Save Files

In [37]:
Path(file_path_book).mkdir(parents=True, exist_ok=True)
df_stocks_book.to_csv(file_path_book + "/stocks_book.csv", index=False)