# Yahoo Finance Dataset Analysis

In [None]:
# !pip install dtw-python
# !pip install yfinance
# !pip install pandas_datareader
# !pip install yahoo-fin

In [None]:
# Here we have useful import
from sklearn.preprocessing import StandardScaler
from dtw import *
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib as mpl
from matplotlib import pyplot as plt
mpl.style.use('seaborn')
# plt.rc("figure", figsize=(10,10))  # size of the figure
# to quickly get access to a list of the tickers in different indices
import yfinance as yf
from yahoo_fin import stock_info as si
import warnings
from pandas_datareader import data as pdr
# Settings
pd.set_option('display.max_rows', None)
warnings.filterwarnings("ignore")
yf.pdr_override()

## Set constant variable

In [None]:
DIR_DATA = "../dataset/yahoo_finance_dataset/AAPL.csv"
NORMALIZE = True

# Set up the start/end dates for the prices
num_of_years = 10
start = dt.date.today() - dt.timedelta(days = int(365.25*num_of_years))
end = dt.date.today()

# Get the list of tickers in the Dow Jones index (others: tickers_dow, 
# tickers_ftse100, tickers_ftse250, tickers_ibovespa, tickers_nasdaq, 
# tickers_nifty50, tickers_niftybank, tickers_other, tickers_sp500)
tickers = si.tickers_dow()

## Dataset

We decide to download the yahoo finance dataset of Apple, more precisily the one that contains daily update in order to analyze the differences of each day.

In [None]:
df = pd.read_csv(DIR_DATA)
df

In the dataset there are no missing values, since we check it using the function isna().sum(), so we don't need to do some preprocessing operations.

## Volatility

Sources:
- [Calculate the Volatility of Historic Stock Prices with Pandas and Python](https://www.learnpythonwithrune.org/calculate-the-volatility-of-historic-stock-prices-with-pandas-and-python/)
- [Volatility](https://corporatefinanceinstitute.com/resources/capital-markets/volatility-vol/)

Volatility is a measure of the rate of fluctuations in the price of a security over time. It indicates the level of risk associated with the price changes of a security. Investors and traders calculate the volatility of a security to assess past variations in the prices to predict their future movements.

We create a new column called 'Log returns' with the daily log return of the Close price.
We use log returns instead of daily simple return, since the log returns have the advantage that you can add them together, while this is not the case for simple returns. Therefore the log returns are used in most financial analysis.

In [None]:
# Compute the daily log return
df['Log_returns'] = np.log(df['Close']/df['Close'].shift())

We need **standard deviation** for the volatility of the stock. In order to do this we use the daily return computed before.

In [None]:
df['Log_returns'].std()

There is a NaN value in the column so we're going to handle it.

In [None]:
df['Log_returns'] = df['Log_returns'].fillna(method="bfill")
df['Log_returns']

The above gives the daily **standard deviation**. The volatility is defined as the annualized standard deviation. Using the above formula we can calculate it as follows.

In [None]:
volatility = df['Log_returns'].std()*252**.5 # we have 252 trading days per year

In [None]:
# Plot volatility
str_vol = str(round(volatility, 4)*100)

fig, ax = plt.subplots()
df['Log_returns'].hist(ax=ax, bins=50, alpha=0.6, color='b')
ax.set_xlabel("Log return")
ax.set_ylabel("Freq of log return")
ax.set_title("AAPL volatility: " + str_vol + "%")

## DTW

In [None]:
def compute_dtw(data, col1="volume", col2="log_returns"):
    x = data[col1]
    y = data[col2]

    if NORMALIZE:
        scaler = StandardScaler()
        reshape = x.values.reshape(-1, 1)
        x = scaler.fit_transform(reshape)
        reshape = y.values.reshape(-1, 1)
        y = scaler.fit_transform(reshape)
    
    # DTW Parameters: see https://dynamictimewarping.github.io/py-api/html/api/dtw.dtw.html#dtw.dtw
    
    plt.rc("figure", figsize=(10,10))  # size of the figure
    dtw(x, y, keep_internals=True).plot(type="threeway", xlab=col1, ylab=col2)

    plt.rc("figure", figsize=(30,8))  # size of the figure
    dtw(x, y, keep_internals=True, 
        step_pattern=rabinerJuangStepPattern(ptype=4, slope_weighting="d"))\
        .plot(type="twoway", offset=-9).legend((col1, col2), loc="upper left")

### Analysis

In [None]:
compute_dtw(df, "Log_returns", "Volume")