
<p><img align="left" src="https://upload.wikimedia.org/wikipedia/commons/c/c3/Python-logo-notext.svg" style="vertical-align: top; padding-top: 2px" width="08%"/>
<br/><br/><br/><br/><br/><br/></p>
<h1><font color="#306998"><left>Introduction to Financial Time Series</left></font></h1><hr/>
<p><strong>Getting Started with Python for Finance</strong></p>
<p><br/></p>
<p><strong>Kannan Singaravelu</strong><br/>
<a href="http://twitter.com/kannansi">@kannansi</a> | <a href="https://github.com/kannansingaravelu">https://github.com/kannansingaravelu</a> <br/></p>
<p><font size="4">February, 2022</font></p>



<h1 id="Financial-Data-Preprocessing">Financial Data Preprocessing<a class="anchor-link" href="#Financial-Data-Preprocessing">¶</a></h1><p>A time series is a series of data points indexed in time order. Financial Data such as equity, commodity, and forex price series observed at equally spaced points in time are an example of such a series. It is a sequence of data points observed at regular time intervals and depending on the frequency of observations, a time series may typically be in ticks, seconds, minutes, hourly, daily, weekly, monthly, quarterly and annual.</p>
<p>The first step towards any data analysis would be to parse the raw data that involves extracting the data from the source and then cleaning and filling the missing data if any. While data comes in many forms, Python makes it easy to read time-series data using useful packages.</p>
<p>In this session, we will retrieve and store both end-of-day and intraday data using some of the popular python packages. These libraries aim to keep the API simple and make it easier to access historical data. Further, we will see how to read data from traditional data sources stored locally.</p>



<h2 id="Load-Libraries">Load Libraries<a class="anchor-link" href="#Load-Libraries">¶</a></h2><p>We’ll now import the required libraries that we’ll use in this example.</p>
<p>Refer <code>installation.txt</code> and <code>requirements.txt</code> for package installation.</p>


In [None]:

# Import data manipulation libraries
import pandas as pd
import numpy as np

# Import yahoo finance library
import yfinance as yf

# Import cufflinks for visualization
import cufflinks as cf
cf.set_config_file(offline=True)

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')



In [None]:

# Check the package version
pd.__version__, cf.__version__, yf.__version__




<h2 id="Docstring-or-Signature">Docstring or Signature<a class="anchor-link" href="#Docstring-or-Signature">¶</a></h2><p>Getting information on function attributes and outputs.</p>


In [None]:

# help(yf.download)
# yf.download?




<h2 id="Data-Retrieval">Data Retrieval<a class="anchor-link" href="#Data-Retrieval">¶</a></h2><p>Retrieving EOD, Intraday, Options data</p>



<h3 id="Retrieving-end-of-day-data-for-single-security">Retrieving end-of-day data for single security<a class="anchor-link" href="#Retrieving-end-of-day-data-for-single-security">¶</a></h3><p>We'll retrieve historical data from yahoo finance using <code>yfinance</code> library</p>



<p><strong>Example 1</strong></p>


In [None]:

# Fetch the data by specifying the number of period
df1 = yf.download('SPY', period='5d', progress=False)

# Display the first five rows of the dataframe to check the results. 
df1.head()




<p><strong>Example 2</strong></p>


In [None]:

# Fetch data by specifying the the start and end dates
df2 = yf.download('SPY', start='2021-01-01', end='2021-12-31', progress=False)

# Display the first five rows of the dataframe to check the results. 
df2.head()




<p><strong>Example 3</strong></p>


In [None]:

# Fetch data for year to date (YTD)
df3 = yf.download('SPY', period='ytd', progress=False)

# Display the last five rows of the dataframe to check the results. 
df3.tail()




<h3 id="Retrieving-data-for-multiple-securities">Retrieving data for multiple securities<a class="anchor-link" href="#Retrieving-data-for-multiple-securities">¶</a></h3><p>We'll retrieve historical price data of five Nasdaq-listed stocks from yahoo finance.</p>



<p><strong>Example 4</strong></p>


In [None]:

# Specify stocks
nasdaq_stocks = ['INTC', 'INTU', 'NVDA', 'ZM', 'ZS']



In [None]:

nasdaq_stocks



In [None]:

# Fetch data for multiple stocks at once
df4 = yf.download(nasdaq_stocks, period='ytd', progress=False)['Adj Close']

# Display dataframe
df4.tail()




<h3 id="Retrieving-multiple-fields-for-multiple-securities">Retrieving multiple fields for multiple securities<a class="anchor-link" href="#Retrieving-multiple-fields-for-multiple-securities">¶</a></h3><p>We'll now retrieve multiple fields from yahoo finance.</p>



<p><strong>Example 5</strong></p>


In [None]:

# Fetch data for multiple fields using comprehension
ohlcv= {symbol: yf.download(symbol, period='250d', progress=False) 
             for symbol in nasdaq_stocks}



In [None]:

ohlcv



In [None]:

# Display NVDA stock data
ohlcv['NVDA']



In [None]:

# Display NVDA adjusted close data
ohlcv['NVDA']['Adj Close']




<h3 id="Retrieving-intraday-data">Retrieving intraday data<a class="anchor-link" href="#Retrieving-intraday-data">¶</a></h3><p>We'll now retrieve intraday data from yahoo finance.</p>



<p><strong>Example 6</strong></p>


In [None]:

# Retrieve intraday data for last five days
df6 = yf.download(tickers='SPY', period='5d', interval='1m', progress=False)

# Display last five rows of the dataframe
df6.tail()




<h3 id="Retrieving-option-chain">Retrieving option chain<a class="anchor-link" href="#Retrieving-option-chain">¶</a></h3><p>We'll now retrieve option chain for SPY for March 2022 expiration from yahoo finance and filter the output to display the first seven columns.</p>



<p><strong>Example 7</strong></p>


In [None]:

# Get SPY option chain
spy = yf.Ticker('SPY')
options = spy.option_chain('2022-03-31')



In [None]:

options



In [None]:

# Filter calls for strike above 400
df = options.calls[options.calls['strike']>440]
df.reset_index(drop=True, inplace=True)

# Check the filtered output
df.iloc[:,:7].head()




<h3 id="Retrieving-Hypertext-Markup-Language-(HTML)">Retrieving Hypertext Markup Language (HTML)<a class="anchor-link" href="#Retrieving-Hypertext-Markup-Language-(HTML)">¶</a></h3><p>We'll now retrieve the data from HTML and do some data manipulation.</p>



<h4 id="Retrieving-Nasdaq-100-Stocklist">Retrieving Nasdaq-100 Stocklist<a class="anchor-link" href="#Retrieving-Nasdaq-100-Stocklist">¶</a></h4>


In [None]:

# read data from wikipedia
nasdaq100 = pd.read_html('https://en.wikipedia.org/wiki/Nasdaq-100')



In [None]:

nasdaq100



In [None]:

# filter table for tickers
stocklist = list(nasdaq100[3]['Ticker'])
stocklist[:10]




<h2 id="Data-Storage">Data Storage<a class="anchor-link" href="#Data-Storage">¶</a></h2><p>Storing retrieved files locally</p>



<h3 id="Storing-OHLCV-data-in-Excel-File">Storing OHLCV data in Excel File<a class="anchor-link" href="#Storing-OHLCV-data-in-Excel-File">¶</a></h3>


In [None]:

# Dataframe to Excel
from pandas import ExcelWriter



In [None]:

# Storing the fetched data in a separate sheet for each security
writer = ExcelWriter('data/stocks.xlsx')
[pd.DataFrame(ohlcv[symbol]).to_excel(writer,symbol) for symbol in nasdaq_stocks]
writer.save() # save file




<h3 id="Storing-OHLCV-data-in-a-CSV--File">Storing OHLCV data in a CSV  File<a class="anchor-link" href="#Storing-OHLCV-data-in-a-CSV--File">¶</a></h3>


In [None]:

# Save ohlcv data for each securities in stockname.csv format
[pd.DataFrame(ohlcv[symbol]).to_csv('data/'+symbol+'.csv') for symbol in nasdaq_stocks]
print('*** data saved ***')




<h2 id="Data-Loading">Data Loading<a class="anchor-link" href="#Data-Loading">¶</a></h2><p>Loading locally stored data</p>



<h3 id="Reading-Microsfot-Excel-File">Reading Microsfot Excel File<a class="anchor-link" href="#Reading-Microsfot-Excel-File">¶</a></h3><p>We'll now read the Excel file stored locally using Pandas</p>


In [None]:

# Reading the fetched data in a spreadsheet
zoom = pd.read_excel('data/stocks.xlsx', sheet_name='ZM',index_col=0, parse_dates=True)

# Display the last five rows of the data frame to check the results
zoom.tail()




<h3 id="Reading-CSV-File">Reading CSV File<a class="anchor-link" href="#Reading-CSV-File">¶</a></h3><p>We'll now read the csv file stored locally using Pandas</p>


In [None]:

# Read CSV file  
aapl = pd.read_csv('data/ZS.csv', index_col=0, parse_dates=True, dayfirst=False) 

# Display the last five rows of the data frame to check the results
aapl.tail()




<h2 id="Interactive-Visualization-of-Time-Series">Interactive Visualization of Time Series<a class="anchor-link" href="#Interactive-Visualization-of-Time-Series">¶</a></h2><p>We use <code>cufflinks</code> for interactive visualization. It is one of the most feature rich third-party wrapper around Plotly by Santos Jorge. It binds the power of <code>plotly</code> with the flexibility of <code>pandas</code> for easy plotting.</p>
<p>When you import cufflinks library, all pandas data frames and series objects have a new method <code>.iplot()</code> attached to them which is similar to pandas' built-in <code>.plot()</code> method.</p>



<h3 id="Plotting-Line-Chart">Plotting Line Chart<a class="anchor-link" href="#Plotting-Line-Chart">¶</a></h3><p>Next, we'll plot the time series data in the line format.</p>


In [None]:

df3['Close'][-30:].iplot(kind='line',title='SPY Price')




<h3 id="Plotting-OHLC-Data">Plotting OHLC Data<a class="anchor-link" href="#Plotting-OHLC-Data">¶</a></h3><p>Next, we'll plot the time series data in ohlc format.</p>


In [None]:

df3[-30:].iplot(kind='ohlc',title='SPY Price')




<h3 id="Plotting-Candlestick">Plotting Candlestick<a class="anchor-link" href="#Plotting-Candlestick">¶</a></h3><p>Next, we'll plot an interactive candlestick chart.</p>


In [None]:

df3[-30:].iplot(kind='candle', title='SPY Price')




<h3 id="Plotting-Selected-Stocks">Plotting Selected Stocks<a class="anchor-link" href="#Plotting-Selected-Stocks">¶</a></h3><p>Next, we'll compare the Zoom &amp; ZScaler data that we fetched from Yahoo Finance.</p>


In [None]:

# Use secondary axis
df4[['ZM', 'ZS']].iplot(title='Zoom Vs Zscaler', secondary_y='ZS')




<h3 id="Plotting-using-Subplots">Plotting using Subplots<a class="anchor-link" href="#Plotting-using-Subplots">¶</a></h3>


In [None]:

# Use subplots
df4[['ZM', 'ZS']].iplot(title='Zoom Vs Zscaler Price Movement', subplots=True)




<h3 id="Normalized-Plot">Normalized Plot<a class="anchor-link" href="#Normalized-Plot">¶</a></h3>


In [None]:

df4.normalize().iplot(title='Our Nasdaq-listed Stocks')




<h3 id="Visualising-Return-Series">Visualising Return Series<a class="anchor-link" href="#Visualising-Return-Series">¶</a></h3><p>We'll now plot historical daily log normal return series using just one line of code.</p>


In [None]:

# Calculating Log Normal Returns

# Use numpy log function to derive log normal returns
daily_returns = np.log(df4).diff().dropna()

# Display the last five rows of the data frame to check the output
daily_returns.head(5)




<h4 id="Plotting-Daily-Returns">Plotting Daily Returns<a class="anchor-link" href="#Plotting-Daily-Returns">¶</a></h4>


In [None]:

# Plot Returns
daily_returns[['ZM','ZS']].iplot(title='Daily Log Returns')




<h4 id="Plotting-Annual-Returns">Plotting Annual Returns<a class="anchor-link" href="#Plotting-Annual-Returns">¶</a></h4>


In [None]:

# Plot Mean Annual Returns
(daily_returns.mean()*252).iplot(kind='bar')




<h4 id="Plotting-Rolling-Returns">Plotting Rolling Returns<a class="anchor-link" href="#Plotting-Rolling-Returns">¶</a></h4>


In [None]:

# To calculate 5 days rolling returns, simply sum daily returns for 5 days as log returns are additive
rolling_return = daily_returns.rolling(5).sum().dropna()

# Display the last five rows of the data frame to check the output
rolling_return.head(5)




<h4 id="Visualising-a-Rolling-Return-Series">Visualising a Rolling Return Series<a class="anchor-link" href="#Visualising-a-Rolling-Return-Series">¶</a></h4><p>We'll now plot 22-day rolling returns of NVDA using just one line of code.</p>


In [None]:

# Plot Rolling Returns
rolling_return['NVDA'].iplot(title='5-Days Rolling Returns of Nvidia')




<h2 id="Time-Series-Statistics">Time Series Statistics<a name="top"></a><a class="anchor-link" href="#Time-Series-Statistics">¶</a></h2><p>Statistics is a branch of mathematics that deals with collecting, interpreting, organization and interpretation of data. The two main categories of statistics are descriptive statistics and inferential statistics.</p>
<p>Descriptive statistics help us to understand the data in a meaningful way and is an important part of data analysis. While inferential statistics allows us to infer trends and derive conclusion from it.</p>


In [None]:

# Analysing the daily returns data
daily_returns.describe().T




<h3 id="Log-Normal-Distribution">Log Normal Distribution<a class="anchor-link" href="#Log-Normal-Distribution">¶</a></h3><p>A normal distribution is the most common and widely used distribution in statistics. It is popularly referred as a “bell curve” or “Gaussian curve”. Financial time series though random in short term, follows a log normal distribution on a longer time frame.</p>
<p>Now that we have derived the daily log returns, we will plot this return distribution and check whether the stock returns follows log normality.</p>


In [None]:

# Plot log normal distribution of returns
daily_returns.iplot(kind='histogram', title = 'Histogram of Daily Returns', subplots=True)




<h3 id="Correlation">Correlation<a class="anchor-link" href="#Correlation">¶</a></h3><p>Correlation defines the similarity between two random variables. As an example we will check correlation between our Nasdaq listed stocks.</p>


In [None]:

# Plot correlation of returns
daily_returns.corr().iplot(kind='heatmap', title="Correlation Matrix", colorscale="Blues")




<h3 id="Pairwise-Correlation">Pairwise Correlation<a class="anchor-link" href="#Pairwise-Correlation">¶</a></h3>


In [None]:

# Compute pairwise correlation
daily_returns.corrwith(daily_returns['INTC'])




<h2 id="Data-Resampling">Data Resampling<a class="anchor-link" href="#Data-Resampling">¶</a></h2><p>Next, we'll manipulate the data retrieved by resampling the frequency of time series. This is very critical if you work on financial data or time series.</p>



<h3 id="Weekly-Resampling">Weekly Resampling<a class="anchor-link" href="#Weekly-Resampling">¶</a></h3>


In [None]:

# Resampling to derive weekly values from daily time series
df_weekly = df4[['INTU']].resample('W').last()

# Display the last five rows of the data frame to check the output
df_weekly.tail(5)




<h3 id="Resample-on-a-Specific-Day-of-a-Week">Resample on a Specific Day of a Week<a class="anchor-link" href="#Resample-on-a-Specific-Day-of-a-Week">¶</a></h3>


In [None]:

# Resampling to a specific day of the week: Thursday
df_weekly_thu = df4[['INTU']].resample('W-THU').ffill()

# Display the last five rows of the data frame to check the output
df_weekly_thu.tail()




<h3 id="Monthly-Resampling-of-Data">Monthly Resampling of Data<a class="anchor-link" href="#Monthly-Resampling-of-Data">¶</a></h3>


In [None]:

# Resampling to derive monthly values from daily time series
df_monthly = df4[['INTU']].resample('M').last()

# Display the last five rows of the data frame to check the output
df_monthly.tail()




<h2 id="Box-Plot-Analysis">Box Plot Analysis<a class="anchor-link" href="#Box-Plot-Analysis">¶</a></h2><p>Next, we'll manipulate the SPX data retrieved by resetting the index, and dropping the non required values to create a new data frame in the format that is required.</p>


In [None]:

# Load the CSV file
spx = pd.read_excel('data/SP500.xlsx', index_col=0, parse_dates=True)['2011':'2020']
spx['Change'] = 100.*np.log(spx['Adj Close']).diff().fillna(0)

# Output first five values
spx



In [None]:

# Create a copy of spx dataframe & reset index
spx_copy = spx.copy()
spx_copy.reset_index(inplace=True)

# Assign separate columns for month & year 
spx_copy['Year'] = spx_copy['Date'].dt.year
spx_copy['Month'] = spx_copy['Date'].dt.month



In [None]:

# Assign a new dataframe with pivoted values
newdf = pd.pivot_table(spx_copy, 
               index='Month', 
               columns='Year', 
               values='Change',
               aggfunc=np.sum)

newdf



In [None]:

# Analysing year wise statistics for SPX returns
newdf.describe()




<h3 id="Box-Plot">Box Plot<a class="anchor-link" href="#Box-Plot">¶</a></h3><p>In descriptive statistics, a box plot is a method for graphically depicting groups of numerical data through their quartiles. The spacing between the different parts of the box indicates the degree of dispersion (spread) and skewness in the data and show outliers. Let’s now analyze the CBOE Volatility Index using a box plot.</p>


In [None]:

# Visualize SPX Box Plot
newdf.iplot(kind='box', 
            title='SPX Return Analysis', 
            yTitle='Returns (%)', 
            legend=False, boxpoints='outliers')




<p>We'll now use <code>idxmax</code> and <code>idxmin</code> to select the index value that corresponds to the maximum or minimum SPX percentage change for each year.</p>


In [None]:

# Grouping dataframe to get the max and min values
max_min = {'Change': ['idxmax', 'idxmin']}
spx_copy.groupby(['Year']).agg(max_min).T



In [None]:

# Check the results for year 2020
spx_copy.loc[[2314, 2320]]



In [None]:

# Check the year wise maximum values
spx_copy.loc[spx_copy.groupby('Year')['Change'].idxmax()]



In [None]:

# Check the year wise minimum values
spx_copy.loc[spx_copy.groupby('Year')['Change'].idxmin()]




<h2 id="References">References<a class="anchor-link" href="#References">¶</a></h2><ul>
<li><a href="https://github.com/kannansingaravelu/PythonResources">Python Resources</a></li>
<li><a href="https://pandas.pydata.org/">Pandas Documentation</a></li>
<li><a href="https://docs.scipy.org/doc/numpy/">Numpy Documentation</a></li>
<li><a href="https://github.com/ranaroussi/yfinance">YFinance Documentation</a></li>
<li><a href="https://github.com/santosjorge/cufflinks">Cufflinks Documentation</a> </li>
</ul>



<p><strong>Kannan Singaravelu</strong><br/>
<a href="http://twitter.com/kannansi">@kannansi</a> | <a href="https://github.com/kannansingaravelu">https://github.com/kannansingaravelu</a><br/></p>
<p><font size="4">February, 2022</font></p>
