# Data Preprocessing

## The Data

The data used was pulled from Yahoo Finance. The focus of the data was on three ETFs: ARKK, SPY, and FNGU. The timeframe of the data is from January of 2018 to March of 2022. To get a clear review and analysis of the data we used the Adjusted Close Price.

### Imports

In [None]:
# Import modules

# Import Yahoo Finance API
import yfinance as yf
# Import Pandas
import pandas as pd
# Import Path Class
from pathlib import Path
# Import custom plotting class
from plotting import Plotter

In [None]:
# Set global variable for tickers
tickers = ['ARKK','SPY','FNGU']

### Extract, Transform and Load

In [None]:
## Extract, transform and load etf data
etfs = pd.DataFrame({i: yf.download(i, '2018-01-26','2022-03-31')['Adj Close'].apply(lambda x: round(x,2)).rename(i) for i in tickers})

## Data Exploration

In [None]:
# Lambda method to display head and tail of DataFrame
display_head_tail = lambda df: display(df.head(),df.tail())
# Instantiating instance of Plotter class
plotter = Plotter('Analysis')

### ETFs

In [None]:
# Displaying head/tail of ETFs
display_head_tail(etfs)

In [None]:
# Plotting ETF Adjusted Closing Prices
plotter.line(etfs, '')

In [None]:
# Plotting heatmap for ETFs
plotter.heatmap(etfs,'')

In [None]:
# Creating DataFrame for each ETF
arkk,spy,fngu = [etfs[i].to_frame(i) for i in tickers]

### ARKK

In [None]:
# Displaying head/tail of ARKK ETF
display_head_tail(arkk)

In [None]:
# Plotting ARKK ETF Adjusted Closing Prices
plotter.line(arkk, tickers[0])

### SPY

In [None]:
# Displaying head/tail of SPY ETF
display_head_tail(spy)

In [None]:
# Plotting SPY ETF Adjusted Closing Prices
plotter.line(spy, tickers[1])

### FNGU

In [None]:
# Displaying head/tail of FNGU ETF
display_head_tail(fngu)

In [None]:
# Plotting FNGU ETF Adjusted Closing Prices
plotter.line(fngu, tickers[2])

## Saving

In [None]:
# Saving ETF data as CSV
etfs.to_csv(Path("./Resources/Data/etf_data.csv"))