# Fundamental Stock Data
In this homework we will guide you through how to download free historical fundamentals for stocks. This data can be used to construct value and other fundamental strategies. There will be no "solution" for this homework as it is more of a guide.

### Packages
We will be using the <a href='https://pypi.org/project/simfin/' target="_blank" >simfin</a> and  <a href='https://pypi.org/project/yfinance/' target="_blank" >yfinance</a> packages. Install them using "pip install simfin" and "pip install yfinance" if you haven't already and import them as below.

In [7]:
import yfinance as yf 
import simfin as sf

ModuleNotFoundError: No module named 'simfin'

### Get Your SimFin API Key

First you must obtain an API key. Make an account at https://simfin.com. Confirm your account via email and then head to https://simfin.com/data/api to obtain your api key.

In [None]:
# set your api key here
api_key = 'G1VF92qykCQUFf4Nw5ZIpdcH4BFXtJ5V'
sf.config.set_api_key(api_key=api_key)

### Set Your SimFin Data Directory

Simfin requires you to set a data directory where simfin data will be downloaded. Downloaded data is used for faster retrieval in the future. The default location is a folder named simfin_data in the home directory.

In [None]:
# set simfin_data
sf.set_data_dir('~/simfin_data/')

### Download historical financial data from simfin.
The three financial statements containing fundamental data are the quarterly income, balance sheet and cash flows statements. Data from these statements can be loaded for all us tickers from simfin as below.

In [None]:
income = sf.load_income(variant='quarterly', market='us')

balance_sheet = sf.load_balance(variant='quarterly', market='us')

cash_flow = sf.load_cashflow(variant='quarterly', market='us')

Below we observe the shape of the data.

All of the results are multi-index dataframes where the row is (ticker,date) and the column is a financial statement item. 

One of the important columns for backtesting is the Publish Date. This is the date the information was available to the public. 

For backtesting, we can assume the simfin data is available 1 business day after the Publish Date, since companies sometimes publish after market close or because we may not always be able to get the data immediately on publication for trading, depending on which data vendor you use. 

Because there are restatements to financial statements, even this will not be a fully "point-in-time" backtest where we only use information available at the time. It should be a reasonable approximation in many cases, however.

In [None]:
cash_flow.head()

Let's observe which columns are available to us below from each financial statement.

In [None]:
# observe available income columns
income.columns

In [None]:
# observe available balance_sheet columns
balance_sheet.columns

In [None]:
# observe available cash_flow columns
cash_flow.columns

Below we observe the number of tickers available for each statement and the start and end dates.

In [None]:
def describe_data(data, data_name):
    size = len(set(data.index.get_level_values(0)))
    start_dt = data.index.get_level_values(1).min().strftime('%Y%m%d')
    end_dt = data.index.get_level_values(1).max().strftime('%Y%m%d')
    print (f'{data_name}: {size} tickers. Date range: {start_dt} to {end_dt}')

describe_data(income ,'Income Data')
describe_data(balance_sheet ,'Balance Sheet Data')
describe_data(cash_flow ,'Cash Flow Data')

### Yahoo Finance Data
If you notice above, the simfin data ends a year ago. They only have 1 year lagged data. To get the most recent financial statement information, you can use yahoo finance.

First you must obtain a yfinance.Ticker object for your desired ticker.

In [None]:
yf_ticker = yf.Ticker('AAPL')

Download information from the 3 financial statements as below.

In [None]:
# Get the quarterly cash flow statements from yfinance

income_yf = yf_ticker.quarterly_financials

cash_flow_yf = yf_ticker.quarterly_cashflow

balance_sheet = yf_ticker.quarterly_balance_sheet

In [None]:
income_yf