# Algorithmic Trading Model with ML4T `yfinance` Market and Fundamental Data Examples
### David Lowe
### June 28, 2022

NOTE: This script is for learning purposes only and does not constitute a recommendation for buying or selling any stock mentioned in this script.

SUMMARY: This project aims to construct and test an algorithmic trading model and document the end-to-end steps using a template.

INTRODUCTION: This script aims to replicate the examples found in chapter one of the book Machine Learning for Algorithmic Trading by Stefan Jansen. The script seeks to validate further the Python environment and package requirements for running these code examples. The eventual goal is to integrate various example code segments into an end-to-end algorithmic trading system.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: US Equities and Fund Prices Hosted by Yahoo! Finance

Source and Further Discussion of the Code Examples: https://www.ml4trading.io/chapter/1

## Imports & Settings

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import pandas as pd
import yfinance as yf
from datetime import datetime

## How to work with a Ticker object

In [3]:
symbol = 'INTC'
ticker = yf.Ticker(symbol)

### Show ticker info

In [4]:
pd.Series(ticker.info)

zip                                                           95054-1549
sector                                                        Technology
fullTimeEmployees                                                 122900
longBusinessSummary    Intel Corporation engages in the design, manuf...
city                                                         Santa Clara
                                             ...                        
dayHigh                                                            38.12
regularMarketPrice                                                 36.97
preMarketPrice                                                      None
logo_url                             https://logo.clearbit.com/intel.com
trailingPegRatio                                                  2.1556
Length: 153, dtype: object

### Get market data

In [5]:
data = ticker.history(period='5d',
                      interval='1m',
                      start=datetime(2022, 6, 13),
                      end=datetime(2022, 6, 17),
                      actions=True,
                      auto_adjust=True,
                      back_adjust=False)
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1561 entries, 2022-06-13 09:30:00-04:00 to 2022-06-17 16:00:00-04:00
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          1561 non-null   float64
 1   High          1561 non-null   float64
 2   Low           1561 non-null   float64
 3   Close         1561 non-null   float64
 4   Volume        1561 non-null   int64  
 5   Dividends     1561 non-null   int64  
 6   Stock Splits  1561 non-null   int64  
dtypes: float64(4), int64(3)
memory usage: 97.6 KB


### View company actions

In [6]:
# show actions (dividends, splits)
ticker.actions

Unnamed: 0_level_0,Dividends,Stock Splits
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1


In [7]:
ticker.dividends

Series([], Name: Dividends, dtype: int64)

In [8]:
ticker.splits

Series([], Name: Stock Splits, dtype: int64)

### Annual and Quarterly Financial Statement Summary

In [9]:
ticker.financials

Unnamed: 0,2021-12-25,2020-12-26,2019-12-28,2018-12-29
Research Development,15190000000.0,13556000000.0,13362000000.0,13543000000.0
Effect Of Accounting Charges,,,,
Income Before Tax,21703000000.0,25078000000.0,24058000000.0,23317000000.0
Minority Interest,,,,
Net Income,19868000000.0,20899000000.0,21048000000.0,21053000000.0
Selling General Administrative,6543000000.0,6180000000.0,6350000000.0,6950000000.0
Gross Profit,43815000000.0,43612000000.0,42140000000.0,43737000000.0
Ebit,22082000000.0,23876000000.0,22428000000.0,23244000000.0
Operating Income,22082000000.0,23876000000.0,22428000000.0,23244000000.0
Other Operating Expenses,,,393000000.0,-72000000.0


In [10]:
ticker.quarterly_financials

Unnamed: 0,2022-04-02,2021-12-25,2021-09-25,2021-06-26
Research Development,4362000000.0,4049000000.0,3803000000.0,3715000000.0
Effect Of Accounting Charges,,,,
Income Before Tax,9661000000.0,5194000000.0,6858000000.0,5745000000.0
Minority Interest,,,,
Net Income,8113000000.0,4623000000.0,6823000000.0,5061000000.0
Selling General Administrative,1752000000.0,1942000000.0,1674000000.0,1599000000.0
Gross Profit,9244000000.0,11009000000.0,10746000000.0,11206000000.0
Ebit,3130000000.0,5018000000.0,5269000000.0,5892000000.0
Operating Income,3130000000.0,5018000000.0,5269000000.0,5892000000.0
Other Operating Expenses,,,,


### Annual and Quarterly Balance Sheet

In [11]:
ticker.balance_sheet

Unnamed: 0,2021-12-25,2020-12-26,2019-12-28,2018-12-29
Intangible Assets,7270000000.0,9026000000.0,10827000000.0,11836000000.0
Total Liab,73015000000.0,72053000000.0,59020000000.0,53400000000.0
Total Stockholder Equity,95391000000.0,81038000000.0,77504000000.0,74563000000.0
Other Current Liab,2555000000.0,3438000000.0,2330000000.0,1438000000.0
Total Assets,168406000000.0,153091000000.0,136524000000.0,127963000000.0
Common Stock,28006000000.0,25556000000.0,25261000000.0,25365000000.0
Other Current Assets,8897000000.0,7575000000.0,1637000000.0,2646000000.0
Retained Earnings,68265000000.0,56233000000.0,53523000000.0,50172000000.0
Other Liab,11748000000.0,13402000000.0,11402000000.0,11676000000.0
Good Will,26963000000.0,26971000000.0,26276000000.0,24513000000.0


In [12]:
ticker.quarterly_balance_sheet

Unnamed: 0,2022-04-02,2021-12-25,2021-09-25,2021-06-26
Intangible Assets,6813000000.0,7270000000.0,7684000000.0,8018000000.0
Total Liab,73220000000.0,73015000000.0,77875000000.0,69390000000.0
Total Stockholder Equity,103136000000.0,95391000000.0,90087000000.0,85207000000.0
Other Current Liab,625000000.0,2555000000.0,556000000.0,618000000.0
Total Assets,176356000000.0,168406000000.0,167962000000.0,154597000000.0
Common Stock,29244000000.0,28006000000.0,27592000000.0,26655000000.0
Other Current Assets,4076000000.0,8897000000.0,8260000000.0,8025000000.0
Retained Earnings,74894000000.0,68265000000.0,63642000000.0,59647000000.0
Other Liab,11110000000.0,11748000000.0,12693000000.0,12840000000.0
Good Will,27011000000.0,26963000000.0,26786000000.0,26768000000.0


### Annual and Quarterly Cashflow Statement

In [13]:
ticker.cashflow

Unnamed: 0,2021-12-25,2020-12-26,2019-12-28,2018-12-29
Investments,-5287000000.0,-6891000000.0,2140000000.0,3856000000.0
Change To Liabilities,-393000000.0,224000000.0,-86000000.0,1578000000.0
Total Cashflows From Investing Activities,-25167000000.0,-20796000000.0,-14405000000.0,-11239000000.0
Net Borrowings,2474000000.0,5722000000.0,765000000.0,-2603000000.0
Total Cash From Financing Activities,-5862000000.0,-12917000000.0,-17565000000.0,-18607000000.0
Change To Operating Activities,974000000.0,-89000000.0,1682000000.0,41000000.0
Issuance Of Stock,1020000000.0,897000000.0,750000000.0,555000000.0
Net Income,19868000000.0,20899000000.0,21048000000.0,21053000000.0
Change In Cash,-1038000000.0,1671000000.0,1175000000.0,-414000000.0
Repurchase Of Stock,-2415000000.0,-14229000000.0,-13576000000.0,-10730000000.0


In [14]:
ticker.quarterly_cashflow

Unnamed: 0,2022-04-02,2021-12-25,2021-09-25,2021-06-26
Investments,-3919000000.0,2769000000.0,-6481000000.0,-3010000000.0
Change To Liabilities,-134000000.0,89000000.0,664000000.0,117000000.0
Total Cashflows From Investing Activities,-2640000000.0,-5034000000.0,-10682000000.0,-6904000000.0
Net Borrowings,-299000000.0,-2000000000.0,4974000000.0,-500000000.0
Total Cash From Financing Activities,-1863000000.0,-3806000000.0,3906000000.0,-2288000000.0
Change To Operating Activities,-3170000000.0,-68000000.0,2137000000.0,829000000.0
Issuance Of Stock,589000000.0,4000000.0,427000000.0,24000000.0
Net Income,8113000000.0,4623000000.0,6823000000.0,5061000000.0
Change In Cash,1388000000.0,-3043000000.0,3124000000.0,-446000000.0
Total Cash From Operating Activities,5891000000.0,5797000000.0,9900000000.0,8746000000.0


In [15]:
ticker.earnings

Unnamed: 0_level_0,Revenue,Earnings
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
2018,70848000000,21053000000
2019,71965000000,21048000000
2020,77867000000,20899000000
2021,79024000000,19868000000


In [16]:
ticker.quarterly_earnings

Unnamed: 0_level_0,Revenue,Earnings
Quarter,Unnamed: 1_level_1,Unnamed: 2_level_1
2Q2021,19631000000,5061000000
3Q2021,19192000000,6823000000
4Q2021,20528000000,4623000000
2Q2022,18353000000,8113000000


### Sustainability: Environmental, Social and Governance (ESG)

In [17]:
ticker.sustainability

Unnamed: 0_level_0,Value
2022-5,Unnamed: 1_level_1
palmOil,False
controversialWeapons,False
gambling,False
socialScore,5.63
nuclear,False
furLeather,False
alcoholic,False
gmo,False
catholic,False
socialPercentile,


### Analyst Recommendations

In [18]:
ticker.recommendations.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 401 entries, 2012-04-18 06:30:00 to 2022-06-06 11:12:28
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Firm        401 non-null    object
 1   To Grade    401 non-null    object
 2   From Grade  401 non-null    object
 3   Action      401 non-null    object
dtypes: object(4)
memory usage: 15.7+ KB


In [19]:
ticker.recommendations.tail(10)

Unnamed: 0_level_0,Firm,To Grade,From Grade,Action
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-01-27 12:30:47,Mizuho,Neutral,,main
2022-01-27 13:43:56,Credit Suisse,Outperform,,main
2022-01-27 15:27:50,UBS,Neutral,,main
2022-02-18 14:15:38,Barclays,Underweight,,main
2022-02-18 14:37:50,BMO Capital,Market Perform,,main
2022-02-23 10:26:16,Raymond James,Market Perform,Underperform,up
2022-03-03 11:13:10,Morgan Stanley,Underweight,Equal-Weight,down
2022-04-29 10:55:30,Needham,Buy,,main
2022-04-29 12:39:41,Wells Fargo,Equal-Weight,,main
2022-06-06 11:12:28,Citigroup,Neutral,,main


### Upcoming Events

In [20]:
ticker.calendar

Unnamed: 0,0,1
Earnings Date,2022-07-20 20:00:00,2022-07-25 20:00:00
Earnings Average,0.71,0.71
Earnings Low,0.62,0.62
Earnings High,0.86,0.86
Revenue Average,18002200000,18002200000
Revenue Low,17435000000,17435000000
Revenue High,18725800000,18725800000


### Option Expiration Dates

In [21]:
ticker.options

('2022-06-24',
 '2022-07-01',
 '2022-07-08',
 '2022-07-15',
 '2022-07-22',
 '2022-07-29',
 '2022-08-19',
 '2022-09-16',
 '2022-10-21',
 '2022-11-18',
 '2022-12-16',
 '2023-01-20',
 '2023-06-16',
 '2024-01-19')

In [22]:
expiration = ticker.options[0]
options = ticker.option_chain(expiration)
options.calls.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46 entries, 0 to 45
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype              
---  ------             --------------  -----              
 0   contractSymbol     46 non-null     object             
 1   lastTradeDate      46 non-null     datetime64[ns, UTC]
 2   strike             46 non-null     float64            
 3   lastPrice          46 non-null     float64            
 4   bid                46 non-null     float64            
 5   ask                46 non-null     float64            
 6   change             46 non-null     float64            
 7   percentChange      42 non-null     float64            
 8   volume             46 non-null     int64              
 9   openInterest       42 non-null     float64            
 10  impliedVolatility  46 non-null     float64            
 11  inTheMoney         46 non-null     bool               
 12  contractSize       46 non-null     object           

In [23]:
options.calls.head()

Unnamed: 0,contractSymbol,lastTradeDate,strike,lastPrice,bid,ask,change,percentChange,volume,openInterest,impliedVolatility,inTheMoney,contractSize,currency
0,INTC220624C00025000,2022-06-17 17:13:48+00:00,25.0,12.25,11.9,12.1,0.18,1.491303,4,3.0,1.546877,True,REGULAR,USD
1,INTC220624C00030000,2022-06-16 16:34:14+00:00,30.0,7.4,6.9,7.1,0.0,0.0,100,112.0,0.890626,True,REGULAR,USD
2,INTC220624C00032000,2022-06-17 19:02:17+00:00,32.0,4.95,4.95,5.1,4.95,,1,,0.730471,True,REGULAR,USD
3,INTC220624C00033500,2022-06-17 18:47:24+00:00,33.5,3.5,3.45,3.65,3.5,,1,,0.582035,True,REGULAR,USD
4,INTC220624C00034000,2022-06-17 16:06:02+00:00,34.0,2.82,3.0,3.15,-0.58,-17.058828,11,9.0,0.550786,True,REGULAR,USD


In [24]:
options.puts.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48 entries, 0 to 47
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype              
---  ------             --------------  -----              
 0   contractSymbol     48 non-null     object             
 1   lastTradeDate      48 non-null     datetime64[ns, UTC]
 2   strike             48 non-null     float64            
 3   lastPrice          48 non-null     float64            
 4   bid                48 non-null     float64            
 5   ask                48 non-null     float64            
 6   change             48 non-null     float64            
 7   percentChange      40 non-null     float64            
 8   volume             44 non-null     float64            
 9   openInterest       40 non-null     float64            
 10  impliedVolatility  48 non-null     float64            
 11  inTheMoney         48 non-null     bool               
 12  contractSize       48 non-null     object           

## Downloading multiple symbols

In [25]:
tickers = yf.Tickers('msft aapl googl')

In [26]:
tickers

yfinance.Tickers object <MSFT,AAPL,GOOGL>

In [27]:
pd.Series(tickers.tickers['MSFT'].info)

zip                                                           98052-6399
sector                                                        Technology
fullTimeEmployees                                                 181000
longBusinessSummary    Microsoft Corporation develops, licenses, and ...
city                                                             Redmond
                                             ...                        
dayHigh                                                            250.5
regularMarketPrice                                                247.65
preMarketPrice                                                      None
logo_url                         https://logo.clearbit.com/microsoft.com
trailingPegRatio                                                  1.6973
Length: 153, dtype: object

In [28]:
tickers.tickers['AAPL'].history(period="1mo")

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2022-05-18,146.850006,147.360001,139.899994,140.820007,109742900,0,0
2022-05-19,139.880005,141.660004,136.600006,137.350006,136095600,0,0
2022-05-20,139.089996,140.699997,132.610001,137.589996,137426100,0,0
2022-05-23,137.789993,143.259995,137.649994,143.110001,117726300,0,0
2022-05-24,140.809998,141.970001,137.330002,140.360001,104132700,0,0
2022-05-25,138.429993,141.789993,138.339996,140.520004,92482700,0,0
2022-05-26,137.389999,144.339996,137.139999,143.779999,90601500,0,0
2022-05-27,145.389999,149.679993,145.259995,149.639999,90978500,0,0
2022-05-31,149.070007,150.660004,146.839996,148.839996,103718400,0,0
2022-06-01,149.899994,151.740005,147.679993,148.710007,74286600,0,0


In [29]:
tickers.history(period='1mo').stack(-1)

[*********************100%***********************]  3 of 3 completed


Unnamed: 0_level_0,Unnamed: 1_level_0,Close,Dividends,High,Low,Open,Stock Splits,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2022-05-18,AAPL,140.820007,0.00,147.360001,139.899994,146.850006,0,109742900
2022-05-18,GOOGL,2237.989990,0.00,2308.000000,2231.110107,2300.000000,0,1756300
2022-05-18,MSFT,254.080002,0.62,263.600006,252.770004,263.000000,0,31356000
2022-05-19,AAPL,137.350006,0.00,141.660004,136.600006,139.880005,0,136095600
2022-05-19,GOOGL,2207.679932,0.00,2260.199951,2200.000000,2228.629883,0,1707200
...,...,...,...,...,...,...,...,...
2022-06-16,GOOGL,2120.669922,0.00,2172.969971,2102.760010,2144.419922,0,2584200
2022-06-16,MSFT,244.970001,0.00,247.419998,243.020004,245.979996,0,33169200
2022-06-17,AAPL,131.559998,0.00,133.080002,129.809998,130.070007,0,134118500
2022-06-17,GOOGL,2142.870117,0.00,2173.989990,2100.919922,2120.669922,0,2555300


In [30]:
data = yf.download("SPY AAPL", start="2020-01-01", end="2020-01-05")

[*********************100%***********************]  2 of 2 completed


In [31]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2 entries, 2020-01-02 to 2020-01-03
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   (Adj Close, AAPL)  2 non-null      float64
 1   (Adj Close, SPY)   2 non-null      float64
 2   (Close, AAPL)      2 non-null      float64
 3   (Close, SPY)       2 non-null      float64
 4   (High, AAPL)       2 non-null      float64
 5   (High, SPY)        2 non-null      float64
 6   (Low, AAPL)        2 non-null      float64
 7   (Low, SPY)         2 non-null      float64
 8   (Open, AAPL)       2 non-null      float64
 9   (Open, SPY)        2 non-null      float64
 10  (Volume, AAPL)     2 non-null      int64  
 11  (Volume, SPY)      2 non-null      int64  
dtypes: float64(10), int64(2)
memory usage: 208.0 bytes


In [32]:
data = yf.download(
        tickers = "SPY AAPL MSFT", # list or string

        # use "period" instead of start/end
        # valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
        # (optional, default is '1mo')
        period = "5d",

        # fetch data by interval (including intraday if period < 60 days)
        # valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
        # (optional, default is '1d')
        interval = "1m",

        # group by ticker (to access via data['SPY'])
        # (optional, default is 'column')
        group_by = 'ticker',

        # adjust all OHLC automatically
        # (optional, default is False)
        auto_adjust = True,

        # download pre/post regular market hours data
        # (optional, default is False)
        prepost = True,

        # use threads for mass downloading? (True/False/Integer)
        # (optional, default is True)
        threads = True,

        # proxy URL scheme use use when downloading?
        # (optional, default is None)
        proxy = None
    )

[*********************100%***********************]  3 of 3 completed


In [33]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4724 entries, 2022-06-13 04:00:00-04:00 to 2022-06-17 19:59:00-04:00
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   (SPY, Open)     4709 non-null   float64
 1   (SPY, High)     4709 non-null   float64
 2   (SPY, Low)      4709 non-null   float64
 3   (SPY, Close)    4709 non-null   float64
 4   (SPY, Volume)   4709 non-null   float64
 5   (AAPL, Open)    4723 non-null   float64
 6   (AAPL, High)    4723 non-null   float64
 7   (AAPL, Low)     4723 non-null   float64
 8   (AAPL, Close)   4723 non-null   float64
 9   (AAPL, Volume)  4723 non-null   float64
 10  (MSFT, Open)    4608 non-null   float64
 11  (MSFT, High)    4608 non-null   float64
 12  (MSFT, Low)     4608 non-null   float64
 13  (MSFT, Close)   4608 non-null   float64
 14  (MSFT, Volume)  4608 non-null   float64
dtypes: float64(15)
memory usage: 590.5 KB


In [34]:
from pandas_datareader import data as pdr

import yfinance as yf
yf.pdr_override()

# download dataframe
data = pdr.get_data_yahoo('SPY',
                          start='2017-01-01',
                          end='2019-04-30',
                          auto_adjust=False)

[*********************100%***********************]  1 of 1 completed


In [35]:
# auto_adjust = True
data.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-04-23,290.679993,293.140015,290.420013,292.880005,277.617126,52246600
2019-04-24,292.790009,293.160004,292.070007,292.230011,277.000946,50392900
2019-04-25,292.119995,292.779999,290.730011,292.049988,276.830292,57770900
2019-04-26,292.100006,293.48999,291.23999,293.410004,278.119446,50916400
2019-04-29,293.51001,294.450012,293.410004,293.869995,278.55545,57197700


In [36]:
# auto_adjust = False
data.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-04-23,290.679993,293.140015,290.420013,292.880005,277.617126,52246600
2019-04-24,292.790009,293.160004,292.070007,292.230011,277.000946,50392900
2019-04-25,292.119995,292.779999,290.730011,292.049988,276.830292,57770900
2019-04-26,292.100006,293.48999,291.23999,293.410004,278.119446,50916400
2019-04-29,293.51001,294.450012,293.410004,293.869995,278.55545,57197700
