# SimFin Tutorial 07 - Stock Screener

[Original repository on GitHub](https://github.com/simfin/simfin-tutorials)

This tutorial was originally written by [Hvass Labs](https://github.com/Hvass-Labs)

----

"I am King Arthur and these are my knights of the round table. Go and tell your master that we have been charged by God with a quest for the Holy Grail and he can join us. Well I can ask him, but I don't think he'll be very keen, because he has already got one, you see."
&ndash; [Monty Python's Holy Grail](https://www.youtube.com/watch?v=M9DCAFUerzs)


## Introduction

A stock-screener is a very common tool used to search for stocks that meet certain criteria, e.g. low valuation ratios, high sales-growth, etc. This tutorial shows how to make a basic stock-screener using the signals calculated in the previous tutorial. It is assumed you are already familiar with the other previous tutorials on the basics of SimFin.

## Imports

In [1]:
%matplotlib inline
import pandas as pd
from datetime import datetime, timedelta

# Import the main functionality from the SimFin Python API.
import simfin as sf

# Import names used for easy access to SimFin's data-columns.
from simfin.names import *

In [2]:
# Version of the SimFin Python API.
sf.__version__

'0.3.0'

## Config

In [3]:
# SimFin data-directory.
sf.set_data_dir('~/simfin_data/')

In [4]:
# SimFin load API key or use free data.
sf.load_api_key(path='~/simfin_api_key.txt', default_key='free')

## Data Hub

In these examples, we will use stock-data for USA. It is very easy to load and process the data using the `sf.StockHub` class. We instruct it to refresh the financial data every 30 days, and the share-price data must be refreshed daily.

In [5]:
hub = sf.StockHub(market='us',
                  refresh_days=30,
                  refresh_days_shareprices=1)

The data-hub not only makes the syntax very simple, but it also takes care of downloading the required datasets from the SimFin server and loading them into Pandas DataFrames. Furthermore, the data-hub's slow processing functions use a disk-cache to save the results for the next time the functions are called, and the cache automatically gets refreshed when new datasets are downloaded from the SimFin server. Lastly, the data-hub's functions are also RAM-cached, so the second time you call them, they return the results nearly instantly. Altogether, the data-hub makes it much easier and faster to work with the data.

## Financial Signals

First we calculate financial signals for the stocks, such as the Current Ratio, Debt Ratio, Net Profit Margin, Return on Assets, etc. These are calculated using data from the financial reports: Income Statements, Balance Sheets and Cash-Flow Statements, which are automatically downloaded and loaded by the data-hub.

Note that we set `variant='latest'` because we are only interested in the most recent signals, and we are not interested in the signals from several years ago.

In [6]:
%%time
df_fin_signals = hub.fin_signals(variant='latest')

Dataset "us-income-ttm" on disk (4 days old).
- Loading from disk ... Done!
Dataset "us-balance-ttm" on disk (4 days old).
- Loading from disk ... Done!
Dataset "us-shareprices-latest" on disk (2 days old).
- Downloading ... 100.0%
- Extracting zip-file ... Done!
- Loading from disk ... Done!
Cache-file 'fin_signals-7ed2b132.pickle' not on disk.
- Running function fin_signals() ... Done!
- Saving cache-file to disk ... Done!
CPU times: user 12.7 s, sys: 83.7 ms, total: 12.7 s
Wall time: 14.5 s


In [7]:
df_fin_signals.dropna().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Current Ratio,Debt Ratio,Gross Profit Margin,Net Profit Margin,Return on Assets,Return on Equity
Ticker,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A,2019-12-20,2.131319,0.208464,0.54558,0.210609,0.128399,0.234676
AA,2019-12-20,1.48391,0.123108,0.204391,-0.068689,-0.048131,-0.106932
AAL,2019-12-20,0.48949,0.408137,0.264137,0.033458,0.028638,-1.734177
AAOI,2019-12-20,3.118895,0.2283,0.291388,-0.057816,-0.030991,-0.042937
AAP,2019-12-20,1.363659,0.123951,0.439722,0.045177,0.049414,0.119358


We then pass the argument `func=sf.avg_ttm_2y` to `hub.fin_signals`, so as to calculate 2-year averages of the financial signals. We then get another DataFrame with the 2-year average Current Ratio, Net Profit Margin, Return on Assets, etc.

In [8]:
%%time
df_fin_signals_2y = hub.fin_signals(variant='latest',
                                    func=sf.avg_ttm_2y)

Cache-file 'fin_signals-2f49531e.pickle' not on disk.
- Running function fin_signals() ... Done!
- Saving cache-file to disk ... Done!
CPU times: user 17.8 s, sys: 16.1 ms, total: 17.9 s
Wall time: 17.8 s


In [9]:
df_fin_signals_2y.dropna().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Current Ratio,Debt Ratio,Gross Profit Margin,Net Profit Margin,Return on Assets,Return on Equity
Ticker,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A,2019-12-20,2.873845,0.211969,0.545716,0.136288,0.082236,0.149652
AA,2019-12-20,1.439858,0.117903,0.227904,-0.034798,-0.024413,-0.054205
AAL,2019-12-20,0.519986,0.432832,0.275318,0.035501,0.029637,-0.647169
AAOI,2019-12-20,3.290255,0.182644,0.360077,0.051187,0.061052,0.080428
AAP,2019-12-20,1.498598,0.141038,0.437761,0.050989,0.056415,0.145199


## Growth Signals

Now we calculate growth signals for the stocks, such as Earnings Growth, FCF Growth, Sales Growth, etc. These are also calculated using data from the financial reports: Income Statements, Balance Sheets and Cash-Flow Statements, which are automatically downloaded and loaded by the data-hub.

In [10]:
%%time
df_growth_signals = hub.growth_signals(variant='latest')

Dataset "us-income-quarterly" on disk (28 days old).
- Loading from disk ... Done!
Dataset "us-cashflow-ttm" on disk (28 days old).
- Loading from disk ... Done!
Dataset "us-cashflow-quarterly" on disk (24 days old).
- Loading from disk ... Done!
Cache-file 'growth_signals-7ed2b132.pickle' not on disk.
- Running function growth_signals() ... Done!
- Saving cache-file to disk ... Done!
CPU times: user 14.8 s, sys: 84.1 ms, total: 14.9 s
Wall time: 14.9 s


In [11]:
df_growth_signals.dropna().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Earnings Growth,Earnings Growth QOQ,Earnings Growth YOY,FCF Growth,FCF Growth QOQ,FCF Growth YOY,Sales Growth,Sales Growth QOQ,Sales Growth YOY
Ticker,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
A,2019-12-20,2.597315,-0.638889,-0.112195,0.076364,0.224138,-0.164706,0.058432,-0.035826,0.026534
AA,2019-12-20,63.916667,-2.04878,-1.219388,-0.940171,0.878641,0.264706,-0.142976,-0.013569,0.05356
AAL,2019-12-20,-0.077723,2.578378,0.169611,4.191781,-1.147518,-1.282609,0.034902,0.130008,0.027227
AAN,2019-12-20,-0.358158,-0.0668,-0.089639,-0.082894,0.564822,0.146337,0.058293,-0.004476,0.011266
AAOI,2019-12-20,-1.261895,0.222456,-5.940566,-60.530462,-0.587227,-0.067186,-0.274372,-0.091334,-0.19191


We then pass the argument `func=sf.avg_ttm_2y` to `hub.growth_signals` so as to calculate 2-year averages of the growth signals. We then get another DataFrame with the 2-year average Earnings Growth, FCF Growth, Sales Growth, etc.

In [12]:
%%time
df_growth_signals_2y = hub.growth_signals(variant='latest',
                                          func=sf.avg_ttm_2y)

Cache-file 'growth_signals-2f49531e.pickle' not on disk.
- Running function growth_signals() ... Done!
- Saving cache-file to disk ... Done!
CPU times: user 21.3 s, sys: 44.1 ms, total: 21.4 s
Wall time: 21.3 s


In [13]:
df_growth_signals_2y.dropna().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Earnings Growth,Earnings Growth QOQ,Earnings Growth YOY,FCF Growth,FCF Growth QOQ,FCF Growth YOY,Sales Growth,Sales Growth QOQ,Sales Growth YOY
Ticker,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
A,2019-12-20,1.034045,-1.139757,0.068902,0.172797,0.43465,0.013441,0.07644,-0.019977,0.060454
AAL,2019-12-20,-0.142836,2.310695,-0.062766,1.806873,-0.897175,-0.765026,0.046774,0.12471,0.037837
AAN,2019-12-20,0.462982,0.034378,0.317814,0.450612,0.46824,0.311686,0.093859,0.011348,0.073692
AAOI,2019-12-20,-0.593424,-0.203328,-3.416718,-30.736077,-1.901287,-3.713263,-0.064059,-0.137183,-0.256959
AAP,2019-12-20,0.128676,-0.131118,0.206472,0.510563,0.94815,0.215537,0.009474,-0.200179,0.015101


## Valuation Signals

Now we calculate valuation signals for the stocks, such as P/E, P/Sales, etc. These are calculated from the share-prices and data from the financial reports. Because the data-hub has already loaded the required datasets in the function-calls above, the data is merely reused here, and the data-hub can proceed directly to computing the signals.

In [14]:
%%time
df_val_signals = hub.val_signals(variant='latest')

Cache-file 'val_signals-1eb3539e.pickle' not on disk.
- Running function val_signals() ... Done!
- Saving cache-file to disk ... Done!
CPU times: user 4.46 s, sys: 24.2 ms, total: 4.48 s
Wall time: 3.88 s


In [15]:
df_val_signals.dropna().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Dividend Yield,Earnings Yield,FCF Yield,Market-Cap,P/Book,P/E,P/FCF,P/NCAV,P/NetNet,P/Sales
Ticker,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
A,2019-12-20,0.007221,0.03914,0.032222,27388580000.0,5.769662,25.549053,31.03487,-65.056021,-24.004018,5.380861
AAL,2019-12-20,0.013756,0.1139,0.028645,13230910000.0,-601.4052,8.779638,34.910064,-0.253762,-0.243795,0.293753
AAN,2019-12-20,0.002254,0.049381,0.063567,4055631000.0,2.17151,20.250612,15.731449,15.519846,-9.283742,1.03007
AAP,2019-12-20,0.001514,0.037873,0.054283,11527980000.0,3.140024,26.403734,18.422006,-7.054318,-2.793535,1.192837
AAPL,2019-12-20,0.010868,0.042534,0.045336,1299092000000.0,14.356514,23.510429,22.057394,-15.245951,-10.131468,4.993167


We then pass the argument `func=sf.avg_ttm_2y` to `hub.val_signals` so as to calculate the valuation signals using 2-year averages of the financial data. We then get another DataFrame with e.g. P/E and P/Sales ratios calculated from the 2-year average Earnings and Sales.

In [16]:
%%time
df_val_signals_2y = hub.val_signals(variant='latest',
                                    func=sf.avg_ttm_2y)

Cache-file 'val_signals-613b35a2.pickle' not on disk.
- Running function val_signals() ... Done!
- Saving cache-file to disk ... Done!
CPU times: user 13.6 s, sys: 52.1 ms, total: 13.7 s
Wall time: 13.1 s


In [17]:
df_val_signals_2y.dropna().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Dividend Yield,Earnings Yield,FCF Yield,Market-Cap,P/Book,P/E,P/FCF,P/NCAV,P/NetNet,P/Sales
Ticker,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
A,2019-12-20,0.006894,0.02501,0.031079,27388580000.0,5.880534,39.983336,32.176258,-102.387234,-28.38563,5.533606
AAL,2019-12-20,0.014058,0.118699,0.017081,13230910000.0,-29.699022,8.424651,58.543869,-0.274347,-0.262333,0.298791
AAN,2019-12-20,0.001904,0.063159,0.06644,4055631000.0,2.233793,15.833061,15.051235,10.703386,-13.273795,1.059243
AAP,2019-12-20,0.001533,0.042144,0.055706,11527980000.0,3.145745,23.728342,17.951413,-24.686124,-3.958762,1.208126
AAPL,2019-12-20,0.010712,0.04418,0.047347,1299092000000.0,13.146378,22.634832,21.120532,-12.229743,-8.62803,4.941684


## Combine Signals

We now combine all the basic signals into a single DataFrame:

In [18]:
# Combine the DataFrames.
dfs = [df_fin_signals, df_growth_signals, df_val_signals]
df_signals = pd.concat(dfs, axis=1)

# Show the result.
df_signals.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Current Ratio,Debt Ratio,Gross Profit Margin,Net Profit Margin,Return on Assets,Return on Equity,Earnings Growth,Earnings Growth QOQ,Earnings Growth YOY,FCF Growth,...,Dividend Yield,Earnings Yield,FCF Yield,Market-Cap,P/Book,P/E,P/FCF,P/NCAV,P/NetNet,P/Sales
Ticker,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2019-12-20,2.131319,0.208464,0.54558,0.210609,0.128399,0.234676,2.597315,-0.638889,-0.112195,0.076364,...,0.007221,0.03914,0.032222,27388580000.0,5.769662,25.549053,31.03487,-65.056021,-24.004018,5.380861
AA,2019-12-20,1.48391,0.123108,0.204391,-0.068689,-0.048131,-0.106932,63.916667,-2.04878,-1.219388,-0.940171,...,,-0.195698,0.012212,3980625000.0,0.621002,-5.109917,81.887143,-0.862167,-0.647519,0.350994
AAC,2019-11-04,,,,,,,,,,,...,,,,,,,,,,
AAL,2019-12-20,0.48949,0.408137,0.264137,0.033458,0.028638,-1.734177,-0.077723,2.578378,0.169611,4.191781,...,0.013756,0.1139,0.028645,13230910000.0,-601.4052,8.779638,34.910064,-0.253762,-0.243795,0.293753
AAME,2019-12-20,,,,,,,,,,,...,,,,,,,,,,


Then we combine all the signals for multi-year averages into another DataFrame:

In [19]:
# Combine the DataFrames.
dfs = [df_fin_signals_2y, df_growth_signals_2y, df_val_signals_2y]
df_signals_2y = pd.concat(dfs, axis=1)

# Show the result.
df_signals_2y.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Current Ratio,Debt Ratio,Gross Profit Margin,Net Profit Margin,Return on Assets,Return on Equity,Earnings Growth,Earnings Growth QOQ,Earnings Growth YOY,FCF Growth,...,Dividend Yield,Earnings Yield,FCF Yield,Market-Cap,P/Book,P/E,P/FCF,P/NCAV,P/NetNet,P/Sales
Ticker,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2019-12-20,2.873845,0.211969,0.545716,0.136288,0.082236,0.149652,1.034045,-1.139757,0.068902,0.172797,...,0.006894,0.02501,0.031079,27388580000.0,5.880534,39.983336,32.176258,-102.387234,-28.38563,5.533606
AA,2019-12-20,1.439858,0.117903,0.227904,-0.034798,-0.024413,-0.054205,31.4375,-2.391647,-0.325694,,...,,-0.099356,0.108163,3980625000.0,0.581325,-10.064791,9.245323,-0.853845,-0.6405,0.32397
AAC,2019-11-04,,,,,,,,,,,...,,,,,,,,,,
AAL,2019-12-20,0.519986,0.432832,0.275318,0.035501,0.029637,-0.647169,-0.142836,2.310695,-0.062766,1.806873,...,0.014058,0.118699,0.017081,13230910000.0,-29.699022,8.424651,58.543869,-0.274347,-0.262333,0.298791
AAME,2019-12-20,,,,,,,,,,,...,,,,,,,,,,


## Screener for Net-Net Stocks

This is an old investment strategy used by Ben Graham who was the teacher of Warren Buffett, who also used the strategy when he started investing. The idea is to buy stocks that are cheaper than a conservative estimate of their liquidation value. In normal market conditions, few companies have stocks that trade at such low prices, and there may be very good reasons why the stocks are so cheap. But during market panics, it is sometimes possible to buy decent stocks at such low prices.

The Net-Net formula is:

$$
    NetNet = Cash\ \&\ Equiv + 0.75 \cdot Receivables \\
    + 0.5 \cdot Inventories - Total\ Liabilities
$$

This means P/NetNet ratios between 0 and 1 indicate the stocks are trading at a discount to their estimated liquidation values. The lower the P/NetNet ratio, the cheaper the stock is.

We create the stock-screener for Net-Net stocks, by making a boolean mask for the rows in the DataFrame with signals that meet the criteria: P/NetNet > 0 and P/NetNet < 1

In [20]:
mask = (df_signals[P_NETNET] > 0) & (df_signals[P_NETNET] < 1)

Rows that satisfy the screener-condition have a value of `True` and rows that do not meet the condition have a value of `False`.

In [21]:
mask.head()

Ticker  Date      
A       2019-12-20    False
AA      2019-12-20    False
AAC     2019-11-04    False
AAL     2019-12-20    False
AAME    2019-12-20    False
Name: P/NetNet, dtype: bool

We can then use the boolean mask to select matching rows in the signal DataFrame, and show the P/NetNet signal:

In [22]:
df_signals.loc[mask, P_NETNET]

Ticker  Date      
ADIL    2019-12-20    0.476734
AEHR    2019-12-20    0.007397
ALPN    2019-12-20    0.317445
AMDA    2019-12-20    0.175974
AVGR    2019-12-20    0.290389
AXGN    2019-12-20    0.340247
CGA     2019-12-20    0.048812
CLBS    2019-12-20    0.598675
CLRB    2019-12-20    0.769379
CUO     2019-12-20    0.516363
CYCC    2019-12-20    0.290360
CYIG    2019-12-19    0.105220
IFON    2018-07-10    0.036616
KKR     2019-12-20    0.634716
MN      2019-12-20    0.310562
NLNK    2019-12-20    0.659181
NSPR    2019-12-20    0.247912
PESI    2019-12-20    0.781220
RKDA    2019-12-20    0.857185
SOHU    2019-12-20    0.615808
SPRT    2019-12-20    0.837657
SRRA    2019-12-20    0.393623
SURF    2019-12-20    0.778481
SVT     2019-12-20    0.003373
TMED    2019-12-19    0.887624
TOCA    2019-12-20    0.356863
TROV    2019-12-20    0.610781
UMRX    2019-12-20    0.748683
WSTL    2019-12-20    0.596703
Name: P/NetNet, dtype: float64

Note that some of the dates are not recent, so we can remove all rows with dates that are older than e.g. 30 days, by creating another boolean mask.

In [23]:
# Oldest date that is allowed for a row.
date_limit = datetime.now() - timedelta(days=30)

# Load the latest share-prices from the data-hub.
df_prices_latest = hub.load_shareprices(variant='latest')

# Boolean mask for the tickers that satisfy this condition.
mask_date_limit = (df_prices_latest.reset_index(DATE)[DATE] > date_limit)

# Show the result.
mask_date_limit.head()

Ticker
A        True
AA       True
AAC     False
AAL      True
AAME     True
Name: Date, dtype: bool

We can then combine the screener-mask and the date-mask:

In [24]:
mask &= mask_date_limit

And then we can show the recent stock-prices that are trading at Net-Net discounts:

In [25]:
df_signals.loc[mask, P_NETNET]

Ticker  Date      
ADIL    2019-12-20    0.476734
AEHR    2019-12-20    0.007397
ALPN    2019-12-20    0.317445
AMDA    2019-12-20    0.175974
AVGR    2019-12-20    0.290389
AXGN    2019-12-20    0.340247
CGA     2019-12-20    0.048812
CLBS    2019-12-20    0.598675
CLRB    2019-12-20    0.769379
CUO     2019-12-20    0.516363
CYCC    2019-12-20    0.290360
CYIG    2019-12-19    0.105220
KKR     2019-12-20    0.634716
MN      2019-12-20    0.310562
NLNK    2019-12-20    0.659181
NSPR    2019-12-20    0.247912
PESI    2019-12-20    0.781220
RKDA    2019-12-20    0.857185
SOHU    2019-12-20    0.615808
SPRT    2019-12-20    0.837657
SRRA    2019-12-20    0.393623
SURF    2019-12-20    0.778481
SVT     2019-12-20    0.003373
TMED    2019-12-19    0.887624
TOCA    2019-12-20    0.356863
TROV    2019-12-20    0.610781
UMRX    2019-12-20    0.748683
WSTL    2019-12-20    0.596703
Name: P/NetNet, dtype: float64

## Screener for Many Criteria

It is very easy to combine many criteria in the stock-screener. Let use start with the condition that the Market Capitalization must be more then USD 1 billion:

In [26]:
mask = (df_signals[MARKET_CAP] > 1e9)

Then let us add criteria for the Current Ratio and Debt Ratio calculated from the latest financial reports, as well as the quarterly sales-growth year-over-year.

We combine all these criteria simply by generating the corresponding boolean masks, and doing the logical-and with the previous mask, thereby accumulating multiple criteria.

In [27]:
mask &= (df_signals[CURRENT_RATIO] > 2)
mask &= (df_signals[DEBT_RATIO] < 0.5)
mask &= (df_signals[SALES_GROWTH_YOY] > 0.1)

We can also create screener-criteria using the 2-year average signals, e.g. the P/E and P/FCF ratios which use 2-year average Earnings and FCF. We can combine screener-criteria from `df_signals` and `df_signals_2y` because their indices are compatible.

In [28]:
mask &= (df_signals_2y[PE] < 20)
mask &= (df_signals_2y[PFCF] < 20)
mask &= (df_signals_2y[ROA] > 0.03)
mask &= (df_signals_2y[ROE] > 0.15)
mask &= (df_signals_2y[NET_PROFIT_MARGIN] > 0.0)
mask &= (df_signals_2y[SALES_GROWTH] > 0.1)

Finally we can ensure that we only get the stocks with recent share-prices:

In [29]:
mask &= mask_date_limit

These are the stocks and signals matching all these criteria:

In [30]:
df_signals[mask]

Unnamed: 0_level_0,Unnamed: 1_level_0,Current Ratio,Debt Ratio,Gross Profit Margin,Net Profit Margin,Return on Assets,Return on Equity,Earnings Growth,Earnings Growth QOQ,Earnings Growth YOY,FCF Growth,...,Dividend Yield,Earnings Yield,FCF Yield,Market-Cap,P/Book,P/E,P/FCF,P/NCAV,P/NetNet,P/Sales
Ticker,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
COP,2019-12-20,2.560922,0.212113,0.622158,0.241477,0.118161,0.25989,0.39695,0.003761,0.183027,-0.555488,...,0.018196,0.223966,0.097003,37224420000.0,1.056341,4.464965,10.309012,-1.872079,-1.556595,1.078187
CORT,2019-12-20,8.251327,0.297553,0.980215,0.304139,0.2896,0.334462,-0.427886,0.240097,-0.776182,0.906388,...,,0.055923,0.072273,1552273000.0,4.642297,17.881679,13.836442,6.614874,7.103126,5.438518
EXEL,2019-12-20,8.639725,0.013915,0.967346,0.704107,0.704089,0.827823,0.984815,0.043114,-0.096601,1.653536,...,,0.110481,0.093192,5806749000.0,3.900728,9.051322,10.730566,7.233291,7.678089,6.373101
JAZZ,2019-12-20,3.153672,0.302764,1.0,0.248815,0.09198,0.174551,0.08714,-0.465724,0.852558,0.134549,...,,0.052497,0.070225,9263464000.0,3.39331,19.048554,14.239895,-7.442339,-6.401259,4.739569
PRGO,2019-12-20,3.18386,0.360024,0.3648,0.120889,0.108734,0.227802,0.025325,-0.059288,0.054924,0.245239,...,0.006568,0.086487,0.093229,5176512000.0,2.095839,11.562457,10.726299,-7.706583,-3.878554,1.397773


We can also show a sub-set of all the signals and sort e.g. by the P/FCF ratios:

In [31]:
columns = [PFCF, PE, ROA, ROE, CURRENT_RATIO, DEBT_RATIO]
df_signals.loc[mask, columns].sort_values(by=PFCF, ascending=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,P/FCF,P/E,Return on Assets,Return on Equity,Current Ratio,Debt Ratio
Ticker,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
COP,2019-12-20,10.309012,4.464965,0.118161,0.25989,2.560922,0.212113
PRGO,2019-12-20,10.726299,11.562457,0.108734,0.227802,3.18386,0.360024
EXEL,2019-12-20,10.730566,9.051322,0.704089,0.827823,8.639725,0.013915
CORT,2019-12-20,13.836442,17.881679,0.2896,0.334462,8.251327,0.297553
JAZZ,2019-12-20,14.239895,19.048554,0.09198,0.174551,3.153672,0.302764


## Handling NaN Signal-Values

The signals are calculated from various financial data using mathematical formulas. If any data-item in the formula is NaN (Not-a-Number) then the result of the entire formula is also NaN, and then the screener-condition automatically evaluates to False, so the company is excluded from the results of the stock-screener.

For example, the Debt Ratio (`DEBT_RATIO`) is calculated from Short Term Debt (`ST_DEBT`), Long Term Debt (`LT_DEBT`) and Total Assets (`TOTAL_ASSETS`). If just one of these numbers is NaN, then the resulting Debt Ratio is also NaN and the screener-condition for this signal will always evaluate to False, so the company is excluded from the screener's results.

You might think that a solution would simply be to use `fillna(0)` on all the data-items before calculating the signals. This may work for some formulas and for some uses of the signals, but it is not a generally valid solution, as it may severely distort the signals.

Consider for example the ticker AMZN, where all data for Short Term Debt is missing in all the reports, while the Long Term Debt is only missing in some reports. If you look at the data, it seems most likely that this is a data-error, and the Long Term Debt should actually be several billions of dollars. If we were to replace these missing values with zeros, then we would get very misleading Debt Ratios.

In this example, AMZN had actually not reported these numbers in some of their quarterly reports. That is why the values are missing in the data.

In [32]:
# Load the TTM Balance Sheets from the data-hub.
df_balance_ttm = hub.load_balance(variant='ttm')

# Show the relevant data.
columns = [ST_DEBT, LT_DEBT, TOTAL_ASSETS]
df_balance_ttm.loc['AMZN', columns]['2010':'2013']

Unnamed: 0_level_0,Short Term Debt,Long Term Debt,Total Assets
Report Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-03-31,,131000000.0,12042000000
2010-06-30,,132000000.0,12397000000
2010-09-30,,164000000.0,14162000000
2010-12-31,,,18797000000
2011-03-31,,,16882000000
2011-06-30,,,17941000000
2011-09-30,,,19054000000
2011-12-31,,255000000.0,25278000000
2012-03-31,,,20339000000
2012-06-30,,,21022000000


A simple solution is to ignore signals that are NaN. For example, we could have the following criteria:

In [33]:
# Start the screener with a market-cap condition.
mask = (df_signals[MARKET_CAP] > 1e9)

# Ensure share-prices are recent.
mask &= mask_date_limit

# Screener criteria where NaN signals are ignored.
mask &= ((df_signals[CURRENT_RATIO] > 2) | (df_signals[CURRENT_RATIO].isnull()))
mask &= ((df_signals[DEBT_RATIO] < 0.5) | (df_signals[DEBT_RATIO].isnull()))
mask &= ((df_signals[PE] < 20) | (df_signals[PE].isnull()))
mask &= ((df_signals[PFCF] < 20) | (df_signals[PFCF].isnull()))
mask &= ((df_signals[ROA] > 0.03) | (df_signals[ROA].isnull()))
mask &= ((df_signals[ROE] > 0.15) | (df_signals[ROE].isnull()))
mask &= ((df_signals[NET_PROFIT_MARGIN] > 0.0) | (df_signals[NET_PROFIT_MARGIN].isnull()))
mask &= ((df_signals[SALES_GROWTH] > 0.1) | (df_signals[SALES_GROWTH].isnull()))

The following shows the stocks whose signals match these criteria. You can see that some of the signals are NaN, but the stocks are still included in the results, because the screener just ignores NaN values:

In [34]:
columns = [PFCF, PE, ROA, ROE, CURRENT_RATIO, DEBT_RATIO]
df_signals.loc[mask, columns].sort_values(by=PE, ascending=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,P/FCF,P/E,Return on Assets,Return on Equity,Current Ratio,Debt Ratio
Ticker,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
HFC,2019-12-20,12.914176,6.202886,0.145555,0.270254,2.503984,
NUE,2019-12-20,9.605797,7.10808,0.155596,0.269057,3.368494,0.238033
EXEL,2019-12-20,10.730566,9.051322,0.704089,0.827823,8.639725,0.013915
PRGO,2019-12-20,10.726299,11.562457,0.108734,0.227802,3.18386,0.360024
PATK,2019-12-20,7.007622,11.879858,0.086688,0.255609,2.007255,0.435674
FII,2019-12-20,16.921757,12.722681,0.165239,0.246805,2.165258,
IRBT,2019-12-20,-42.088276,16.009366,0.130783,0.179615,2.98146,
ALXN,2019-12-20,16.561572,16.845427,0.106208,0.160343,3.978372,0.170996
HDS,2019-12-20,12.486342,16.928598,0.090317,0.258706,2.054779,0.43587
GNTX,2019-12-20,,17.684374,,,5.416747,


## License (MIT)

This is published under the
[MIT License](https://github.com/simfin/simfin-tutorials/blob/master/LICENSE.txt)
which allows very broad use for both academic and commercial purposes.

You are very welcome to modify and use this source-code in your own project. Please keep a link to the [original repository](https://github.com/simfin/simfin-tutorials).
