<a id='top'></a>
# How to slice and dice the data
Below are a series of examples on how to slice and dice the data that is stored in the *.sqlite* file generated by the [MorningStar.com](https://www.morningstar.com) scraper. 

##### NOTE: 
- The data used to generate the codes below comes from the *.sqlite* that is generated by the scraper once it has been installed and ran locally on your machine. See the [README]() for instructions on how to run install and run the scraper.
- Navigation links as the ones in the list of content below as well as other links throught this document will only work if you are using [Jupyter](https://jupyter.org/) to view this document.


**Content** 

1. [Required modules and matplotlib backend](#modules)
1. [Creating a master (bridge table) DataFrame instance using the DataFrames class](#master)
1. [Creating DataFrame instances with the dataframes methods](#methods)
1. [Data statistics and sample code](#stats)
1. [Applying various criteria to filter common stocks](#value) *(in progress)*
1. [Additional sample / test code](#additional) *(in progress)*

<a id="modules"></a>
## Required modules and matplotlib backend

In [1]:
%matplotlib notebook

In [2]:
import matplotlib.pyplot as plt
import matplotlib

In [3]:
from importlib import reload
import pandas as pd
import numpy as np

# Import dataframes module from project folder.
# This module contains a class that reads the database tables and assigns the data to pandas.DataFrame objects
import dataframes
reload(dataframes) #reload if changes have been made to module file

<module 'dataframes' from '/home/cbrandao/lib/python/msTables/dataframes.py'>

[return to the top](#top)
<a id="master"></a>
## Creating a master DataFrame instance using the dataframes class
The DataFrames class is part of the [dataframes module](dataframes.py)

In [4]:
db_file_name = 'mstables2' # Change the file name here as needed
df = dataframes.DataFrames('db/{}.sqlite'.format(db_file_name))

Creating intial DataFrames from file db/mstables2.sqlite...
Creating DataFrame 'colheaders' ...
Creating DataFrame 'timerefs' ...
Creating DataFrame 'urls' ...
Creating DataFrame 'securitytypes' ...
Creating DataFrame 'tickers' ...
Creating DataFrame 'sectors' ...
Creating DataFrame 'industries' ...
Creating DataFrame 'stockstyles' ...
Creating DataFrame 'exchanges' ...
Creating DataFrame 'countries' ...
Creating DataFrame 'companies' ...
Creating DataFrame 'currencies' ...
Creating DataFrame 'stocktypes' ...
Creating DataFrame 'master' ...
Initial DataFrames created.


### Creating Master DataFrame instance from reference tables
By merging `df.master` (*Master* bridge table) with other reference tables (e.g. `df.tickers`, `df.exchanges`, etc.)
##### DataFrame Instance

In [5]:
# Merge Tables
df_master0 = (df.master
# Ticker Symbols
 .merge(df.tickers, left_on='ticker_id', right_on='id').drop(['id'], axis=1)
# Company / Security Name
 .merge(df.companies, left_on='company_id', right_on='id').drop(['id', 'company_id'], axis=1)
# Exchanges
 .merge(df.exchanges, left_on='exchange_id', right_on='id').drop(['id'], axis=1)
# Industries
 .merge(df.industries, left_on='industry_id', right_on='id').drop(['id', 'industry_id'], axis=1)
# Sectors
 .merge(df.sectors, left_on='sector_id', right_on='id').drop(['id', 'sector_id'], axis=1)
# Countries
 .merge(df.countries, left_on='country_id', right_on='id').drop(['id', 'country_id'], axis=1)
# Security Types
 .merge(df.securitytypes, left_on='security_type_id', right_on='id').drop(['id', 'security_type_id'], axis=1)
# Stock Types
 .merge(df.stocktypes, left_on='stock_type_id', right_on='id').drop(['id', 'stock_type_id'], axis=1)
# Stock Style Types
 .merge(df.styles, left_on='style_id', right_on='id').drop(['id', 'style_id'], axis=1)
# Quote Header Info
 .merge(df.quoteheader(), on=['ticker_id', 'exchange_id']).rename(columns={'fpe':'Forward_PE'})
# Currency
 .merge(df.currencies, left_on='currency_id', right_on='id').drop(['id', 'currency_id'], axis=1)
# Fiscal Year End
 .merge(df.timerefs, left_on='fyend_id', right_on='id').drop(['fyend_id'], axis=1)
             .rename(columns={'dates':'fy_end'})
# Updated Date
 .merge(df.timerefs, left_on='update_date_id', right_on='id').drop(['update_date_id'], axis=1)
             .rename(columns={'dates':'updated_date'})
)

# Change date columns to TimeFrames
df_master0['fy_end'] = pd.to_datetime(df_master0['fy_end'])
df_master0['updated_date'] = pd.to_datetime(df_master0['updated_date'])

# Create df_master and apply filters
df_master = df_master0.copy()

df_master[['lastprice', 'day_hi', 'day_lo', '_52wk_hi', '_52wk_lo', 'yield', 'aprvol', 'avevol']] = (
    df_master[['lastprice', 'day_hi', 'day_lo', '_52wk_hi', '_52wk_lo', 'yield', 'aprvol', 'avevol']]
    .fillna(value=0.0))

df_master = (df_master.where((df_master['openprice'] > 0.0) & (df_master['lastprice'] > 0.0))
             .dropna(axis=0, how='all'))

In [6]:
df_master.head()

Unnamed: 0,ticker_id,exchange_id,ticker,company,exchange,exchange_sym,industry,sector,country,country_c2,...,aprvol,avevol,Forward_PE,pb,ps,pc,currency,currency_code,fy_end,updated_date
0,1.0,374.0,OGCP,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,1000.0,3498.0,,2.4,6.4,16.7,United States Dollar,USD,2019-12-31,2019-04-07
1,2.0,374.0,FISK,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,1300.0,3014.0,,2.4,6.4,16.8,United States Dollar,USD,2019-12-31,2019-04-07
2,3.0,374.0,ESBA,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,7245.0,17288.0,,2.4,6.4,16.8,United States Dollar,USD,2019-12-31,2019-04-07
3,19437.0,302.0,PSB,PS Business Parks Inc,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Diversified,Real Estate,United States,US,...,483944.0,90418.0,,5.1,9.9,14.9,United States Dollar,USD,2019-12-31,2019-04-07
4,20371.0,302.0,STOR,STORE Capital Corp,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Diversified,Real Estate,United States,US,...,15000000.0,17000000.0,,1.9,12.7,17.5,United States Dollar,USD,2019-12-31,2019-04-07


##### DataFrame Length

In [7]:
print('Master DataFrame contains {:,.0f} records.'.format(len(df_master)))

Master DataFrame contains 103,545 records.


##### DataFrame Columns

In [8]:
df_master.columns

Index(['ticker_id', 'exchange_id', 'ticker', 'company', 'exchange',
       'exchange_sym', 'industry', 'sector', 'country', 'country_c2',
       'country_c3', 'security_type_code', 'security_type', 'stock_type',
       'style', 'openprice', 'lastprice', 'day_hi', 'day_lo', '_52wk_hi',
       '_52wk_lo', 'yield', 'aprvol', 'avevol', 'Forward_PE', 'pb', 'ps', 'pc',
       'currency', 'currency_code', 'fy_end', 'updated_date'],
      dtype='object')

<br></br>
[return to the top](#top)
<a id='methods'></a>
## Creating DataFrame instances with dataframes methods
Class DataFrames from [dataframe.py](dataframe.py) contains the following methods that return a pd.DataFrame object for the specified database table:

- `quoteheader` - [MorningStar (MS) Quote Header](#quote)
- `valuation` - [MS Valuation table with Price Ratios (P/E, P/S, P/B, P/C) for the past 10 yrs](#val)
- `keyratios` - [MS Ratio - Key Financial Ratios & Values](#keyratios)
- `finhealth` - [MS Ratio - Financial Health](#finhealth)
- `profitability` - [MS Ratio - Profitability](#prof)
- `growth` - [MS Ratio - Growth](#growth)
- `cfhealth` - [MS Ratio - Cash Flow Health](#cfh)
- `efficiency` - [MS Ratio - Efficiency](#eff)
- `annualIS` - [MS Annual Income Statements](#isa)
- `quarterlyIS` - [MS Quarterly Income Statements](#isq)
- `annualBS` - [MS Annual Balance Sheets](#bsa)
- `quarterlyBS` - [MS Quarterly Balance Sheets](#bsq)
- `annualCF` - [MS Annual Cash Flow Statements](#cfa)
- `quarterlyCF` - [MS Quarterly Cash Flow Statements](#cfq)
- `pricehistory` - MSpricehistory

[return to the top](#top)
<a id='quote'></a>
### Quote Header 
##### DataFrame Instance

In [9]:
df_quote = df.quoteheader()

In [10]:
df_quote.head()

Unnamed: 0,ticker_id,exchange_id,openprice,lastprice,day_hi,day_lo,_52wk_hi,_52wk_lo,yield,aprvol,avevol,fpe,pb,ps,pc,currency_id
0,1,374,15.73,15.72,15.73,15.72,17.72,12.16,2.67,1000.0,3498.0,,2.4,6.4,16.7,104.0
1,2,374,15.75,15.77,15.78,15.73,17.68,13.68,2.66,1300.0,3014.0,,2.4,6.4,16.8,104.0
2,3,374,15.75,15.75,15.9,15.75,17.79,11.99,2.67,7245.0,17288.0,,2.4,6.4,16.8,104.0
3,4,482,95.65,95.85,96.12,95.05,115.11,87.87,1.25,738930.0,804691.0,19.8,3.3,3.9,20.1,104.0
4,5,1,0.0,0.0,0.0,0.0,0.0,0.0,,184.0,184.0,,,,,104.0


##### DataFrame Length

In [11]:
print('DataFrame contains {:,.0f} records.'.format(len(df_quote)))

DataFrame contains 117,669 records.


<a id='val'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Price Ratios (P/E, P/S, P/B, P/C)
##### DataFrame Instance

In [12]:
df_vals = df.valuation().reset_index()

In [13]:
df_vals

Unnamed: 0,exchange_id,ticker_id,PE_2009,PE_2010,PE_2011,PE_2012,PE_2013,PE_2014,PE_2015,PE_2016,...,PC_2010,PC_2011,PC_2012,PC_2013,PC_2014,PC_2015,PC_2016,PC_2017,PC_2018,PC_TTM
0,374,1,,,,,,18.6,69.0,57.8,...,,,,48.3,-153.8,24.3,25.8,26.3,21.5,16.7
1,374,2,,,,,,18.2,69.0,58.8,...,,,,50.5,-151.5,24.3,26.2,26.4,20.3,16.8
2,374,3,,,,,,18.3,69.4,58.8,...,,,,48.3,-153.8,24.4,26.2,26.7,20.6,16.8
3,482,4,,22.2,17.4,16.6,27.0,29.7,26.6,31.7,...,16.2,11.1,12.9,19.6,23.1,19.9,26.8,52.9,21.8,20.1
4,1,5,,,,,,,,,...,,,,,,,,,,
5,1,6,,,,,,,,,...,,,,,,,,,,
6,1,7,,,,,,,,,...,,,,,,,,,,
7,1,8,,,,,,,,,...,,,,,,,,,,
8,1,9,,,,,,,,,...,-0.8,,-0.1,,,,,,,
9,1,10,,,,,1.1,,,,...,-0.8,-0.1,0.0,11.2,-0.4,0.0,0.0,,,


##### DataFrame Length

In [14]:
print('DataFrame contains {:,.0f} records.'.format(len(df_vals)))

DataFrame contains 80,010 records.


##### DataFrame Columnns

In [15]:
df_vals.columns

Index(['exchange_id', 'ticker_id', 'PE_2009', 'PE_2010', 'PE_2011', 'PE_2012',
       'PE_2013', 'PE_2014', 'PE_2015', 'PE_2016', 'PE_2017', 'PE_2018',
       'PE_TTM', 'PS_2009', 'PS_2010', 'PS_2011', 'PS_2012', 'PS_2013',
       'PS_2014', 'PS_2015', 'PS_2016', 'PS_2017', 'PS_2018', 'PS_TTM',
       'PB_2009', 'PB_2010', 'PB_2011', 'PB_2012', 'PB_2013', 'PB_2014',
       'PB_2015', 'PB_2016', 'PB_2017', 'PB_2018', 'PB_TTM', 'PC_2009',
       'PC_2010', 'PC_2011', 'PC_2012', 'PC_2013', 'PC_2014', 'PC_2015',
       'PC_2016', 'PC_2017', 'PC_2018', 'PC_TTM'],
      dtype='object')

<a id='keyratios'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Key Ratios
##### DataFrame Instance

In [16]:
df_keyratios = (df_master.merge(df.keyratios(), on=['ticker_id', 'exchange_id']))

In [17]:
df_keyratios.head()

Unnamed: 0,ticker_id,exchange_id,ticker,company,exchange,exchange_sym,industry,sector,country,country_c2,...,Y1,Y2,Y3,Y4,Y5,Y6,Y7,Y8,Y9,Y10
0,1.0,374.0,OGCP,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
1,2.0,374.0,FISK,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
2,3.0,374.0,ESBA,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
3,19437.0,302.0,PSB,PS Business Parks Inc,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Diversified,Real Estate,United States,US,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
4,20371.0,302.0,STOR,STORE Capital Corp,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Diversified,Real Estate,United States,US,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM


##### DataFrame Length

In [18]:
print('DataFrame contains {:,.0f} records.'.format(len(df_keyratios)))

DataFrame contains 68,348 records.


##### DataFrame Columnns

In [19]:
df_labels_kratios = (df_keyratios
                     .loc[0, [col for col in df_keyratios.columns if 'Y' not in col and col.startswith('i')]]
                     .replace(df.colheaders['header']))
df_labels_kratios

industry          REIT - Diversified
i0                           Revenue
i1                      Gross_Margin
i2                  Operating_Income
i3                  Operating_Margin
i4                        Net_Income
i5                Earnings_Per_Share
i6                         Dividends
i91                     Payout_Ratio
i7                            Shares
i8              Book_Value_Per_Share
i9               Operating_Cash_Flow
i10                     Cap_Spending
i11                   Free_Cash_Flow
i90         Free_Cash_Flow_Per_Share
i80                  Working_Capital
Name: 0, dtype: object

<a id='finhealth'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Financial Health
##### DataFrame Instance

In [20]:
df_finhealth = df.finhealth()

In [21]:
df_finhealth.head()

Unnamed: 0,ticker_id,exchange_id,fh_balsheet,i45,i45_fh_Y0,i45_fh_Y1,i45_fh_Y2,i45_fh_Y3,i45_fh_Y4,i45_fh_Y5,...,fh_Y1,fh_Y2,fh_Y3,fh_Y4,fh_Y5,fh_Y6,fh_Y7,fh_Y8,fh_Y9,fh_Y10
0,1,374,324,325,,,,4.89,2.45,1.39,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,Latest Qtr
1,2,374,324,325,,,,4.89,2.45,1.39,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,Latest Qtr
2,3,374,324,325,,,,4.89,2.45,1.39,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,Latest Qtr
3,4,482,324,325,67.09,21.17,41.16,40.02,50.12,38.53,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,Latest Qtr
4,23,1,324,325,37.05,83.93,79.97,86.32,99.22,22.23,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,Latest Qtr


In [22]:
print('DataFrame contains {:,.0f} records.'.format(len(df_finhealth)))

DataFrame contains 77,298 records.


##### DataFrame Columns

In [23]:
(df_finhealth.loc[0, [col for col in df_finhealth.columns if 'Y' not in col and '_id' not in col]]
 .replace(df.colheaders['header']))

fh_balsheet         Balance Sheet Items (in %)
i45              Cash & Short-Term Investments
i46                        Accounts Receivable
i47                                  Inventory
i48                       Other Current Assets
i49                       Total Current Assets
i50                                   Net PP&E
i51                                Intangibles
i52                     Other Long-Term Assets
i53                               Total Assets
i54                           Accounts Payable
i55                            Short-Term Debt
i56                              Taxes Payable
i57                        Accrued Liabilities
i58               Other Short-Term Liabilities
i59                  Total Current Liabilities
i60                             Long-Term Debt
i61                Other Long-Term Liabilities
i62                          Total Liabilities
i63                  Total Stockholders Equity
i64                 Total Liabilities & Equity
lfh_liquidity

<a id='prof'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Profitability
##### DataFrame Instance

In [24]:
df_profitability = df.profitability()

In [25]:
df_profitability.head()

Unnamed: 0,ticker_id,exchange_id,pr_margins,i12,i12_pr_Y0,i12_pr_Y1,i12_pr_Y2,i12_pr_Y3,i12_pr_Y4,i12_pr_Y5,...,pr_Y1,pr_Y2,pr_Y3,pr_Y4,pr_Y5,pr_Y6,pr_Y7,pr_Y8,pr_Y9,pr_Y10
0,1,374,279,202,,,100.0,100.0,100.0,100.0,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
1,2,374,279,202,,,100.0,100.0,100.0,100.0,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
2,3,374,279,202,,,100.0,100.0,100.0,100.0,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
3,4,482,279,202,100.0,100.0,100.0,100.0,100.0,100.0,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
4,23,1,279,202,100.0,100.0,100.0,100.0,100.0,100.0,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM


##### DataFrame Length

In [26]:
print('DataFrame contains {:,.0f} records.'.format(len(df_profitability)))

DataFrame contains 77,298 records.


##### DataFrame Columns

In [27]:
# Financial Health DataFrame Columns
(df_profitability.loc[0, [col for col in df_profitability.columns if 'Y' not in col and '_id' not in col]]
 .replace(df.colheaders['header']))

pr_margins              Margins % of Sales
i12                                Revenue
i13                                   COGS
i14                           Gross Margin
i15                                   SG&A
i16                                    R&D
i17                                  Other
i18                       Operating Margin
i19                    Net Int Inc & Other
i20                             EBT Margin
pr_profit                    Profitability
i21                             Tax Rate %
i22                           Net Margin %
i23               Asset Turnover (Average)
i24                     Return on Assets %
i25           Financial Leverage (Average)
i26                     Return on Equity %
i27           Return on Invested Capital %
i95                      Interest Coverage
Name: 0, dtype: object

<a id='growth'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Growth
##### DataFrame Instance

In [28]:
df_growth = df.growth()

In [29]:
df_growth.head()

Unnamed: 0,ticker_id,exchange_id,gr_revenue,i28,i28_gr_Y0,i28_gr_Y1,i28_gr_Y2,i28_gr_Y3,i28_gr_Y4,i28_gr_Y5,...,gr_Y1,gr_Y2,gr_Y3,gr_Y4,gr_Y5,gr_Y6,gr_Y7,gr_Y8,gr_Y9,gr_Y10
0,1,374,298,299,,,,-11.7,19.81,103.73,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,Latest Qtr
1,2,374,298,299,,,,-11.7,19.81,103.73,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,Latest Qtr
2,3,374,298,299,,,,-11.7,19.81,103.73,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,Latest Qtr
3,4,482,298,299,2.23,2.59,16.25,0.83,11.65,7.9,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,Latest Qtr
4,23,1,298,299,238.49,16.43,16.43,,,,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,Latest Qtr


##### DataFrame Length

In [30]:
print('DataFrame contains {:,.0f} records.'.format(len(df_growth)))

DataFrame contains 77,298 records.


##### DataFrame Columns

In [31]:
# Financial Health DataFrame Columns
(df_growth.loc[0, [col for col in df_growth.columns if 'Y' not in col and '_id' not in col]]
 .replace(df.colheaders['header']))

gr_revenue               Revenue %
i28                 Year over Year
i29                 3-Year Average
i30                 5-Year Average
i31                10-Year Average
gr_operating    Operating Income %
i32                 Year over Year
i33                 3-Year Average
i34                 5-Year Average
i35                10-Year Average
gr_ni                 Net Income %
i81                 Year over Year
i82                 3-Year Average
i83                 5-Year Average
i84                10-Year Average
gr_eps                       EPS %
i36                 Year over Year
i37                 3-Year Average
i38                 5-Year Average
i39                10-Year Average
Name: 0, dtype: object

<a id='cfh'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Cash Flow Health
##### DataFrame Instance

In [32]:
df_cfhealth = df.cfhealth()

In [33]:
df_cfhealth

Unnamed: 0,ticker_id,exchange_id,cf_cashflow,i40,i40_cf_Y0,i40_cf_Y1,i40_cf_Y2,i40_cf_Y3,i40_cf_Y4,i40_cf_Y5,...,cf_Y1,cf_Y2,cf_Y3,cf_Y4,cf_Y5,cf_Y6,cf_Y7,cf_Y8,cf_Y9,cf_Y10
0,1,374,318,319,,,,97.88,,,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
1,2,374,318,319,,,,97.88,,,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
2,3,374,318,319,,,,97.88,,,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
3,4,482,318,319,-31.63,19.64,50.56,-1.28,11.89,17.06,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
4,23,1,318,319,,51.78,51.78,,,,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
5,33,16,318,319,93.61,-15.19,-67.91,127.18,-35.68,10.96,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
6,33,17,318,319,93.61,-15.19,-67.91,127.18,-35.68,10.96,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
7,33,138,318,319,93.61,-15.19,-67.91,127.18,-35.68,10.96,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
8,33,141,318,319,93.61,-15.19,-67.91,127.18,-35.68,10.96,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
9,33,142,318,319,93.61,-15.19,-67.91,127.18,-35.68,10.96,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM


##### DataFrame Length

In [34]:
print('DataFrame contains {:,.0f} records.'.format(len(df_cfhealth)))

DataFrame contains 77,298 records.


##### DataFrame Columns

In [35]:
# Financial Health DataFrame Columns
(df_cfhealth.loc[0, [col for col in df_cfhealth.columns if 'Y' not in col and '_id' not in col]]
 .replace(df.colheaders['header']))

cf_cashflow                    Cash Flow Ratios
i40            Operating Cash Flow Growth % YOY
i41                 Free Cash Flow Growth % YOY
i42                      Cap Ex as a % of Sales
i43                      Free Cash Flow/Sales %
i44                   Free Cash Flow/Net Income
Name: 0, dtype: object

<a id='eff'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Efficiency
##### DataFrame Instance

In [36]:
df_efficiency = df.efficiency()

In [37]:
df_efficiency.head()

Unnamed: 0,ticker_id,exchange_id,ef_efficiency,i69,i69_ef_Y0,i69_ef_Y1,i69_ef_Y2,i69_ef_Y3,i69_ef_Y4,i69_ef_Y5,...,ef_Y1,ef_Y2,ef_Y3,ef_Y4,ef_Y5,ef_Y6,ef_Y7,ef_Y8,ef_Y9,ef_Y10
0,1,374,350,351,,,,82.34,85.86,61.96,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
1,2,374,350,351,,,,82.34,85.86,61.96,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
2,3,374,350,351,,,,82.34,85.86,61.96,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
3,4,482,350,351,25.66,28.47,27.05,29.65,30.48,32.01,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
4,23,1,350,351,93.87,58.87,58.87,11.21,11.21,84.13,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM


##### DataFrame Length

In [38]:
print('DataFrame contains {:,.0f} records.'.format(len(df_efficiency)))

DataFrame contains 77,298 records.


##### DataFrame Columns

In [39]:
# Financial Health DataFrame Columns
(df_efficiency.loc[0, [col for col in df_efficiency.columns if 'Y' not in col and '_id' not in col]]
 .replace(df.colheaders['header']))

ef_efficiency                Efficiency
i69              Days Sales Outstanding
i70                      Days Inventory
i71                     Payables Period
i72               Cash Conversion Cycle
i73                Receivables Turnover
i74                  Inventory Turnover
i75               Fixed Assets Turnover
i76                      Asset Turnover
Name: 0, dtype: object

<a id='isa'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Annual Income Statement
##### DataFrame Instance

In [40]:
df_annualIS0 = df.annualIS()

In [41]:
df_annualIS = (df_master 
 .merge(df_annualIS0, on=['ticker_id', 'exchange_id'])
 .merge(df.timerefs, left_on='Year_Y_6', right_on='id').drop('Year_Y_6', axis=1).rename(columns={'dates':'Y6'})
 .merge(df.timerefs, left_on='Year_Y_5', right_on='id').drop('Year_Y_5', axis=1).rename(columns={'dates':'Y5'})
 .merge(df.timerefs, left_on='Year_Y_4', right_on='id').drop('Year_Y_4', axis=1).rename(columns={'dates':'Y4'})
 .merge(df.timerefs, left_on='Year_Y_3', right_on='id').drop('Year_Y_3', axis=1).rename(columns={'dates':'Y3'})
 .merge(df.timerefs, left_on='Year_Y_2', right_on='id').drop('Year_Y_2', axis=1).rename(columns={'dates':'Y2'})
 .merge(df.timerefs, left_on='Year_Y_1', right_on='id').drop('Year_Y_1', axis=1).rename(columns={'dates':'Y1'})
)
df_annualIS.loc[:, 'Y5':'Y1'] = df_annualIS.loc[:, 'Y5':'Y1'].astype('datetime64')

In [42]:
df_annualIS.head()

Unnamed: 0,ticker_id,exchange_id,ticker,company,exchange,exchange_sym,industry,sector,country,country_c2,...,label_tts4,label_tts5,currency_id,fye_month,Y6,Y5,Y4,Y3,Y2,Y1
0,1.0,374.0,OGCP,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,,,104.0,12.0,TTM,2018-12-01,2017-12-01,2016-12-01,2015-12-01,2014-12-01
1,2.0,374.0,FISK,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,,,104.0,12.0,TTM,2018-12-01,2017-12-01,2016-12-01,2015-12-01,2014-12-01
2,3.0,374.0,ESBA,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,,,104.0,12.0,TTM,2018-12-01,2017-12-01,2016-12-01,2015-12-01,2014-12-01
3,19437.0,302.0,PSB,PS Business Parks Inc,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Diversified,Real Estate,United States,US,...,,,104.0,12.0,TTM,2018-12-01,2017-12-01,2016-12-01,2015-12-01,2014-12-01
4,20371.0,302.0,STOR,STORE Capital Corp,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Diversified,Real Estate,United States,US,...,,,104.0,12.0,TTM,2018-12-01,2017-12-01,2016-12-01,2015-12-01,2014-12-01


##### DataFrame Length

In [43]:
print('DataFrame contains {:,.0f} records.'.format(len(df_annualIS)))

DataFrame contains 68,356 records.


##### DataFrame Columns

In [44]:
labels = [col for col in df_annualIS if 'label' in col]
labels = [[label, header] for label in labels 
          for header in df_annualIS[label].unique().tolist() if pd.notna(header)]

df_labels_aIS = (pd.DataFrame(labels, columns=['header', 'value'])
                 .set_index('header')
                 .astype('int')
                )

df_labels_aIS['value'] = df_labels_aIS['value'].replace(df.colheaders['header'])
df_labels_aIS[df_labels_aIS['value'].astype('str').str.contains('ncome')].sort_values(by='value')

df_labels_aIS.sort_values(by='value')

Unnamed: 0_level_0,value
header,Unnamed: 1_level_1
label_i46,Advertising and market...
label_i36,Advertising and market...
label_i15,Advertising and promot...
label_i47,Amortization of intang...
label_i24,Asset impairment
label_i4,Asset mgmt and securit...
label_i85,Basic
label_i83,Basic
label_s2,"Benefits, claims and e..."
label_i14,Borrowed funds


<a id='isq'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Quarterly Income Statements
##### DataFrame Instance

In [126]:
df_quarterlyIS0 = df.quarterlyIS()

In [127]:
df_quarterlyIS.head()

NameError: name 'df_quarterlyIS' is not defined

##### DataFrame Length

In [None]:
print('DataFrame contains {:,.0f} records.'.format(len(df_quarterlyIS)))

##### DataFrame Columns

In [None]:
labels = [col for col in df_annualIS if 'label' in col]
labels = [[label, header] for label in labels 
          for header in df_annualIS[label].unique().tolist() if pd.notna(header)]

df_labels_aIS = (pd.DataFrame(labels, columns=['header', 'value'])
                 .set_index('header')
                 .astype('int')
                )

df_labels_aIS['value'] = df_labels_aIS['value'].replace(df.colheaders['header'])
df_labels_aIS[df_labels_aIS['value'].astype('str').str.contains('ncome')].sort_values(by='value')

df_labels_aIS.sort_values(by='value')

<a id='bsa'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Annual Balance Sheet
##### DataFrame Instance

In [45]:
df_bsa0 = df.annualBS()

In [46]:
df_bsa0.head()

Unnamed: 0,ticker_id,exchange_id,Year_Y_1,Year_Y_2,Year_Y_3,Year_Y_4,Year_Y_5,data_g1_Y_1,data_g1_Y_2,data_g1_Y_3,...,label_ttg2,label_ttg5,label_ttg8,label_ttgg1,label_ttgg2,label_ttgg3,label_ttgg5,label_ttgg6,label_tts1,label_tts2
0,1,374,2014-12,2015-12,2016-12,2017-12,2018-12,,,,...,,Total liabilities,Total stockholders eq...,,,,,,Total assets,Total liabilities and ...
1,2,374,2014-12,2015-12,2016-12,2017-12,2018-12,,,,...,,Total liabilities,Total stockholders eq...,,,,,,Total assets,Total liabilities and ...
2,3,374,2014-12,2015-12,2016-12,2017-12,2018-12,,,,...,,Total liabilities,Total stockholders eq...,,,,,,Total assets,Total liabilities and ...
3,4,482,2014-12,2015-12,2016-12,2017-12,2018-12,,,,...,,Total liabilities,Total stockholders eq...,,,,,,Total assets,Total liabilities and ...
4,5,1,2003-12,2004-12,2005-12,2006-06,2007-06,0.0,19946.0,167141.0,...,Total non-current asse...,Total liabilities,Total stockholders eq...,Total cash,"Net property, plant an...",,Total current liabilit...,Total non-current liab...,Total assets,Total liabilities and ...


##### DataFrame Length

In [47]:
print('DataFrame contains {:,.0f} records.'.format(len(df_bsa0)))

DataFrame contains 76,311 records.


##### DataFrame Columns

In [48]:
# Financial Health DataFrame Columns
(df_bsa0.loc[0, [col for col in df_bsa0.columns if 'Y' not in col and '_id' not in col]]
 .replace(df.colheaders['header']))

label_g1                             NaN
label_g2                             NaN
label_g5                     Liabilities
label_g8             Stockholders equity
label_gg1                            NaN
label_gg2                            NaN
label_gg3                            NaN
label_gg5                            NaN
label_gg6                            NaN
label_i1          Real estate properties
label_i10                            NaN
label_i11                            NaN
label_i12                            NaN
label_i13                            NaN
label_i14                            NaN
label_i15                            NaN
label_i16                            NaN
label_i17                            NaN
label_i18                            NaN
label_i19                            NaN
label_i2       Accumulated depreciati...
label_i21                            NaN
label_i3       Real estate properties...
label_i30                            NaN
label_i4       C

<a id='bsq'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Quarterly Balance Sheet
##### DataFrame Instance

In [49]:
df_bsq0 = df.quarterlyBS()

In [50]:
df_bsq0.head()

Unnamed: 0,ticker_id,exchange_id,Year_Y_1,Year_Y_2,Year_Y_3,Year_Y_4,Year_Y_5,data_g1_Y_1,data_g1_Y_2,data_g1_Y_3,...,label_ttg2,label_ttg5,label_ttg8,label_ttgg1,label_ttgg2,label_ttgg3,label_ttgg5,label_ttgg6,label_tts1,label_tts2
0,1,374,2017-12,2018-03,2018-06,2018-09,2018-12,,,,...,,Total liabilities,Total stockholders eq...,,,,,,Total assets,Total liabilities and ...
1,2,374,2017-12,2018-03,2018-06,2018-09,2018-12,,,,...,,Total liabilities,Total stockholders eq...,,,,,,Total assets,Total liabilities and ...
2,3,374,2017-12,2018-03,2018-06,2018-09,2018-12,,,,...,,Total liabilities,Total stockholders eq...,,,,,,Total assets,Total liabilities and ...
3,4,482,2017-12,2018-03,2018-06,2018-09,2018-12,,,,...,,Total liabilities,Total stockholders eq...,,,,,,Total assets,Total liabilities and ...
4,5,1,2007-03,2007-06,2007-09,2007-12,2008-03,1100958.0,669928.0,272986.0,...,Total non-current asse...,Total liabilities,Total stockholders eq...,Total cash,"Net property, plant an...",,Total current liabilit...,Total non-current liab...,Total assets,Total liabilities and ...


##### DataFrame Length

In [51]:
print('DataFrame contains {:,.0f} records.'.format(len(df_bsq0)))

DataFrame contains 76,216 records.


##### DataFrame Columns

In [52]:
# Financial Health DataFrame Columns
(df_bsq0.loc[0, [col for col in df_bsq0.columns if 'Y' not in col and '_id' not in col]]
 .replace(df.colheaders['header']))

label_g1                             NaN
label_g2                             NaN
label_g5                     Liabilities
label_g8             Stockholders equity
label_gg1                            NaN
label_gg2                            NaN
label_gg3                            NaN
label_gg5                            NaN
label_gg6                            NaN
label_i1          Real estate properties
label_i10                            NaN
label_i11                            NaN
label_i12                            NaN
label_i13                            NaN
label_i14                            NaN
label_i15                            NaN
label_i16                            NaN
label_i17                            NaN
label_i18                            NaN
label_i19                            NaN
label_i2       Accumulated depreciati...
label_i21                            NaN
label_i3       Real estate properties...
label_i30                            NaN
label_i4       C

<a id='cfa'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Annual Cash Flow Statement
##### DataFrame Instance

In [133]:
df_cfa0 = df.annualCF()

In [134]:
df_cfa0.head()

Unnamed: 0,ticker_id,exchange_id,Year_Y_1,Year_Y_2,Year_Y_3,Year_Y_4,Year_Y_5,Year_Y_6,data_i1_Y_1,data_i1_Y_2,...,label_i98,label_i99,label_s1,label_s2,label_s3,label_tts1,label_tts2,label_tts3,currency_id,fye_month
0,1,374,11,12,13,1,5,21,,,...,143.0,144.0,88,109,121,108,120,133,104.0,12.0
1,2,374,11,12,13,1,5,21,,,...,143.0,144.0,88,109,121,108,120,133,104.0,12.0
2,3,374,11,12,13,1,5,21,,,...,143.0,144.0,88,109,121,108,120,133,104.0,12.0
3,4,482,11,12,13,1,5,21,,,...,143.0,144.0,88,109,121,108,120,133,104.0,12.0
4,5,1,668,669,670,671,664,21,-129926.0,-209185.0,...,,,88,109,121,108,120,133,104.0,6.0


##### DataFrame Length

In [135]:
print('DataFrame contains {:,.0f} records.'.format(len(df_cfa0)))

DataFrame contains 76,767 records.


##### DataFrame Columns

In [136]:
# Financial Health DataFrame Columns
(df_cfa0.loc[0, [col for col in df_cfa0.columns if 'Y' not in col and '_id' not in col]]
 .replace(df.colheaders['header']))

label_g7                 Free Cash Flow
label_g8      Supplemental schedule ...
label_i1                     Net income
label_i10     Stock based compensati...
label_i100          Operating cash flow
label_i11                           NaN
label_i12                           NaN
label_i13                           NaN
label_i15                           NaN
label_i16           Accounts receivable
label_i17                     Inventory
label_i18              Prepaid expenses
label_i19              Accounts payable
label_i2      Depreciation & amortiz...
label_i20           Accrued liabilities
label_i21              Interest payable
label_i22          Income taxes payable
label_i23         Other working capital
label_i24                           NaN
label_i25                           NaN
label_i26                           NaN
label_i27                           NaN
label_i28                           NaN
label_i29                           NaN
label_i3      Amortization of debt d...


<a id='cfq'></a>
[return to list of methods](#methods),
[return to the top](#top)
### Quarterly Cash Flow Statement
##### DataFrame Instance

In [57]:
df_cfq0 = df.quarterlyCF()

In [58]:
df_cfq0.head()

Unnamed: 0,ticker_id,exchange_id,Year_Y_1,Year_Y_2,Year_Y_3,Year_Y_4,Year_Y_5,Year_Y_6,data_i1_Y_1,data_i1_Y_2,...,label_i96,label_i97,label_i98,label_i99,label_s1,label_s2,label_s3,label_tts1,label_tts2,label_tts3
0,1,374,2017-12,2018-03,2018-06,2018-09,2018-12,TTM,,,...,Capital expenditure,Free cash flow,Cash paid for income t...,Cash paid for interest,Cash Flows From Operat...,Cash Flows From Invest...,Cash Flows From Financ...,Net cash provided by o...,Net cash used for inve...,Net cash provided by (...
1,2,374,2017-12,2018-03,2018-06,2018-09,2018-12,TTM,,,...,Capital expenditure,Free cash flow,Cash paid for income t...,Cash paid for interest,Cash Flows From Operat...,Cash Flows From Invest...,Cash Flows From Financ...,Net cash provided by o...,Net cash used for inve...,Net cash provided by (...
2,3,374,2017-12,2018-03,2018-06,2018-09,2018-12,TTM,,,...,Capital expenditure,Free cash flow,Cash paid for income t...,Cash paid for interest,Cash Flows From Operat...,Cash Flows From Invest...,Cash Flows From Financ...,Net cash provided by o...,Net cash used for inve...,Net cash provided by (...
3,4,482,2017-12,2018-03,2018-06,2018-09,2018-12,TTM,,,...,Capital expenditure,Free cash flow,Cash paid for income t...,Cash paid for interest,Cash Flows From Operat...,Cash Flows From Invest...,Cash Flows From Financ...,Net cash provided by o...,Net cash used for inve...,Net cash provided by (...
4,5,1,2007-03,2007-06,2007-09,2007-12,2008-03,TTM,-429976.0,-980707.0,...,Capital expenditure,Free cash flow,,,Cash Flows From Operat...,Cash Flows From Invest...,Cash Flows From Financ...,Net cash provided by o...,Net cash used for inve...,Net cash provided by (...


##### DataFrame Length

In [59]:
print('DataFrame contains {:,.0f} records.'.format(len(df_cfq0)))

DataFrame contains 76,331 records.


##### DataFrame Columns

In [60]:
# Financial Health DataFrame Columns
(df_cfq0.loc[0, [col for col in df_cfq0.columns if 'Y' not in col and '_id' not in col]]
 .replace(df.colheaders['header']))

label_g7                 Free Cash Flow
label_g8      Supplemental schedule ...
label_i1                     Net income
label_i10     Stock based compensati...
label_i100          Operating cash flow
label_i11                           NaN
label_i12                           NaN
label_i13                           NaN
label_i15                           NaN
label_i16           Accounts receivable
label_i17                     Inventory
label_i18              Prepaid expenses
label_i19              Accounts payable
label_i2      Depreciation & amortiz...
label_i20           Accrued liabilities
label_i21              Interest payable
label_i22          Income taxes payable
label_i23         Other working capital
label_i24                           NaN
label_i25                           NaN
label_i26                           NaN
label_i27                           NaN
label_i28                           NaN
label_i29                           NaN
label_i3      Amortization of debt d...


<a id="stats"></a>
[return to list of methods](#methods),
[return to the top](#top)
## Below are a few statistics on database data

### Count of database records
**1.** Total number of records **before** merging reference tables *(length of `df.master`)*

In [17]:
print('DataFrame df.master contains {:,.0f} records.'.format(len(df.master)))

DataFrame df.master contains 117,711 records.


**2.** Total number of records **after** merging reference tables *(length of `df_master`)*

In [18]:
print('DataFrame df_master0 contains {:,.0f} records.'.format(len(df_master0)))

DataFrame df_master0 contains 112,157 records.


**3.** Total number of records **after** merging reference tables where the following filters apply:
- $openprice > 0$
- $lastprice > 0$

In [19]:
print('DataFrame df_master contains {:,.0f} records.'.format(len(df_master)))

DataFrame df_master contains 103,507 records.


### Last updated dates
List of dates (as a pd.Series object) of when the database records were last updated. 
The values indicate the number of records updated on each date.

In [20]:
(df_master[['updated_date', 'ticker']].groupby(by='updated_date').count().sort_index(ascending=False)
 .rename(columns={'ticker':'ticker_count'}))

Unnamed: 0_level_0,ticker_count
updated_date,Unnamed: 1_level_1
2019-03-27,54927
2019-03-26,45772
2019-03-25,2808


### Number of records by Type

In [21]:
(df_master[['security_type', 'ticker']].groupby(by='security_type').count()
 .rename(columns={'ticker':'ticker_count'}))

Unnamed: 0_level_0,ticker_count
security_type,Unnamed: 1_level_1
Closed-End Fund,1201
Exchange-Traded Fund,6186
Index,503
Money Market Fund,198
Open-End Fund,25781
Stock,69638


### Number of records by Country, based on the location of exchanges *(see next table)*

In [22]:
(df_master[['country', 'country_c3', 'ticker']]
 .groupby(by=['country', 'country_c3']).count().rename(columns={'ticker':'ticker_count'})
)

Unnamed: 0_level_0,Unnamed: 1_level_0,ticker_count
country,country_c3,Unnamed: 2_level_1
Australia,AUS,2153
Belgium,BEL,169
Canada,CAN,4184
China,CHN,3844
Finland,FIN,2
France,FRA,1201
Germany,DEU,36942
Hong Kong,HKG,2375
Luxembourg,LUX,949
Netherlands,NLD,225


### Number of records per Exchange
Where $ticker\_count > 100$

In [23]:
cols = ['country', 'country_c3', 'exchange', 'exchange_sym', 'ticker']
df_exchanges = df_master[cols].groupby(by=cols[:-1]).count().rename(columns={'ticker':'ticker_count'})
df_exchanges[df_exchanges['ticker_count'] > 100]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,ticker_count
country,country_c3,exchange,exchange_sym,Unnamed: 4_level_1
Australia,AUS,ASX - ALL MARKETS,XASX,2153
Belgium,BEL,EURONEXT - EURONEXT BRUSSELS,XBRU,169
Canada,CAN,CANADIAN NATIONAL STOCK EXCHANGE,XCNQ,418
Canada,CAN,TORONTO STOCK EXCHANGE,XTSE,2021
Canada,CAN,TSX VENTURE EXCHANGE,XTSX,1685
China,CHN,SHANGHAI STOCK EXCHANGE,XSHG,1608
China,CHN,SHENZHEN STOCK EXCHANGE,XSHE,2236
France,FRA,EURONEXT - EURONEXT PARIS,XPAR,1201
Germany,DEU,BOERSE BERLIN,XBER,8182
Germany,DEU,BOERSE DUESSELDORF,XDUS,2170


### Number of Stocks by Country of Exchange

In [24]:
(df_master
 .where(df_master['security_type'] == 'Stock').dropna(axis=0, how='all')[['country', 'country_c3', 'ticker']]
 .groupby(by=['country', 'country_c3']).count().rename(columns={'ticker':'ticker_count'})
 .sort_values(by='ticker_count', ascending=False))

Unnamed: 0_level_0,Unnamed: 1_level_0,ticker_count
country,country_c3,Unnamed: 2_level_1
Germany,DEU,36293
United States,USA,16559
United Kingdom,GBR,4010
China,CHN,3649
Canada,CAN,3372
Hong Kong,HKG,2280
Australia,AUS,1842
France,FRA,838
Luxembourg,LUX,453
Belgium,BEL,168


### Number of Stocks by Sector

In [25]:
(df_master
 .where((df_master['security_type'] == 'Stock') & (df_master['sector'] != '—')).dropna(axis=0, how='all')
 .groupby(by='sector').count()
 .rename(columns={'ticker':'stock_count'}))['stock_count'].sort_values(ascending=False)

sector
Basic Materials           11512
Industrials                9784
Technology                 9570
Consumer Cyclical          8792
Financial Services         7921
Healthcare                 7520
Energy                     3827
Consumer Defensive         3697
Real Estate                3607
Utilities                  1948
Communication Services     1460
Name: stock_count, dtype: int64

### Number of Stocks by Industry

In [77]:
(df_master[['sector', 'industry', 'ticker']]
 .where((df_master['security_type'] == 'Stock') & (df_master['industry'] != '—')).dropna(axis=0, how='all')
 .groupby(by=['sector', 'industry']).count().rename(columns={'ticker':'stock_count'}))

Unnamed: 0_level_0,Unnamed: 1_level_0,stock_count
sector,industry,Unnamed: 2_level_1
Basic Materials,Agricultural Inputs,287
Basic Materials,Aluminum,139
Basic Materials,Building Materials,815
Basic Materials,Chemicals,713
Basic Materials,Coal,376
Basic Materials,Copper,290
Basic Materials,Gold,1845
Basic Materials,Industrial Metals & Minerals,5235
Basic Materials,Lumber & Wood Production,136
Basic Materials,Paper & Paper Products,292


### Mean Price Ratios (P/E, P/S, P/B, P/CF) of Common Stocks by Sectors

Merge *Master* and *Valuation* DataFrames

In [81]:
df_valuation = (df_master
                .where(df_master['security_type'] == 'Stock').dropna(axis=0, how='all')
                .merge(df_vals, on=['ticker_id', 'exchange_id'])
                .drop(['ticker_id', 'exchange_id'], axis=1))

#### Mean Price Ratios for all stocks:

In [82]:
df_valuation_mean = (df_valuation[['sector', 'company']].groupby('sector').count()
                     .rename(columns={'company':'count'})
                     .merge(df_valuation.groupby('sector').mean().round(1), on='sector')
                     .drop(['aprvol', 'avevol'], axis=1))

In [83]:
df_valuation_mean.columns

Index(['count', 'openprice', 'lastprice', 'day_hi', 'day_lo', '_52wk_hi',
       '_52wk_lo', 'yield', 'Forward_PE', 'pb', 'ps', 'pc', 'PE_2009',
       'PE_2010', 'PE_2011', 'PE_2012', 'PE_2013', 'PE_2014', 'PE_2015',
       'PE_2016', 'PE_2017', 'PE_2018', 'PE_TTM', 'PS_2009', 'PS_2010',
       'PS_2011', 'PS_2012', 'PS_2013', 'PS_2014', 'PS_2015', 'PS_2016',
       'PS_2017', 'PS_2018', 'PS_TTM', 'PB_2009', 'PB_2010', 'PB_2011',
       'PB_2012', 'PB_2013', 'PB_2014', 'PB_2015', 'PB_2016', 'PB_2017',
       'PB_2018', 'PB_TTM', 'PC_2009', 'PC_2010', 'PC_2011', 'PC_2012',
       'PC_2013', 'PC_2014', 'PC_2015', 'PC_2016', 'PC_2017', 'PC_2018',
       'PC_TTM'],
      dtype='object')

In [84]:
print('For a total of {:,.0f} stock records:'.format(len(df_valuation)))
df_valuation_mean[['count', 'Forward_PE', 'PE_TTM', 'PB_TTM', 'PS_TTM', 'PC_TTM']]

For a total of 69,661 stock records:


Unnamed: 0_level_0,count,Forward_PE,PE_TTM,PB_TTM,PS_TTM,PC_TTM
sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Basic Materials,11506,17.1,28.3,495.0,331.1,26.1
Communication Services,1461,29.7,41.8,3.7,5.2,45.7
Consumer Cyclical,8800,24.8,35.2,5.5,31.7,26.8
Consumer Defensive,3706,21.1,34.0,5.9,23.9,45.8
Energy,3827,48.6,31.1,4.8,192.8,12.5
Financial Services,7916,13.7,29.8,41.7,410.6,263.9
Healthcare,7521,42.7,68.7,18.6,935.9,59.5
Industrials,9791,20.7,46.5,8.9,7796.3,33.1
Real Estate,3617,24.5,34.0,22.4,35.6,139.2
Technology,9569,42.9,61.6,10.2,179.4,132.7


#### Mean Price Ratios for USA stocks:

In [85]:
df_valuation_usa = df_valuation[df_valuation['country_c3'] == 'USA']

In [86]:
df_valuation_mean_usa = (df_valuation_usa[['sector', 'company']].groupby('sector').count()
                         .rename(columns={'company':'count'})
                         .merge(df_valuation_usa.groupby('sector').mean().round(1), on='sector')
                         .drop(['aprvol', 'avevol'], axis=1))

In [87]:
df_valuation_mean_usa.columns

Index(['count', 'openprice', 'lastprice', 'day_hi', 'day_lo', '_52wk_hi',
       '_52wk_lo', 'yield', 'Forward_PE', 'pb', 'ps', 'pc', 'PE_2009',
       'PE_2010', 'PE_2011', 'PE_2012', 'PE_2013', 'PE_2014', 'PE_2015',
       'PE_2016', 'PE_2017', 'PE_2018', 'PE_TTM', 'PS_2009', 'PS_2010',
       'PS_2011', 'PS_2012', 'PS_2013', 'PS_2014', 'PS_2015', 'PS_2016',
       'PS_2017', 'PS_2018', 'PS_TTM', 'PB_2009', 'PB_2010', 'PB_2011',
       'PB_2012', 'PB_2013', 'PB_2014', 'PB_2015', 'PB_2016', 'PB_2017',
       'PB_2018', 'PB_TTM', 'PC_2009', 'PC_2010', 'PC_2011', 'PC_2012',
       'PC_2013', 'PC_2014', 'PC_2015', 'PC_2016', 'PC_2017', 'PC_2018',
       'PC_TTM'],
      dtype='object')

In [88]:
print('For a total of {:,.0f} stock records:'.format(len(df_valuation_usa)))
df_valuation_mean_usa[['count', 'Forward_PE', 'PE_TTM', 'PB_TTM', 'PS_TTM', 'PC_TTM']]

For a total of 16,592 stock records:


Unnamed: 0_level_0,count,Forward_PE,PE_TTM,PB_TTM,PS_TTM,PC_TTM
sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Basic Materials,2320,17.4,23.6,2683.7,355.4,33.8
Communication Services,319,34.2,38.7,6.5,11.5,28.3
Consumer Cyclical,1853,23.7,36.0,8.4,125.0,22.1
Consumer Defensive,891,21.2,37.0,13.7,84.8,41.9
Energy,987,26.5,42.7,5.4,175.5,20.7
Financial Services,2506,15.1,28.4,4.8,265.2,35.2
Healthcare,1974,50.7,70.7,36.0,1924.7,35.1
Industrials,2256,26.1,45.4,18.6,55.0,37.4
Real Estate,945,26.8,43.5,2.0,13.5,37.0
Technology,2060,43.4,68.1,21.8,617.2,87.3


#### Mean Price Ratios for DEU (Germany) stocks:

In [89]:
df_valuation_deu = df_valuation[df_valuation['country_c3'] == 'DEU']

In [90]:
df_valuation_mean_deu = (df_valuation_deu[['sector', 'company']].groupby('sector').count()
                         .rename(columns={'company':'count'})
                         .merge(df_valuation_deu.groupby('sector').mean().round(1), on='sector')
                         .drop(['aprvol', 'avevol'], axis=1))

In [91]:
df_valuation_mean_deu.columns

Index(['count', 'openprice', 'lastprice', 'day_hi', 'day_lo', '_52wk_hi',
       '_52wk_lo', 'yield', 'Forward_PE', 'pb', 'ps', 'pc', 'PE_2009',
       'PE_2010', 'PE_2011', 'PE_2012', 'PE_2013', 'PE_2014', 'PE_2015',
       'PE_2016', 'PE_2017', 'PE_2018', 'PE_TTM', 'PS_2009', 'PS_2010',
       'PS_2011', 'PS_2012', 'PS_2013', 'PS_2014', 'PS_2015', 'PS_2016',
       'PS_2017', 'PS_2018', 'PS_TTM', 'PB_2009', 'PB_2010', 'PB_2011',
       'PB_2012', 'PB_2013', 'PB_2014', 'PB_2015', 'PB_2016', 'PB_2017',
       'PB_2018', 'PB_TTM', 'PC_2009', 'PC_2010', 'PC_2011', 'PC_2012',
       'PC_2013', 'PC_2014', 'PC_2015', 'PC_2016', 'PC_2017', 'PC_2018',
       'PC_TTM'],
      dtype='object')

In [92]:
print('For a total of {:,.0f} stock records:'.format(len(df_valuation_deu)))
df_valuation_mean_deu[['count', 'Forward_PE', 'PE_TTM', 'PB_TTM', 'PS_TTM', 'PC_TTM']]

For a total of 36,286 stock records:


Unnamed: 0_level_0,count,Forward_PE,PE_TTM,PB_TTM,PS_TTM,PC_TTM
sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Basic Materials,5877,17.1,22.4,4.6,125.9,18.2
Communication Services,910,27.6,39.4,3.0,3.8,55.9
Consumer Cyclical,4620,27.1,32.0,4.4,11.7,19.5
Consumer Defensive,1953,21.1,26.2,3.7,6.5,35.5
Energy,2054,51.1,23.5,5.2,165.3,9.1
Financial Services,3698,13.0,29.2,2.2,8.7,27.8
Healthcare,4225,41.9,74.2,7.8,699.5,40.0
Industrials,5003,18.8,45.4,6.5,155.6,23.6
Real Estate,1741,25.7,30.3,1.6,45.0,192.9
Technology,5176,45.3,53.5,6.9,72.6,161.8


#### Plots of TTM P/E by Sectors
*All Stocks*

In [93]:
pe = df_valuation_mean['PE_TTM']

In [94]:
fig_pe, ax_pe = plt.subplots(1)
x = [x*3 for x in range(len(pe))]
y = pe.sort_values(ascending=True)
bars = ax_pe.bar(x, y, width=2)
plt.xticks(ticks=x, labels=y.index.tolist(), fontsize=9)
for tick in ax_pe.xaxis.get_ticklabels():
    tick.set_rotation(90)
plt.subplots_adjust(bottom=0.4)
plt.title('Average TTM P/E by Sector for all Stocks', fontsize=10, fontweight='bold')
plt.yticks([])
for bar in bars:
    ax_pe.text(bar.get_x()+1, bar.get_height()+1, '{:.1f}'.format(bar.get_height()), 
               color='black', ha='center', fontsize=9)
plt.axis([-3, len(x)*3, 0, 80])
ax_pe.get_children()[22].set_color(None)
ax_pe.get_children()[23].set_color(None)
ax_pe.get_children()[25].set_color(None)

<IPython.core.display.Javascript object>

*USA*

In [95]:
pe = df_valuation_mean_usa['PE_TTM']

In [96]:
fig_pe, ax_pe = plt.subplots(1)
x = [x*3 for x in range(len(pe))]
y = pe.sort_values(ascending=True)
bars = ax_pe.bar(x, y, width=2)
plt.xticks(ticks=x, labels=y.index.tolist(), fontsize=9)
for tick in ax_pe.xaxis.get_ticklabels():
    tick.set_rotation(90)
plt.subplots_adjust(bottom=0.4)
plt.title('Average TTM P/E by Sector for all US Stocks', fontsize=10, fontweight='bold')
plt.yticks([])
for bar in bars:
    ax_pe.text(bar.get_x()+1, bar.get_height()+1, '{:.1f}'.format(bar.get_height()), 
               color='black', ha='center', fontsize=9)
plt.axis([-3, len(x)*3, 0, 80])
ax_pe.get_children()[22].set_color(None)
ax_pe.get_children()[23].set_color(None)
ax_pe.get_children()[25].set_color(None)

<IPython.core.display.Javascript object>

*DEU*

In [43]:
pe = df_valuation_mean_deu['PE_TTM']

In [44]:
fig_pe, ax_pe = plt.subplots(1)
x = [x*3 for x in range(len(pe))]
y = pe.sort_values(ascending=True)
bars = ax_pe.bar(x, y, width=2)
plt.xticks(ticks=x, labels=y.index.tolist(), fontsize=9)
for tick in ax_pe.xaxis.get_ticklabels():
    tick.set_rotation(90)
plt.subplots_adjust(bottom=0.4)
plt.title('Average TTM P/E by Sector for all German Stocks', fontsize=10, fontweight='bold')
plt.yticks([])
for bar in bars:
    ax_pe.text(bar.get_x()+1, bar.get_height()+1, '{:.1f}'.format(bar.get_height()), 
               color='black', ha='center', fontsize=9)
plt.axis([-3, len(x)*3, 0, 80])
ax_pe.get_children()[22].set_color(None)
ax_pe.get_children()[23].set_color(None)
ax_pe.get_children()[25].set_color(None)

<IPython.core.display.Javascript object>

### Stocks in the Cannabis Industry
Using stocks listed on [marijuanaindex.com](http://marijuanaindex.com/stock-quotes/north-american-marijuana-index/) under North America

In [45]:
import json

with open('input/pot_stocks.json') as file:
    pot_symbols = json.loads(file.read())
    
pot_stocks = (pd.DataFrame(pot_symbols, columns=['ticker', 'country_c3'])
               .merge(df_master, how='left', on=['ticker', 'country_c3']).drop('country', axis=1)
               .rename(columns={'country_c3':'country', 'exchange_sym':'exch'}))

pot_stocks = (pot_stocks.where(((pot_stocks['country'] == 'USA') | 
                                (pot_stocks['country'] == 'CAN')) &
                               (pot_stocks['sector'] != '—'))
              .dropna(axis=0, how='all').sort_values(by='company'))

In [46]:
msg = 'Below are the {} stocks listed on marijuanaindex.com for North America.'
print(msg.format(len(pot_stocks['company'].unique())))

pot_stocks[['country', 'ticker', 'exch', 'company', 'sector', 'industry']]

Below are the 46 stocks listed on marijuanaindex.com for North America.


Unnamed: 0,country,ticker,exch,company,sector,industry
29,CAN,TGIF,XCNQ,1933 Industries Inc,Healthcare,Drug Manufacturers - Specialty & Generic
1,USA,ACRGF,PINX,Acreage Holdings Inc Ordinary Shares (Sub Voting),Healthcare,Drug Manufacturers - Specialty & Generic
2,CAN,ACRG.U,XCNQ,Acreage Holdings Inc Ordinary Shares (Sub Voting),Healthcare,Drug Manufacturers - Specialty & Generic
37,USA,APHA,XNYS,Aphria Inc,Healthcare,Drug Manufacturers - Specialty & Generic
0,CAN,ACB,XTSE,Aurora Cannabis Inc,Healthcare,Drug Manufacturers - Specialty & Generic
36,CAN,XLY,XTSX,Auxly Cannabis Group Inc,Healthcare,Drug Manufacturers - Specialty & Generic
40,USA,CVSI,PINX,CV Sciences Inc,Healthcare,Drug Manufacturers - Specialty & Generic
32,CAN,TRST,XTSE,CannTrust Holdings Inc,Healthcare,Drug Manufacturers - Specialty & Generic
4,CAN,CNNX,XCNQ,Cannex Capital Holdings Inc,Healthcare,Drug Manufacturers - Specialty & Generic
38,USA,CGC,XNYS,Canopy Growth Corp,Healthcare,Drug Manufacturers - Specialty & Generic


<a id="value"></a>
[return to the top](#top)

## Applying various criteria to filter common stocks

- **[Rule 1](#rule1): No earnings deficit (loss) for past 5 years**
- **[Rule 2](#rule2): Uniterrupted and increasing Dividends for past 5 yrs**
- **[Rule 3](#rule3): P/E Ratio of 25 or less for the past 7 yrs and less then 20 for TTM¶**
- **[Rule 4](#rule4): P/B Ratio of 1 or less for TTM**
- **[Rule 5](#rule5): Filter for "bargain issues"**
- **[Rule 6](#rule5): **
- **[Rule 7](#rule5): **
- **[Rule 8](#rule5): **
<a id="rule1"></a>

### Rule 1. No earnings deficit (loss) for past 5 years

**a. Identify *Net Income* column labels in** `df_annualIS`

In [66]:
data = 'Net income'
df_labels = df_labels_aIS[df_labels_aIS['value'] == data].sort_values(by='value')
df_labels

Unnamed: 0_level_0,value
header,Unnamed: 1_level_1
label_i50,Net income
label_i70,Net income
label_i80,Net income


**b. Get column headers for 'Net income' values for the past 5 yrs**

In [67]:
i_ids = [(label[-3:] + '_') for label in df_labels.index]

def get_icols(col):
    for i_id in i_ids:
        if i_id in col:
            return True
    return False

main_cols1 = ['ticker_id', 'exchange_id', 
             'country', 'exchange_sym', 'ticker', 'company', 
             'sector', 'industry', 'stock_type', 'style', 
             'Y6', 'Y5', 'Y4', 'Y3', 'Y2', 'Y1']
data_cols = sorted(list(filter(get_icols, df_annualIS.columns)), key=lambda r: (r[-1], r[5:8]), reverse=True)
print('The following columns contain \'{}\' values:\n{}'.format(data, data_cols))

The following columns contain 'Net income' values:
['data_i80_Y_6', 'data_i70_Y_6', 'data_i50_Y_6', 'data_i80_Y_5', 'data_i70_Y_5', 'data_i50_Y_5', 'data_i80_Y_4', 'data_i70_Y_4', 'data_i50_Y_4', 'data_i80_Y_3', 'data_i70_Y_3', 'data_i50_Y_3', 'data_i80_Y_2', 'data_i70_Y_2', 'data_i50_Y_2', 'data_i80_Y_1', 'data_i70_Y_1', 'data_i50_Y_1']


**c. Create 'Net Income' DataFrame**

In [68]:
df_netinc5 = (df_annualIS
              .where((df_annualIS['security_type'] == 'Stock') & 
                     (df_annualIS['Y5'] >= pd.to_datetime('2018-01')))
              .dropna(axis=0, how='all')
              .drop(['country'], axis=1)
              .rename(columns={'country_c3':'country'})
             )[main_cols1 + data_cols]

np_netinc = df_netinc5[data_cols].values
netinc_cols = [('Net_Income_Y' + data_cols[i * 3][-1], (i * 3, i * 3 + 1, i * 3 + 2))
               for i in range(int(len(data_cols)/3))]

vals = []
for row in np_netinc:
    row_vals = []
    for i in range(len(netinc_cols)):
        val = None
        for col in netinc_cols[i][1]:
            if not np.isnan(row[col]):
                val = row[col]
                break
        row_vals.append(val)
    vals.append(row_vals)
    
df_vals = pd.DataFrame(vals, columns=list(zip(*netinc_cols))[0])
df_netinc5 = df_netinc5[main_cols1].join(df_vals)

In [69]:
df_rule1 = df_netinc5.where((df_netinc5['Net_Income_Y6'] > 0) & 
                            ((df_netinc5['Net_Income_Y5'] > 0) | (df_netinc5['Net_Income_Y5'].isna())) & 
                            ((df_netinc5['Net_Income_Y4'] > 0) | (df_netinc5['Net_Income_Y4'].isna())) & 
                            ((df_netinc5['Net_Income_Y3'] > 0) | (df_netinc5['Net_Income_Y3'].isna())) & 
                            ((df_netinc5['Net_Income_Y2'] > 0) | (df_netinc5['Net_Income_Y2'].isna())) & 
                            ((df_netinc5['Net_Income_Y1'] > 0) | (df_netinc5['Net_Income_Y1'].isna()))
                           ).dropna(axis=0, how='all')

In [70]:
df_rule1

Unnamed: 0,ticker_id,exchange_id,country,exchange_sym,ticker,company,sector,industry,stock_type,style,...,Y4,Y3,Y2,Y1,Net_Income_Y6,Net_Income_Y5,Net_Income_Y4,Net_Income_Y3,Net_Income_Y2,Net_Income_Y1
0,1.0,374.0,USA,ARCX,OGCP,Empire State Realty OP LP Operating Partnershi...,Real Estate,REIT - Diversified,Hard Asset,Mid Core,...,2017-12-01,2016-12-01,2015-12-01,2014-12-01,1.164020e+08,1.164020e+08,1.182530e+08,1.072500e+08,7.992800e+07,7.021000e+07
1,2.0,374.0,USA,ARCX,FISK,Empire State Realty OP LP Operating Partnershi...,Real Estate,REIT - Diversified,Hard Asset,Mid Core,...,2017-12-01,2016-12-01,2015-12-01,2014-12-01,1.164020e+08,1.164020e+08,1.182530e+08,1.072500e+08,7.992800e+07,7.021000e+07
2,3.0,374.0,USA,ARCX,ESBA,Empire State Realty OP LP Operating Partnershi...,Real Estate,REIT - Diversified,Hard Asset,Mid Core,...,2017-12-01,2016-12-01,2015-12-01,2014-12-01,1.164020e+08,1.164020e+08,1.182530e+08,1.072500e+08,7.992800e+07,7.021000e+07
3,19437.0,302.0,USA,XNYS,PSB,PS Business Parks Inc,Real Estate,REIT - Diversified,Hard Asset,Mid Core,...,2017-12-01,2016-12-01,2015-12-01,2014-12-01,2.267020e+08,2.267020e+08,1.550370e+08,1.280290e+08,1.304750e+08,1.739710e+08
5,19275.0,302.0,USA,XNYS,LPT,Liberty Property Trust,Real Estate,REIT - Office,Hard Asset,Mid Core,...,2017-12-01,2016-12-01,2015-12-01,2014-12-01,4.796070e+08,4.796070e+08,2.823400e+08,3.568170e+08,2.380390e+08,2.179100e+08
6,19240.0,302.0,USA,XNYS,KRC,Kilroy Realty Corp,Real Estate,REIT - Office,Hard Asset,Mid Core,...,2017-12-01,2016-12-01,2015-12-01,2014-12-01,2.584150e+08,2.584150e+08,1.646120e+08,2.937880e+08,2.340810e+08,1.802190e+08
7,19108.0,302.0,USA,XNYS,RHP,Ryman Hospitality Properties Inc,Real Estate,REIT - Hotel & Motel,Hard Asset,Mid Core,...,2017-12-01,2016-12-01,2015-12-01,2014-12-01,2.646700e+08,2.646700e+08,1.761000e+08,1.593660e+08,1.115110e+08,1.264520e+08
8,18925.0,302.0,USA,XNYS,NNN,National Retail Properties Inc,Real Estate,REIT - Retail,Hard Asset,Mid Core,...,2017-12-01,2016-12-01,2015-12-01,2014-12-01,2.924470e+08,2.924470e+08,2.649730e+08,2.395000e+08,1.978360e+08,1.906010e+08
9,16870.0,44.0,USA,XNAS,REG,Regency Centers Corp,Real Estate,REIT - Retail,Hard Asset,Mid Core,...,2017-12-01,2016-12-01,2015-12-01,2014-12-01,2.491270e+08,2.491270e+08,1.760770e+08,1.649220e+08,1.500560e+08,1.873900e+08
10,18726.0,302.0,USA,XNYS,AIV,Apartment Investment & Management Co,Real Estate,REIT - Residential,Hard Asset,Mid Core,...,2017-12-01,2016-12-01,2015-12-01,2014-12-01,6.739660e+08,6.739660e+08,3.235380e+08,4.376490e+08,2.556530e+08,3.157460e+08


[return to top of this section](#value),
[return to the top](#top)
<a id="rule2"></a>
### Rule 2. Uniterrupted and increasing *Dividends* for past 7 yrs

**a. Identify *Dividends* column label in** `df_keyratios`

In [71]:
icol = df_labels_kratios[df_labels_kratios == 'Dividends'].index[0]
icol

'i6'

**b. Get column headers for *Dividends* for the past 5 yrs**

In [72]:
main_cols2 = ['ticker_id', 'exchange_id', 
             #'country_c3', 'exchange_sym', 'ticker', 'company', 
             #'sector', 'industry', 'stock_type', 'style', 
             'Y10', 'Y9', 'Y8', 'Y7', 'Y6', 'Y5']
icols = sorted([col for col in df_keyratios.columns if icol + '_' in col], 
               key=lambda col: int(col[4:]), reverse=True)[:8]
icols

['i6_Y10', 'i6_Y9', 'i6_Y8', 'i6_Y7', 'i6_Y6', 'i6_Y5', 'i6_Y4', 'i6_Y3']

**c. Create 'Net Income' DataFrame**

In [73]:
df_rule2 = (df_keyratios
            .where((df_keyratios['security_type'] == 'Stock') & 
                   (df_keyratios['Y9'] >= pd.to_datetime('2018-01')) & 
                   (df_keyratios['i6_Y10'].notna()) & (df_keyratios['i6_Y9'].notna()) &
                   (df_keyratios['i6_Y8'].notna()) & (df_keyratios['i6_Y7'].notna()) &
                   (df_keyratios['i6_Y6'].notna()) & (df_keyratios['i6_Y5'].notna()) & 
                   (df_keyratios['i6_Y4'].notna()) & (df_keyratios['i6_Y3'].notna()) & 
                   (df_keyratios['i6_Y10'] >= df_keyratios['i6_Y9']) & 
                   (df_keyratios['i6_Y9'] >= df_keyratios['i6_Y8']) & 
                   (df_keyratios['i6_Y8'] >= df_keyratios['i6_Y7']) & 
                   (df_keyratios['i6_Y7'] >= df_keyratios['i6_Y6']) & 
                   (df_keyratios['i6_Y6'] >= df_keyratios['i6_Y5']) & 
                   (df_keyratios['i6_Y5'] >= df_keyratios['i6_Y4']) & 
                   (df_keyratios['i6_Y4'] >= df_keyratios['i6_Y3']))
            .dropna(axis=0, how='all').sort_values(by='Y9', ascending=False))[main_cols2 + icols]

df_rule2.columns = main_cols2 + [col.replace('i6', 'Dividend') for col in icols]

In [74]:
df_rule2

Unnamed: 0,ticker_id,exchange_id,Y10,Y9,Y8,Y7,Y6,Y5,Dividend_Y10,Dividend_Y9,Dividend_Y8,Dividend_Y7,Dividend_Y6,Dividend_Y5,Dividend_Y4,Dividend_Y3
53529,19104.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,1.52,1.52,1.52,1.48,1.44,1.32,1.10,0.80
16733,18794.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,1.80,1.80,1.36,1.12,0.92,0.72,0.68,0.62
15950,16865.0,44.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,0.52,0.52,0.52,0.52,0.52,0.50,0.48,0.42
55932,18820.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,0.28,0.28,0.28,0.28,0.28,0.28,0.28,0.28
16650,19178.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,4.12,4.12,3.56,2.76,2.36,1.88,1.56,1.16
16719,19556.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,1.48,1.48,1.24,1.04,0.88,0.72,0.60,0.48
16720,19249.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,2.44,2.44,2.20,2.00,1.80,1.56,1.40,1.28
16721,19393.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,1.48,1.48,1.48,1.48,1.48,1.32,1.20,1.08
16722,19276.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,2.40,2.40,2.40,2.40,2.00,1.36,1.20,1.00
16725,16940.0,44.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,0.32,0.32,0.29,0.28,0.26,0.24,0.24,0.15


[return to top of this section](#value),
[return to the top](#top)
<a id="rule3"></a>
### Rule 3. P/E Ratio of 25 or less for the past 7 yrs and less then 20 for TTM

In [75]:
pe_cols = [col for col in df_valuation.columns if 'PE_' in col]
pe_cols = [pe_cols[len(pe_cols)-i-1] for i in range(len(pe_cols))][:8]
pe_cols

NameError: name 'df_valuation' is not defined

In [None]:
df_rule3 = (df_vals[['ticker_id', 'exchange_id'] + pe_cols]
            .where((df_vals['PE_TTM'] <= 20) & (df_vals['PE_2018'] <= 25) &
                   (df_vals['PE_2017'] <= 25) & (df_vals['PE_2016'] <= 25) &
                   (df_vals['PE_2015'] <= 25) & (df_vals['PE_2014'] <= 25) &
                   (df_vals['PE_2013'] <= 25) & (df_vals['PE_2012'] <= 25)).dropna(axis=0, how='all'))

In [None]:
df_rule3

[return to top of this section](#value),
[return to the top](#top)
<a id="rule4"></a>
### Rule 4. P/B Ratio of 1 or less for TTM

In [None]:
pb_cols = [col for col in df_vals.columns if 'PB_' in col]
pb_cols = [pb_cols[len(pb_cols)-i-1] for i in range(len(pb_cols))][:6]
pb_cols

In [None]:
df_rule4 = (df_vals[['ticker_id', 'exchange_id'] + pb_cols]
            .where(df_vals['PB_TTM'] <= 1).dropna(axis=0))

In [None]:
df_rule4

[return to top of this section](#value),
[return to the top](#top)
<a id="rule5"></a>
### Rule 5. Filtering for *bargain issues* as described by Benjamin Graham in *The Intelligent Investor*:

In his book, Graham defines bargain issues as common stocks where:

- $stock\ price\ < \frac{2}{3}\ net\ current\ asset\ value\ per\ share\ (NCAVPS)\$

- $net\ current\ asset\ value\ (NCAV)\ = current\ assets\ - total\ liabilities\ - preferred\ stocks\$

[return to top of this section](#value),
[return to the top](#top)
<a id="rule6"></a>
### Rule 6. Operating Cash Flow growth for the past 7 yrs

[return to top of this section](#value),
[return to the top](#top)
<a id="rule7"></a>
### Rule 7. *Owner earnings* growth rate > 6% over past 7 years

$owner\ earning's = net\ income + amortization\ and\ depreciation\ - normal\ capital\ expenditures$

Benjamin Graham mentions in *Intelligent Investor* that, because it adjusts for entries like amortization and depreciation that do not affect the company's cash balances, *owner earnings* is a better measure to reported net income. 

[return to top of this section](#value),
[return to the top](#top)
<a id="rule8"></a>
### Rule 8. Long-term debt < 50% of total capital

[return to top of this section](#value),
[return to the top](#top)
<a id="rule9"></a>
### Rule 9. NAV per share > Stock Price

The definition of NAV as described by Graham in the Intelligent Investor

$net\ asset\ value\ (NAV)\ = total\ assets\ - intangible\ assets\ (patents,\ goodwill)\ - total\ liabilities\$

[return to top of this section](#value),
[return to the top](#top)
<a id="rulex"></a>
### Rule x. Positive ratio of earnings to fixed charges

### Merging DataFrames

In [87]:
df_rules = (df_rule1
            .merge(df_rule2, on=['ticker_id', 'exchange_id'])
            .merge(df_rule3, on=['ticker_id', 'exchange_id'])
            .merge(df_rule4, on=['ticker_id', 'exchange_id'])
           )

In [88]:
df_rules.columns.values

array(['ticker_id', 'exchange_id', 'country', 'exchange_sym', 'ticker',
       'company', 'sector', 'industry', 'stock_type', 'style', 'Y6_x',
       'Y5_x', 'Y4', 'Y3', 'Y2', 'Y1', 'Net_Income_Y6', 'Net_Income_Y5',
       'Net_Income_Y4', 'Net_Income_Y3', 'Net_Income_Y2', 'Net_Income_Y1',
       'Y10', 'Y9', 'Y8', 'Y7', 'Y6_y', 'Y5_y', 'Dividend_Y10',
       'Dividend_Y9', 'Dividend_Y8', 'Dividend_Y7', 'Dividend_Y6',
       'Dividend_Y5', 'Dividend_Y4', 'Dividend_Y3', 'PE_TTM', 'PE_2018',
       'PE_2017', 'PE_2016', 'PE_2015', 'PE_2014', 'PE_2013', 'PE_2012',
       'PB_TTM', 'PB_2018', 'PB_2017', 'PB_2016', 'PB_2015', 'PB_2014'],
      dtype=object)

In [89]:
df_rules.groupby('company').count()['ticker_id']

company
AP (Thailand) PCL DR                                                     1
APT Satellite Holdings Ltd                                               1
Acme United Corp                                                         1
Akbank TAS ADR                                                           2
Apollo Commercial Real Estate Finance Inc                                1
Ares Capital Corp                                                        1
Associated Banc-Corp                                                     1
Assured Guaranty Ltd                                                     1
Bangkok Bank PCL DR                                                      1
Bangkok Bank PCL Shs Foreign Registered                                  5
BankMuscat (SAOG) GDR                                                    1
Bar Harbor Bankshares Inc                                                1
Bayerische Motoren Werke AG                                              7
Bayerische Motore

<a id="additional"></a>
[return to the top](#top)

## Additional sample / test code

In [16]:
df = None # Set df variable to none to close db connection 

Database connection for file db/mstables2.sqlite closed.
