<a id='top'></a>
# How to slice and dice the data
Below are a series of examples on how to slice and dice the data that is stored in the *.sqlite* file generated by the [MorningStar.com](https://www.morningstar.com) web scraper. 

##### NOTE: 
- The data used in the code below come from the *.sqlite* file that is automatically generated by the web scraper once it has been installed and ran locally on your machine. See [README]() for instructions on how to run install and run the scraper.
- Navigation links only work with [Jupyter](https://jupyter.org/).


**Content** 

1. [Required modules and matplotlib backend](#modules)
1. [Creating a master (bridge table) DataFrame instance using the DataFrames class](#master)
1. [Methods for creating DataFrame instances](#methods)
    1. `quoteheader` - [MorningStar (MS) Quote Header](#quote)
    1. `valuation` - [MS Valuation table with Price Ratios (P/E, P/S, P/B, P/C) for the past 10 yrs](#val)
    1. `keyratios` - [MS Ratio - Key Financial Ratios & Values](#keyratios)
    1. `finhealth` - [MS Ratio - Financial Health](#finhealth)
    1. `profitability` - [MS Ratio - Profitability](#prof)
    1. `growth` - [MS Ratio - Growth](#growth)
    1. `cfhealth` - [MS Ratio - Cash Flow Health](#cfh)
    1. `efficiency` - [MS Ratio - Efficiency](#eff)
    1. `annualIS` - [MS Annual Income Statements](#isa)
    1. `quarterlyIS` - [MS Quarterly Income Statements](#isq)
    1. `annualBS` - [MS Annual Balance Sheets](#bsa)
    1. `quarterlyBS` - [MS Quarterly Balance Sheets](#bsq)
    1. `annualCF` - [MS Annual Cash Flow Statements](#cfa)
    1. `quarterlyCF` - [MS Quarterly Cash Flow Statements](#cfq)
1. [Performing statistical analysis](#stats)
    1. [Count of database records](#stats)
    1. [Last updated dates](#lastupdate)
    1. [Number of records by security type](#type)
    1. [Number of records by country, based on the location of exchanges](#country)
    1. [Number of records per exchange](#exchange)
    1. [Number of stocks by sector](#sector)
    1. [Number of stocks by industry](#industry)
    1. [Mean price ratios (P/E, P/S, P/B, P/CF) of stocks by sectors](#meanpr)
1. [Applying various criteria to filter common stocks](#value) *(in progress)*
1. [Additional sample / test code](#additional) *(in progress)*

<a id="modules"></a>
## Required modules and matplotlib backend

In [1]:
%matplotlib notebook

In [2]:
import matplotlib.pyplot as plt
import matplotlib

In [3]:
from importlib import reload
import pandas as pd
import numpy as np

# Import dataframes module from project folder.
# This module contains a class that reads the database tables and assigns the data to pandas.DataFrame objects
import dataframes
reload(dataframes) #reload if changes have been made to module file

<module 'dataframes' from '/home/cbrandao/lib/python/msTables/dataframes.py'>

[return to the top](#top)
<a id="master"></a>
## Creating a master DataFrame instance using the dataframes class
The DataFrames class is part of the [dataframes module](dataframes.py)

In [4]:
db_file_name = 'mstables' # Change the file name here as needed
df = dataframes.DataFrames('db/{}.sqlite'.format(db_file_name))

Creating intial DataFrames from file db/mstables.sqlite...
Creating DataFrame 'colheaders' ...
Creating DataFrame 'timerefs' ...
Creating DataFrame 'urls' ...
Creating DataFrame 'securitytypes' ...
Creating DataFrame 'tickers' ...
Creating DataFrame 'sectors' ...
Creating DataFrame 'industries' ...
Creating DataFrame 'stockstyles' ...
Creating DataFrame 'exchanges' ...
Creating DataFrame 'countries' ...
Creating DataFrame 'companies' ...
Creating DataFrame 'currencies' ...
Creating DataFrame 'stocktypes' ...
Creating DataFrame 'master' ...
Initial DataFrames created.


### Creating Master DataFrame instance from reference tables
By merging `df.master` (*Master* bridge table) with other reference tables (e.g. `df.tickers`, `df.exchanges`, etc.)
##### DataFrame Instance

In [5]:
# Create df_master and apply filters
df_master = df.master.copy()

df_master[['lastprice', 'day_hi', 'day_lo', '_52wk_hi', '_52wk_lo', 'yield', 'aprvol', 'avevol']] = (
    df_master[['lastprice', 'day_hi', 'day_lo', '_52wk_hi', '_52wk_lo', 'yield', 'aprvol', 'avevol']]
    .fillna(value=0.0))

df_master = (df_master.where((df_master['openprice'] > 0.0) & (df_master['lastprice'] > 0.0))
             .dropna(axis=0, how='all'))

In [6]:
df_master.head()

Unnamed: 0,ticker_id,exchange_id,ticker,company,exchange,exchange_sym,industry,sector,country,country_c2,...,aprvol,avevol,Forward_PE,pb,ps,pc,currency,currency_code,fy_end,updated_date
0,1.0,374.0,OGCP,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,11.0,3498.0,,2.4,6.4,16.7,United States Dollar,USD,2019-12-31,2019-04-11
1,2.0,374.0,FISK,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,1300.0,3014.0,,2.4,6.4,16.8,United States Dollar,USD,2019-12-31,2019-04-11
2,3.0,374.0,ESBA,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,4243.0,17288.0,,2.4,6.4,16.8,United States Dollar,USD,2019-12-31,2019-04-11
3,18686.0,302.0,ARE,Alexandria Real Estate Equities Inc,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Office,Real Estate,United States,US,...,417090.0,695255.0,57.1,2.2,11.2,26.1,United States Dollar,USD,2019-12-31,2019-04-11
4,19275.0,302.0,LPT,Liberty Property Trust,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Office,Real Estate,United States,US,...,432637.0,843316.0,,2.2,10.3,18.9,United States Dollar,USD,2019-12-31,2019-04-11


##### DataFrame Length

In [7]:
print('Master DataFrame contains {:,.0f} records.'.format(len(df_master)))

Master DataFrame contains 103,490 records.


##### DataFrame Columns

In [8]:
df_master.columns

Index(['ticker_id', 'exchange_id', 'ticker', 'company', 'exchange',
       'exchange_sym', 'industry', 'sector', 'country', 'country_c2',
       'country_c3', 'security_type_code', 'security_type', 'stock_type',
       'style', 'openprice', 'lastprice', 'day_hi', 'day_lo', '_52wk_hi',
       '_52wk_lo', 'yield', 'aprvol', 'avevol', 'Forward_PE', 'pb', 'ps', 'pc',
       'currency', 'currency_code', 'fy_end', 'updated_date'],
      dtype='object')

<br></br>
[return to the top](#top)
<a id='methods'></a>
## Creating DataFrame instances with dataframes methods
Class DataFrames from [dataframe.py](dataframe.py) contains the following methods that return a pd.DataFrame object for the specified database table:

- `quoteheader` - [MorningStar (MS) Quote Header](#quote)
- `valuation` - [MS Valuation table with Price Ratios (P/E, P/S, P/B, P/C) for the past 10 yrs](#val)
- `keyratios` - [MS Ratio - Key Financial Ratios & Values](#keyratios)
- `finhealth` - [MS Ratio - Financial Health](#finhealth)
- `profitability` - [MS Ratio - Profitability](#prof)
- `growth` - [MS Ratio - Growth](#growth)
- `cfhealth` - [MS Ratio - Cash Flow Health](#cfh)
- `efficiency` - [MS Ratio - Efficiency](#eff)
- `annualIS` - [MS Annual Income Statements](#isa)
- `quarterlyIS` - [MS Quarterly Income Statements](#isq)
- `annualBS` - [MS Annual Balance Sheets](#bsa)
- `quarterlyBS` - [MS Quarterly Balance Sheets](#bsq)
- `annualCF` - [MS Annual Cash Flow Statements](#cfa)
- `quarterlyCF` - [MS Quarterly Cash Flow Statements](#cfq)

<a id='quote'></a>
### Quote Header 
##### DataFrame Instance

In [9]:
df_quote = df.quoteheader()

In [10]:
df_quote.head()

Unnamed: 0,ticker_id,exchange_id,openprice,lastprice,day_hi,day_lo,_52wk_hi,_52wk_lo,yield,aprvol,avevol,fpe,pb,ps,pc,currency_id
0,1,374,15.73,15.72,15.73,15.72,17.72,12.16,2.67,11.0,3498.0,,2.4,6.4,16.7,104.0
1,2,374,15.75,15.77,15.78,15.73,17.68,13.68,2.66,1300.0,3014.0,,2.4,6.4,16.8,104.0
2,3,374,15.76,15.71,15.76,15.63,17.79,11.99,2.67,4243.0,17288.0,,2.4,6.4,16.8,104.0
3,4,482,96.19,95.9,96.48,95.35,115.11,87.87,1.25,192854.0,804691.0,19.8,3.3,3.9,20.1,104.0
4,5,1,0.0,0.0,0.0,0.0,0.0,0.0,,184.0,184.0,,,,,104.0


##### DataFrame Length

In [12]:
print('DataFrame contains {:,.0f} records.'.format(len(df_quote)))

DataFrame contains 118,338 records.


<a id='val'></a>
[return to the top](#top)
### Price Ratios (P/E, P/S, P/B, P/C)
##### DataFrame Instance

In [11]:
df_vals = df.valuation().reset_index()

In [12]:
df_vals.head()

Unnamed: 0,exchange_id,ticker_id,PE_2009,PE_2010,PE_2011,PE_2012,PE_2013,PE_2014,PE_2015,PE_2016,...,PC_2010,PC_2011,PC_2012,PC_2013,PC_2014,PC_2015,PC_2016,PC_2017,PC_2018,PC_TTM
0,374,1,,,,,,18.6,69.0,57.8,...,,,,48.3,-153.8,24.3,25.8,26.3,21.5,16.7
1,374,2,,,,,,18.2,69.0,58.8,...,,,,50.5,-151.5,24.3,26.2,26.4,20.3,16.8
2,374,3,,,,,,18.3,69.4,58.8,...,,,,48.3,-153.8,24.4,26.2,26.7,20.6,16.8
3,482,4,,22.2,17.4,16.6,27.0,29.7,26.6,31.7,...,16.2,11.1,12.9,19.6,23.1,19.9,26.8,52.9,21.8,20.1
4,1,5,,,,,,,,,...,,,,,,,,,,


##### DataFrame Length

In [13]:
print('DataFrame contains {:,.0f} records.'.format(len(df_vals)))

DataFrame contains 80,567 records.


##### DataFrame Columnns

In [14]:
df_vals.columns

Index(['exchange_id', 'ticker_id', 'PE_2009', 'PE_2010', 'PE_2011', 'PE_2012',
       'PE_2013', 'PE_2014', 'PE_2015', 'PE_2016', 'PE_2017', 'PE_2018',
       'PE_TTM', 'PS_2009', 'PS_2010', 'PS_2011', 'PS_2012', 'PS_2013',
       'PS_2014', 'PS_2015', 'PS_2016', 'PS_2017', 'PS_2018', 'PS_TTM',
       'PB_2009', 'PB_2010', 'PB_2011', 'PB_2012', 'PB_2013', 'PB_2014',
       'PB_2015', 'PB_2016', 'PB_2017', 'PB_2018', 'PB_TTM', 'PC_2009',
       'PC_2010', 'PC_2011', 'PC_2012', 'PC_2013', 'PC_2014', 'PC_2015',
       'PC_2016', 'PC_2017', 'PC_2018', 'PC_TTM'],
      dtype='object')

<a id='keyratios'></a>
[return to the top](#top)
### Key Ratios
##### DataFrame Instance

In [18]:
df_keyratios = (df_master.merge(df.keyratios(), on=['ticker_id', 'exchange_id']))

In [19]:
df_keyratios.head()

Unnamed: 0,ticker_id,exchange_id,ticker,company,exchange,exchange_sym,industry,sector,country,country_c2,...,Y1,Y2,Y3,Y4,Y5,Y6,Y7,Y8,Y9,Y10
0,1.0,374.0,OGCP,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
1,2.0,374.0,FISK,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
2,3.0,374.0,ESBA,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
3,18686.0,302.0,ARE,Alexandria Real Estate Equities Inc,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Office,Real Estate,United States,US,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
4,19275.0,302.0,LPT,Liberty Property Trust,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Office,Real Estate,United States,US,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM


##### DataFrame Length

In [20]:
print('DataFrame contains {:,.0f} records.'.format(len(df_keyratios)))

DataFrame contains 68,394 records.


##### DataFrame Columnns

In [31]:
df_keyratios_cols = (df_keyratios
                     .loc[0, [col for col in df_keyratios.columns if 'Y' not in col and col.startswith('i')]]
                     .replace(df.colheaders['header']))
df_keyratios_cols

industry          REIT - Diversified
i0                           Revenue
i1                      Gross_Margin
i2                  Operating_Income
i3                  Operating_Margin
i4                        Net_Income
i5                Earnings_Per_Share
i6                         Dividends
i91                     Payout_Ratio
i7                            Shares
i8              Book_Value_Per_Share
i9               Operating_Cash_Flow
i10                     Cap_Spending
i11                   Free_Cash_Flow
i90         Free_Cash_Flow_Per_Share
i80                  Working_Capital
Name: 0, dtype: object

<a id='finhealth'></a>
[return to the top](#top)
### Financial Health
##### DataFrame Instance

In [22]:
df_finhealth = df.finhealth()

In [23]:
df_finhealth.head()

Unnamed: 0,ticker_id,exchange_id,fh_balsheet,i45,i45_fh_Y0,i45_fh_Y1,i45_fh_Y2,i45_fh_Y3,i45_fh_Y4,i45_fh_Y5,...,fh_Y1,fh_Y2,fh_Y3,fh_Y4,fh_Y5,fh_Y6,fh_Y7,fh_Y8,fh_Y9,fh_Y10
0,1,374,324,325,,,,4.89,2.45,1.39,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
1,2,374,324,325,,,,4.89,2.45,1.39,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
2,3,374,324,325,,,,4.89,2.45,1.39,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
3,4,482,324,325,67.09,21.17,41.16,40.02,50.12,38.53,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
4,23,1,324,325,37.05,83.93,79.97,86.32,99.22,22.23,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr


In [24]:
print('DataFrame contains {:,.0f} records.'.format(len(df_finhealth)))

DataFrame contains 77,855 records.


##### DataFrame Columns

In [32]:
df_finhealth_cols = (df_finhealth.loc[0, [col for col in df_finhealth.columns 
                                          if 'Y' not in col and '_id' not in col]]
                     .replace(df.colheaders['header']))
df_finhealth_cols

fh_balsheet         Balance Sheet Items (in %)
i45              Cash & Short-Term Investments
i46                        Accounts Receivable
i47                                  Inventory
i48                       Other Current Assets
i49                       Total Current Assets
i50                                   Net PP&E
i51                                Intangibles
i52                     Other Long-Term Assets
i53                               Total Assets
i54                           Accounts Payable
i55                            Short-Term Debt
i56                              Taxes Payable
i57                        Accrued Liabilities
i58               Other Short-Term Liabilities
i59                  Total Current Liabilities
i60                             Long-Term Debt
i61                Other Long-Term Liabilities
i62                          Total Liabilities
i63                  Total Stockholders Equity
i64                 Total Liabilities & Equity
lfh_liquidity

<a id='prof'></a>
[return to the top](#top)
### Profitability
##### DataFrame Instance

In [33]:
df_profitab = df.profitability()

In [34]:
df_profitab.head()

Unnamed: 0,ticker_id,exchange_id,pr_margins,i12,i12_pr_Y0,i12_pr_Y1,i12_pr_Y2,i12_pr_Y3,i12_pr_Y4,i12_pr_Y5,...,pr_Y1,pr_Y2,pr_Y3,pr_Y4,pr_Y5,pr_Y6,pr_Y7,pr_Y8,pr_Y9,pr_Y10
0,1,374,279,202,,,100.0,100.0,100.0,100.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
1,2,374,279,202,,,100.0,100.0,100.0,100.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
2,3,374,279,202,,,100.0,100.0,100.0,100.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
3,4,482,279,202,100.0,100.0,100.0,100.0,100.0,100.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
4,23,1,279,202,100.0,100.0,100.0,100.0,100.0,100.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM


##### DataFrame Length

In [35]:
print('DataFrame contains {:,.0f} records.'.format(len(df_profitab)))

DataFrame contains 77,855 records.


##### DataFrame Columns

In [37]:
# Financial Health DataFrame Columns
df_profitab_cols = (df_profitab.loc[0, [col for col in df_profitab.columns 
                                        if 'Y' not in col and '_id' not in col]]
                    .replace(df.colheaders['header']))
df_profitab_cols

pr_margins              Margins % of Sales
i12                                Revenue
i13                                   COGS
i14                           Gross Margin
i15                                   SG&A
i16                                    R&D
i17                                  Other
i18                       Operating Margin
i19                    Net Int Inc & Other
i20                             EBT Margin
pr_profit                    Profitability
i21                             Tax Rate %
i22                           Net Margin %
i23               Asset Turnover (Average)
i24                     Return on Assets %
i25           Financial Leverage (Average)
i26                     Return on Equity %
i27           Return on Invested Capital %
i95                      Interest Coverage
Name: 0, dtype: object

<a id='growth'></a>
[return to the top](#top)
### Growth
##### DataFrame Instance

In [27]:
df_growth = df.growth()

In [28]:
df_growth.head()

Unnamed: 0,ticker_id,exchange_id,gr_revenue,i28,i28_gr_Y0,i28_gr_Y1,i28_gr_Y2,i28_gr_Y3,i28_gr_Y4,i28_gr_Y5,...,gr_Y1,gr_Y2,gr_Y3,gr_Y4,gr_Y5,gr_Y6,gr_Y7,gr_Y8,gr_Y9,gr_Y10
0,1,374,298,299,,,,-11.7,19.81,103.73,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
1,2,374,298,299,,,,-11.7,19.81,103.73,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
2,3,374,298,299,,,,-11.7,19.81,103.73,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
3,4,482,298,299,2.23,2.59,16.25,0.83,11.65,7.9,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
4,23,1,298,299,238.49,16.43,16.43,,,,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr


##### DataFrame Length

In [29]:
print('DataFrame contains {:,.0f} records.'.format(len(df_growth)))

DataFrame contains 77,855 records.


##### DataFrame Columns

In [38]:
# Financial Health DataFrame Columns
df_growth_cols = (df_growth.loc[0, [col for col in df_growth.columns if 'Y' not in col and '_id' not in col]]
                  .replace(df.colheaders['header']))
df_growth_cols

gr_revenue               Revenue %
i28                 Year over Year
i29                 3-Year Average
i30                 5-Year Average
i31                10-Year Average
gr_operating    Operating Income %
i32                 Year over Year
i33                 3-Year Average
i34                 5-Year Average
i35                10-Year Average
gr_ni                 Net Income %
i81                 Year over Year
i82                 3-Year Average
i83                 5-Year Average
i84                10-Year Average
gr_eps                       EPS %
i36                 Year over Year
i37                 3-Year Average
i38                 5-Year Average
i39                10-Year Average
Name: 0, dtype: object

<a id='cfh'></a>
[return to the top](#top)
### Cash Flow Health
##### DataFrame Instance

In [33]:
df_cfhealth = df.cfhealth()

In [34]:
df_cfhealth.head()

Unnamed: 0,ticker_id,exchange_id,cf_cashflow,i40,i40_cf_Y0,i40_cf_Y1,i40_cf_Y2,i40_cf_Y3,i40_cf_Y4,i40_cf_Y5,...,cf_Y1,cf_Y2,cf_Y3,cf_Y4,cf_Y5,cf_Y6,cf_Y7,cf_Y8,cf_Y9,cf_Y10
0,1,374,318,319,,,,97.88,,,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
1,2,374,318,319,,,,97.88,,,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
2,3,374,318,319,,,,97.88,,,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
3,4,482,318,319,-31.63,19.64,50.56,-1.28,11.89,17.06,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
4,23,1,318,319,,51.78,51.78,,,,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM


##### DataFrame Length

In [35]:
print('DataFrame contains {:,.0f} records.'.format(len(df_cfhealth)))

DataFrame contains 77,855 records.


##### DataFrame Columns

In [36]:
# Financial Health DataFrame Columns
(df_cfhealth.loc[0, [col for col in df_cfhealth.columns if 'Y' not in col and '_id' not in col]]
 .replace(df.colheaders['header']))

cf_cashflow                    Cash Flow Ratios
i40            Operating Cash Flow Growth % YOY
i41                 Free Cash Flow Growth % YOY
i42                      Cap Ex as a % of Sales
i43                      Free Cash Flow/Sales %
i44                   Free Cash Flow/Net Income
Name: 0, dtype: object

<a id='eff'></a>
[return to the top](#top)
### Efficiency
##### DataFrame Instance

In [37]:
df_efficiency = df.efficiency()

In [38]:
df_efficiency.head()

Unnamed: 0,ticker_id,exchange_id,ef_efficiency,i69,i69_ef_Y0,i69_ef_Y1,i69_ef_Y2,i69_ef_Y3,i69_ef_Y4,i69_ef_Y5,...,ef_Y1,ef_Y2,ef_Y3,ef_Y4,ef_Y5,ef_Y6,ef_Y7,ef_Y8,ef_Y9,ef_Y10
0,1,374,350,351,,,,82.34,85.86,61.96,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
1,2,374,350,351,,,,82.34,85.86,61.96,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
2,3,374,350,351,,,,82.34,85.86,61.96,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
3,4,482,350,351,25.66,28.47,27.05,29.65,30.48,32.01,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
4,23,1,350,351,93.87,58.87,58.87,11.21,11.21,84.13,...,2010-12,2011-12,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,TTM


##### DataFrame Length

In [39]:
print('DataFrame contains {:,.0f} records.'.format(len(df_efficiency)))

DataFrame contains 77,855 records.


##### DataFrame Columns

In [40]:
# Financial Health DataFrame Columns
(df_efficiency.loc[0, [col for col in df_efficiency.columns if 'Y' not in col and '_id' not in col]]
 .replace(df.colheaders['header']))

ef_efficiency                Efficiency
i69              Days Sales Outstanding
i70                      Days Inventory
i71                     Payables Period
i72               Cash Conversion Cycle
i73                Receivables Turnover
i74                  Inventory Turnover
i75               Fixed Assets Turnover
i76                      Asset Turnover
Name: 0, dtype: object

<a id='isa'></a>
[return to the top](#top)
### Annual Income Statement
##### DataFrame Instance

In [41]:
df_annualIS0 = df.annualIS()

In [42]:
df_annualIS = (df_master 
 .merge(df_annualIS0, on=['ticker_id', 'exchange_id']))

In [43]:
df_annualIS.head()

Unnamed: 0,ticker_id,exchange_id,ticker,company,exchange,exchange_sym,industry,sector,country,country_c2,...,label_tts4,label_tts5,currency_id,fye_month,Year_Y_1,Year_Y_2,Year_Y_3,Year_Y_4,Year_Y_5,Year_Y_6
0,1.0,374.0,OGCP,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,,,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
1,2.0,374.0,FISK,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,,,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
2,3.0,374.0,ESBA,Empire State Realty OP LP Operating Partnershi...,NYSE ARCA,ARCX,REIT - Diversified,Real Estate,United States,US,...,,,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
3,18686.0,302.0,ARE,Alexandria Real Estate Equities Inc,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Office,Real Estate,United States,US,...,,,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
4,19275.0,302.0,LPT,Liberty Property Trust,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Office,Real Estate,United States,US,...,,,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12,TTM


##### DataFrame Length

In [44]:
print('DataFrame contains {:,.0f} records.'.format(len(df_annualIS)))

DataFrame contains 68,393 records.


##### DataFrame Columns

In [45]:
labels = [col for col in df_annualIS if 'label' in col]
labels = [[label, header] for label in labels 
          for header in df_annualIS[label].unique().tolist() if pd.notna(header)]

df_labels_aIS = (pd.DataFrame(labels, columns=['header', 'value'])
                 .set_index('header')
                 .astype('int')
                )

df_labels_aIS['value'] = df_labels_aIS['value'].replace(df.colheaders['header'])
df_labels_aIS[df_labels_aIS['value'].astype('str').str.contains('ncome')].sort_values(by='value')

sorted(list(zip(df_labels_aIS.values.tolist(), df_labels_aIS.index)))

[(['Advertising and market...'], 'label_i36'),
 (['Advertising and market...'], 'label_i46'),
 (['Advertising and promot...'], 'label_i15'),
 (['Amortization of intang...'], 'label_i47'),
 (['Asset impairment'], 'label_i24'),
 (['Asset mgmt and securit...'], 'label_i4'),
 (['Basic'], 'label_i83'),
 (['Basic'], 'label_i85'),
 (['Benefits, claims and e...'], 'label_s2'),
 (['Borrowed funds'], 'label_i14'),
 (['Commissions and fees'], 'label_i22'),
 (['Compensation and benef...'], 'label_i13'),
 (['Compensation and benef...'], 'label_i32'),
 (['Compensation and benef...'], 'label_i42'),
 (['Cost of revenue'], 'label_i6'),
 (['Costs and expenses'], 'label_g3'),
 (['Credit card income'], 'label_i27'),
 (['Cumulative effect of a...'], 'label_i43'),
 (['Cumulative effect of a...'], 'label_i66'),
 (['Cumulative effect of a...'], 'label_i73'),
 (['Deposits'], 'label_i12'),
 (['Deposits with banks'], 'label_i3'),
 (['Depreciation and amort...'], 'label_i12'),
 (['Depreciation and amort...'], 'la

<a id='isq'></a>
[return to the top](#top)
### Quarterly Income Statements
##### DataFrame Instance

In [46]:
df_quarterlyIS = df.quarterlyIS()

In [47]:
df_quarterlyIS.head()

Unnamed: 0,ticker_id,exchange_id,data_g1_Y_1,data_g1_Y_2,data_g1_Y_3,data_g1_Y_4,data_g1_Y_5,data_g1_Y_6,data_g2_Y_1,data_g2_Y_2,...,label_tts4,label_tts5,currency_id,fye_month,Year_Y_1,Year_Y_2,Year_Y_3,Year_Y_4,Year_Y_5,Year_Y_6
0,1,374,,,,,,,124910000.0,133107000.0,...,,,104.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12,TTM
1,2,374,,,,,,,124910000.0,133107000.0,...,,,104.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12,TTM
2,3,374,,,,,,,124910000.0,133107000.0,...,,,104.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12,TTM
3,4,482,,,,,,,,,...,1213.0,,104.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12,TTM
4,23,1,,,,,,,,,...,1213.0,,19.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12,TTM


##### DataFrame Length

In [48]:
print('DataFrame contains {:,.0f} records.'.format(len(df_quarterlyIS)))

DataFrame contains 57,725 records.


##### DataFrame Columns

In [49]:
labels = [col for col in df_annualIS if 'label' in col]
labels = [[label, header] for label in labels 
          for header in df_annualIS[label].unique().tolist() if pd.notna(header)]

df_labels_aIS = (pd.DataFrame(labels, columns=['header', 'value'])
                 .set_index('header')
                 .astype('int')
                )

df_labels_aIS['value'] = df_labels_aIS['value'].replace(df.colheaders['header'])
df_labels_aIS[df_labels_aIS['value'].astype('str').str.contains('ncome')].sort_values(by='value')

sorted(list(zip(df_labels_aIS.values.tolist(), df_labels_aIS.index)))

[(['Advertising and market...'], 'label_i36'),
 (['Advertising and market...'], 'label_i46'),
 (['Advertising and promot...'], 'label_i15'),
 (['Amortization of intang...'], 'label_i47'),
 (['Asset impairment'], 'label_i24'),
 (['Asset mgmt and securit...'], 'label_i4'),
 (['Basic'], 'label_i83'),
 (['Basic'], 'label_i85'),
 (['Benefits, claims and e...'], 'label_s2'),
 (['Borrowed funds'], 'label_i14'),
 (['Commissions and fees'], 'label_i22'),
 (['Compensation and benef...'], 'label_i13'),
 (['Compensation and benef...'], 'label_i32'),
 (['Compensation and benef...'], 'label_i42'),
 (['Cost of revenue'], 'label_i6'),
 (['Costs and expenses'], 'label_g3'),
 (['Credit card income'], 'label_i27'),
 (['Cumulative effect of a...'], 'label_i43'),
 (['Cumulative effect of a...'], 'label_i66'),
 (['Cumulative effect of a...'], 'label_i73'),
 (['Deposits'], 'label_i12'),
 (['Deposits with banks'], 'label_i3'),
 (['Depreciation and amort...'], 'label_i12'),
 (['Depreciation and amort...'], 'la

<a id='bsa'></a>
[return to the top](#top)
### Annual Balance Sheet
##### DataFrame Instance

In [50]:
df_annualBS = df.annualBS()

In [51]:
df_annualBS.head()

Unnamed: 0,ticker_id,exchange_id,data_g1_Y_1,data_g1_Y_2,data_g1_Y_3,data_g1_Y_4,data_g1_Y_5,data_g2_Y_1,data_g2_Y_2,data_g2_Y_3,...,label_ttgg6,label_tts1,label_tts2,currency_id,fye_month,Year_Y_1,Year_Y_2,Year_Y_3,Year_Y_4,Year_Y_5
0,1,374,,,,,,,,,...,,10.0,29,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12
1,2,374,,,,,,,,,...,,10.0,29,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12
2,3,374,,,,,,,,,...,,10.0,29,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12
3,4,482,,,,,,,,,...,,10.0,29,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12
4,23,1,,,,,,,,,...,,10.0,29,19.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12


##### DataFrame Length

In [52]:
print('DataFrame contains {:,.0f} records.'.format(len(df_annualBS)))

DataFrame contains 77,984 records.


##### DataFrame Columns

In [53]:
labels = [col for col in df_annualBS if 'label' in col]
labels = [[label, header] for label in labels 
          for header in df_annualBS[label].unique().tolist() if pd.notna(header)]

df_labels_aBS = (pd.DataFrame(labels, columns=['header', 'value'])
                 .set_index('header')
                 .astype('int')
                )

df_labels_aBS['value'] = df_labels_aBS['value'].replace(df.colheaders['header'])
df_labels_aBS[df_labels_aBS['value'].astype('str').str.contains('ncome')].sort_values(by='value')

sorted(list(zip(df_labels_aBS.values.tolist(), df_labels_aBS.index)))

[(['Accounts payable'], 'label_i41'),
 (['Accounts payable'], 'label_i42'),
 (['Accounts payable'], 'label_i43'),
 (['Accrued expenses and l...'], 'label_i46'),
 (['Accrued expenses and l...'], 'label_i47'),
 (['Accrued investment inc...'], 'label_i8'),
 (['Accrued liabilities'], 'label_i45'),
 (['Accrued liabilities'], 'label_i46'),
 (['Accrued liabilities'], 'label_i53'),
 (['Accumulated Depreciati...'], 'label_i10'),
 (['Accumulated Depreciati...'], 'label_i14'),
 (['Accumulated depreciati...'], 'label_i2'),
 (['Accumulated other comp...'], 'label_i89'),
 (['Additional paid-in cap...'], 'label_i84'),
 (['Allowance for loan los...'], 'label_i12'),
 (['Allowance for loan los...'], 'label_i9'),
 (['Assets'], 'label_g1'),
 (['Assets'], 'label_s1'),
 (['Buildings and improvem...'], 'label_i10'),
 (['Capital leases'], 'label_i42'),
 (['Capital leases'], 'label_i43'),
 (['Capital leases'], 'label_i51'),
 (['Cash'], 'label_gg1'),
 (['Cash and cash equivale...'], 'label_i1'),
 (['Cash and ca

<a id='bsq'></a>
[return to the top](#top)
### Quarterly Balance Sheet
##### DataFrame Instance

In [54]:
df_quarterlyBS = df.quarterlyBS()

In [55]:
df_quarterlyBS.head()

Unnamed: 0,ticker_id,exchange_id,data_g1_Y_1,data_g1_Y_2,data_g1_Y_3,data_g1_Y_4,data_g1_Y_5,data_g2_Y_1,data_g2_Y_2,data_g2_Y_3,...,label_ttgg6,label_tts1,label_tts2,currency_id,fye_month,Year_Y_1,Year_Y_2,Year_Y_3,Year_Y_4,Year_Y_5
0,1,374,,,,,,,,,...,,10.0,29,104.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12
1,2,374,,,,,,,,,...,,10.0,29,104.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12
2,3,374,,,,,,,,,...,,10.0,29,104.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12
3,4,482,,,,,,,,,...,,10.0,29,104.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12
4,23,1,,,,,,,,,...,,10.0,29,19.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12


##### DataFrame Length

In [56]:
print('DataFrame contains {:,.0f} records.'.format(len(df_quarterlyBS)))

DataFrame contains 77,753 records.


##### DataFrame Columns

In [70]:
labels = [col for col in df_quarterlyBS if 'label' in col]
labels = [[label, header] for label in labels 
          for header in df_quarterlyBS[label].unique().tolist() if pd.notna(header)]

df_labels_qBS = (pd.DataFrame(labels, columns=['header', 'value'])
                 .set_index('header')
                 .astype('int')
                )

df_labels_qBS['value'] = df_labels_qBS['value'].replace(df.colheaders['header'])
df_labels_qBS[df_labels_qBS['value'].astype('str').str.contains('ncome')].sort_values(by='value')

sorted(list(zip(df_labels_qBS.values.tolist(), df_labels_qBS.index)))

[(['Accounts payable'], 'label_i41'),
 (['Accounts payable'], 'label_i42'),
 (['Accounts payable'], 'label_i43'),
 (['Accrued expenses and l...'], 'label_i46'),
 (['Accrued expenses and l...'], 'label_i47'),
 (['Accrued investment inc...'], 'label_i8'),
 (['Accrued liabilities'], 'label_i45'),
 (['Accrued liabilities'], 'label_i46'),
 (['Accrued liabilities'], 'label_i53'),
 (['Accumulated Depreciati...'], 'label_i10'),
 (['Accumulated Depreciati...'], 'label_i14'),
 (['Accumulated depreciati...'], 'label_i2'),
 (['Accumulated other comp...'], 'label_i89'),
 (['Additional paid-in cap...'], 'label_i84'),
 (['Allowance for loan los...'], 'label_i12'),
 (['Allowance for loan los...'], 'label_i9'),
 (['Assets'], 'label_g1'),
 (['Assets'], 'label_s1'),
 (['Buildings and improvem...'], 'label_i10'),
 (['Capital leases'], 'label_i42'),
 (['Capital leases'], 'label_i43'),
 (['Capital leases'], 'label_i51'),
 (['Cash'], 'label_gg1'),
 (['Cash and cash equivale...'], 'label_i1'),
 (['Cash and ca

<a id='cfa'></a>
[return to the top](#top)
### Annual Cash Flow Statement
##### DataFrame Instance

In [58]:
df_annualCF = df.annualCF()

In [59]:
df_annualCF.head()

Unnamed: 0,ticker_id,exchange_id,data_i1_Y_1,data_i1_Y_2,data_i1_Y_3,data_i1_Y_4,data_i1_Y_5,data_i1_Y_6,data_i10_Y_1,data_i10_Y_2,...,label_tts2,label_tts3,currency_id,fye_month,Year_Y_1,Year_Y_2,Year_Y_3,Year_Y_4,Year_Y_5,Year_Y_6
0,1,374,,,,,,,3720000.0,5483000.0,...,120,133,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
1,2,374,,,,,,,3720000.0,5483000.0,...,120,133,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
2,3,374,,,,,,,3720000.0,5483000.0,...,120,133,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
3,4,482,,,,,,,,,...,120,133,104.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12,TTM
4,23,1,,,,,,,,227000.0,...,120,133,19.0,12.0,2014-12,2015-12,2016-12,2017-12,2018-12,TTM


##### DataFrame Length

In [60]:
print('DataFrame contains {:,.0f} records.'.format(len(df_annualCF)))

DataFrame contains 77,444 records.


##### DataFrame Columns

In [72]:
labels = [col for col in df_annualCF if 'label' in col]
labels = [[label, header] for label in labels 
          for header in df_annualCF[label].unique().tolist() if pd.notna(header)]

df_labels_aCF = (pd.DataFrame(labels, columns=['header', 'value'])
                 .set_index('header')
                 .astype('int')
                )

df_labels_aCF['value'] = df_labels_aCF['value'].replace(df.colheaders['header'])
df_labels_aCF[df_labels_aCF['value'].astype('str').str.contains('ncome')].sort_values(by='value')

sorted(list(zip(df_labels_aCF.values.tolist(), df_labels_aCF.index)))

[(['(Gain) Loss from disco...'], 'label_i3'),
 (['(Gain) Loss from disco...'], 'label_i7'),
 (['(Gains) loss on dispos...'], 'label_i11'),
 (['(Gains) loss on dispos...'], 'label_i12'),
 (['Accounts payable'], 'label_i19'),
 (['Accounts payable'], 'label_i20'),
 (['Accounts receivable'], 'label_i16'),
 (['Accrued liabilities'], 'label_i20'),
 (['Accrued liabilities'], 'label_i21'),
 (['Accrued liabilities'], 'label_i30'),
 (['Acquisitions and dispo...'], 'label_i33'),
 (['Acquisitions and dispo...'], 'label_i36'),
 (['Acquisitions and dispo...'], 'label_i43'),
 (['Acquisitions, net'], 'label_i33'),
 (['Amortization of debt a...'], 'label_i7'),
 (['Amortization of debt d...'], 'label_i10'),
 (['Amortization of debt d...'], 'label_i3'),
 (['Amortization of debt d...'], 'label_i7'),
 (['Capital expenditure'], 'label_i96'),
 (['Capitalization of defe...'], 'label_i24'),
 (['Cash Flows From Financ...'], 'label_s3'),
 (['Cash Flows From Invest...'], 'label_s2'),
 (['Cash Flows From Operat...

<a id='cfq'></a>
[return to the top](#top)
### Quarterly Cash Flow Statement
##### DataFrame Instance

In [62]:
df_quarterlyCF = df.quarterlyCF()

In [63]:
df_quarterlyCF.head()

Unnamed: 0,ticker_id,exchange_id,data_i1_Y_1,data_i1_Y_2,data_i1_Y_3,data_i1_Y_4,data_i1_Y_5,data_i1_Y_6,data_i10_Y_1,data_i10_Y_2,...,label_tts2,label_tts3,currency_id,fye_month,Year_Y_1,Year_Y_2,Year_Y_3,Year_Y_4,Year_Y_5,Year_Y_6
0,1,374,,,,,,,3292000.0,4555000.0,...,120,133,104.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12,TTM
1,2,374,,,,,,,3292000.0,4555000.0,...,120,133,104.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12,TTM
2,3,374,,,,,,,3292000.0,4555000.0,...,120,133,104.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12,TTM
3,4,482,,,,,,,0.0,400000.0,...,120,133,104.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12,TTM
4,23,1,,,,,,,-84000.0,-128000.0,...,120,133,19.0,12.0,2017-12,2018-03,2018-06,2018-09,2018-12,TTM


##### DataFrame Length

In [64]:
print('DataFrame contains {:,.0f} records.'.format(len(df_quarterlyCF)))

DataFrame contains 77,919 records.


##### DataFrame Columns

In [73]:
labels = [col for col in df_quarterlyCF if 'label' in col]
labels = [[label, header] for label in labels 
          for header in df_quarterlyCF[label].unique().tolist() if pd.notna(header)]

df_labels_qCF = (pd.DataFrame(labels, columns=['header', 'value'])
                 .set_index('header')
                 .astype('int'))

df_labels_qCF['value'] = df_labels_qCF['value'].replace(df.colheaders['header'])
df_labels_qCF[df_labels_qCF['value'].astype('str').str.contains('ncome')].sort_values(by='value')

sorted(list(zip(df_labels_qCF.values.tolist(), df_labels_qCF.index)))

[(['(Gain) Loss from disco...'], 'label_i3'),
 (['(Gain) Loss from disco...'], 'label_i7'),
 (['(Gains) loss on dispos...'], 'label_i11'),
 (['(Gains) loss on dispos...'], 'label_i12'),
 (['Accounts payable'], 'label_i19'),
 (['Accounts payable'], 'label_i20'),
 (['Accounts receivable'], 'label_i16'),
 (['Accrued liabilities'], 'label_i20'),
 (['Accrued liabilities'], 'label_i21'),
 (['Accrued liabilities'], 'label_i30'),
 (['Acquisitions and dispo...'], 'label_i33'),
 (['Acquisitions and dispo...'], 'label_i36'),
 (['Acquisitions and dispo...'], 'label_i43'),
 (['Acquisitions, net'], 'label_i33'),
 (['Amortization of debt a...'], 'label_i7'),
 (['Amortization of debt d...'], 'label_i10'),
 (['Amortization of debt d...'], 'label_i3'),
 (['Amortization of debt d...'], 'label_i7'),
 (['Capital expenditure'], 'label_i96'),
 (['Capitalization of defe...'], 'label_i24'),
 (['Cash Flows From Financ...'], 'label_s3'),
 (['Cash Flows From Invest...'], 'label_s2'),
 (['Cash Flows From Operat...

<a id="stats"></a>
[return to the top](#top)
## Performing statistical analysis
### Count of database records
**1.** Total number of records **before** merging reference tables *(length of `df.master`)*

In [74]:
print('DataFrame df.master contains {:,.0f} records.'.format(len(df.master)))

DataFrame df.master contains 117,989 records.


**2.** Total number of records **after** merging reference tables *(length of `df_master`)*

In [75]:
print('DataFrame df_master0 contains {:,.0f} records.'.format(len(df_master0)))

DataFrame df_master0 contains 112,085 records.


**3.** Total number of records **after** merging reference tables where the following filters apply:
- $openprice > 0$
- $lastprice > 0$

In [76]:
print('DataFrame df_master contains {:,.0f} records.'.format(len(df_master)))

DataFrame df_master contains 103,490 records.


<a id="lastupdate"></a>
[return to the top](#top)
### Last updated dates
List of dates (as a pd.Series object) of when the database records were last updated. 
The values indicate the number of records updated on each date.

In [77]:
(df_master[['updated_date', 'ticker']].groupby(by='updated_date').count().sort_index(ascending=False)
 .rename(columns={'ticker':'ticker_count'}))

Unnamed: 0_level_0,ticker_count
updated_date,Unnamed: 1_level_1
2019-04-12,1
2019-04-11,50341
2019-04-10,40590
2019-04-09,12496
2019-04-08,62


<a id="type"></a>
[return to the top](#top)
### Number of records by Security Type

In [78]:
(df_master[['security_type', 'ticker']].groupby(by='security_type').count()
 .rename(columns={'ticker':'ticker_count'}))

Unnamed: 0_level_0,ticker_count
security_type,Unnamed: 1_level_1
Closed-End Fund,1209
Exchange-Traded Fund,6207
Index,502
Money Market Fund,194
Open-End Fund,25680
Stock,69698


<a id="country"></a>
[return to the top](#top)
### Number of records by Country, based on the location of exchanges

In [79]:
(df_master[['country', 'country_c3', 'ticker']]
 .groupby(by=['country', 'country_c3']).count().rename(columns={'ticker':'ticker_count'})
)

Unnamed: 0_level_0,Unnamed: 1_level_0,ticker_count
country,country_c3,Unnamed: 2_level_1
Australia,AUS,2147
Belgium,BEL,169
Canada,CAN,4179
China,CHN,3823
France,FRA,1195
Germany,DEU,35433
Hong Kong,HKG,2370
Ireland,IRL,1753
Italy,ITA,630
Luxembourg,LUX,23


<a id="exchange"></a>
[return to the top](#top)
### Number of records per exchange
Where $ticker\_count > 100$

In [81]:
cols = ['country', 'country_c3', 'exchange', 'exchange_sym', 'ticker']
df_exchanges = df_master[cols].groupby(by=cols[:-1]).count().rename(columns={'ticker':'ticker_count'})
df_exchanges[df_exchanges['ticker_count'] > 100]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,ticker_count
country,country_c3,exchange,exchange_sym,Unnamed: 4_level_1
Australia,AUS,ASX - ALL MARKETS,XASX,2147
Belgium,BEL,EURONEXT - EURONEXT BRUSSELS,XBRU,169
Canada,CAN,CANADIAN NATIONAL STOCK EXCHANGE,XCNQ,419
Canada,CAN,TORONTO STOCK EXCHANGE,XTSE,2017
Canada,CAN,TSX VENTURE EXCHANGE,XTSX,1683
China,CHN,SHANGHAI STOCK EXCHANGE,XSHG,1596
China,CHN,SHENZHEN STOCK EXCHANGE,XSHE,2227
France,FRA,EURONEXT - EURONEXT PARIS,XPAR,1195
Germany,DEU,BOERSE BERLIN,XBER,8140
Germany,DEU,BOERSE DUESSELDORF,XDUS,2162



[return to the top](#top)
### Number of Stocks by Country of Exchange

In [82]:
(df_master
 .where(df_master['security_type'] == 'Stock').dropna(axis=0, how='all')[['country', 'country_c3', 'ticker']]
 .groupby(by=['country', 'country_c3']).count().rename(columns={'ticker':'ticker_count'})
 .sort_values(by='ticker_count', ascending=False))

Unnamed: 0_level_0,Unnamed: 1_level_0,ticker_count
country,country_c3,Unnamed: 2_level_1
Germany,DEU,35403
United States,USA,16586
United Kingdom,GBR,4145
China,CHN,3623
Canada,CAN,3365
Hong Kong,HKG,2275
Australia,AUS,1836
Ireland,IRL,839
France,FRA,833
Italy,ITA,451


<a id="sector"></a>
[return to the top](#top)
### Number of stocks by sector

In [83]:
(df_master
 .where((df_master['security_type'] == 'Stock') & (df_master['sector'] != '—')).dropna(axis=0, how='all')
 .groupby(by='sector').count()
 .rename(columns={'ticker':'stock_count'}))['stock_count'].sort_values(ascending=False)

sector
Basic Materials           11494
Industrials                9830
Technology                 9556
Consumer Cyclical          8783
Financial Services         7898
Healthcare                 7602
Energy                     3825
Consumer Defensive         3710
Real Estate                3600
Utilities                  1931
Communication Services     1468
Name: stock_count, dtype: int64

<a id="industry"></a>
[return to the top](#top)
### Number of stocks by industry

In [84]:
(df_master[['sector', 'industry', 'ticker']]
 .where((df_master['security_type'] == 'Stock') & (df_master['industry'] != '—')).dropna(axis=0, how='all')
 .groupby(by=['sector', 'industry']).count().rename(columns={'ticker':'stock_count'}))

Unnamed: 0_level_0,Unnamed: 1_level_0,stock_count
sector,industry,Unnamed: 2_level_1
Basic Materials,Agricultural Inputs,286
Basic Materials,Aluminum,139
Basic Materials,Building Materials,813
Basic Materials,Chemicals,724
Basic Materials,Coal,382
Basic Materials,Copper,298
Basic Materials,Gold,1930
Basic Materials,Industrial Metals & Minerals,5087
Basic Materials,Lumber & Wood Production,141
Basic Materials,Paper & Paper Products,292


<a id="meanpr"></a>
[return to the top](#top)
### Mean price ratios (P/E, P/S, P/B, P/CF) of stocks by sectors

Merge *Master* and *Valuation* DataFrames

In [203]:
df_valuation = (df_master
                .where((df_master['security_type'] == 'Stock') & (df_master['sector'] != '—'))
                .dropna(axis=0, how='all')
                .merge(df_vals, on=['ticker_id', 'exchange_id'])
                .drop(['ticker_id', 'exchange_id'], axis=1))

#### Mean Price Ratios for all stocks:

In [201]:
df_valuation_mean = (df_valuation[['sector', 'company']].groupby('sector').count()
                     .rename(columns={'company':'count'})
                     .merge(df_valuation.groupby('sector').mean().round(1), on='sector')
                     .drop(['aprvol', 'avevol'], axis=1))
#df_valuation_mean.columns

In [21]:
print('For a total of {:,.0f} stock records:'.format(len(df_valuation)))
df_pratios_TTM = df_valuation_mean[['count', 'Forward_PE', 'PE_TTM', 'PB_TTM', 'PS_TTM', 'PC_TTM']]
df_pratios_TTM

For a total of 69,696 stock records:


Unnamed: 0_level_0,count,Forward_PE,PE_TTM,PB_TTM,PS_TTM,PC_TTM
sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Basic Materials,11494,18.1,29.6,495.0,366.2,27.2
Communication Services,1468,31.3,42.8,3.7,5.3,21.0
Consumer Cyclical,8783,25.3,38.7,5.5,31.4,29.1
Consumer Defensive,3710,22.7,36.0,6.0,27.5,47.3
Energy,3825,49.5,35.3,5.0,192.3,12.9
Financial Services,7898,14.6,32.0,41.7,414.6,280.8
Healthcare,7602,44.1,68.2,97.5,927.9,84.4
Industrials,9829,22.5,43.9,13.5,7432.9,33.0
Real Estate,3600,25.0,36.6,6.8,13.1,76.5
Technology,9556,43.5,62.6,8.9,177.8,149.8


#### Mean Price Ratios for USA stocks:

In [22]:
df_valuation_usa = df_valuation[df_valuation['country_c3'] == 'USA']

In [24]:
df_valuation_mean_usa = (df_valuation_usa[['sector', 'company']].groupby('sector').count()
                         .rename(columns={'company':'count'})
                         .merge(df_valuation_usa.groupby('sector').mean().round(1), on='sector')
                         .drop(['aprvol', 'avevol'], axis=1))
#df_valuation_mean_usa.columns

In [25]:
print('For a total of {:,.0f} stock records:'.format(len(df_valuation_usa)))
df_pratios_TTM_USA = df_valuation_mean_usa[['count', 'Forward_PE', 'PE_TTM', 'PB_TTM', 'PS_TTM', 'PC_TTM']]
df_pratios_TTM_USA

For a total of 16,586 stock records:


Unnamed: 0_level_0,count,Forward_PE,PE_TTM,PB_TTM,PS_TTM,PC_TTM
sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Basic Materials,2314,18.0,23.9,2692.3,378.4,33.8
Communication Services,321,35.7,39.2,6.4,11.6,27.8
Consumer Cyclical,1848,24.0,33.9,8.6,124.7,22.2
Consumer Defensive,891,22.1,37.3,13.7,96.7,41.9
Energy,985,26.3,43.0,5.5,172.8,20.9
Financial Services,2504,15.3,29.9,4.9,266.6,35.3
Healthcare,1978,50.8,68.4,377.0,1943.8,36.2
Industrials,2260,26.2,45.7,43.5,54.6,37.0
Real Estate,938,26.9,44.0,2.0,14.0,37.6
Technology,2066,43.3,67.9,15.8,653.3,88.3


#### Mean Price Ratios for DEU (Germany) stocks:

In [27]:
df_valuation_deu = df_valuation[df_valuation['country_c3'] == 'DEU']

In [28]:
df_valuation_mean_deu = (df_valuation_deu[['sector', 'company']].groupby('sector').count()
                         .rename(columns={'company':'count'})
                         .merge(df_valuation_deu.groupby('sector').mean().round(1), on='sector')
                         .drop(['aprvol', 'avevol'], axis=1))
#df_valuation_mean_deu.columns

In [29]:
print('For a total of {:,.0f} stock records:'.format(len(df_valuation_deu)))
df_pratios_TTM_DEU = df_valuation_mean_deu[['count', 'Forward_PE', 'PE_TTM', 'PB_TTM', 'PS_TTM', 'PC_TTM']]
df_pratios_TTM_DEU

For a total of 35,403 stock records:


Unnamed: 0_level_0,count,Forward_PE,PE_TTM,PB_TTM,PS_TTM,PC_TTM
sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Basic Materials,5807,18.2,22.8,4.7,149.0,18.5
Communication Services,873,28.9,40.6,3.0,4.0,18.7
Consumer Cyclical,4470,27.9,37.6,4.4,10.1,20.7
Consumer Defensive,1909,23.0,29.1,3.7,8.1,34.9
Energy,2015,52.8,27.7,3.4,168.1,9.5
Financial Services,3567,14.0,31.6,2.1,8.8,53.9
Healthcare,4179,44.2,74.3,8.5,693.0,38.9
Industrials,4896,21.1,40.2,6.4,152.6,22.4
Real Estate,1684,26.3,33.7,1.6,12.5,95.1
Technology,5012,44.7,52.9,6.5,52.7,161.8


#### Plots of TTM P/E by Sectors
*All Stocks*

In [204]:
fig_pe, (ax_pe, ax_pe_usa, ax_pe_deu) = plt.subplots(3, 1, sharex=True, sharey=True, figsize=(8, 5))

<IPython.core.display.Javascript object>

In [205]:
plt.subplots_adjust(bottom=0.4)
plt.xticks(ticks=x, labels=y.index.tolist(), fontsize=9)
plt.axis([-3, len(x)*3, 0, 80])
plt.suptitle('Average TTM P/E of Stocks by Sector', fontsize=11, fontweight='bold')
plt.yticks([])
ax_pe.set_title('All Stocks', loc='left', fontsize=9)
ax_pe_usa.set_title('USA', loc='left', fontsize=9)
ax_pe_deu.set_title('DEU', loc='left', fontsize=9)
fig_pe.subplots_adjust(hspace=1)

In [194]:
pe = df_valuation_mean['PE_TTM']
pe_usa = df_valuation_mean_usa['PE_TTM']
pe_deu = df_valuation_mean_deu['PE_TTM']

In [206]:
x = [x*3 for x in range(len(pe))]
y = pe#.sort_values(ascending=True)
bars = ax_pe.bar(x, y, width=2)
for bar in bars:
    ax_pe.text(bar.get_x()+1, bar.get_height()+1.5, '{:.1f}'.format(bar.get_height()), 
               color='black', ha='center', fontsize=9)
ax_pe.get_children()[22].set_color(None)
ax_pe.get_children()[23].set_color(None)
ax_pe.get_children()[25].set_color(None)

*USA*

In [207]:
x = [x*3 for x in range(len(pe_usa))]
y = pe_usa#.sort_values(ascending=True)
bars = ax_pe_usa.bar(x, y, width=2)
for bar in bars:
    ax_pe_usa.text(bar.get_x()+1, bar.get_height()+1.5, '{:.1f}'.format(bar.get_height()), 
               color='black', ha='center', fontsize=9)
ax_pe_usa.get_children()[22].set_color(None)
ax_pe_usa.get_children()[23].set_color(None)
ax_pe_usa.get_children()[25].set_color(None)

*DEU*

In [208]:
x = [x*3 for x in range(len(pe_deu))]
y = pe_deu#.sort_values(ascending=True)
bars = ax_pe_deu.bar(x, y, width=2)
for bar in bars:
    ax_pe_deu.text(bar.get_x()+1, bar.get_height()+1.5, '{:.1f}'.format(bar.get_height()), 
               color='black', ha='center', fontsize=9)
ax_pe_deu.get_children()[22].set_color(None)
ax_pe_deu.get_children()[23].set_color(None)
ax_pe_deu.get_children()[25].set_color(None)
for tick in ax_pe_deu.xaxis.get_ticklabels():
    tick.set_rotation(90)

### Stocks in the Cannabis Industry
Using stocks listed on [marijuanaindex.com](http://marijuanaindex.com/stock-quotes/north-american-marijuana-index/) under North America

In [73]:
import json

with open('input/pot_stocks.json') as file:
    pot_symbols = json.loads(file.read())
    
pot_stocks = (pd.DataFrame(pot_symbols, columns=['ticker', 'country_c3'])
               .merge(df_master, how='left', on=['ticker', 'country_c3']).drop('country', axis=1)
               .rename(columns={'country_c3':'country', 'exchange_sym':'exch'}))

pot_stocks = (pot_stocks.where(((pot_stocks['country'] == 'USA') | 
                                (pot_stocks['country'] == 'CAN')) &
                               (pot_stocks['sector'] != '—'))
              .dropna(axis=0, how='all').sort_values(by='company'))

In [74]:
msg = 'Below are the {} stocks listed on marijuanaindex.com for North America.'
print(msg.format(len(pot_stocks['company'].unique())))

pot_stocks[['country', 'ticker', 'exch', 'company', 'sector', 'industry']]

Below are the 46 stocks listed on marijuanaindex.com for North America.


Unnamed: 0,country,ticker,exch,company,sector,industry
29,CAN,TGIF,XCNQ,1933 Industries Inc,Healthcare,Drug Manufacturers - Specialty & Generic
1,USA,ACRGF,PINX,Acreage Holdings Inc Ordinary Shares (Sub Voting),Healthcare,Drug Manufacturers - Specialty & Generic
2,CAN,ACRG.U,XCNQ,Acreage Holdings Inc Ordinary Shares (Sub Voting),Healthcare,Drug Manufacturers - Specialty & Generic
37,USA,APHA,XNYS,Aphria Inc,Healthcare,Drug Manufacturers - Specialty & Generic
0,CAN,ACB,XTSE,Aurora Cannabis Inc,Healthcare,Drug Manufacturers - Specialty & Generic
36,CAN,XLY,XTSX,Auxly Cannabis Group Inc,Healthcare,Drug Manufacturers - Specialty & Generic
40,USA,CVSI,PINX,CV Sciences Inc,Healthcare,Drug Manufacturers - Specialty & Generic
32,CAN,TRST,XTSE,CannTrust Holdings Inc,Healthcare,Drug Manufacturers - Specialty & Generic
4,CAN,CNNX,XCNQ,Cannex Capital Holdings Inc,Healthcare,Drug Manufacturers - Specialty & Generic
38,USA,CGC,XNYS,Canopy Growth Corp,Healthcare,Drug Manufacturers - Specialty & Generic


<a id="value"></a>
[return to the top](#top)

## Applying various criteria to filter common stocks

Below is a list of different rules that can be applied to the data to screen stocks (development of *italicized rules* is still in progress)

- *[Rule 1](#rule1): No earnings deficit (loss) for past 5 or 7 years*
- **[Rule 2](#rule2): Uniterrupted and increasing Dividends for past 5 yrs**
- **[Rule 3](#rule3): P/E Ratio of 25 or less for the past 7 yrs and less then 20 for TTM**
- *[Rule 4](#rule4): P/B Ratio of 1.2 or less for TTM*
- *[Rule 5](#rule5): Filter for "bargain issues"*
- *[Rule 6](#rule6): Operating Cash Flow growth for the past 7 yr*
- *[Rule 7](#rule7): Owner earnings growth rate > 6% over past 7 years*
- *[Rule 8](#rule8): Long-term debt < 50% of total capital*
- *[Rule 9](#rule9): NAV per share > Stock Price*
- *[Rule 10](#rule10): Growth Stocks as described by Benjamin Graham in _The Intelligent Investor_*
- *[Rule 11](#rule11): Positive ratio of earnings to fixed charge*
- *[Rule 12](#rule12): CAN SLIM*
- *[Rule 13](#rule13):*
- *[Rule 14](#rule14):*
- *[Rule 15](#rule15):*

*[Merge DataFrames](#mergerules) created for different rules to screen stocks.*
<a id="rule1"></a>

### Rule 1. No earnings deficit (loss) for past 5 or 7 years
Criteria: Find companies with positive earnings per share growth during the past five years with no earnings deficits. Earnings need to be higher in the most recent year than five years ago. Avoiding companies with earnings deficits during the past five years will help you stay clear of high-risk companies. [(Source)](https://cabotwealth.com/daily/value-investing/benjamin-grahams-value-stock-criteria/)

#### 5 Years:
*a. Identify Net Income column labels in* `df_annualIS`

In [105]:
data = 'Net income'
df_labels = df_labels_aIS[df_labels_aIS['value'] == data].sort_values(by='value')
df_labels

Unnamed: 0_level_0,value
header,Unnamed: 1_level_1
label_i50,Net income
label_i70,Net income
label_i80,Net income


*b. Get column headers for 'Net income' values for the past 5 yrs*

In [106]:
i_ids = [(label[-3:] + '_') for label in df_labels.index]

def get_icols(col):
    for i_id in i_ids:
        if i_id in col:
            return True
    return False

main_cols1 = ['ticker_id', 'exchange_id', 
             'country', 'exchange_sym', 'ticker', 'company', 
             'sector', 'industry', 'stock_type', 'style', 
             'Year_Y_6', 'Year_Y_5', 'Year_Y_4', 'Year_Y_3', 'Year_Y_2', 'Year_Y_1']
data_cols = sorted(list(filter(get_icols, df_annualIS.columns)), key=lambda r: (r[-1], r[5:8]), reverse=True)
print('The following columns contain \'{}\' values:\n{}'.format(data, data_cols))

The following columns contain 'Net income' values:
['data_i80_Y_6', 'data_i70_Y_6', 'data_i50_Y_6', 'data_i80_Y_5', 'data_i70_Y_5', 'data_i50_Y_5', 'data_i80_Y_4', 'data_i70_Y_4', 'data_i50_Y_4', 'data_i80_Y_3', 'data_i70_Y_3', 'data_i50_Y_3', 'data_i80_Y_2', 'data_i70_Y_2', 'data_i50_Y_2', 'data_i80_Y_1', 'data_i70_Y_1', 'data_i50_Y_1']


*c. Create 'Net Income' DataFrame*

In [107]:
df_annualIS['Year_Y_5'] = pd.to_datetime(df_annualIS['Year_Y_5'])

In [126]:
df_netinc5 = (df_annualIS
              .where((df_annualIS['security_type'] == 'Stock') & 
                     (df_annualIS['Year_Y_5'] >= pd.to_datetime('2018-01')))
              .dropna(axis=0, how='all')
              .drop(['country'], axis=1)
              .rename(columns={'country_c3':'country'})
             )[main_cols1 + data_cols]

np_netinc = df_netinc5[data_cols].values
netinc_cols = [('Net_Income_Y' + data_cols[i * 3][-1], (i * 3, i * 3 + 1, i * 3 + 2))
               for i in range(int(len(data_cols)/3))]

vals = []
for row in np_netinc:
    row_vals = []
    for i in range(len(netinc_cols)):
        val = None
        for col in netinc_cols[i][1]:
            if not np.isnan(row[col]):
                val = row[col]
                break
        row_vals.append(val)
    vals.append(row_vals)
    
df_netinc_vals = pd.DataFrame(vals, columns=list(zip(*netinc_cols))[0])
df_netinc5 = df_netinc5[main_cols1].join(df_netinc_vals)

In [127]:
df_rule1_5 = df_netinc5.where((df_netinc5['Net_Income_Y6'] > 0) & 
                            ((df_netinc5['Net_Income_Y5'] > 0) | (df_netinc5['Net_Income_Y5'].isna())) & 
                            ((df_netinc5['Net_Income_Y4'] > 0) | (df_netinc5['Net_Income_Y4'].isna())) & 
                            ((df_netinc5['Net_Income_Y3'] > 0) | (df_netinc5['Net_Income_Y3'].isna())) & 
                            ((df_netinc5['Net_Income_Y2'] > 0) | (df_netinc5['Net_Income_Y2'].isna())) & 
                            ((df_netinc5['Net_Income_Y1'] > 0) | (df_netinc5['Net_Income_Y1'].isna()))
                           ).dropna(axis=0, how='all')

In [128]:
df_rule1_5

Unnamed: 0,ticker_id,exchange_id,country,exchange_sym,ticker,company,sector,industry,stock_type,style,...,Year_Y_4,Year_Y_3,Year_Y_2,Year_Y_1,Net_Income_Y6,Net_Income_Y5,Net_Income_Y4,Net_Income_Y3,Net_Income_Y2,Net_Income_Y1
0,1.0,374.0,USA,ARCX,OGCP,Empire State Realty OP LP Operating Partnershi...,Real Estate,REIT - Diversified,Hard Asset,Mid Core,...,2017-12,2016-12,2015-12,2014-12,1.164020e+08,1.164020e+08,1.182530e+08,1.072500e+08,7.992800e+07,7.021000e+07
1,2.0,374.0,USA,ARCX,FISK,Empire State Realty OP LP Operating Partnershi...,Real Estate,REIT - Diversified,Hard Asset,Mid Core,...,2017-12,2016-12,2015-12,2014-12,1.164020e+08,1.164020e+08,1.182530e+08,1.072500e+08,7.992800e+07,7.021000e+07
2,3.0,374.0,USA,ARCX,ESBA,Empire State Realty OP LP Operating Partnershi...,Real Estate,REIT - Diversified,Hard Asset,Mid Core,...,2017-12,2016-12,2015-12,2014-12,1.164020e+08,1.164020e+08,1.182530e+08,1.072500e+08,7.992800e+07,7.021000e+07
4,19275.0,302.0,USA,XNYS,LPT,Liberty Property Trust,Real Estate,REIT - Office,Hard Asset,Mid Core,...,2017-12,2016-12,2015-12,2014-12,4.796070e+08,4.796070e+08,2.823400e+08,3.568170e+08,2.380390e+08,2.179100e+08
5,19727.0,302.0,USA,XNYS,VNO,Vornado Realty Trust,Real Estate,REIT - Office,Hard Asset,Mid Core,...,2017-12,2016-12,2015-12,2011-12,4.499540e+08,4.499540e+08,2.274160e+08,9.069170e+08,7.604340e+08,6.623020e+08
7,19849.0,302.0,USA,XNYS,DEI,Douglas Emmett Inc,Real Estate,REIT - Office,Hard Asset,Mid Core,...,2017-12,2016-12,2015-12,2014-12,1.160860e+08,1.160860e+08,9.444300e+07,8.539700e+07,5.838400e+07,4.462100e+07
8,19335.0,302.0,USA,XNYS,MAA,Mid-America Apartment Communities Inc,Real Estate,REIT - Residential,Hard Asset,Mid Core,...,2017-12,2016-12,2015-12,2014-12,2.228990e+08,2.228990e+08,3.283790e+08,2.122220e+08,3.322870e+08,1.479800e+08
9,19694.0,302.0,USA,XNYS,UDR,UDR Inc,Real Estate,REIT - Residential,Hard Asset,Mid Core,...,2017-12,2016-12,2015-12,2014-12,2.031060e+08,2.031060e+08,1.215580e+08,2.927180e+08,3.403830e+08,1.543340e+08
10,19021.0,302.0,USA,XNYS,EGP,EastGroup Properties Inc,Real Estate,REIT - Industrial,Hard Asset,Mid Core,...,2017-12,2016-12,2015-12,2014-12,8.850600e+07,8.850600e+07,8.318300e+07,9.550900e+07,4.786600e+07,4.794100e+07
11,19575.0,302.0,USA,XNYS,LSI,Life Storage Inc,Real Estate,REIT - Industrial,Hard Asset,Mid Core,...,2017-12,2016-12,2015-12,2014-12,2.065900e+08,2.065900e+08,9.636500e+07,8.522500e+07,1.125240e+08,8.853100e+07


#### 7 Years:

In [112]:
cols = ['country_c3', 'exchange_sym', 'ticker', 'company', 'sector', 'industry'] + \
        [col for col in df_keyratios.columns if col.startswith('i4_') or col.startswith('Y')]

df_rule1_7 = (df_keyratios
              .where((df_keyratios['security_type'] == 'Stock') & 
                     (df_keyratios['Y9'] >= pd.to_datetime('2018-01')) & (df_keyratios['i4_Y10'] > 0) & 
                     ((df_keyratios['i4_Y9'] > 0) | (df_keyratios['i4_Y9'].isna())) & 
                     ((df_keyratios['i4_Y8'] > 0) | (df_keyratios['i4_Y8'].isna())) & 
                     ((df_keyratios['i4_Y7'] > 0) | (df_keyratios['i4_Y7'].isna())) & 
                     ((df_keyratios['i4_Y6'] > 0) | (df_keyratios['i4_Y6'].isna())) & 
                     ((df_keyratios['i4_Y5'] > 0) | (df_keyratios['i4_Y5'].isna())) & 
                     ((df_keyratios['i4_Y4'] > 0) | (df_keyratios['i4_Y4'].isna())) & 
                     ((df_keyratios['i4_Y3'] > 0) | (df_keyratios['i4_Y3'].isna())))
              .dropna(axis=0, how='all'))[cols]

In [113]:
df_rule1_7

Unnamed: 0,country_c3,exchange_sym,ticker,company,sector,industry,i4_Y0,i4_Y1,i4_Y2,i4_Y3,...,Y1,Y2,Y3,Y4,Y5,Y6,Y7,Y8,Y9,Y10
0,USA,ARCX,OGCP,Empire State Realty OP LP Operating Partnershi...,Real Estate,REIT - Diversified,,,57.0,49.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
1,USA,ARCX,FISK,Empire State Realty OP LP Operating Partnershi...,Real Estate,REIT - Diversified,,,57.0,49.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
2,USA,ARCX,ESBA,Empire State Realty OP LP Operating Partnershi...,Real Estate,REIT - Diversified,,,57.0,49.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
4,USA,XNYS,LPT,Liberty Property Trust,Real Estate,REIT - Office,56.0,128.0,184.0,137.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
5,USA,XNYS,VNO,Vornado Realty Trust,Real Estate,REIT - Office,721.0,636.0,395.0,106.0,...,2007-12-01,2008-12-01,2009-12-01,2010-12-01,2011-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
7,USA,XNYS,DEI,Douglas Emmett Inc,Real Estate,REIT - Office,-27.0,-26.0,1.0,23.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
8,USA,XNYS,MAA,Mid-America Apartment Communities Inc,Real Estate,REIT - Residential,37.0,30.0,49.0,105.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
9,USA,XNYS,UDR,UDR Inc,Real Estate,REIT - Residential,-88.0,-103.0,20.0,212.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
10,USA,XNYS,EGP,EastGroup Properties Inc,Real Estate,REIT - Industrial,27.0,18.0,22.0,32.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM
11,USA,XNYS,LSI,Life Storage Inc,Real Estate,REIT - Industrial,20.0,41.0,31.0,55.0,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,TTM


[return to top of this section](#value),
[return to the top](#top)
<a id="rule2"></a>
### Rule 2. Uniterrupted and increasing *Dividends* for past 7 yrs

**a. Identify *Dividends* column label in** `df_keyratios`

In [114]:
icol = df_labels_kratios[df_labels_kratios == 'Dividends'].index[0]
icol

'i6'

**b. Get column headers for *Dividends* for the past 5 yrs**

In [115]:
main_cols2 = ['ticker_id', 'exchange_id', 
             #'country_c3', 'exchange_sym', 'ticker', 'company', 
             #'sector', 'industry', 'stock_type', 'style', 
             'Y10', 'Y9', 'Y8', 'Y7', 'Y6', 'Y5']
icols = sorted([col for col in df_keyratios.columns if icol + '_' in col], 
               key=lambda col: int(col[4:]), reverse=True)[:8]
icols

['i6_Y10', 'i6_Y9', 'i6_Y8', 'i6_Y7', 'i6_Y6', 'i6_Y5', 'i6_Y4', 'i6_Y3']

**c. Create 'Net Income' DataFrame**

In [116]:
df_rule2 = (df_keyratios
            .where((df_keyratios['security_type'] == 'Stock') & 
                   (df_keyratios['Y9'] >= pd.to_datetime('2018-01')) & 
                   (df_keyratios['i6_Y10'].notna()) & (df_keyratios['i6_Y9'].notna()) &
                   (df_keyratios['i6_Y8'].notna()) & (df_keyratios['i6_Y7'].notna()) &
                   (df_keyratios['i6_Y6'].notna()) & (df_keyratios['i6_Y5'].notna()) & 
                   (df_keyratios['i6_Y4'].notna()) & (df_keyratios['i6_Y3'].notna()) & 
                   (df_keyratios['i6_Y10'] >= df_keyratios['i6_Y9']) & 
                   (df_keyratios['i6_Y9'] >= df_keyratios['i6_Y8']) & 
                   (df_keyratios['i6_Y8'] >= df_keyratios['i6_Y7']) & 
                   (df_keyratios['i6_Y7'] >= df_keyratios['i6_Y6']) & 
                   (df_keyratios['i6_Y6'] >= df_keyratios['i6_Y5']) & 
                   (df_keyratios['i6_Y5'] >= df_keyratios['i6_Y4']) & 
                   (df_keyratios['i6_Y4'] >= df_keyratios['i6_Y3']))
            .dropna(axis=0, how='all').sort_values(by='Y9', ascending=False))[main_cols2 + icols]

df_rule2.columns = main_cols2 + [col.replace('i6', 'Dividend') for col in icols]

In [117]:
df_rule2

Unnamed: 0,ticker_id,exchange_id,Y10,Y9,Y8,Y7,Y6,Y5,Dividend_Y10,Dividend_Y9,Dividend_Y8,Dividend_Y7,Dividend_Y6,Dividend_Y5,Dividend_Y4,Dividend_Y3
30977,19768.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,1.72,1.72,1.56,1.48,1.40,1.32,1.24,0.88
31125,19325.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,0.72,0.72,0.72,0.72,0.72,0.72,0.72,0.72
31114,19143.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,0.90,0.90,0.90,0.90,0.90,0.90,0.80,0.80
31115,19428.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,1.36,1.36,1.08,1.08,1.00,0.84,0.72,0.60
31116,16616.0,44.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.18
31118,16890.0,44.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,0.90,0.90,0.64,0.54,0.47,0.40,0.34,0.28
31121,19104.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,1.52,1.52,1.52,1.48,1.44,1.32,1.10,0.80
31122,18997.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,0.40,0.40,0.34,0.28,0.26,0.24,0.22,0.20
31128,19393.0,302.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,1.48,1.48,1.48,1.48,1.48,1.32,1.20,1.08
55497,700189.0,16.0,TTM,2019-01-01,2018-01-01,2017-01-01,2016-01-01,2015-01-01,0.54,0.54,0.50,0.46,0.40,0.38,0.38,0.32


[return to top of this section](#value),
[return to the top](#top)
<a id="rule3"></a>
### Rule 3. P/E Ratio of 25 or less for the past 7 yrs and less then 20 for TTM

In [134]:
pe_cols = [col for col in df_vals.columns if 'PE_' in col]
pe_cols = ['ticker_id', 'exchange_id'] + [pe_cols[len(pe_cols)-i-1] for i in range(len(pe_cols))][:8]
pe_cols

['ticker_id',
 'exchange_id',
 'PE_TTM',
 'PE_2018',
 'PE_2017',
 'PE_2016',
 'PE_2015',
 'PE_2014',
 'PE_2013',
 'PE_2012']

In [136]:
df_rule3 = (df_vals[pe_cols]
            .where((df_vals['PE_TTM'] <= 20) & (df_vals['PE_2018'] <= 25) &
                   (df_vals['PE_2017'] <= 25) & (df_vals['PE_2016'] <= 25) &
                   (df_vals['PE_2015'] <= 25) & (df_vals['PE_2014'] <= 25) &
                   (df_vals['PE_2013'] <= 25) & (df_vals['PE_2012'] <= 25)).dropna(axis=0, how='all'))

In [137]:
df_rule3

Unnamed: 0,ticker_id,exchange_id,PE_TTM,PE_2018,PE_2017,PE_2016,PE_2015,PE_2014,PE_2013,PE_2012
643,644.0,1.0,14.3,15.0,15.2,15.1,13.6,17.7,11.8,10.4
700,704.0,1.0,6.4,5.4,9.0,15.5,14.5,14.7,17.1,16.2
878,885.0,1.0,15.9,16.2,20.4,21.4,18.3,15.2,14.1,14.5
888,895.0,1.0,7.5,7.3,8.2,7.2,11.7,14.9,24.3,17.8
946,954.0,1.0,5.9,4.4,14.3,9.1,3.5,6.4,4.5,5.2
1054,1063.0,17.0,5.7,4.3,9.4,11.0,6.6,6.4,6.0,12.4
1055,1063.0,141.0,5.5,4.2,8.8,10.6,6.3,6.9,6.7,13.6
1080,1087.0,1.0,4.3,4.1,7.4,5.8,4.7,5.0,6.4,6.0
1144,1152.0,1.0,11.1,12.2,12.9,13.3,18.9,8.5,4.4,4.7
1149,1158.0,1.0,15.1,10.9,16.5,15.9,16.1,15.3,13.6,8.0


[return to top of this section](#value),
[return to the top](#top)
<a id="rule4"></a>
### Rule 4. P/B Ratio of 1 or less for TTM

In [138]:
pb_cols = [col for col in df_vals.columns if 'PB_' in col]
pb_cols = [pb_cols[len(pb_cols)-i-1] for i in range(len(pb_cols))][:6]
pb_cols

['PB_TTM', 'PB_2018', 'PB_2017', 'PB_2016', 'PB_2015', 'PB_2014']

In [139]:
df_rule4 = (df_vals[['ticker_id', 'exchange_id'] + pb_cols]
            .where(df_vals['PB_TTM'] <= 1).dropna(axis=0))

In [140]:
df_rule4

Unnamed: 0,ticker_id,exchange_id,PB_TTM,PB_2018,PB_2017,PB_2016,PB_2015,PB_2014
144,139.0,1.0,0.7,0.6,0.7,0.2,0.6,0.8
148,143.0,1.0,0.1,0.1,0.3,0.6,0.2,0.5
308,302.0,1.0,0.9,1.0,3.6,-14.1,0.9,0.1
340,335.0,1.0,0.8,0.6,1.8,0.3,0.2,0.8
366,361.0,1.0,0.5,0.5,1.2,0.9,2.1,8.5
400,396.0,1.0,0.6,0.6,1.4,2.8,5.1,4.5
513,511.0,1.0,1.0,0.9,1.9,2.1,1.0,1.0
601,602.0,1.0,0.3,0.4,0.9,2.4,0.2,0.2
672,675.0,1.0,0.8,0.6,1.5,2.1,1.7,3.1
686,689.0,1.0,0.2,0.1,0.8,0.9,0.7,1.0


[return to top of this section](#value),
[return to the top](#top)
<a id="rule5"></a>
### Rule 5. Filtering for *bargain issues* as described by Benjamin Graham in *The Intelligent Investor*:

In his book, Graham defines bargain issues as common stocks where:

- $stock\ price\ < \frac{2}{3}\ net\ current\ asset\ value\ per\ share\ (NCAVPS)\$

- $net\ current\ asset\ value\ (NCAV)\ = current\ assets\ - total\ liabilities\ - preferred\ stocks\$

In [141]:
# df_master columns
cols1 = ['ticker_id', 'exchange_id', 'country_c3', 'exchange_sym', 'ticker', 'company',
       'industry', 'sector', 'security_type', 'lastprice']

# df_quarterlyBS columns
cols2 = ['ticker_id', 'exchange_id', 'Year_Y_5', 
         'label_i81', 'data_i81_Y_5', # Preferred stock
         #'label_g1', 'data_g1_Y_5', # Assets / Current assets
         'label_s1', 'data_s1_Y_5', # Assets /
         #'label_ttg1', 'data_ttg1_Y_5', # Total assets / Total current assets
         #'label_tts1', 'data_tts1_Y_5', # Total assets
         #'label_gg5', 'data_gg5_Y_5', # Current liabilities
         #'label_ttgg5', 'data_ttgg5_Y_5', # Total current liabilit...
         'label_g5', 'data_g5_Y_5', # Liabilities
         #'label_ttg5', 'data_ttg5_Y_5' # Total liabilities
         'currency_id'
        ]

# df_finhealth columns
cols3 = ['ticker_id', 'exchange_id', 
        #'i53_fh_Y5', # Total Assets
        'i49_fh_Y10'#,  Total Current Assets
        #'i62_fh_Y5', # Total Assets
        #'i59_fh_Y5' # Total Current Liabilities
       ]

# df_quarterlyIS columns
cols4 = ['ticker_id', 'exchange_id', 
         'data_i85_Y_5' # Basic Outstanding Shares
        ]


df_bs0 = (df_master[cols1]
          .where(df_master['security_type'] == 'Stock')
          .merge(df_quarterlyBS[cols2], on=['ticker_id', 'exchange_id'])
          .merge(df_finhealth[cols3], on=['ticker_id', 'exchange_id'])
          .merge(df_quarterlyIS[cols4], on=['ticker_id', 'exchange_id'])
          #.rename(columns={'data_g1_Y_5':'current_assets'})
        )

# Map Column Headers
for col in [col for col in df_bs0.columns if 'label_' in col]:
    df_bs0[col] = df_bs0[col].map(df.colheaders['header'])
    
# Map Currency Name
df_bs0['currency'] = df_bs0['currency_id'].map(df.currencies.set_index('id')['currency'])

In [142]:
df_bs = (df_bs0.where((df_bs0['label_i81'] == 'Preferred stock') & 
                     ((df_bs0['label_s1'] == 'Assets') & (df_bs0['data_s1_Y_5'].notna())) & 
                     ((df_bs0['label_g5'] == 'Liabilities') & (df_bs0['data_g5_Y_5'].notna()))
                    ).dropna(axis=0, how='all'))

In [143]:
df_bs['current_assets'] = df_bs['data_s1_Y_5'] * df_bs['i49_fh_Y10'] / 100

df_bs['data_i81_Y_5'] = df_bs['data_i81_Y_5'].fillna(0)

df_bs['NCAV'] = df_bs['current_assets'] - df_bs['data_g5_Y_5'] - df_bs['data_i81_Y_5']

df_bs['NCAVPS'] = df_bs['NCAV'] / df_bs['data_i85_Y_5']

In [144]:
df_bs[df_bs['NCAV'] > 0]

Unnamed: 0,ticker_id,exchange_id,country_c3,exchange_sym,ticker,company,industry,sector,security_type,lastprice,...,data_s1_Y_5,label_g5,data_g5_Y_5,currency_id,i49_fh_Y10,data_i85_Y_5,currency,current_assets,NCAV,NCAVPS
16,19367.0,302.0,USA,XNYS,NOV,National Oilwell Varco Inc,Oil & Gas Equipment & Services,Energy,Stock,28.32,...,1.979600e+10,Liabilities,5.977000e+09,104.0,36.77,3.780000e+08,United States Dollar,7.278989e+09,1.301989e+09,3.444416
36,1226.0,1.0,USA,GREY,BSMAF,Bursa Malaysia Bhd,Financial Exchanges,Financial Services,Stock,1.84,...,2.434560e+09,Liabilities,1.559402e+09,62.0,74.91,8.074820e+08,Malaysia Ringgit,1.823729e+09,2.643269e+08,0.327347
45,10933.0,25.0,USA,PINX,PCDVF,Pacific Century Regional Developments Ltd,Asset Management,Financial Services,Stock,0.31,...,1.424342e+09,Liabilities,1.900800e+07,88.0,2.04,2.649740e+09,Singapore Dollar,2.905658e+07,1.004858e+07,0.003792
62,13351.0,25.0,USA,PINX,FUPBY,Fuchs Petrolub SE ADR,Chemicals,Basic Materials,Stock,10.55,...,1.891000e+09,Liabilities,4.360000e+08,34.0,53.83,5.501146e+08,Euro Member Countries,1.017925e+09,5.819253e+08,1.057825
63,8275.0,25.0,USA,PINX,FUPEF,Fuchs Petrolub SE,Chemicals,Basic Materials,Stock,44.20,...,1.891000e+09,Liabilities,4.360000e+08,34.0,53.83,1.375287e+08,Euro Member Countries,1.017925e+09,5.819253e+08,4.231302
64,10782.0,25.0,USA,PINX,KURRF,Kuraray Co Ltd,Chemicals,Basic Materials,Stock,12.60,...,9.471100e+11,Liabilities,3.910900e+11,52.0,42.10,3.486830e+08,Japan Yen,3.987333e+11,7.643310e+09,21.920512
65,12178.0,25.0,USA,PINX,KURRY,Kuraray Co Ltd ADR,Chemicals,Basic Materials,Stock,40.23,...,9.471100e+11,Liabilities,3.910900e+11,52.0,42.10,1.162277e+08,Japan Yen,3.987333e+11,7.643310e+09,65.761537
67,16781.0,44.0,USA,XNAS,PAAS,Pan American Silver Corp,Silver,Basic Materials,Stock,13.43,...,1.937476e+09,Liabilities,4.292640e+08,104.0,28.30,1.533540e+08,United States Dollar,5.483057e+08,1.190417e+08,0.776254
75,709.0,1.0,USA,GREY,GDMOF,DMG Mori Aktiengesellschaft,Tools & Accessories,Industrials,Stock,54.45,...,2.440499e+09,Liabilities,1.247264e+09,34.0,68.94,7.881799e+07,Euro Member Countries,1.682480e+09,4.352160e+08,5.521785
79,2541259.0,25.0,USA,PINX,SAABY,Saab AB ADR,Aerospace & Defense,Industrials,Stock,17.00,...,5.612800e+10,Liabilities,3.671600e+10,93.0,69.46,2.378995e+08,Sweden Krona,3.898651e+10,2.270509e+09,9.543983


[return to top of this section](#value),
[return to the top](#top)
<a id="rule6"></a>
### Rule 6. Operating Cash Flow growth for the past 7 yrs

[return to top of this section](#value),
[return to the top](#top)
<a id="rule7"></a>
### Rule 7. *Owner earnings* growth rate > 6% over past 7 years

$owner\ earning's = net\ income + amortization\ and\ depreciation\ - normal\ capital\ expenditures$

Benjamin Graham mentions in *Intelligent Investor* that, because it adjusts for entries like amortization and depreciation that do not affect the company's cash balances, *owner earnings* is a better measure to reported net income. 

[return to top of this section](#value),
[return to the top](#top)
<a id="rule8"></a>
### Rule 8. Long-term debt < 50% of total capital

[return to top of this section](#value),
[return to the top](#top)
<a id="rule9"></a>
### Rule 9. NAV per share > Stock Price

The definition of NAV as described by Graham in the Intelligent Investor

$net\ asset\ value\ (NAV)\ = total\ assets\ - intangible\ assets\ (patents,\ goodwill)\ - total\ liabilities\$

[return to top of this section](#value),
[return to the top](#top)
<a id="rule10"></a>
### Rule 10. Growth Stocks as defined in *The Intelligent Investor*

*'The term “growth stock” is applied to one which has increased its per-share earnings in the past at well above the rate for common stocks generally and is expected to continue to do so in the future. (Some authorities would say that a true growth stock should be expected at least to double its per-share earnings in ten years—i.e., to increase them at a __compounded annual rate of over 7.1%__.)'* - Benjamin Graham, *The Intelligent Investor*

$$Compounded\ Annual\ Growth\ Rate\ (CAGR)\ = \left(\frac{EV}{BV}\right)^{\frac{1}{n}}-1 > 7.1\%$$

$$EV = Ending\ value\$$

$$BV = Beginning\ value\$$

$$n = Number\ of\ periods\ (months,\ years,\ etc.)\$$

[return to top of this section](#value),
[return to the top](#top)
<a id="rule11"></a>
### Rule 11. Positive ratio of earnings to fixed charges

[return to top of this section](#value),
[return to the top](#top)
<a id="rule12"></a>
### Rule 12. CAN SLIM

CAN SLIM refers to the acronym developed by the American stock research and education company Investor's Business Daily (IBD). IBD claims CANSLIM represents the seven characteristics that top-performing stocks often share before making their biggest price gains. It was developed in the 1950s by Investor's Business Daily founder William O'Neil. The method was named the top-performing investment strategy from 1998-2009 by the American Association of Individual Investors. In 2015, an exchange-traded fund (ETF) was launched focusing on the companies listed on the IBD 50, a computer generated list published by Investors Business Daily that highlights stocks based on the CAN SLIM investment criteria. [(Source)](https://en.m.wikipedia.org/wiki/CAN_SLIM?wprov=sfla1)

[return to top of this section](#value),
[return to the top](#top)
<a id="rule13"></a>
### Rule 13. 

In [39]:
df_growth_cols

gr_revenue               Revenue %
i28                 Year over Year
i29                 3-Year Average
i30                 5-Year Average
i31                10-Year Average
gr_operating    Operating Income %
i32                 Year over Year
i33                 3-Year Average
i34                 5-Year Average
i35                10-Year Average
gr_ni                 Net Income %
i81                 Year over Year
i82                 3-Year Average
i83                 5-Year Average
i84                10-Year Average
gr_eps                       EPS %
i36                 Year over Year
i37                 3-Year Average
i38                 5-Year Average
i39                10-Year Average
Name: 0, dtype: object

In [54]:
cols = ['ticker_id', 'exchange_id'] + [col for col in df_growth 
                                       if col.startswith('i28_') or col.startswith('gr_Y')]

(df_growth[cols]
 .where((df_growth['gr_Y9'] > pd.to_datetime('2018-01-01')) & 
        (df_growth['i28_gr_Y10'] > 0.0) & (df_growth['i28_gr_Y9'] > 0.0) &
        (df_growth['i28_gr_Y8'] > 0.0) & (df_growth['i28_gr_Y7'] > 0.0) & (df_growth['i28_gr_Y6'] > 0.0)
       )
 .dropna(axis=0, how='all')
)

Unnamed: 0,ticker_id,exchange_id,i28_gr_Y0,i28_gr_Y1,i28_gr_Y2,i28_gr_Y3,i28_gr_Y4,i28_gr_Y5,i28_gr_Y6,i28_gr_Y7,...,gr_Y1,gr_Y2,gr_Y3,gr_Y4,gr_Y5,gr_Y6,gr_Y7,gr_Y8,gr_Y9,gr_Y10
0,1.0,374.0,,,,-11.70,19.81,103.73,3.51,3.10,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
1,2.0,374.0,,,,-11.70,19.81,103.73,3.51,3.10,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
2,3.0,374.0,,,,-11.70,19.81,103.73,3.51,3.10,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
3,4.0,482.0,2.23,2.59,16.25,0.83,11.65,7.90,2.81,3.53,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
25,506.0,1.0,6.83,36.47,14.10,12.08,35.21,28.79,24.80,17.89,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
27,568.0,1.0,26.87,9.52,-0.15,10.17,9.30,7.92,17.10,17.09,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
28,615.0,1.0,-4.54,-41.80,33.62,6.32,-5.73,4.71,2.62,24.94,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
30,618.0,1.0,-16.75,-12.27,83.95,54.87,50.96,6.04,40.07,46.81,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
43,703.0,1.0,2.43,7.77,16.85,2.32,-2.12,7.52,16.06,9.59,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr
56,796.0,1.0,32.72,33.27,27.12,0.10,2.65,9.42,0.33,20.15,...,2010-12-01,2011-12-01,2012-12-01,2013-12-01,2014-12-01,2015-12-01,2016-12-01,2017-12-01,2018-12-01,Latest Qtr


[return to top of this section](#value),
[return to the top](#top)
<a id="rule14"></a>
### Rule 14. 

[return to top of this section](#value),
[return to the top](#top)
<a id="rule15"></a>
### Rule 15. 

[return to top of this section](#value),
[return to the top](#top)
<a id="mergerules"></a>
### Merging DataFrames

In [145]:
df_rules = (df_rule1
            .merge(df_rule2, on=['ticker_id', 'exchange_id'])
            .merge(df_rule3, on=['ticker_id', 'exchange_id'])
            .merge(df_rule4, on=['ticker_id', 'exchange_id'])
           )

In [146]:
df_rules.columns.values

array(['ticker_id', 'exchange_id', 'country', 'exchange_sym', 'ticker',
       'company', 'sector', 'industry', 'stock_type', 'style', 'Year_Y_6',
       'Year_Y_5', 'Year_Y_4', 'Year_Y_3', 'Year_Y_2', 'Year_Y_1',
       'Net_Income_Y6', 'Net_Income_Y5', 'Net_Income_Y4', 'Net_Income_Y3',
       'Net_Income_Y2', 'Net_Income_Y1', 'Y10', 'Y9', 'Y8', 'Y7', 'Y6',
       'Y5', 'Dividend_Y10', 'Dividend_Y9', 'Dividend_Y8', 'Dividend_Y7',
       'Dividend_Y6', 'Dividend_Y5', 'Dividend_Y4', 'Dividend_Y3',
       'PE_TTM', 'PE_2018', 'PE_2017', 'PE_2016', 'PE_2015', 'PE_2014',
       'PE_2013', 'PE_2012', 'PB_TTM', 'PB_2018', 'PB_2017', 'PB_2016',
       'PB_2015', 'PB_2014'], dtype=object)

In [160]:
dfr = (df_rules
       .where(df_rules['country'] == 'USA')
       .groupby(['country', 'exchange_sym', 'ticker', 'company', 'sector', 'industry'])
       .mean()
       .sort_values(by='exchange_sym', ascending=True)
      )

In [161]:
dfr

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,ticker_id,exchange_id,Net_Income_Y6,Net_Income_Y5,Net_Income_Y4,Net_Income_Y3,Net_Income_Y2,Net_Income_Y1,Dividend_Y10,Dividend_Y9,...,PE_2015,PE_2014,PE_2013,PE_2012,PB_TTM,PB_2018,PB_2017,PB_2016,PB_2015,PB_2014
country,exchange_sym,ticker,company,sector,industry,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
USA,PINX,BMWYY,Bayerische Motoren Werke AG ADR,Consumer Cyclical,Auto Manufacturers,11637.0,25.0,4165809000.0,4165809000.0,3845990000.0,3548578000.0,2880615000.0,2558010000.0,1.35,1.35,...,10.3,9.9,10.8,9.5,0.8,0.8,1.1,1.3,1.5,1.6
USA,PINX,CCOHF,China State Construction International Holdings Ltd,Industrials,Engineering & Construction,9413.0,25.0,3881063000.0,3881063000.0,3520646000.0,3196395000.0,2876989000.0,2591129000.0,0.35,0.35,...,13.9,13.5,22.2,15.8,0.8,0.8,2.0,2.0,2.5,2.5
USA,PINX,CLLDY,CapitaLand Ltd ADR,Real Estate,Real Estate - General,7082.0,25.0,4302000000.0,4302000000.0,3807000000.0,2063000000.0,860000000.0,1412000000.0,0.24,0.24,...,12.1,15.5,13.5,17.0,0.8,0.7,0.8,0.8,0.8,0.9
USA,PINX,CWYCY,China Railway Construction Corp Ltd ADR,Industrials,Engineering & Construction,11682.0,25.0,2897300000.0,2897300000.0,2984600000.0,2802500000.0,2727400000.0,2375300000.0,1.91,1.91,...,8.4,9.0,7.1,10.3,0.7,0.8,0.7,1.0,1.0,1.1
USA,PINX,HNGKY,Hongkong Land Holdings Ltd ADR,Real Estate,Real Estate - General,8462.0,25.0,5211093000.0,5211093000.0,8618423000.0,8043335000.0,15838850000.0,7710624000.0,1.0,1.0,...,12.8,13.9,9.9,11.4,0.4,0.4,0.5,0.5,0.6,0.6
USA,PINX,ISBA,Isabella Bank Corp,Financial Services,Banks - Regional - US,6100.0,25.0,3190058.0,3190058.0,1937088.0,1851480.0,2015214.0,2062000.0,1.04,1.04,...,15.5,13.5,15.5,13.9,0.9,0.9,1.1,1.1,1.3,1.0
USA,PINX,NBNKF,Nordea Bank Abp,Financial Services,Banks - Regional - Europe,9351.0,25.0,38459120000.0,38459120000.0,34338250000.0,40174100000.0,39473640000.0,46153410000.0,0.68,0.68,...,10.7,11.9,12.0,9.5,1.0,0.9,1.3,1.4,1.3,1.3
USA,PINX,NSANY,Nissan Motor Co Ltd ADR,Consumer Cyclical,Auto Manufacturers,6268.0,25.0,219472000.0,202270000.0,179328000.0,145610000.0,101904000.0,172357000.0,108.39,105.11,...,9.7,10.0,10.4,11.2,0.7,0.6,0.9,1.1,1.1,1.0
USA,PINX,QNTO,Quaint Oak Bancorp Inc,Financial Services,Banks - Regional - US,9752.0,25.0,12174000.0,12174000.0,8984000.0,8301000.0,8867000.0,8019000.0,0.26,0.26,...,17.9,15.5,23.5,10.0,1.0,1.0,1.1,1.1,1.2,1.0
USA,XASE,DIT,Amcon Distributing Co,Consumer Defensive,Food Distribution,15692.0,547.0,69993000.0,69993000.0,301278000.0,170216000.0,91848000.0,45033000.0,0.72,0.72,...,9.4,11.8,10.3,6.8,0.8,1.0,1.0,1.2,0.8,0.9


In [154]:
dfr.columns

Index(['ticker_id', 'exchange_id', 'Net_Income_Y6', 'Net_Income_Y5',
       'Net_Income_Y4', 'Net_Income_Y3', 'Net_Income_Y2', 'Net_Income_Y1',
       'Dividend_Y10', 'Dividend_Y9', 'Dividend_Y8', 'Dividend_Y7',
       'Dividend_Y6', 'Dividend_Y5', 'Dividend_Y4', 'Dividend_Y3', 'PE_TTM',
       'PE_2018', 'PE_2017', 'PE_2016', 'PE_2015', 'PE_2014', 'PE_2013',
       'PE_2012', 'PB_TTM', 'PB_2018', 'PB_2017', 'PB_2016', 'PB_2015',
       'PB_2014'],
      dtype='object')

<a id="additional"></a>
[return to the top](#top)

## Additional sample / test code

In [164]:
(df_master
 .where(df_master['avevol'] > 1000000).dropna(axis=0, how='all')
)

Unnamed: 0,ticker_id,exchange_id,ticker,company,exchange,exchange_sym,industry,sector,country,country_c2,...,aprvol,avevol,Forward_PE,pb,ps,pc,currency,currency_code,fy_end,updated_date
4,20371.0,302.0,STOR,STORE Capital Corp,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Diversified,Real Estate,United States,US,...,1.500000e+07,1.700000e+07,,1.9,12.7,17.5,United States Dollar,USD,2019-12-31,2019-04-08
14,19993.0,302.0,PEB,Pebblebrook Hotel Trust,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Hotel & Motel,Real Estate,United States,US,...,1.100000e+07,1.200000e+07,36.8,1.1,2.9,17.5,United States Dollar,USD,2019-12-31,2019-04-08
15,18925.0,302.0,NNN,National Retail Properties Inc,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Retail,Real Estate,United States,US,...,6.255050e+05,1.100000e+07,,2.5,13.5,17.8,United States Dollar,USD,2019-12-31,2019-04-08
20,18726.0,302.0,AIV,Apartment Investment & Management Co,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Residential,Real Estate,United States,US,...,6.082250e+05,1.300000e+07,181.8,4.8,7.9,19.4,United States Dollar,USD,2019-12-31,2019-04-08
24,19694.0,302.0,UDR,UDR Inc,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Residential,Real Estate,United States,US,...,1.100000e+07,1.500000e+07,,4.5,11.7,21.9,United States Dollar,USD,2019-12-31,2019-04-08
25,20221.0,302.0,AMH,American Homes 4 Rent Class A,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Residential,Real Estate,United States,US,...,9.166240e+05,2.000000e+07,20.6,1.3,6.3,16.5,United States Dollar,USD,2019-12-31,2019-04-08
26,20509.0,302.0,INVH,Invitation Homes Inc,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Residential,Real Estate,United States,US,...,2.700000e+07,3.400000e+07,19.8,1.6,7.4,22.7,United States Dollar,USD,2019-12-31,2019-04-08
31,19677.0,302.0,CUBE,CubeSmart,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Industrial,Real Estate,United States,US,...,8.983220e+05,1.400000e+07,,3.5,10.0,19.6,United States Dollar,USD,2019-12-31,2019-04-08
32,19013.0,302.0,DRE,Duke Realty Corp,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Industrial,Real Estate,United States,US,...,1.700000e+07,2.100000e+07,57.1,2.4,11.7,22.9,United States Dollar,USD,2019-12-31,2019-04-08
34,19162.0,302.0,HCP,HCP Inc,"NEW YORK STOCK EXCHANGE, INC.",XNYS,REIT - Healthcare Facilities,Real Estate,United States,US,...,2.600000e+07,3.200000e+07,69.4,2.5,8.3,17.5,United States Dollar,USD,2019-12-31,2019-04-08


In [16]:
df = None # Set df variable to none to close db connection 

Database connection for file db/mstables2.sqlite closed.
