# Value Investing Program
# Introduction

Inspired by Sean Seah Book -- Gone Fishing with Warren Buffetthttp://www.aceprofitsacademy.com/wp-content/uploads/2016/09/Gone-Fishing-with-Buffett.pdf

In here we are going to try to scrape financial data:
Input: List of the companies

Web scraping: 
1. Find the shareprice by year and the following metrics:
    a. EPS
    b. ROE
    c. ROA
    d. Long term debt
    e. Total Income
    f. Debt to Equity
    g. Interest Coverage Ratio

Methods:
1. Given list of the companies, find out the feasibility to invest
    a. Been in market minimal 10 years
    b. Have the track records (EPS per year)
    c. Have efficiency (ROE > 15%) -- Net income / shareholder equity
    d. Determine manipulation (ROA > 7%) -- Net income / Total Asset
    e. Have small long term debt (Long term debt <5* total income)
    f. Low Debt to Equity
    g. Ability to pay interest: (Interest Coverage Ratio >3) -- EBIT / Interest expenses

Outputs:
1. Ranking of each company in terms of return rate given the value investing methodology
    a. Find EPS Annual Compounded Growth Rate
    b. Estimate EPS 10 years from now
    c. Estimate stock price 10 years from now (Stock Price EPS * Average PE)
    d. Determine target by price today based on returns(discount rate 15%/20%)
    e. Add margin of safety (Safety net 15%)

Additional:
1. Qualitative Assessment of the companies
    a. Advantages in business (product differentiation, branding, low price producer, high switching cost, legal barriers to entry)
    b. Ability of foolhardy management (even a fool can run)
    c. Avoid price competitive business    

# Web scraping

## Scraping Wikipedia SP500 Data Using Beautiful Soup

In [160]:
import bs4 as bs
import pickle
import requests

# This will keep tickers + gics industries & sub industries
def save_sp500_stocks_info():
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class': 'wikitable sortable'})
    stocks_info=[]
    tickers = []
    gics_industries = []
    gics_sub_industries = []
    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        gics_industry = row.findAll('td')[3].text
        gics_sub_industry = row.findAll('td')[4].text

        tickers.append(ticker.lower())
        gics_industries.append(gics_industry.lower())
        gics_sub_industries.append(gics_sub_industry.lower())
    
    stocks_info.append(tickers)
    stocks_info.append(gics_industries)
    stocks_info.append(gics_sub_industries)
    return stocks_info

stocks_info = save_sp500_stocks_info()
stocks_info_df = pd.DataFrame(stocks_info).T
stocks_info_df.columns=['tickers','gics_industry','gics_sub_industry']
stocks_info_df.set_index('tickers',inplace=True)

# Extract just the tickers list
tickers= stocks_info[0]

In [251]:
tickersshort = tickers[:10]
tickersmedium = tickers[:50]
tickersmediumlarge = tickers[:300]

## Scraping marketwatch Data Using Beautiful Soup

In [425]:
%%time

from urllib import urlopen
from bs4 import BeautifulSoup

dflist = []

for ticker in tickersshort: 
    try:
        urlfinancials = 'http://www.marketwatch.com/investing/stock/'+ticker+'/financials'
        urlbalancesheet = 'http://www.marketwatch.com/investing/stock/'+ticker+'/financials/balance-sheet'

        text_soup_financials = BeautifulSoup(urlopen(urlfinancials).read()) #read in
        text_soup_balancesheet = BeautifulSoup(urlopen(url).read()) #read in

        # Income statement
        titlesfinancials = text_soup_financials.findAll('td', {'class': 'rowTitle'})
        epslist=[]
        netincomelist = []
        longtermdebtlist = [] 
        interestexpenselist = []
        ebitdalist= []

        for title in titlesfinancials:
            if 'EPS (Basic)' in title.text:
                epslist.append ([td.text for td in title.findNextSiblings(attrs={'class': 'valueCell'}) if td.text])
            if 'Net Income' in title.text:
                netincomelist.append ([td.text for td in title.findNextSiblings(attrs={'class': 'valueCell'}) if td.text])
            if 'Interest Expense' in title.text:
                interestexpenselist.append ([td.text for td in title.findNextSiblings(attrs={'class': 'valueCell'}) if td.text])
            if 'EBITDA' in title.text:
                ebitdalist.append ([td.text for td in title.findNextSiblings(attrs={'class': 'valueCell'}) if td.text])


        # Balance sheet
        titlesbalancesheet = text_soup_balancesheet.findAll('td', {'class': 'rowTitle'})
        equitylist=[]
        for title in titlesbalancesheet:
            if 'Total Shareholders\' Equity' in title.text:
                equitylist.append( [td.text for td in title.findNextSiblings(attrs={'class': 'valueCell'}) if td.text])
            if 'Long-Term Debt' in title.text:
                longtermdebtlist.append( [td.text for td in title.findNextSiblings(attrs={'class': 'valueCell'}) if td.text])

        # Variables        
        eps = epslist[0]
        epsgrowth = epslist[1]
        netincome = netincomelist[0]
        shareholderequity = equitylist[0]
        roa = equitylist[1]

        longtermdebt = longtermdebtlist[0]
        interestexpense = interestexpenselist[0]
        ebitda = ebitdalist[0]
        # Don't forget to add in roe, interest coverage ratio

        ## Make it into Dataframes
        df= pd.DataFrame({'eps': eps,'epsgrowth': epsgrowth,'netincome': netincome,'shareholderequity': shareholderequity,'roa': 
                      roa,'longtermdebt': longtermdebt,'interestexpense': interestexpense,'ebitda': ebitda},index=[2012,2013,2014,2015,2016])

        # Format all the number in dataframe
        dfformatted = df.apply(format)

        # Adding roe, interest coverage ratio
        dfformatted['roe'] = dfformatted.netincome/dfformatted.shareholderequity
        dfformatted['interestcoverageratio'] = dfformatted.ebitda/dfformatted.interestexpense

    #     Insert ticker and df
        dflist.append((ticker,dfformatted))
    except:
        print(ticker,' ticker is not found')

Wall time: 41.2 s


In [426]:
len(dflist)

10

## Time
40 seconds for 10 tickers

3 minutes for 50 tickers


## Formatting all the values to numerical

In [427]:
def format(list):
    newlist=[]
    posornegnumber = 1
    for text in list:
        if text.endswith(')'):
            text = text[1:-1] # remove the parentheses
            posornegnumber = -1
            
        if text.endswith('%'):
#             Then please make it into comma float
            endtext = float(text[:-1])/100.0 * posornegnumber 
        elif text.endswith('B'):
#             Then please times 1000000000
#             Change it into integer
            endtext = int(float(text[:-1])*1000000000)* posornegnumber 
        elif text.endswith('M'):
#             Then please times 1000000
#             Change it into integer
            endtext = int(float(text[:-1])*1000000)* posornegnumber 
        elif text.endswith('-'):
#             Insert 0
            endtext = 0
        else:
#             change to float
            endtext = float(text)* posornegnumber 
        newlist.append(endtext)
    return newlist   

## Determining legibility
Find whether this particular stocks is legitimate using this and filter accordingly
    1. EPS increases over the year (consistent)
    2. ROE > 0.15
    3. ROA > 0.07 (also consider debt to equity cause Assets = liabilities + equity)
    4. Long term debt < 5 * income
    5. Interest Coverage Ratio > 3

In [298]:
def eligibilitycheck(df):
    ticker,dfformatted = df
    
    legiblestock = True
    reasonlist=[]

    # EPS increases over the year (consistent)
    for growth in dfformatted.epsgrowth:
        if growth<0:
            legiblestock = False
            reasonlist.append('there is negative growth '+str(growth))
            break
    # ROE > 0.15
    if dfformatted.roe.mean()<0.13:
            legiblestock = False
            reasonlist.append('roe mean is less than 0.13 '+ str(dfformatted.roe.mean()))
    # ROA > 0.07 (also consider debt to equity cause Assets = liabilities + equity)
    if dfformatted.roa.mean()<0.07:
            legiblestock = False
            reasonlist.append('roa mean is less than 0.07 ' + str(dfformatted.roa.mean()))
    # Long term debt < 5 * income
    if dfformatted.longtermdebt.tail(1).values[0]>5*dfformatted.netincome.tail(1).values[0]:
            legiblestock = False
            reasonlist.append('longtermdebt is 5 times the netincome ')
    # Interest Coverage Ratio > 3
    if dfformatted.interestcoverageratio.tail(1).values[0]<3:
            legiblestock = False
            reasonlist.append('Interestcoverageratio is less than 3 ')
#     print ticker,legiblestock,reasonlist
    return ticker,legiblestock

In [429]:
dflist[:2]

[(u'mmm',
            ebitda   eps  epsgrowth  interestexpense  longtermdebt   netincome  \
  2012  7760000000  6.40     0.0000        221000000    2990000000  4510000000   
  2013  8029999999  6.83     0.0672                0    2240000000  4720000000   
  2014  8529999999  7.63     0.1171        101000000    3230000000  5000000000   
  2015  8310000000  7.73     0.0125        117000000    2000000000  4840000000   
  2016  8490000000  8.35     0.0809        207000000    3940000000  5060000000   
  
           roa  shareholderequity       roe  interestcoverageratio  
  2012  0.7646        71720000000  0.062883              35.113122  
  2013  0.7871        87310000000  0.054060                    inf  
  2014  0.8040       103860000000  0.048142              84.455446  
  2015  0.8160       120330000000  0.040223              71.025641  
  2016  0.8301       139040000000  0.036392              41.014493  ),
 (u'abt',
            ebitda   eps  epsgrowth  interestexpense  longtermdebt   

In [300]:
selectiondflist = []
for df in dflist:
    if eligibilitycheck(df)[1]:
        selectiondflist.append(df)

mmm False ['roe mean is less than 0.13 0.0483401071703']
abt False ['there is negative growth -0.5638', 'roe mean is less than 0.13 0.0151214555543']
abbv False ['there is negative growth -0.2287', 'roe mean is less than 0.13 0.0445496491725']
acn False ['there is negative growth -0.0866', 'roe mean is less than 0.13 0.0338117326132']
atvi False ['there is negative growth -0.0495', 'roe mean is less than 0.13 0.00983340539427']
ayi False ['roe mean is less than 0.13 0.00172609663005', 'longtermdebt is 5 times the netincome ']
adbe False ['there is negative growth -0.6548', 'roe mean is less than 0.13 0.00623277670333']
amd False ['there is negative growth -3.77', 'roe mean is less than 0.13 -0.0060686317473', 'longtermdebt is 5 times the netincome ', 'Interestcoverageratio is less than 3 ']
aap False ['there is negative growth -0.0444', 'roe mean is less than 0.13 0.00436678953041', 'longtermdebt is 5 times the netincome ']
aes False ['there is negative growth -0.6113', 'roe mean is le

In [301]:
len(dflist)

36

In [302]:
len(selectiondflist)

2

In [307]:
# What are the tickers of these?
print [x[0] for x in selectiondflist]

[u'googl', u'goog']


## Scraping for latest shareprice using selectiondflist

In [428]:
import pandas as pd
import datetime
import pandas_datareader.data as web
from pandas import Series, DataFrame


days_per_year = 365.24

# start = datetime.datetime.now()-datetime.timedelta(days=(5*days_per_year))
start = datetime.datetime.now()-datetime.timedelta(days=1)
end = datetime.datetime.now()

# To pull individual stock
# df = web.DataReader("AAPL", 'google', start, end)
# df.tail()

# To pull group stocks
dfcomp = web.DataReader(dfrank.index,'google',
                               start=start, 
                               end=end)['Close']

# Using selectiondflist to calculate stocks price value

Outputs:
1. Ranking of each company in terms of return rate given the value investing methodology
    a. Find EPS Annual Compounded Growth Rate
    b. Estimate EPS 10 years from now
    c. Estimate stock price 10 years from now (Stock Price EPS * Average PE)
    d. Determine target by price today based on returns(discount rate 15%/20%)
    e. Add margin of safety (Safety net 15%)


In [416]:
import numpy as np
dfrank = pd.DataFrame(columns =['ticker','annualgrowthrate','lasteps','futureeps'])
i=0
for tuple in selectiondflist:
    ticker, df = tuple
    
    # Find EPS Annual Compounded Growth Rate
    annualgrowthrate =  df.epsgrowth.mean() #growth rate
    
    # Estimate stock price 10 years from now (Stock Price EPS * Average PE)
    lasteps = df.eps.tail(1).values[0] #presentvalue
    years  = 10 #period
    
    futureeps = abs(np.fv(annualgrowthrate,years,0,lasteps))
        
    dfrank.loc[i] = [ticker,annualgrowthrate,lasteps,futureeps]
    i+=1
    
dfrank.set_index('ticker',inplace=True)

In [418]:
dfrank['lastshareprice']=dfcomp.tail().T.values
dfrank['peratio'] = dfrank['lastshareprice']/dfrank['lasteps']
dfrank['futureshareprice'] = dfrank['futureeps']*dfrank['peratio']

In [423]:
discountrate = 0.20
margin = 0.15

dfrank['presentshareprice'] = abs(np.pv(discountrate,years,0,fv=dfrank['futureshareprice']))
dfrank['marginalizedprice'] = dfrank['presentshareprice']*(1-0.15) 

In [424]:
dfrank

Unnamed: 0_level_0,annualgrowthrate,lasteps,futureeps,lastshareprice,peratio,futureshareprice,presentshareprice,marginalizedprice
ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
googl,0.11778,28.32,86.229645,965.59,34.095692,2940.05941,474.836009,403.610607
goog,0.11778,28.32,86.229645,986.09,34.819562,3002.478467,484.917035,412.17948
