# Lesson 8 Intermediate Python for Data Analytics (Finance Performance and Fraudulent Detection)
# Introduction

In here we are going to try to scrape financial data:
* Income statement
* Balance sheet
* Cashflows

## Objective

* To use twippy (Python Library linked with Twitter) to load data from Twitter
* Setting up Twitter Developer account linked to profile
* Initialize Connections and Extracting the presidential election tweets
* To explore Presidential Election Trump vs Hillary from Twitter data 
    * Viewing the data
    * Search Term Analysis
    * Exploring Twitter Trends
* Sentimental Analysis
    * Generating Sentimental Analysis
    * Plotting out Sentimental Analysis
    * How about the news media. How often do they mention the election candidates?
* Topic Analysis
    * Generating Topic with LDA
    * Plotting out Topic Analysis
* Challenges:
    * Analysing fake news
    * Analysing geographic locations sentiments given charts
    * Applying these techniques for companies, commodities, and stocks
* Next lesson:
    * Lesson 6 Basic Python for Data Analytics (Optimization Model for Operations Management)

# Scraping Wikipedia SP500 Data Using Beautiful Soup

In [56]:
import bs4 as bs
import pickle
import requests

def save_sp500_tickers():
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class': 'wikitable sortable'})
    tickers = []
    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker.lower())
        
    return tickers

tickers = save_sp500_tickers()

In [58]:
tickers[:5]

[u'mmm', u'abt', u'abbv', u'acn', u'atvi']

# Scraping Financial Data
Using Selenium to scrape 
http://www.nasdaq.com/symbol/aapl/financials?query=income-statement&data=quarterly

## Repeat the scraping for multiple websites

In [59]:
import pandas as pd
from numpy import nan
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--disable-extensions")

## return nan values if elements not found, and convert the webelements to text
def get_elements(xpath):
    ## find the elements
    elements = browser.find_elements_by_xpath(xpath)
    ## if any are missing, return all nan values
    if len(elements) != 4:
        return [nan] * 4
    ## otherwise, return just the text of the element 
    else:
        text = []
        for e in elements:
            text.append(e.text)
        return text
 
## create a pandas dataframe to store the scraped data
df = pd.DataFrame(index=range(400),
                  columns=['company', 'quarter', 'quarter_ending', 
                           'total_revenue', 'gross_profit', 'net_income', 
                           'total_assets', 'total_liabilities', 'total_equity', 
                           'net_cash_flow'])
 
## launch the Chrome browser   
my_path = "C:\\Users\\vincentt.2013\\Downloads\\chromedriver.exe"
browser = webdriver.Chrome(executable_path=my_path,chrome_options=chrome_options)
browser.maximize_window()
 
url_form = "http://www.nasdaq.com/symbol/{}/financials?query={}&data=quarterly" 
financials_xpath = "//tbody/tr/th[text() = '{}']/../td[contains(text(), '$')]"
 
## company ticker symbols
symbols = tickers[:5]
 
for i, symbol in enumerate(symbols):
    try:
        ## navigate to income statement quarterly page    
        url = url_form.format(symbol, "income-statement")
        browser.get(url)

        company_xpath = "//h1[contains(text(), 'Company Financials')]"
        company = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, company_xpath))).text

        quarters_xpath = "//thead/tr[th[1][text() = 'Quarter:']]/th[position()>=3]"
        quarters = get_elements(quarters_xpath)

        quarter_endings_xpath = "//thead/tr[th[1][text() = 'Quarter Ending:']]/th[position()>=3]"
        quarter_endings = get_elements(quarter_endings_xpath)

        total_revenue = get_elements(financials_xpath.format("Total Revenue"))
        gross_profit = get_elements(financials_xpath.format("Gross Profit"))
        net_income = get_elements(financials_xpath.format("Net Income"))

        ## navigate to balance sheet quarterly page 
        url = url_form.format(symbol, "balance-sheet")
        browser.get(url)

        total_assets = get_elements(financials_xpath.format("Total Assets"))
        total_liabilities = get_elements(financials_xpath.format("Total Liabilities"))
        total_equity = get_elements(financials_xpath.format("Total Equity"))

        ## navigate to cash flow quarterly page 
        url = url_form.format(symbol, "cash-flow")
        browser.get(url)

        net_cash_flow = get_elements(financials_xpath.format("Net Cash Flow"))

        ## fill the datarame with the scraped data, 4 rows per company
        for j in range(4):  
            row = i*4 + j
            df.loc[row, 'company'] = company
            df.loc[row, 'quarter'] = quarters[j]
            df.loc[row, 'quarter_ending'] = quarter_endings[j]
            df.loc[row, 'total_revenue'] = total_revenue[j]
            df.loc[row, 'gross_profit'] = gross_profit[j]
            df.loc[row, 'net_income'] = net_income[j]
            df.loc[row, 'total_assets'] = total_assets[j]
            df.loc[row, 'total_liabilities'] = total_liabilities[j]
            df.loc[row, 'total_equity'] = total_equity[j]
            df.loc[row, 'net_cash_flow'] = net_cash_flow[j]
    except:
        print("symbol is missing information: "+symbol)
browser.quit()
 
## create a csv file in our working directory with our scraped data
df.to_csv("test.csv", index=False)

symbol is missing information: acn


In [63]:
quarter_endings

[u'9/30/2016', u'6/30/2016', u'3/31/2016', u'12/31/2015']

In [60]:
df = df.dropna(how='all')

In [61]:
df

Unnamed: 0,company,quarter,quarter_ending,total_revenue,gross_profit,net_income,total_assets,total_liabilities,total_equity,net_cash_flow
0,MMM Company Financials,4th,12/31/2016,"$7,329,000","$3,613,000","$1,155,000","$32,906,000","$22,608,000","$10,298,000","$90,000"
1,MMM Company Financials,3rd,9/30/2016,"$7,709,000","$3,862,000","$1,329,000","$34,051,000","$22,049,000","$12,002,000","$620,000"
2,MMM Company Financials,2nd,6/30/2016,"$7,662,000","$3,863,000","$1,291,000","$33,235,000","$21,341,000","$11,894,000","$351,000"
3,MMM Company Financials,1st,3/31/2016,"$7,409,000","$3,731,000","$1,275,000","$32,982,000","$21,249,000","$11,733,000","($461,000)"
4,ABT Company Financials,4th,12/31/2016,"$5,333,000","$3,021,000","$798,000","$52,666,000","$32,128,000","$20,538,000","$16,120,000"
5,ABT Company Financials,3rd,9/30/2016,"$5,302,000","$3,017,000","($329,000)","$39,497,000","$18,721,000","$20,776,000","($78,000)"
6,ABT Company Financials,2nd,6/30/2016,"$5,333,000","$3,046,000","$615,000","$39,831,000","$19,156,000","$20,675,000","($756,000)"
7,ABT Company Financials,1st,3/31/2016,"$4,885,000","$2,745,000","$316,000","$39,637,000","$18,915,000","$20,722,000","($1,667,000)"
8,ABBV Company Financials,4th,12/31/2016,"$6,796,000","$5,241,000","$1,391,000","$66,099,000","$61,463,000","$4,636,000","($1,118,000)"
9,ABBV Company Financials,3rd,9/30/2016,"$6,432,000","$4,928,000","$1,598,000","$66,626,000","$60,157,000","$6,469,000","($109,000)"


# Generating Financial Ratios data

A ratio analysis is a quantitative analysis of information contained in a company’s financial statements. Ratio analysis is based on line items in financial statements like the balance sheet, income statement and cash flow statement; the ratios of one item – or a combination of items - to another item or combination are then calculated. Ratio analysis is used to evaluate various aspects of a company’s operating and financial performance such as its efficiency, liquidity, profitability and solvency.

Read more: Ratio Analysis Definition | Investopedia http://www.investopedia.com/terms/r/ratioanalysis.asp#ixzz4ZzYSyt15 
Follow us: Investopedia on Facebook