### **Name:** Python Automated Financial Analysis
### **University:** University of Texas at San Antonio
### **Class:** DA6223 - Data Tools & Techniques
### **Author:** Rudy Martinez
### **Last Updated:** February 4, 2021

<br>

### **Project Steps**
#### **1.** Scrape S&P 500 companies from Wikipedia and create a CSV file with the scraped company information
#### **2.** Create a dataframe from the CSV and randomly select 1 security (ticker symbol)
#### **3.** Scrape the Yahoo Finance site for key statistics, financial statements and stock price history for the selected security 
#### **4.** Create a multi-layered stock screen to determine company's financial strength
#### **5.** Scrape Yahoo News Articles for langauge associated with "positive" sentiment for the security
#### **6.** Can financial strength and positive sentiment predict stock price?

<br>

#### **Python Packages and Modules**

In [1]:
import random
import re
import json
import csv
from io import StringIO
from bs4 import BeautifulSoup
import requests
import pandas as pd

<br>

#### **1. Scrape S&P 500 Companies from Wikipedia**

In [2]:
#Acquires Wikipedia page content for S&P500 companies
wiki_url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
response_1 = requests.get(wiki_url)
company_page_content = BeautifulSoup(response_1.text, 'html.parser')


#Stores the table with company information into the company_table variable
table_id = "constituents"
company_table = company_page_content.find('table', attrs={'id': table_id})


#Creates a dataframe with company information and writes to csv
df = pd.read_html(str(company_table))
df[0].to_csv('S&P500 Company Information.csv')

<br>

#### **2. Randomly Select 1 Company Ticker Symbol**

In [42]:
#Creates a new dataframe using the CSV that was generated
csv_df = pd.read_csv('S&P500 Company Information.csv')


#Creates a list of companies
company_list = csv_df['Symbol'].to_list()


#Randomly select 1 company from company_list
random_company = random.sample(company_list,1)


#Establishes the randomly selected stock variable as a string
stock = ''.join(random_company)

random_company

['IFF']

<br>

#### **3. Scrape Yahoo Finance for Financial Statements** 

In [47]:
#Establishes URLs for Key Statistics, Company Profile, and Financial Statements
url_financial_statements = 'https://finance.yahoo.com/quote/{}/financials?p={}'


#Establishes the randomly selected stock variable as a string
stock = ''.join(random_company)


#Acquires company financials page content
response_2 = requests.get(url_financial_statements.format(stock, stock))
fin_content = BeautifulSoup(response_2.text, 'html.parser')


#Creates a pattern using a regular expression to pinpoint where the financials are stored
pattern = re.compile(r'\s--\sData\s--\s')
script_data = fin_content.find('script', text = pattern).contents[0]


#Finds the boundaries of the slice where the financials are stored
start = script_data.find("context")-2
json_data = json.loads(script_data[start:-12])


#Dictionary keys of financials to help create variables
key_list = json_data['context']['dispatcher']['stores']['QuoteSummaryStore'].keys()


#Creates financial statements variables
annual_is = json_data['context']['dispatcher']['stores']['QuoteSummaryStore']['incomeStatementHistory']['incomeStatementHistory']
annual_cf = json_data['context']['dispatcher']['stores']['QuoteSummaryStore']['cashflowStatementHistory']['cashflowStatements']
annual_bs = json_data['context']['dispatcher']['stores']['QuoteSummaryStore']['balanceSheetHistory']['balanceSheetStatements']


#Consolidates financial statement variables into lists
annual_inc_statement = []
annual_cf_statement = []
annual_bs_statement = []


#Income Statement
for line in annual_is:
    statement = {}
    for key, val in line.items():
        try:
            statement[key] = val['raw']
        except TypeError:
            continue
        except KeyError:
            continue
    annual_inc_statement.append(statement)

    
#Cash Flow Statement    
for line in annual_cf:
    statement = {}
    for key, val in line.items():
        try:
            statement[key] = val['raw']
        except TypeError:
            continue
        except KeyError:
            continue
    annual_cf_statement.append(statement)

    
#Balance Sheet
for line in annual_bs:
    statement = {}
    for key, val in line.items():
        try:
            statement[key] = val['raw']
        except TypeError:
            continue
        except KeyError:
            continue
    annual_bs_statement.append(statement)

<br>

#### **3. Scrape Yahoo Finance for Stock Price History** 

In [44]:
#Acquires stock price history for the selected stock
stock_url = 'https://query1.finance.yahoo.com/v7/finance/download/{}?'


#Parameters for 5 yeares of stock history
params = {
    'range': '5y',
    'interval': '1d',
    'events':'history'
}


#Acquire the data from the page given the above params
response_3 = requests.get(stock_url.format(stock), params=params)


#Puts the stock price data into a list
price_file = StringIO(response_3.text)
reader = csv.reader(price_file)
data = list(reader)


#Creates a stock price data frame and write to CSV
price_df = pd.DataFrame(data)

price_df.to_csv(f'{stock} Stock Price - 5 Year Historical.csv')

<br>

#### **3. Scrape Yahoo Finance for Stock Statistics** 

In [45]:
#Established url for Key Statistics
stats = pd.read_html(f'https://finance.yahoo.com/quote/{stock}/key-statistics?p={stock}')


#Create dataframe with statistics
key_stats = stats[0]
stats_df = pd.DataFrame(key_stats)

#Write the dataframe to csv
stats_df.to_csv(f'{stock} Statistics.csv')