### **Name:** Python Automated Financial Analysis
### **University:** University of Texas at San Antonio
### **Class:** DA6223 - Data Tools & Techniques
### **Author:** Rudy Martinez
### **Last Updated:** February 4, 2021

<br>

### **Project Steps**
#### **1.** Scrape S&P 500 companies from Wikipedia and create a CSV file with the scraped company information
#### **2.** Create a dataframe from the CSV and randomly select 1 security (ticker symbol)
#### **3.** Scrape the Yahoo Finance site for key statistics, financial statements and stock price history for the selected security 
#### **4.** Create a multi-layered stock screen to determine company's financial strength (fundamental analysis via Piotroski F-Score)
#### **5.** Scrape Yahoo News Articles for langauge associated with "positive" sentiment for the security
#### **6.** Can financial strength and positive sentiment predict stock price?

<br>

#### **Python Packages and Modules**

In [85]:
import random
from bs4 import BeautifulSoup
import requests
import pandas as pd

<br>

#### **1. Scrape S&P 500 companies from Wikipedia and create a CSV file with the scraped company information**

In [86]:
#Acquires Wikipedia page content for S&P500 companies
wiki_url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
response_1 = requests.get(wiki_url)
company_page_content = BeautifulSoup(response_1.text, 'html.parser')


#Stores the table with company information into the company_table variable
table_id = "constituents"
company_table = company_page_content.find('table', attrs={'id': table_id})


#Creates a dataframe with company information and writes to csv
df = pd.read_html(str(company_table))
df[0].to_csv('00. S&P500 Company Information.csv')

<br>

#### **2. Create a dataframe from the CSV and randomly select 1 security (ticker symbol)**

In [87]:
#Reads the CSV that was generated
csv_df = pd.read_csv('00. S&P500 Company Information.csv')


#Creates a list of companies
company_list = csv_df['Symbol'].to_list()


#Randomly select 1 company from company_list
random_company = random.sample(company_list,1)


#Establishes the randomly selected stock variable as a string
stock = ''.join(random_company)

random_company

['RCL']

<br>

#### **3. Scrape the Yahoo Finance site for financial statements for the selected security**

In [88]:
#Funtion that makes all values numerical - will be needed in the below financial statement scraping
def conv_to_num(column):
    first_col = [val.replace(',','') for val in column]
    second_col = [val.replace('-','') for val in first_col]
    final_col = pd.to_numeric(second_col)
    
    return final_col

In [89]:
#Establishes the randomly selected stock variable as a string
stock = ''.join(random_company)


#Establishes URLs for Financial Statements and places them in a list
url_inc_statement = 'https://finance.yahoo.com/quote/{}/financials?p={}'
url_bs_statement = 'https://finance.yahoo.com/quote/{}/balance-sheet?p={}'
url_cf_statement = 'https://finance.yahoo.com/quote/{}/cash-flow?p={}'
url_list = [url_inc_statement, url_bs_statement, url_cf_statement]

statement_count = 0

for statement in url_list:
    #Acquires company financial statement page content 
    response_2 = requests.get(statement.format(stock, stock))
    fin_content = BeautifulSoup(response_2.text, 'html.parser')
    fin_data = fin_content.find_all('div', class_= 'D(tbr)')

    headers = []
    temp_list = []
    label_list = []
    final = []
    index = 0

    #Creates Headers for statement
    for item in fin_data[0].find_all('div', class_= 'D(ib)'):
        headers.append(item.text)


    #Statement Contents
    while index <= len(fin_data) - 1:
        temp = fin_data[index].find_all('div', class_= 'D(tbc)')
        for line in temp:
            temp_list.append(line.text)
        final.append(temp_list)
        temp_list = []
        index += 1
    
    
    #Places statement contents into a dataframe
    df = pd.DataFrame(final[1:])
    df.columns = headers
    
    
    #Makes all values numerical and removes na 
    for column in headers[1:]:
        df[column] = conv_to_num(df[column])
    
    final_df = df.fillna('-')
    
    
    #Used as a naming input for the csv export below
    statement_count += 1
    
    
    #Writes to csv for each financial statement
    if statement_count == 1:
        final_df.to_csv(f'01. {stock} Income Statement.csv')
    elif statement_count == 2:
        final_df.to_csv(f'02. {stock} Balance Sheet.csv')
    else:
        final_df.to_csv(f'03. {stock} Cash Flow Statement.csv')

<br>

#### **3. Scrape the Yahoo Finance site for stock price history for the selected security**

In [90]:
#Acquires stock price history for the selected stock
stock_url = 'https://query1.finance.yahoo.com/v7/finance/download/{}?'


#Parameters for 5 yeares of stock history
params = {
    'range': '5y',
    'interval': '1d',
    'events':'history'
}


#Acquire the data from the page given the above params
response_3 = requests.get(stock_url.format(stock), params=params)


#Puts the stock price data into a list
price_file = StringIO(response_3.text)
reader = csv.reader(price_file)
data = list(reader)


#Creates a stock price data frame and write to CSV
price_df = pd.DataFrame(data)
price_df.to_csv(f'04. {stock} Stock Price - 5 Year Historical.csv')

<br>

#### **3. Scrape the Yahoo Finance site for key statistics for the selected security**

In [91]:
#Established url for Key Statistics
stats = pd.read_html(f'https://finance.yahoo.com/quote/{stock}/key-statistics?p={stock}')


#Create dataframe with statistics
key_stats = stats[0]
stats_df = pd.DataFrame(key_stats)

#Write the dataframe to csv
stats_df.to_csv(f'05. {stock} Statistics.csv')

<br>

#### **4. Create a multi-layered stock screen to determine company's financial strength (fundamental analysis via Piotroski F-score)**

In [135]:
#Reads in the financial statement CSVs that were generated as well as the years
income_statement = pd.read_csv(f'01. {stock} Income Statement.csv')
balance_sheet = pd.read_csv(f'02. {stock} Balance Sheet.csv')
cashflow_statement = pd.read_csv(f'03. {stock} Cash Flow Statement.csv')
years = list(income_statement.columns[3:6])

#Remove first column from dataframes
income_statement.drop(income_statement.columns[0], axis = 1, inplace = True)
balance_sheet.drop(balance_sheet.columns[0], axis = 1, inplace = True)
cashflow_statement.drop(cashflow_statement.columns[0], axis = 1, inplace = True)

#Initialize scoring trackers
profitability_score = 0
leverage_liquidity_score = 0
operating_efficiency_score = 0


#Profitability
# def profitability():
net_inc_cy = income_statement[(income_statement['Breakdown'] == "Net Income Common Stockholders")]
net_inc_cy

# #Leverage and Liquidity
# def leverage():

    
# #Operating Effeciency
# def operating_efficiency():

Unnamed: 0,Breakdown,ttm,12/31/2019,12/31/2018,12/31/2017
8,Net Income Common Stockholders,4157391.0,1878887.0,1811042.0,1625133.0
