## Gather Company Info
Other code in this repository gathers OHLC data and calculates the value of portfolio holdings at different dates. However, to balance a portfolio, we need to also know information about each company, such as its sector and industry.

The following code pulls the unique holdings currently in the portfolio, obtains basic information for each company, and writes this data to a SQL database.

In [95]:
# necessary when running from jupyter lab o.g. docker image
# !pip install sqlalchemy_utils psycopg2-binary yfinance

In [148]:
import pandas as pd
import numpy as np
from datetime import date
import yfinance as yf

from sqlalchemy import create_engine
from sqlalchemy_utils import database_exists, create_database
import psycopg2

# connect to Docker SQL database
#engine = create_engine('postgresql+psycopg2://postgres:########@###.###.##.##:54320/finance')

#### Get a list of the S&P 500 companies

In [99]:
import bs4 as bs
import pickle
import requests

def getSP500():
    resp = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class': 'wikitable sortable'})
    tickers = []
    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker)
        
    with open("sp500tickers.pickle","wb") as f:
        pickle.dump(tickers,f)
        
    return tickers

sp = getSP500()

# initialize character 
char = '\n'
  
# Remove character from Strings list 
# using list comprehension + replace() 
sp = [ele.replace(char, '') for ele in sp] 

# fix Berkshire and others (can't read the period, needs the hyphen)
for i in sp:
    sp = [i.replace('.','-') for i in sp]
    
# find problematic strings
matching = [s for s in sp if "-" in s]

# remove the problems
for i in matching:
    sp.remove(i) 

# check what is returned
sp[0:20]

['MMM',
 'ABT',
 'ABBV',
 'ABMD',
 'ACN',
 'ATVI',
 'ADBE',
 'AMD',
 'AAP',
 'AES',
 'AFL',
 'A',
 'APD',
 'AKAM',
 'ALK',
 'ALB',
 'ARE',
 'ALXN',
 'ALGN',
 'ALLE']

#### Get a list of equities in the portfolio

In [105]:
# read Postgresql data into python as Pandas df
stocks = pd.read_sql_table(table_name = 'equities', schema='public', con=engine)
portfolio = list(stocks['ticker'])
portfolio[0:20]

['AAPL',
 'ABBV',
 'ADM',
 'AFL',
 'AFRM',
 'AQN',
 'BABA',
 'BRK-B',
 'BX',
 'CAG',
 'CASY',
 'CHRW',
 'CMI',
 'COST',
 'CTVA',
 'CURLF',
 'CVNA',
 'DG',
 'ELY',
 'ETSY']

In [110]:
# get list of unique equity symbols in either the S&P 500 or our portfolio
# https://stackoverflow.com/questions/28444561/get-only-unique-elements-from-two-lists
symbols_list = list(set(sp).symmetric_difference(set(portfolio)))
symbols_list.sort()
symbols_list[0:20]

['A',
 'AAL',
 'AAP',
 'ABC',
 'ABMD',
 'ABT',
 'ACN',
 'ADBE',
 'ADI',
 'ADP',
 'ADSK',
 'AEE',
 'AEP',
 'AES',
 'AFRM',
 'AIG',
 'AIZ',
 'AJG',
 'AKAM',
 'ALB']

In [144]:
# pull existing SQL db, compare with current list of holdings, make list of just those that are new to add to info db
missing_tickers = pd.read_sql("SELECT ticker AS missing \
FROM public.equities \
EXCEPT SELECT symbol AS missing \
FROM public.equity_info", con = engine)
missing_tickers = list(missing_tickers['missing'])
missing_tickers

['PPG', 'CASY']

#### Get basic info for each company and save to SQL

In [113]:
def yahooInfo(symbols_list):

    # start with empty dataframe
    info_df = pd.DataFrame()
    
    # grab Yahoo info on each company
    for i in symbols_list:
            try:
                df = pd.DataFrame.from_dict(yf.Ticker(i).info, orient='index').T
                df = df.set_index('symbol')

                # full join to avoid pesky issues with columns not alwasy matching
                info_df = pd.concat([info_df, df]) 
    
            except:
                print('Error obtaining info for ' + str(i))
                pass
            
    # reduce to only columns of interest
    info_cols = ['shortName','longName','exchange','market','sector','industry','quoteType','longBusinessSummary','country','city','state','zip']
    info_df = info_df[info_cols]
            
    return info_df

In [145]:
df = yahooInfo(missing_tickers)
df

Unnamed: 0_level_0,shortName,longName,exchange,market,sector,industry,quoteType,longBusinessSummary,country,city,state,zip
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
PPG,"PPG Industries, Inc.","PPG Industries, Inc.",NYQ,us_market,Basic Materials,Specialty Chemicals,EQUITY,"PPG Industries, Inc. manufactures and distribu...",United States,Pittsburgh,PA,15272
CASY,"Caseys General Stores, Inc.","Casey's General Stores, Inc.",NMS,us_market,Consumer Defensive,Grocery Stores,EQUITY,"Casey's General Stores, Inc., together with it...",United States,Ankeny,IA,50021


In [146]:
# when finished, print the data to sql database
df.to_sql(name = 'equity_info', schema = 'public', con=engine, if_exists='append')

In [147]:
# check if it worked
pd.read_sql_table(table_name = 'equity_info', schema='public', con=engine)

Unnamed: 0,symbol,shortName,longName,exchange,market,sector,industry,quoteType,longBusinessSummary,country,city,state,zip
0,AAPL,Apple Inc.,Apple Inc.,NMS,us_market,Technology,Consumer Electronics,EQUITY,"Apple Inc. designs, manufactures, and markets ...",United States,Cupertino,CA,95014
1,ABBV,AbbVie Inc.,AbbVie Inc.,NYQ,us_market,Healthcare,Drug Manufacturers—General,EQUITY,"AbbVie Inc. discovers, develops, manufactures,...",United States,North Chicago,IL,60064-6400
2,ADM,Archer-Daniels-Midland Company,Archer-Daniels-Midland Company,NYQ,us_market,Consumer Defensive,Farm Products,EQUITY,"Archer-Daniels-Midland Company procures, trans...",United States,Chicago,IL,60601
3,AFL,AFLAC Incorporated,Aflac Incorporated,NYQ,us_market,Financial Services,Insurance—Life,EQUITY,"Aflac Incorporated, through its subsidiaries, ...",United States,Columbus,GA,31999
4,AFRM,"Affirm Holdings, Inc.","Affirm Holdings, Inc.",NMS,us_market,Technology,Information Technology Services,EQUITY,"Affirm Holdings, Inc. operates a platform for ...",United States,San Francisco,CA,94108-2716
...,...,...,...,...,...,...,...,...,...,...,...,...,...
73,WM,"Waste Management, Inc.","Waste Management, Inc.",NYQ,us_market,Industrials,Waste Management,EQUITY,"Waste Management, Inc., through its subsidiari...",United States,Houston,TX,77002
74,WMT,Walmart Inc.,Walmart Inc.,NYQ,us_market,Consumer Defensive,Discount Stores,EQUITY,Walmart Inc. engages in the operation of retai...,United States,Bentonville,AR,72716
75,XYL,Xylem Inc.,Xylem Inc.,NYQ,us_market,Industrials,Specialty Industrial Machinery,EQUITY,"Xylem Inc., together with its subsidiaries, en...",United States,Rye Brook,NY,10573
76,PPG,"PPG Industries, Inc.","PPG Industries, Inc.",NYQ,us_market,Basic Materials,Specialty Chemicals,EQUITY,"PPG Industries, Inc. manufactures and distribu...",United States,Pittsburgh,PA,15272
