# There is a wealth of online data sets

For the more business-minded attendees, let's switch briefly from flowers to stocks.

## Example:  grabbing data from the Yahoo finance API

"Yahoo_fin is a Python 3 package designed to scrape historical stock price data, as well as to provide current information on market caps, dividend yields, and which stocks comprise the major exchanges. Additional functionality includes scraping income statements, balance sheets, cash flows, holder information, and analyst data. The package includes the ability to scrape live (real-time) stock prices, capture cryptocurrency data, and get the most actively traded stocks on a current trading day. Yahoo_fin also contains a module for retrieving option prices and expiration dates." 

-- [yahoo_fin documentation](http://theautomatic.net/yahoo_fin-documentation/)

In [None]:
import yahoo_fin.stock_info as si

I will come back to the following libraries later, but include them now for a bit of fun.

In [None]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets

### Which companies comprise the Dow Jones Industrial Average?

* Use the Yahoo finance API to access the Dow's stock ticker symbols.

In [None]:
dow_list = si.tickers_dow()
dow_list

The Dow is comprised of 30 companies -- check.

We can also access APIs on the web with the requests library.

"Requests is an elegant and simple HTTP library for Python, built for human beings.  Requests allows you to send HTTP/1.1 requests extremely easily." 
<br>-- https://requests.readthedocs.io/en/master/ 

In [None]:
# Getting the actual company name from a ticker symbol
def get_symbol(symbol):
    url = "http://d.yimg.com/autoc.finance.yahoo.com/autoc?query={}&region=1&lang=en".format(symbol)
    result = requests.get(url).json()
    for x in result['ResultSet']['Result']:
        if x['symbol'] == symbol:
            return x['name']

In [None]:
get_symbol('MSFT')

In [None]:
for i in dow_list:
    print(i,'\t',get_symbol(i))

### What is the historical value of DJIA stocks over 2020 to date?

In [None]:
few_days = si.get_data('aapl', start_date = '01/01/2020', end_date = '08/17/2020')

In [None]:
type(few_days)

In [None]:
get_symbol('AAPL')

In [None]:
few_days.head()

It's easy to plot with Pandas.

In [None]:
few_days['high'].plot();

For more customization, Pandas uses matplotlib as a base library.

In [None]:
fig,ax = plt.subplots(figsize=(8,5))
ax = few_days['high'].plot()
ax.set_title(get_symbol('AAPL'))
fig.show()

In [None]:
def plotstock(ticker='AAPL'):
    few_days = si.get_data(ticker, start_date = '01/01/2020', end_date = '08/17/2020')
    fig,ax = plt.subplots(figsize=(8,5))
    ax = few_days['high'].plot()
    ax.set_title(get_symbol(ticker))
    fig.show()
    
ipywidgets.interact(plotstock,ticker=dow_list);

# The matplotlib equivalent version

In [None]:
plt.plot(few_days.index,few_days.high)

In [None]:
fig,ax = plt.subplots(figsize=(8,5))
plt.plot(few_days.index,few_days.high)
fig.autofmt_xdate()

In [None]:
fig,ax = plt.subplots(figsize=(8,5))
plt.plot(few_days.index,few_days.high)
ax.set_title(get_symbol('AAPL'))
fig.autofmt_xdate()

In [None]:
def plotstock(ticker='AAPL'):
    few_days = si.get_data(ticker, start_date = '01/01/2020', end_date = '08/17/2020')
    fig,ax = plt.subplots(figsize=(8,5))
    plt.plot(few_days.index,few_days.high)
    ax.set_title(get_symbol(ticker))
    fig.autofmt_xdate()
    
ipywidgets.interact(plotstock,ticker=dow_list);

### Apple is one of the companies on the Dow: Who are the largest holders of Apple stock (by percentage)?

In [None]:
apple_holders = si.get_holders('aapl')
apple_holders.keys()

The API returns a dictionary that includes pandas dataframes.

In [None]:
print(type(apple_holders))
print(type(apple_holders['Major Holders']))

In [None]:
apple_holders['Major Holders']

In [None]:
apple_holders['Direct Holders (Forms 3 and 4)']

In [None]:
apple_holders['Top Institutional Holders']

### Do these holders own stock in other DJIA companies?

First, let's look just at the Vanguard Group, Inc.

In [None]:
comp = []
for i in dow_list:
    comp.append(get_symbol(i))
compdf = pd.DataFrame({'Company':comp})
compdf

In [None]:
icomp = 'Vanguard Group, Inc. (The)'
compdf[icomp] = 0

for i in dow_list:

    gh = si.get_holders(i)
    compdf.loc[compdf['Company']==get_symbol(i),icomp] = 0.00
    
    if 'Direct Holders (Forms 3 and 4)' in gh.keys():
        if icomp in gh['Direct Holders (Forms 3 and 4)'].values:
            ghdf = gh['Direct Holders (Forms 3 and 4)']
            compdf.loc[compdf['Company']==get_symbol(i),icomp] = float(
                ghdf[ghdf['Holder']==icomp]['% Out'].iloc[0].replace('%',''))

compdf

In [None]:
compdf.sort_values('Vanguard Group, Inc. (The)').plot.barh(x='Company',y='Vanguard Group, Inc. (The)',figsize=(7,7));

The top 6 holders of Apple:

In [None]:
top6 = apple_holders['Direct Holders (Forms 3 and 4)'].iloc[:6]['Holder'].values[:6]
top6

In [None]:
for h in top6:
    compdf[h] = 0
    
for i in dow_list:
    gh = si.get_holders(i)
    if 'Direct Holders (Forms 3 and 4)' in gh.keys():
        compname = get_symbol(i)
        for h in top6:            
            if h in gh['Direct Holders (Forms 3 and 4)'].values:
                ghdf = gh['Direct Holders (Forms 3 and 4)']
                compdf.loc[compdf['Company']==compname,h] = float(
                    ghdf[ghdf['Holder']==h]['% Out'].iloc[0].replace('%',''))

In [None]:
compdf

In [None]:
compdf.sort_values('Vanguard Group, Inc. (The)',ascending=False)[:5].plot.barh(x='Company',figsize=(10,7),cmap='gist_rainbow')

In [None]:
compdf.sort_values('Vanguard Group, Inc. (The)').plot.bar(x='Company',
                figsize=(15,7),
                cmap='gist_rainbow')

In [None]:
compdf.sort_values('Vanguard Group, Inc. (The)')[8:].plot.bar(x='Company',
                figsize=(15,7),
                cmap='gist_rainbow')