# yFinance Datenextraktion

Aufgabe: 

- Installiert ein Data-Package Eurer Wahl (z.B. yfinance) und extrahiert die Ticker ("Symbol") der relevanten S&P500 Unternehmen. 
- Erstellt ein Pandas Dataframe mit täglichen Schlusskursen für alle relevanten Unternehmen für den Test-Zeitraum 2015-2022. 
- Überlegt Euch, welche Daten Ihr zum Training / Analyse Eures Ansatzes benutzen möchtet. 

## Imports

In [13]:
import pandas as pd
import numpy as np
import yfinance as yf
import datetime as dt
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt
import plotly.offline as pyo
pyo.init_notebook_mode(connected=True)
pd.options.plotting.backend = "plotly"

## Pandas-datareader & yFinance

In [93]:
end = dt.datetime(2022,12,31)
start = dt.datetime(2015,1,1)

print("Start:", start, "|", "End:", end)

Start: 2015-01-01 00:00:00 | End: 2022-12-31 00:00:00


### Tickers von S&P 500 extrahieren

In [3]:
# Read and print the stock tickers that make up S&P500
tickers = pd.read_html(
    'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]
tickers.head()

Unnamed: 0,Symbol,Security,GICS Sector,GICS Sub-Industry,Headquarters Location,Date added,CIK,Founded
0,MMM,3M,Industrials,Industrial Conglomerates,"Saint Paul, Minnesota",1957-03-04,66740,1902
1,AOS,A. O. Smith,Industrials,Building Products,"Milwaukee, Wisconsin",2017-07-26,91142,1916
2,ABT,Abbott,Health Care,Health Care Equipment,"North Chicago, Illinois",1957-03-04,1800,1888
3,ABBV,AbbVie,Health Care,Biotechnology,"North Chicago, Illinois",2012-12-31,1551152,2013 (1888)
4,ACN,Accenture,Information Technology,IT Consulting & Other Services,"Dublin, Ireland",2011-07-06,1467373,1989


### Pandas dataframe mit Daten von yFinance

In [65]:
# Get the data for this tickers from yahoo finance
data = yf.download(tickers.Symbol.to_list(), start, end, auto_adjust=True)['Close']
data.head()
data.to_csv("sp500_data.csv")

[*********************100%%**********************]  503 of 503 completed

4 Failed downloads:
['BF.B']: Exception('%ticker%: No price data found, symbol may be delisted (1d 2015-01-01 00:00:00 -> 2022-12-31 00:00:00)')
['BRK.B']: Exception('%ticker%: No timezone found, symbol may be delisted')
['KVUE', 'VLTO']: Exception("%ticker%: Data doesn't exist for startDate = 1420088400, endDate = 1672462800")


## Plot

In [72]:
def return_topx_data(statistic = "mean", top_x = 10, ascending=False):
    sorted = data.describe().T[statistic].sort_values(ascending=ascending)

    # Get the top x tickers with highest values
    top_x_tickers = sorted.head(top_x)
    top_x_tickers = top_x_tickers.index.to_list()
    if ascending == False:
        print("Top", top_x, "Companies/Tickers with highest", statistic)
    else:
        print("Top", top_x, "Companies/Tickers with lowest", statistic)
    return data[top_x_tickers]

In [73]:
return_topx_data().plot(title="Top S&P 500 Stocks with highest mean price")

Top 10 Companies/Tickers with highest mean


In [74]:
return_topx_data(ascending=True).plot(title="Top S&P 500 Stocks with lowest mean price")

Top 10 Companies/Tickers with lowest mean


In [95]:
data["AAPL"].pct_change().plot(kind="hist", title="Histogramm prozentuale tägliche Änderung AAPL")

---

- Src: https://gist.github.com/quantra-go-algo/ac5180bf164a7894f70969fa563627b2

---