# EURO STOXX 50 index scraper

This .ipynb file provides more detailed description of first part of our project - scraping the data used for analysis. Fistly, tickers of companies, which are currently in EURO STOXX 50 index, are download. Next, important financial data and ratios are scraped from yahoo finance for those tickers a compiled with them into two datasets.

## Packages


In [4]:
#Necessary packages for this part
!pip install yahoo_fin
!pip install requests_html



In [6]:
#Import packages
import yahoo_fin.stock_info as si #Get data
import pandas as pd #Data manipulation
from tqdm import tqdm #Make a progress bar because that's cool...
from bs4 import BeautifulSoup # to parse external data
import requests # to get data
from requests_html import HTMLSession
#import matplotlib.pyplot as plt
import plotly.express as px
import pickle

### EUROSTOXX 50 ticker scraper
Tickers are scraped from the wikipedia page of the index, which is updated regularly in case of changes. For complexity, the code has been written as function. However, one could only use simple for loop without function.

In [7]:
def getEURSTX50tickers():
    resp = requests.get('https://en.wikipedia.org/wiki/EURO_STOXX_50')
    soup = BeautifulSoup(resp.text, 'lxml')
    tableEURSTX = soup.find(text="Ticker").find_parent("table").find('tbody').findAll('tr')[1:]
    #Get the source code from wikipedia page of the index in the tableEURSTX

    EURSTX_tickers = []
    for row in tableEURSTX:
        ticker1 = row.findAll('td')[0].text.strip()
        EURSTX_tickers.append(ticker1)
    with open("EURSTX50tickers.pickle", "wb") as f:
        pickle.dump(EURSTX_tickers, f)
        #store into byte stream using pickle, later used in processing yahoo finance info
    return EURSTX_tickers
    #Simply iterate over rows in tableEURSTX and get ticker as text and store them

ER_tickers = getEURSTX50tickers()
ER_tickers
#stored tickers

['ADS.DE',
 'ADYEN.AS',
 'AD.AS',
 'AI.PA',
 'AIR.PA',
 'ALV.DE',
 'ABI.BR',
 'ASML.AS',
 'CS.PA',
 'BAS.DE',
 'BAYN.DE',
 'BBVA.MC',
 'SAN.MC',
 'BMW.DE',
 'BNP.PA',
 'CRG.IR',
 'DAI.DE',
 'BN.PA',
 'DB1.DE',
 'DPW.DE',
 'DTE.DE',
 'ENEL.MI',
 'ENI.MI',
 'EL.PA',
 'FLTR.IR',
 'IBE.MC',
 'ITX.MC',
 'IFX.DE',
 'INGA.AS',
 'ISP.MI',
 'KER.PA',
 'KNEBV.HE',
 'OR.PA',
 'LIN.DE',
 'MC.PA',
 'MUV2.DE',
 'RI.PA',
 'PHIA.AS',
 'PRX.AS',
 'SAF.PA',
 'SAN.PA',
 'SAP.DE',
 'SU.PA',
 'SIE.DE',
 'STLA.MI',
 'TTE.PA',
 'UMG.AS',
 'DG.PA',
 'VOW.DE',
 'VNA.DE']

### Yahoo Finance 
Stats and ratios are downloaded using two functions from yahoo_fin package. Unfortunately, it seems that yahoo finance is provides less maintanance and information on european stocks, as opposed to stocks listed on NYSE or NASDAQ. As a result of that, Yahoo Finance sometimes does not keep financial ratios for European stocks during closed exchanges, but instead, the ratios are shown to be N/A (unlike US stocks, where the values are always shown, even if they do not change during closed exchanges). This caused issues with our code, since the loop could not match ticker with respective data from yahoo finance. We solved this by using try: except: pass. However, for best performance of the code, we recommend using it during open exchanges, since code will omit every ticker, which does not have available information.

In [8]:
#First function to get yahoo finance data
def valuation_measures(reload_EURSTX50=False):
       
    if reload_EURSTX50:
        EURSTX_tickers = getEURSTX50tickers()
    else:
        with open("EURSTX50tickers.pickle","rb") as f:
            EURSTX_tickers = pickle.load(f)
        #ticker reload argument
    ticker_stats = {}     
    for ticker in EURSTX_tickers:
        try:
            df = si.get_stats_valuation(ticker)
            df = df.iloc[:,:2]
            df.columns = ["Attribute", "Recent"]
            ticker_stats[ticker] = df
        except:
            pass
    #loop to match tickers with yahoo finance valuation ratios, omitting tickers with no data on YF
    dat = pd.concat(ticker_stats)
    dat = dat.reset_index()
    dat = dat.dropna()
    del dat["level_1"]
    dat.columns = ["Ticker", "Attribute", "Recent"]
    dat.to_csv('df1.csv')
    #adjust the dataframe and save as .csv
    return dat
    

In [9]:
df1 = valuation_measures()
df1


Unnamed: 0,Ticker,Attribute,Recent
0,ADS.DE,Market Cap (intraday),46.54B
1,ADS.DE,Enterprise Value,47.38B
2,ADS.DE,Trailing P/E,32.69
3,ADS.DE,Forward P/E,24.94
4,ADS.DE,PEG Ratio (5 yr expected),0.59
...,...,...,...
444,VNA.DE,Forward P/E,19.57
446,VNA.DE,Price/Sales (ttm),9.45
447,VNA.DE,Price/Book (mrq),1.41
448,VNA.DE,Enterprise Value/Revenue,11.12


In [10]:
#Second function to get data, same principle as first function. This function gets more information from Yahoo_fin
def extra_stats(reload_EURSTX50=False):
    
    if reload_EURSTX50:
        EURSTX_tickers = getEURSTX50tickers()
    else:
        with open("EURSTX50tickers.pickle","rb") as f:
            EURSTX_tickers = pickle.load(f)
    ticker_extra_stats = {}
    for ticker in tqdm(ER_tickers):
        try:
            ticker_extra_stats[ticker] = si.get_stats(ticker)
        except:
            pass
    dat2 = pd.concat(ticker_extra_stats)
    dat2 = dat2.reset_index()
    dat2 = dat2.dropna()
    del dat2["level_1"]
    dat2.columns = ["Ticker", "Attribute", "Value"]
    dat2.to_csv('df2.csv')
    return dat2

In [11]:
df2 = extra_stats()
df2

  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.append(elt)
  table = table.appe

Unnamed: 0,Ticker,Attribute,Value
0,ADS.DE,Beta (5Y Monthly),0.87
1,ADS.DE,52-Week Change 3,-13.79%
2,ADS.DE,S&P500 52-Week Change 3,14.94%
3,ADS.DE,52 Week High 3,336.25
4,ADS.DE,52 Week Low 3,231.55
...,...,...,...
2545,VNA.DE,Total Debt/Equity (mrq),162.57
2546,VNA.DE,Current Ratio (mrq),0.39
2547,VNA.DE,Book Value Per Share (mrq),47.85
2548,VNA.DE,Operating Cash Flow (ttm),1.44B


In [12]:
print(df1)

     Ticker                  Attribute  Recent
0    ADS.DE      Market Cap (intraday)  46.54B
1    ADS.DE           Enterprise Value  47.38B
2    ADS.DE               Trailing P/E   32.69
3    ADS.DE                Forward P/E   24.94
4    ADS.DE  PEG Ratio (5 yr expected)    0.59
..      ...                        ...     ...
444  VNA.DE                Forward P/E   19.57
446  VNA.DE          Price/Sales (ttm)    9.45
447  VNA.DE           Price/Book (mrq)    1.41
448  VNA.DE   Enterprise Value/Revenue   11.12
449  VNA.DE    Enterprise Value/EBITDA    4.13

[376 rows x 3 columns]


In [13]:
print(df2)

      Ticker                     Attribute    Value
0     ADS.DE             Beta (5Y Monthly)     0.87
1     ADS.DE              52-Week Change 3  -13.79%
2     ADS.DE       S&P500 52-Week Change 3   14.94%
3     ADS.DE                52 Week High 3   336.25
4     ADS.DE                 52 Week Low 3   231.55
...      ...                           ...      ...
2545  VNA.DE       Total Debt/Equity (mrq)   162.57
2546  VNA.DE           Current Ratio (mrq)     0.39
2547  VNA.DE    Book Value Per Share (mrq)    47.85
2548  VNA.DE     Operating Cash Flow (ttm)    1.44B
2549  VNA.DE  Levered Free Cash Flow (ttm)    8.66B

[2073 rows x 3 columns]
