# Financial Time Series
# Data Collection

## Objective

This notebook downloads historical stock data for IBEX 35 companies using Yahoo Finance.

The goal is to build a clean, reproducible dataset of daily prices and volumes, which will later be used for return calculations, modeling, and portfolio optimization.


In [1]:
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import warnings

## Define IBEX35 Tickers

We use the list of IBEX 35 components (as of July 2025), based on the english Wikipedia page.

In [2]:
url = 'https://en.wikipedia.org/wiki/IBEX_35#Components'
IBEX35 = pd.read_html(url, header=0)[2]
IBEX35.head()

Unnamed: 0,Ticker,Company,Sector
0,ACS.MC,ACS,Construction
1,ACX.MC,Acerinox,Steel
2,AMS.MC,Amadeus IT Group,Tourism
3,ANA.MC,Acciona,Construction
4,ANE.MC,Acciona Energía,Energy


## Download Data from Yahoo Finance

We retrieve daily **Close** and **Volume** prices from April 10, 2015 to July 14, 2025.

Prices are adjusted for splits and dividends (`auto_adjust=True`).

In [3]:
data = yf.download(IBEX35["Ticker"].tolist(), start='2015-05-10', end='2025-07-14', auto_adjust=True )
data = data.loc[:, ['Close', 'Volume']]
data.columns = ['{}_{}'.format(price_type, ticker) for price_type, ticker in data.columns]
data.head()

[*********************100%***********************]  35 of 35 completed


Unnamed: 0_level_0,Close_ACS.MC,Close_ACX.MC,Close_AENA.MC,Close_AMS.MC,Close_ANA.MC,Close_ANE.MC,Close_BBVA.MC,Close_BKT.MC,Close_CABK.MC,Close_CLNX.MC,...,Volume_PUIG.MC,Volume_RED.MC,Volume_REP.MC,Volume_ROVI.MC,Volume_SAB.MC,Volume_SAN.MC,Volume_SCYR.MC,Volume_SLR.MC,Volume_TEF.MC,Volume_UNI.MC
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-05-11,16.940952,7.748485,87.81897,35.313824,49.867825,,5.24839,2.825591,2.65447,11.113306,...,,5608440,5718597,8943,26396456,67946568,10477838,221773,25738355,
2015-05-12,16.728851,7.705486,88.067459,35.695255,49.743042,,5.219527,2.861327,2.655066,11.187394,...,,5672096,7923122,14322,36956820,74136179,8688260,211602,127886879,
2015-05-13,16.633675,7.929081,88.624107,35.461189,49.632118,,5.214909,2.900387,2.644336,11.483747,...,,2489748,9626474,9754,32088105,58436689,8717111,73722,90652907,
2015-05-14,16.995333,8.055215,93.196457,35.331165,50.068893,,5.287067,2.926564,2.668777,11.706014,...,,2958900,5289364,8238,35337405,33497163,6514814,62359,69659502,
2015-05-15,16.573853,8.055215,93.733208,35.491539,49.749981,,5.236268,2.894154,2.635991,11.409656,...,,2967068,10968719,4615,21808416,72000982,4285704,1070468,70445819,


An initial analysis show that some of the assets have missing or insufficient data. Those are excluded.

In [4]:
missing_counts = data.isna().sum()
missing_counts = missing_counts[missing_counts > 0].sort_values(ascending=False)
print(missing_counts)

Close_PUIG.MC     2301
Volume_PUIG.MC    2301
Close_ANE.MC      1574
Volume_ANE.MC     1574
Close_UNI.MC       551
Volume_UNI.MC      551
dtype: int64


## Save the data
The data is stored in a .csv file in a the Data/directory. Note that you should have created the directory before running this cell.

In [5]:
data = data.dropna(axis=1)
data.to_csv('../data/financial_data.csv')

## Summary

- Data collected for 32 IBEX 35 assets over 10+ years.
- Stored `Close` and `Volume` for daily frequency.
- Saved to `../data/financial_data.csv` for further preprocessing.

The next step is to compute log returns and fit distributions, which will be handled in the `Data_exploration_&_volatility.ipynb` notebook.