<a href="https://colab.research.google.com/github/JonatanSiracusa/download-historical-series/blob/main/download_hist_series.ipynb" target="_blank">
	<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Historical prices download


In this notebook we will be downloading the historical series of a list of stocks.

1. Byma´s prices downloaded from Yahoo Finance. 


In order to get the desired results, the next steps must be followed:

1. Open the Excel file named 'tickers.xlsx' located in the same folder of this program: 
	* Complete the `'ticker_byma'` column.
	* Complete the `'ticker_yahoo'` column. 
2. Set the `'start_date'` variable in the section 1 of this program.
3. Set the `'NOMBRE_OUTPUT'` variable in the section 1 of this program. Data series will be saved and named by the value set in this variable.


The next steps will be followed in order to implement the ***Project***:

1. Kick-off: Libraries Importing, Variables Setup and Functions.

2. Data Loading

3. Data Cleaning

4. Data Transformation

5. Results saving


***************************



# 1. Kick-off: Libraries Importing, Variables Setup and Functions

In [11]:
import numpy as np
import pandas as pd
import time
import datetime as dt
from datetime import datetime

import yfinance as yf

import warnings

In [12]:
# medimos el tiempo de ejecucion del programa
star_time = time.time()

start_date = dt.datetime(1994, 1, 1)
end_date = dt.datetime.now()

RUEDAS_ANIO = 252
NOMBRE_OUTPUT_1 = 'historical-Adj_prices-byma'
NOMBRE_OUTPUT_2 = 'historical-Adj_prices_plus-byma'
EXPORTAR_DATOS = False

warnings.simplefilter("ignore")

In [13]:
def ticker_simple_return():
	return list(map(lambda elem: elem + '_sr', tickers))

def ticker_log_return(): 
	return list(map(lambda elem: elem + '_lr', tickers))

def ticker_volat(): 
	return list(map(lambda elem: elem + '_v40', tickers))


In [14]:
def convert_to_dataframe(data):
    """
    Esta función toma una entrada `data` y verifica si es una Serie o un DF.
    Si es una Serie de Pandas, la convierte en un DF y renombre la columna.
    Si es un DF, lo devuelve sin cambios.
    """	
    if isinstance(data, pd.Series):
        # Convertir la Serie a DataFrame
        data = pd.DataFrame(data)
        data.columns.values[0] = tickers[0]
        return data
    elif isinstance(data, pd.DataFrame):
        # Si ya es un DataFrame, no se hace nada
        return data
    else:
        raise ValueError("La entrada no es una pandas.Series ni un pandas.DataFrame.")


def get_volatility(ticker, df):
	"""
	Esta funcion busca en el DF y devuelve la volatilidad de las ultimas 40 ruedas anualizada del ticker ingresado.
	"""
	indice = tickers.index(ticker)
	variable = ticker_volat()[indice]
	valor = df.loc[:, variable].iloc[-1]
	return valor


# 2. Data Loading

In [15]:
# Completamos las listas Tickers y Tickers_name
xlsx = pd.ExcelFile('./tickers.xlsx')
df1 = pd.read_excel(xlsx, 'Hoja1')

tickers = df1.iloc[:, 0].tolist()
tickers_yahoo = df1.iloc[:, 1].tolist()

df1.info()
print(tickers)
print(tickers_yahoo)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   ticker_byma   22 non-null     object
 1   ticker_yahoo  22 non-null     object
dtypes: object(2)
memory usage: 484.0+ bytes
['Index', 'ALUA', 'BBAR', 'BMA', 'BYMA', 'CEPU', 'COME', 'CRES', 'CVH', 'EDN', 'GGAL', 'LOMA', 'MIRG', 'PAMP', 'SUPV', 'TECO2', 'TGNO4', 'TGSU2', 'TRAN', 'TXAR', 'VALO', 'YPFD']
['^MERV', 'ALUA.BA', 'BBAR.BA', 'BMA.BA', 'BYMA.BA', 'CEPU.BA', 'COME.BA', 'CRES.BA', 'CVH.BA', 'EDN.BA', 'GGAL.BA', 'LOMA.BA', 'MIRG.BA', 'PAMP.BA', 'SUPV.BA', 'TECO2.BA', 'TGNO4.BA', 'TGSU2.BA', 'TRAN.BA', 'TXAR.BA', 'VALO.BA', 'YPFD.BA']


In [16]:
# Descargamos las cotizaciones de todos los tickers
data = round(yf.download(tickers_yahoo, start=start_date, end=end_date)['Adj Close'], 2)

data.info()

[*********************100%***********************]  22 of 22 completed

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7006 entries, 1996-10-08 00:00:00+00:00 to 2024-11-14 00:00:00+00:00
Data columns (total 22 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   ALUA.BA   5534 non-null   float64
 1   BBAR.BA   6199 non-null   float64
 2   BMA.BA    6199 non-null   float64
 3   BYMA.BA   1803 non-null   float64
 4   CEPU.BA   6202 non-null   float64
 5   COME.BA   6199 non-null   float64
 6   CRES.BA   6199 non-null   float64
 7   CVH.BA    1757 non-null   float64
 8   EDN.BA    4298 non-null   float64
 9   GGAL.BA   6052 non-null   float64
 10  LOMA.BA   1714 non-null   float64
 11  MIRG.BA   6199 non-null   float64
 12  PAMP.BA   5152 non-null   float64
 13  SUPV.BA   2074 non-null   float64
 14  TECO2.BA  6199 non-null   float64
 15  TGNO4.BA  4393 non-null   float64
 16  TGSU2.BA  6199 non-null   float64
 17  TRAN.BA   6047 non-null   float64
 18  TXAR.BA   6199 non-null   float64
 19  VALO.BA   4165 non-nu




# 3. Data Cleaning

In [17]:
# La descarga trae un indice de fecha. Seteamos el formato de la fecha del indice
data.index = pd.to_datetime(data.index).strftime('%Y-%m-%d')

data = convert_to_dataframe(data)

# Cambiamos los nombres de los tickers que acepta la fuente de datos por los que deseamos
for ticker_y, name in zip(tickers_yahoo, tickers):
	data.rename(columns={ticker_y: name}, inplace=True)

# Agregamos un indice "n" en la columna 1
data.insert(0, 'n', 1, allow_duplicates=False)
data['n'] = data['n'].cumsum()

# Guardo la columna del ticker 1 y la inserto en pos=1
col_1 = data.pop(tickers[0])
data.insert(1, tickers[0], col_1)

# Reemplazamos los missing values y negativos por 0
data.fillna(0, inplace=True)
data[data < 0] = 0

prices_v2 = data.copy()
prices_v2.info()
prices_v2

<class 'pandas.core.frame.DataFrame'>
Index: 7006 entries, 1996-10-08 to 2024-11-14
Data columns (total 23 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   n       7006 non-null   int64  
 1   Index   7006 non-null   float64
 2   ALUA    7006 non-null   float64
 3   BBAR    7006 non-null   float64
 4   BMA     7006 non-null   float64
 5   BYMA    7006 non-null   float64
 6   CEPU    7006 non-null   float64
 7   COME    7006 non-null   float64
 8   CRES    7006 non-null   float64
 9   CVH     7006 non-null   float64
 10  EDN     7006 non-null   float64
 11  GGAL    7006 non-null   float64
 12  LOMA    7006 non-null   float64
 13  MIRG    7006 non-null   float64
 14  PAMP    7006 non-null   float64
 15  SUPV    7006 non-null   float64
 16  TECO2   7006 non-null   float64
 17  TGNO4   7006 non-null   float64
 18  TGSU2   7006 non-null   float64
 19  TRAN    7006 non-null   float64
 20  TXAR    7006 non-null   float64
 21  VALO    7006 non-null   flo

Ticker,n,Index,ALUA,BBAR,BMA,BYMA,CEPU,COME,CRES,CVH,...,MIRG,PAMP,SUPV,TECO2,TGNO4,TGSU2,TRAN,TXAR,VALO,YPFD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1996-10-08,1,590.00,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1996-10-09,2,583.00,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1996-10-10,3,585.00,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1996-10-11,4,584.00,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1996-10-14,5,584.00,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-11-08,7002,1964487.00,802.0,5900.0,9200.0,324.5,1345.0,264.50,1170.0,5700.0,...,23375.0,3245.0,2245.0,2670.0,3360.0,5400.0,2030.0,764.0,367.5,33675.0
2024-11-11,7003,1987846.00,811.0,5980.0,9170.0,327.5,1365.0,261.00,1160.0,5640.0,...,23625.0,3300.0,2270.0,2605.0,3380.0,5480.0,2020.0,789.0,369.5,34675.0
2024-11-12,7004,2012041.00,811.0,6120.0,9260.0,336.5,1430.0,250.50,1180.0,5700.0,...,23750.0,3360.0,2305.0,2515.0,3495.0,5680.0,2070.0,790.0,373.5,34850.0
2024-11-13,7005,2042551.00,810.0,6180.0,9190.0,335.0,1520.0,246.00,1230.0,5780.0,...,23750.0,3435.0,2315.0,2655.0,3560.0,5960.0,2125.0,790.0,373.0,35550.0


# 4. Data Transformation

A cada activo le agregamos los siguientes calculos: 
* Rendimiento simple (o discreto), respecto de rueda anterior.
* Rendimiento logaritmico (o continuo), respecto de rueda anterior.
* Volatilidad (o desvio estandar) de las ultimas 40 ruedas, anualizado.

In [18]:
# Comenzamos a transformar los datos con una nueva version del DF
prices_v3 = prices_v2.copy()

In [19]:
# Insertamos los calculos de rendimientos y volatildiad
for i in range(len(tickers)):
	
	# Seleccionamos el ticker
	asset = tickers[i]

	# Buscamos la posicion de la columna del ticker
	pos = prices_v3.columns.get_loc(asset)

	# Asignamos el nombre a la nueva col
	col_sr = ticker_simple_return()[i]
	col_lr = ticker_log_return()[i]
	col_v40 = ticker_volat()[i]

	# Insertamos una columna luego del ticker, con el nombre correspondiente al calculo y el ticker
	prices_v3.insert(pos+1, col_sr, np.nan)
	prices_v3.insert(pos+2, col_lr, np.nan)
	prices_v3.insert(pos+3, col_v40, np.nan)

	# Calculo los Simple Return diarios
	prices_v3[col_sr] = (prices_v3[asset] / prices_v3[asset].shift(1)) - 1

	# Calculo los Log Return diarios
	prices_v3[col_lr] = np.log(prices_v3[asset] / prices_v3[asset].shift(1))

	# Calculo el Desvio St de las ultimas 40 ruedas anualizado
	prices_v3[col_v40] = (prices_v3[col_lr].rolling(window=40).std()) * (RUEDAS_ANIO ** (1/2))

prices_v3.fillna(0, inplace=True)

print(get_volatility('GGAL', prices_v3))
prices_v3

0.34325317274429545


Ticker,n,Index,Index_sr,Index_lr,Index_v40,ALUA,ALUA_sr,ALUA_lr,ALUA_v40,BBAR,...,TXAR_lr,TXAR_v40,VALO,VALO_sr,VALO_lr,VALO_v40,YPFD,YPFD_sr,YPFD_lr,YPFD_v40
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1996-10-08,1,590.00,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000
1996-10-09,2,583.00,-0.011864,-0.011935,0.000000,0.0,0.000000,0.000000,0.000000,0.0,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000
1996-10-10,3,585.00,0.003431,0.003425,0.000000,0.0,0.000000,0.000000,0.000000,0.0,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000
1996-10-11,4,584.00,-0.001709,-0.001711,0.000000,0.0,0.000000,0.000000,0.000000,0.0,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000
1996-10-14,5,584.00,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-11-08,7002,1964487.00,-0.025338,-0.025665,0.252490,802.0,-0.047506,-0.048671,0.280893,5900.0,...,-0.022007,0.274744,367.5,0.011004,0.010944,0.214380,33675.0,-0.008830,-0.008869,0.369220
2024-11-11,7003,1987846.00,0.011891,0.011820,0.251469,811.0,0.011222,0.011159,0.282781,5980.0,...,0.032199,0.287412,369.5,0.005442,0.005427,0.214107,34675.0,0.029696,0.029263,0.374726
2024-11-12,7004,2012041.00,0.012171,0.012098,0.252336,811.0,0.000000,0.000000,0.281563,6120.0,...,0.001267,0.279490,373.5,0.010825,0.010767,0.212373,34850.0,0.005047,0.005034,0.372687
2024-11-13,7005,2042551.00,0.015164,0.015050,0.254170,810.0,-0.001233,-0.001234,0.281391,6180.0,...,0.000000,0.279543,373.0,-0.001339,-0.001340,0.212043,35550.0,0.020086,0.019887,0.372523


# 5. Results Saving

Guardamos en formato .csv y .xlsx la serie de datos solo con cotizaciones y la serie de datos con cotizaciones y calculos de rendimientos y volatilidad.

In [20]:
if EXPORTAR_DATOS:
	# Guardamos en un .csv
	prices_v2.to_csv('./' + NOMBRE_OUTPUT_1 + '.csv')

	# Guardamos en un .xlsx
	with pd.ExcelWriter('./' + NOMBRE_OUTPUT_1 + '.xlsx') as writer:
		prices_v2.to_excel(writer, sheet_name='Sheet1', index=True)


	# Guardamos en un .csv
	prices_v3.to_csv('./' + NOMBRE_OUTPUT_2 + '.csv')

	# Guardamos en un .xlsx
	with pd.ExcelWriter('./' + NOMBRE_OUTPUT_2 + '.xlsx') as writer:
		prices_v3.to_excel(writer, sheet_name='Sheet1', index=True)


		ahora = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
		print(f'Data has already been exported. Date: {ahora}.')

else:
	ahora = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
	print(f'No data has been exported. Date: {ahora}.')


end_time = time.time()
execution_time = end_time - star_time
print(f'\nExecution time: {round(execution_time, 2)} seconds.')

No data has been exported. Date: 2024-11-14 20:01:52.

Execution time: 4.14 seconds.
