<a href="https://colab.research.google.com/github/JonatanSiracusa/download-historical-series/blob/main/download_hist_series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Historical prices download


In this notebook we will be downloading the historical series of a list of stocks.

1. Byma´s prices downloaded from Yahoo Finance. 


In order to get the desired results, the next steps must be followed:

1. Open the Excel file named 'tickers.xlsx' located in the same folder of this program: 
	* Complete the `'ticker_byma'` column.
	* Complete the `'ticker_yahoo'` column. 
2. Set the `'start_date'` variable in the section 1 of this program.
3. Set the `'NOMBRE_OUTPUT'` variable in the section 1 of this program. Data series will be saved and named by the value set in this variable.


The next steps will be followed in order to implement the ***Project***:

1. Kick-off: Libraries Importing, Variables Setup and Functions.

2. Data Loading

3. Data Cleaning

4. Data Transformation

5. Results saving


***************************



# 1. Kick-off: Libraries Importing, Variables Setup and Functions

In [1]:
import numpy as np
import pandas as pd
import scipy
import math
import random
import datetime as dt
from datetime import datetime

import yfinance as yf

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

import warnings

In [2]:
start_date = dt.datetime(1994, 1, 1)
end_date = dt.datetime.now()

RUEDAS_ANIO = 252
NOMBRE_OUTPUT_1 = 'historical-Adj_prices-byma'
NOMBRE_OUTPUT_2 = 'historical-Adj_prices_plus-byma'

warnings.simplefilter("ignore")

In [3]:
def ticker_simple_return():
	return list(map(lambda elem: elem + '_sr', tickers))

def ticker_log_return(): 
	return list(map(lambda elem: elem + '_lr', tickers))

def ticker_volat(): 
	return list(map(lambda elem: elem + '_v40', tickers))


In [4]:
def convert_to_dataframe(data):
    """
    Esta función toma una entrada `data` y verifica si es una Serie o un DF.
    Si es una Serie de Pandas, la convierte en un DF y renombre la columna.
    Si es un DF, lo devuelve sin cambios.
    """	
    if isinstance(data, pd.Series):
        # Convertir la Serie a DataFrame
        data = pd.DataFrame(data)
        data.columns.values[0] = tickers[0]
        return data
    elif isinstance(data, pd.DataFrame):
        # Si ya es un DataFrame, no se hace nada
        return data
    else:
        raise ValueError("La entrada no es una pandas.Series ni un pandas.DataFrame.")


def get_volatility(ticker, df):
	"""
	Esta funcion busca en el DF y devuelve la volatilidad de las ultimas 40 ruedas anualizada del ticker ingresado.
	"""
	indice = tickers.index(ticker)
	variable = ticker_volat()[indice]
	valor = df.loc[:, variable].iloc[-1]
	return valor


# 2. Data Loading

In [5]:
# Completamos las listas Tickers y Tickers_name
xlsx = pd.ExcelFile('./tickers.xlsx')
df1 = pd.read_excel(xlsx, 'Hoja1')

tickers = df1.iloc[:, 0].tolist()
tickers_yahoo = df1.iloc[:, 1].tolist()

df1.info()
print(tickers)
print(tickers_yahoo)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   ticker_byma   21 non-null     object
 1   ticker_yahoo  21 non-null     object
dtypes: object(2)
memory usage: 468.0+ bytes
['ALUA', 'BBAR', 'BMA', 'BYMA', 'CEPU', 'COME', 'CRES', 'CVH', 'EDN', 'GGAL', 'LOMA', 'MIRG', 'PAMP', 'SUPV', 'TECO2', 'TGNO4', 'TGSU2', 'TRAN', 'TXAR', 'VALO', 'YPFD']
['ALUA.BA', 'BBAR.BA', 'BMA.BA', 'BYMA.BA', 'CEPU.BA', 'COME.BA', 'CRES.BA', 'CVH.BA', 'EDN.BA', 'GGAL.BA', 'LOMA.BA', 'MIRG.BA', 'PAMP.BA', 'SUPV.BA', 'TECO2.BA', 'TGNO4.BA', 'TGSU2.BA', 'TRAN.BA', 'TXAR.BA', 'VALO.BA', 'YPFD.BA']


In [6]:
# Descargamos las cotizaciones de todos los tickers
data = round(yf.download(tickers_yahoo, start=start_date, end=end_date)['Adj Close'], 2)

data.info()

[*********************100%***********************]  21 of 21 completed

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6184 entries, 2000-01-03 00:00:00+00:00 to 2024-10-21 00:00:00+00:00
Data columns (total 21 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   ALUA.BA   5516 non-null   float64
 1   BBAR.BA   6181 non-null   float64
 2   BMA.BA    6181 non-null   float64
 3   BYMA.BA   1785 non-null   float64
 4   CEPU.BA   6184 non-null   float64
 5   COME.BA   6181 non-null   float64
 6   CRES.BA   6181 non-null   float64
 7   CVH.BA    1739 non-null   float64
 8   EDN.BA    4280 non-null   float64
 9   GGAL.BA   6034 non-null   float64
 10  LOMA.BA   1696 non-null   float64
 11  MIRG.BA   6181 non-null   float64
 12  PAMP.BA   5134 non-null   float64
 13  SUPV.BA   2056 non-null   float64
 14  TECO2.BA  6181 non-null   float64
 15  TGNO4.BA  4375 non-null   float64
 16  TGSU2.BA  6181 non-null   float64
 17  TRAN.BA   6029 non-null   float64
 18  TXAR.BA   6181 non-null   float64
 19  VALO.BA   4147 non-nu




# 3. Data Cleaning

In [7]:
# La descarga trae un indice de fecha. Seteamos el formato de la fecha del indice
data.index = pd.to_datetime(data.index).strftime('%Y-%m-%d')

data = convert_to_dataframe(data)

# Cambiamos los nombres de los tickers que acepta la fuente de datos por los que deseamos
for ticker_y, name in zip(tickers_yahoo, tickers):
	data.rename(columns={ticker_y: name}, inplace=True)

# Agregamos un indice "n" en la columna 1
data.insert(0, 'n', 1, allow_duplicates=False)
data['n'] = data['n'].cumsum()

# Guardo la columna del ticker 1 y la inserto en pos=1
col_1 = data.pop(tickers[0])
data.insert(1, tickers[0], col_1)

# Reemplazamos los missing values y negativos por 0
data.fillna(0, inplace=True)
data[data < 0] = 0

prices_v2 = data.copy()
prices_v2.info()
prices_v2

<class 'pandas.core.frame.DataFrame'>
Index: 6184 entries, 2000-01-03 to 2024-10-21
Data columns (total 22 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   n       6184 non-null   int64  
 1   ALUA    6184 non-null   float64
 2   BBAR    6184 non-null   float64
 3   BMA     6184 non-null   float64
 4   BYMA    6184 non-null   float64
 5   CEPU    6184 non-null   float64
 6   COME    6184 non-null   float64
 7   CRES    6184 non-null   float64
 8   CVH     6184 non-null   float64
 9   EDN     6184 non-null   float64
 10  GGAL    6184 non-null   float64
 11  LOMA    6184 non-null   float64
 12  MIRG    6184 non-null   float64
 13  PAMP    6184 non-null   float64
 14  SUPV    6184 non-null   float64
 15  TECO2   6184 non-null   float64
 16  TGNO4   6184 non-null   float64
 17  TGSU2   6184 non-null   float64
 18  TRAN    6184 non-null   float64
 19  TXAR    6184 non-null   float64
 20  VALO    6184 non-null   float64
 21  YPFD    6184 non-null   flo

Ticker,n,ALUA,BBAR,BMA,BYMA,CEPU,COME,CRES,CVH,EDN,...,MIRG,PAMP,SUPV,TECO2,TGNO4,TGSU2,TRAN,TXAR,VALO,YPFD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2000-01-03,1,0.0,4.11,1.49,0.0,0.0,0.10,0.60,0.0,0.0,...,1.21,0.0,0.0,4.29,0.0,0.82,0.0,0.1,0.0,24.36
2000-01-04,2,0.0,3.91,1.40,0.0,0.0,0.09,0.57,0.0,0.0,...,1.17,0.0,0.0,4.11,0.0,0.80,0.0,0.1,0.0,24.36
2000-01-05,3,0.0,3.96,1.47,0.0,0.0,0.09,0.57,0.0,0.0,...,1.17,0.0,0.0,4.16,0.0,0.79,0.0,0.1,0.0,24.42
2000-01-06,4,0.0,3.98,1.44,0.0,0.0,0.09,0.56,0.0,0.0,...,1.17,0.0,0.0,4.08,0.0,0.79,0.0,0.1,0.0,24.19
2000-01-07,5,0.0,3.82,1.47,0.0,0.0,0.09,0.56,0.0,0.0,...,1.17,0.0,0.0,4.01,0.0,0.81,0.0,0.1,0.0,24.59
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-10-15,6180,849.0,4715.00,8850.00,323.5,1250.0,252.00,1110.00,5000.0,1375.0,...,21800.00,3100.0,1880.0,1920.00,3095.0,4905.00,1795.0,800.0,329.5,30375.00
2024-10-16,6181,843.0,4525.00,8400.00,313.5,1195.0,245.75,1065.00,4995.0,1345.0,...,21950.00,3055.0,1825.0,2030.00,2985.0,4815.00,1775.0,801.0,329.0,29625.00
2024-10-17,6182,838.0,4530.00,8560.00,318.5,1195.0,248.50,1080.00,5090.0,1395.0,...,22150.00,3225.0,1860.0,2055.00,3045.0,4900.00,1820.0,799.0,324.5,30050.00
2024-10-18,6183,835.0,4645.00,8720.00,318.0,1230.0,245.25,1070.00,5080.0,1370.0,...,22825.00,3250.0,1905.0,2020.00,3080.0,5040.00,1810.0,797.0,326.0,29950.00


# 4. Data Transformation

A cada activo le agregamos los siguientes calculos: 
* Rendimiento simple (o discreto), respecto de rueda anterior.
* Rendimiento logaritmico (o continuo), respecto de rueda anterior.
* Volatilidad (o desvio estandar) de las ultimas 40 ruedas, anualizado.

In [8]:
# Comenzamos a transformar los datos con una nueva version del DF
prices_v3 = prices_v2.copy()

In [9]:
# Insertamos los calculos de rendimientos y volatildiad
for i in range(len(tickers)):
	
	# Seleccionamos el ticker
	asset = tickers[i]

	# Buscamos la posicion de la columna del ticker
	pos = prices_v3.columns.get_loc(asset)

	# Asignamos el nombre a la nueva col
	col_sr = ticker_simple_return()[i]
	col_lr = ticker_log_return()[i]
	col_v40 = ticker_volat()[i]

	# Insertamos una columna luego del ticker, con el nombre correspondiente al calculo y el ticker
	prices_v3.insert(pos+1, col_sr, np.nan)
	prices_v3.insert(pos+2, col_lr, np.nan)
	prices_v3.insert(pos+3, col_v40, np.nan)

	# Calculo los Simple Return diarios
	prices_v3[col_sr] = (prices_v3[asset] / prices_v3[asset].shift(1)) - 1

	# Calculo los Log Return diarios
	prices_v3[col_lr] = np.log(prices_v3[asset] / prices_v3[asset].shift(1))

	# Calculo el Desvio St de las ultimas 40 ruedas anualizado
	prices_v3[col_v40] = (prices_v3[col_lr].rolling(window=40).std()) * (RUEDAS_ANIO ** (1/2))

prices_v3.fillna(0, inplace=True)

print(get_volatility('GGAL', prices_v3))
prices_v3

0.384471794450412


Ticker,n,ALUA,ALUA_sr,ALUA_lr,ALUA_v40,BBAR,BBAR_sr,BBAR_lr,BBAR_v40,BMA,...,TXAR_lr,TXAR_v40,VALO,VALO_sr,VALO_lr,VALO_v40,YPFD,YPFD_sr,YPFD_lr,YPFD_v40
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2000-01-03,1,0.0,0.000000,0.000000,0.000000,4.11,0.000000,0.000000,0.000000,1.49,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,24.36,0.000000,0.000000,0.000000
2000-01-04,2,0.0,0.000000,0.000000,0.000000,3.91,-0.048662,-0.049886,0.000000,1.40,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,24.36,0.000000,0.000000,0.000000
2000-01-05,3,0.0,0.000000,0.000000,0.000000,3.96,0.012788,0.012707,0.000000,1.47,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,24.42,0.002463,0.002460,0.000000
2000-01-06,4,0.0,0.000000,0.000000,0.000000,3.98,0.005051,0.005038,0.000000,1.44,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,24.19,-0.009419,-0.009463,0.000000
2000-01-07,5,0.0,0.000000,0.000000,0.000000,3.82,-0.040201,-0.041031,0.000000,1.47,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,24.59,0.016536,0.016401,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-10-15,6180,849.0,-0.012791,-0.012873,0.257710,4715.00,-0.001059,-0.001060,0.466289,8850.00,...,-0.002497,0.296678,329.5,0.024883,0.024579,0.243999,30375.00,0.018441,0.018273,0.371006
2024-10-16,6181,843.0,-0.007067,-0.007092,0.257776,4525.00,-0.040297,-0.041131,0.478400,8400.00,...,0.001249,0.295601,329.0,-0.001517,-0.001519,0.243925,29625.00,-0.024691,-0.025001,0.377042
2024-10-17,6182,838.0,-0.005931,-0.005949,0.246935,4530.00,0.001105,0.001104,0.472552,8560.00,...,-0.002500,0.291968,324.5,-0.013678,-0.013772,0.243597,30050.00,0.014346,0.014244,0.372015
2024-10-18,6183,835.0,-0.003580,-0.003586,0.234305,4645.00,0.025386,0.025069,0.448260,8720.00,...,-0.002506,0.291270,326.0,0.004622,0.004612,0.243649,29950.00,-0.003328,-0.003333,0.368471


# 5. Results Saving

Guardamos en formato .csv y .xlsx la serie de datos solo con cotizaciones y la serie de datos con cotizaciones y calculos de rendimientos y volatilidad.

In [10]:
# Guardamos en un .csv
prices_v2.to_csv('./' + NOMBRE_OUTPUT_1 + '.csv')

# Guardamos en un .xlsx
with pd.ExcelWriter('./' + NOMBRE_OUTPUT_1 + '.xlsx') as writer:
	prices_v2.to_excel(writer, sheet_name='Sheet1', index=True)


# Guardamos en un .csv
prices_v3.to_csv('./' + NOMBRE_OUTPUT_2 + '.csv')

# Guardamos en un .xlsx
with pd.ExcelWriter('./' + NOMBRE_OUTPUT_2 + '.xlsx') as writer:
	prices_v3.to_excel(writer, sheet_name='Sheet1', index=True)


print('Data has already been downloaded.')

Data has already been downloaded.
