<div style="text-align: center; font-family: 'charter bt pro roman'; color: rgb(0, 65, 75);">
    <h1>
    GDP Intermediate Revisions and Horizon Datasets
    </h1>
</div>

<div style="text-align: center; font-family: 'charter bt pro roman'; color: rgb(0, 65, 75);">
<h3>
Documentation
<br>
____________________
<br>
</h3>
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    This 
    <span style="color: rgb(0, 65, 75);">jupyter notebook</span>
    provides a step-by-step guide to <b>data building</b> regarding the project <b>'Revisiones y sesgos en las estimaciones preliminares del PBI en el Perú'</b>. This guide covers the creation of GDP mid-term revision dataset for each sector. A key step is the construction at par of what we will call “The ‘t+h’ structure”. This dataset is similar to that of the GDP growth vintages by sector, but instead of growth rate values, it contains values of type “t+h”, where h indicates how many months have passed since the preliminary growth rate was first published; that is, this jupyter notebook also covers the creation of vintages datasets of growth rates associated with a horizon (<b>h</b>).
</div>

<div style="text-align: center; font-family: 'PT Serif Pro Book'; color: rgb(0, 65, 75); font-size: 16px;">
    Jason Cruz
    <br>
    <a href="mailto:jj.cruza@up.edu.pe" style="color: rgb(0, 153, 123); font-size: 16px;">
        jj.cruza@up.edu.pe
    </a>
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;line-height: 1.5;">
<span style="font-size: 34px;">&#128452;</span> The 't+h' structure should be available for all sectors and frequencies.
    <br>
    <span style="font-size: 24px;">&#8987;</span> Available since <b>1994-2024</b> (Table 1) and since <b>1997-2024</b> (Table 2). 
    <br>
</div>

<div style="font-family: Amaya; text-align: left; color: rgb(0, 65, 75); font-size:16px">The following <b>outline is functional</b>. By utilising the provided buttons, users are able to enhance their experience by browsing this script.<div/>

<div id="outilne">
   <!-- Contenido de la celda de destino -->
</div>

<div style="background-color: #292929; padding: 10px; line-height: 1.5; font-family: 'PT Serif Pro Book';">
    <h2 style="text-align: left; color: #E0E0E0;">
        Outline
    </h2>
    <br>
    <a href="#libraries" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        Libraries</a>
    <br>
    <a href="#setup" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        Initial set-up</a>
    <br>
    <a href="#1" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        1. Economic sector selector</a>
    <br>
    <a href="#2" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        2. Create horizon datasets</a>
    <br>
    <a href="#2.1." style="color: #94FFD8; font-size: 16px; margin-left: 20px;">
        2.1. Loading growth rate datasets from postgresql.</a>
    <br>
    <a href="#2.2." style="color: #94FFD8; font-size: 16px; margin-left: 20px;">
        2.2. Functions for creating horizon and intermediate revisions datasets.</a>
    <br>
    <a href="#2.3." style="color: #94FFD8; font-size: 16px; margin-left: 20px;">
        2.3. Creating horizon dataset step by step.</a> 
    <br>
    <a href="#3" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        3. Create intermediate revisions datasets</a>
    <br>
    <a href="#3.1." style="color: #94FFD8; font-size: 16px; margin-left: 20px;">
        3.1. Functions for creating intermediate revisions dataset.</a>
    <br>
    <a href="#3.2." style="color: #94FFD8; font-size: 16px; margin-left: 20px;">
        3.2. Creating intermediate revisions dataset step by step.</a>
    <br>
    <a href="#3.3." style="color: #94FFD8; font-size: 16px; margin-left: 20px;">
        3.3. Clean-up intermediate revisions dataset (last version to be loaded to SQL).</a>
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Any questions or issues regarding the coding, please email Jason Cruz <a href="mailto:jj.cruza@alum.up.edu.pe" style="color: rgb(0, 153, 123); text-decoration: none;"><span style="font-size: 24px;">&#x2709;</span>
    </a>.
    <div/>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    If you don't have the libraries below, please use the following code (as example) to install the required libraries.
    <div/>

In [None]:
#!pip install os # Comment this code with "#" if you have already installed this library.

<div id="libraries">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'charter'; color: dark;">
    <h2>
    Libraries
    </h2>
    <div/>

In [1]:
# POSTGRESSQL
import os
from sqlalchemy import create_engine

# HORIZON DATASETS
import pandas as pd
import numpy as np
import re


<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="setup">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark;">
    <h2>
    Initial set-up
    </h2>
    <div/>

<p style="font-family: PT Serif Pro Book; text-align: left; color:dark; font-size:16px"> The following function will establish a connection to the <code>gdp_revisions_datasets</code> database in <code>PostgreSQL</code>. The <b>input data</b> used in this jupyter notebook will be loaded from this <code>PostgreSQL</code> database, and similarly, all <b>output data</b> generated by this jupyter notebook will be stored in that database. Ensure that you set the necessary parameters to access the server once you have obtained the required permissions.<p/>
    
<p style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
To request permissions, please email Jason Cruz <a href="mailto:jj.cruza@alum.up.edu.pe" style="color: rgb(0, 153, 123); text-decoration: none;"> <span style="font-size: 24px;">&#x2709;</span>
    </a>.
<p/>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    <span style="font-size: 24px; color: #FFA823; font-weight: bold;">&#9888;</span>
    Enter your user credentials to acces to SQL.
    <div/>

In [2]:
def create_sqlalchemy_engine():
    """
    Function to create an SQLAlchemy engine using environment variables.
    
    Returns:
        engine: SQLAlchemy engine object.
    """
    # Get environment variables
    user = os.environ.get('CIUP_SQL_USER')  # Get the SQL user from environment variables
    password = os.environ.get('CIUP_SQL_PASS')  # Get the SQL password from environment variables
    host = os.environ.get('CIUP_SQL_HOST')  # Get the SQL host from environment variables
    port = 5432  # Set the SQL port to 5432
    database = 'gdp_revisions_datasets'  # Set the database name 'gdp_revisions_datasets' from SQL

    # Check if all environment variables are defined
    if not all([host, user, password]):
        raise ValueError("Some environment variables are missing (CIUP_SQL_HOST, CIUP_SQL_USER, CIUP_SQL_PASS)")

    # Create connection string
    connection_string = f"postgresql://{user}:{password}@{host}:{port}/{database}"

    # Create SQLAlchemy engine
    engine = create_engine(connection_string)
    
    return engine

<div style="text-align: left;">
    <span style="font-size: 24px; color: rgb(255, 32, 78); font-weight: bold;">&#9888;</span>
    <span style="font-family: PT Serif Pro Book; color: black; font-size: 16px;">
        Import all other functions required by this jupyter notebook.
    </span>
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color:dark; font-size:16px"> Please, check the script <code>gdp_inter_revisions_datasets_functions.py</code> which contains all the functions required by this jupyter notebook. The functions there are ordered according to the <a href="#outilne" style="color: #3d30a2;">sections</a> of this jupyter notebok.<div/>

In [3]:
from gdp_inter_revisions_datasets_functions import *

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="1">
   <!-- Contenido de la celda de destino -->
</div>

<h1><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;; color: dark;">1.</span> <span style = "color: dark; font-family: PT Serif Pro Book;">Economic sector and data frequency selector</span></h1>

<h2><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;; color: dark;">1.1.</span> <span style = "color: dark; font-family: PT Serif Pro Book;">Economic sector</span></h2>

In [4]:
# Call the function to show the popup window
sector = show_option_window()
print("Selected economic sector:", sector)

Selected economic sector: gdp


<h2><span style = "color: rgb(0, 65, 75); font-family: charter;">1.2. </span> <span style = "color: dark; font-family: charter;">Frequency</span></h2>

In [5]:
# Call the function to show the popup window
frequency = show_frequency_window()
print("Selected frequency:", frequency)

Selected frequency: quarterly


<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="2">
   <!-- Contenido de la celda de destino -->
</div>

<h1><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;; color: dark;">2.</span> <span style = "color: dark; font-family: PT Serif Pro Book;">Create horizon datasets</span></h1>

<div id="2.1.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: charter;">2.1. </span> <span style = "color: dark; font-family: charter;">Loading growth rate datasets from <code>PostgresSQL</code></span></h2>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Connect to SQL.
    <div/>

In [62]:
engine = create_sqlalchemy_engine()

In [63]:
# SQL Query
query = f"SELECT * FROM {sector}_{frequency}_growth_rates;"

# Load data into DataFrame
globals()[f'{sector}_{frequency}_growth_rates'] = pd.read_sql(query, engine)

In [64]:
#pd.set_option('display.max_columns', None)

In [65]:
globals()[f'{sector}_{frequency}_growth_rates'].head(40)

Unnamed: 0,year,id_ns,date,2010_1,2010_2,2010_3,2010_4,2011_1,2011_2,2011_3,2011_4,2012_1,2012_2,2012_3,2012_4,2013_1,2013_2,2013_3,2013_4,2014_1,2014_2,2014_3,2014_4,2015_1,2015_2,2015_3,2015_4,2016_1,2016_2,2016_3,2016_4,2017_1,2017_2,2017_3,2017_4,2018_1,2018_2,2018_3,2018_4,2019_1,2019_2,2019_3,2019_4,2020_1,2020_2,2020_3,2020_4,2021_1,2021_2,2021_3,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4
0,2013,1,2013-01-04,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2013,2,2013-01-11,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,2013,3,2013-01-18,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,2013,4,2013-01-25,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,2013,5,2013-02-01,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,2013,6,2013-02-08,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,2013,7,2013-02-15,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,2013,8,2013-02-22,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.4,6.8,5.9,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,2013,9,2013-03-01,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.4,6.8,5.9,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,2013,10,2013-03-08,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.4,6.8,5.9,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="2.2.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: charter;">2.2.</span>
    <span style = "color: dark; font-family: charter;">
    Functions for creating horizon dataset
    </span>
    </h2>

<div style="font-family: charter; text-align: left; color:dark">
    <b> Setting main horizon rows on growth rates datasets</b>
    <div/>

In [66]:
def replace_horizon(df, start_row):
    # Cast DataFrame to object dtype to allow storing strings
    df = df.astype(object)

    def replace_row(row, last_non_nan_indices):
        new_row = row.copy()
        last_non_nan_index = row.last_valid_index()
        h = 3
        if last_non_nan_indices:
            if last_non_nan_indices[-1] != last_non_nan_index:
                new_row[last_non_nan_index] = "t+1"
                h = 1
            else:
                h += 1
        else:
            new_row[last_non_nan_index] = "t+1"
        for i in range(len(row) - 1, -1, -1):
            if pd.notnull(row.iloc[i]) and (not last_non_nan_indices or last_non_nan_indices[-1] != last_non_nan_index):
                new_row.iloc[i] = f"t+{h}"
                h += 3
        last_non_nan_indices.append(last_non_nan_index)
        return new_row

    first_part = df.iloc[:start_row]
    last_non_nan_indices = []
    second_part = df.iloc[start_row:].apply(lambda x: replace_row(x, last_non_nan_indices), axis=1)
    return pd.concat([first_part, second_part])

<div style="font-family: charter; text-align: left; color:dark">
    <b> Converting columns to string type</b>
    <div/>

In [67]:
def columns_str(df):
    # Aplicar la conversión a partir de la cuarta columna del dataframe
    return df.apply(lambda x: x if x.name in df.columns[:3] else x.map(lambda y: str(y) if pd.notnull(y) else ''))

<div style="font-family: charter; text-align: left; color:dark">
    <b>Filling the rest of the rows with horizon t+h </b>
    <div/>

In [68]:
def replace_horizon_1(df):
    # Obtener el nombre de las columnas que no deben ser procesadas
    excluded_columns = ['year', 'id_ns', 'date', 'month']

    # Iterar sobre cada columna
    for col in df.columns:
        # Ignorar las columnas excluidas
        if col in excluded_columns:
            continue

        # Diccionario para almacenar los valores 't+h' por cada valor de 'date' para esta columna
        dict_columna = {}
        last_t_h_date = None
        last_t_h_value = None

        # Iterar sobre cada fila en la columna actual
        for i, val in enumerate(df[col]):
            # Si el valor es de la forma 't+h', almacenar su correspondiente valor de 'date'
            if re.match(r't\+\d+', str(val)):
                date_val = df.at[i, 'date'].strftime('%Y-%m')  # Obtener el año y mes de la fecha
                dict_columna[date_val] = val
                last_t_h_date = df.at[i, 'date']
                last_t_h_value = val

        # Iterar sobre cada fila en la columna actual para reemplazar los valores
        for i, val in enumerate(df[col]):
            # Si el valor no es NaN, vacío (''), o contiene la expresión 't+h', dejarlo intacto
            if not re.match(r't\+\d+', str(val)) and val != '':
                date_val = df.at[i, 'date'].strftime('%Y-%m')  # Obtener el año y mes de la fecha
                if date_val in dict_columna:
                    df.at[i, col] = dict_columna[date_val]

        # Tercera iteración para la lógica adicional
        for i, val in enumerate(df[col]):
            if re.match(r'[-+]?\d+\.\d+', str(val)) and val != '':
                date_val = df.at[i, 'date'].strftime('%Y-%m')
                if date_val not in dict_columna and last_t_h_date:
                    current_date = df.at[i, 'date']
                    if current_date.year == last_t_h_date.year:
                        month_diff = current_date.month - last_t_h_date.month
                        if month_diff > 0:
                            h = int(re.search(r'\d+', last_t_h_value).group()) + month_diff
                            df.at[i, col] = f't+{h}'

        # Revisión adicional para valores con el mismo mes y año
        for i in range(len(df[col])):
            if re.match(r'[-+]?\d+\.\d+', str(df.at[i, col])) and df.at[i, col] != '':
                current_date = df.at[i, 'date']
                if last_t_h_date and current_date.year == last_t_h_date.year:
                    for j in range(i - 1, -1, -1):
                        prev_date = df.at[j, 'date']
                        if current_date.month == prev_date.month:
                            if re.match(r't\+\d+', str(df.at[j, col])):
                                h = int(re.search(r'\d+', df.at[j, col]).group())
                                df.at[i, col] = f't+{h}'
                                break
                        else:
                            month_diff = current_date.month - prev_date.month
                            if month_diff == 1 or month_diff == 2:
                                if re.match(r't\+\d+', str(df.at[j, col])):
                                    h = int(re.search(r'\d+', df.at[j, col]).group()) + month_diff
                                    df.at[i, col] = f't+{h}'
                                    break

    return df

<div style="font-family: charter; text-align: left; color:dark">
    <b>Filling rare values with horizon t+h </b>
    <div/>

In [69]:
def replace_horizon_2(df):
    # Función para buscar el último valor 't+h' encima de una secuencia de valores '\d+.\d+'
    def find_last_t_plus_h(columna, indice):
        for i in range(indice - 1, -1, -1):
            if re.match(r't\+\d+', str(df.iloc[i, columna])):
                return df.iloc[i, columna]
        return None

    # Iterar sobre cada columna del DataFrame
    for columna in df.columns:
        ultimo_t_mas_h = None
        for indice, valor in df[columna].items():  # Change iteritems() to items()
            if re.match(r'[-+]?\d+\.\d+', str(valor)):  # Modify the regex pattern to include an optional sign
                # Buscar el último valor 't+h' encima de la secuencia de valores '\d+.\d+'
                if ultimo_t_mas_h is None:
                    ultimo_t_mas_h = find_last_t_plus_h(df.columns.get_loc(columna), indice)
                if ultimo_t_mas_h is not None:  # Check if ultimo_t_mas_h is not None
                    # Extraer el dígito de 't+h' y sumarle una unidad
                    nuevo_h = int(re.search(r'\d+', ultimo_t_mas_h).group()) + 1
                    # Construir el nuevo valor 't+(h+1)'
                    nuevo_valor = re.sub(r'[-+]?\d+\.\d+', f't+{nuevo_h}', str(valor))
                    # Reemplazar el valor en el DataFrame
                    df.at[indice, columna] = nuevo_valor
            else:
                ultimo_t_mas_h = None

    return df

<div style="font-family: charter; text-align: left; color:dark">
    <b>Exporting to excel file </b>
    <div/>

In [70]:
#def export_to_excel(df, filename):
#    # Exportar el DataFrame como un archivo Excel
#    df.to_excel(filename, index=False)

<div style="color: rgb(61, 48, 162); font-size: 12px;">
    Back to the
    <a href="#outilne" style="color: #687EFF;">
    outline.
    </a>
    <div/>

<div id="2.3.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: charter;">2.3.</span>
    <span style = "color: dark; font-family: charter;">
    Creating horizon dataset step by step
    </span>
    </h2>

<h3 style = "color: dark; font-family: charter;">
    <b>0.</b> Choose the row to start replacing
  </h3>

In [71]:
start_row = 0

In [72]:
pd.set_option('display.max_columns', None)

<h3 style = "color: dark; font-family: charter;">
    <b>1.</b> Reemplazara los valores decimales por valores "t+h" solo en las filas que representen un nuevo escalón
  </h3>

In [73]:
globals()[f'{sector}_{frequency}_growth_rates_horizon'] = replace_horizon(globals()[f'{sector}_{frequency}_growth_rates'].iloc[:, 3:], start_row)

In [74]:
globals()[f'{sector}_{frequency}_growth_rates_horizon'].head(30)

Unnamed: 0,2010_1,2010_2,2010_3,2010_4,2011_1,2011_2,2011_3,2011_4,2012_1,2012_2,2012_3,2012_4,2013_1,2013_2,2013_3,2013_4,2014_1,2014_2,2014_3,2014_4,2015_1,2015_2,2015_3,2015_4,2016_1,2016_2,2016_3,2016_4,2017_1,2017_2,2017_3,2017_4,2018_1,2018_2,2018_3,2018_4,2019_1,2019_2,2019_3,2019_4,2020_1,2020_2,2020_3,2020_4,2021_1,2021_2,2021_3,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4
0,t+33,t+30,t+27,t+24,t+21,t+18,t+15,t+12,t+9,t+6,t+3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,t+34,t+31,t+28,t+25,t+22,t+19,t+16,t+13,t+10,t+7,t+4,t+1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.4,6.8,5.9,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.4,6.8,5.9,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


<h3 style = "color: dark; font-family: charter;">
    <b>2. Concatenate first 3 columns: year, date, id_ns</b>
  </h3>

In [75]:
# Obtener las tres primeras columnas del DataFrame original
first_3_columns = globals()[f'{sector}_{frequency}_growth_rates'].iloc[:, :3]

# Concatenar las tres primeras columnas con h_gdp_monthly_growth_rates
globals()[f'{sector}_{frequency}_growth_rates_horizon'] = pd.concat([first_3_columns, globals()[f'{sector}_{frequency}_growth_rates_horizon']], axis=1)

In [76]:
print(globals()[f'{sector}_{frequency}_growth_rates_horizon'].head(40))

    year id_ns       date 2010_1 2010_2 2010_3 2010_4 2011_1 2011_2 2011_3  \
0   2013    01 2013-01-04   t+33   t+30   t+27   t+24   t+21   t+18   t+15   
1   2013    02 2013-01-11    6.2   10.0    9.6    9.2    8.8    6.9    6.7   
2   2013    03 2013-01-18    6.2   10.0    9.6    9.2    8.8    6.9    6.7   
3   2013    04 2013-01-25    6.2   10.0    9.6    9.2    8.8    6.9    6.7   
4   2013    05 2013-02-01    6.2   10.0    9.6    9.2    8.8    6.9    6.7   
5   2013    06 2013-02-08    6.2   10.0    9.6    9.2    8.8    6.9    6.7   
6   2013    07 2013-02-15    6.2   10.0    9.6    9.2    8.8    6.9    6.7   
7   2013    08 2013-02-22   t+34   t+31   t+28   t+25   t+22   t+19   t+16   
8   2013    09 2013-03-01    6.2   10.0    9.6    9.2    8.8    6.9    6.7   
9   2013    10 2013-03-08    6.2   10.0    9.6    9.2    8.8    6.9    6.7   
10  2013    11 2013-03-15    6.2   10.0    9.6    9.2    8.8    6.9    6.7   
11  2013    12 2013-03-22    6.2   10.0    9.6    9.2    8.8    

<h3 style = "color: dark; font-family: charter;">
    <b>3. Convert columns to string type</b>
  </h3>

In [77]:
globals()[f'{sector}_{frequency}_growth_rates_horizon'] = columns_str(globals()[f'{sector}_{frequency}_growth_rates_horizon'])
globals()[f'{sector}_{frequency}_growth_rates_horizon'].head(10)

Unnamed: 0,year,id_ns,date,2010_1,2010_2,2010_3,2010_4,2011_1,2011_2,2011_3,2011_4,2012_1,2012_2,2012_3,2012_4,2013_1,2013_2,2013_3,2013_4,2014_1,2014_2,2014_3,2014_4,2015_1,2015_2,2015_3,2015_4,2016_1,2016_2,2016_3,2016_4,2017_1,2017_2,2017_3,2017_4,2018_1,2018_2,2018_3,2018_4,2019_1,2019_2,2019_3,2019_4,2020_1,2020_2,2020_3,2020_4,2021_1,2021_2,2021_3,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4
0,2013,1,2013-01-04,t+33,t+30,t+27,t+24,t+21,t+18,t+15,t+12,t+9,t+6,t+3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2013,2,2013-01-11,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,2013,3,2013-01-18,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,2013,4,2013-01-25,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,2013,5,2013-02-01,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,2013,6,2013-02-08,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,2013,7,2013-02-15,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.3,6.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,2013,8,2013-02-22,t+34,t+31,t+28,t+25,t+22,t+19,t+16,t+13,t+10,t+7,t+4,t+1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,2013,9,2013-03-01,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.4,6.8,5.9,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,2013,10,2013-03-08,6.2,10.0,9.6,9.2,8.8,6.9,6.7,5.5,6.0,6.4,6.8,5.9,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


<h3 style = "color: dark; font-family: charter;">
    <b>4. Spreads the "t+h" values over the remaining decimal values</b>
  </h3>

In [82]:
pd.set_option('display.max_rows', None)

In [83]:
globals()[f'{sector}_{frequency}_growth_rates_horizon'] = transform_dataframe(globals()[f'{sector}_{frequency}_growth_rates_horizon'])

In [84]:
globals()[f'{sector}_{frequency}_growth_rates_horizon'].head(80)

Unnamed: 0,year,id_ns,date,2010_1,2010_2,2010_3,2010_4,2011_1,2011_2,2011_3,2011_4,2012_1,2012_2,2012_3,2012_4,2013_1,2013_2,2013_3,2013_4,2014_1,2014_2,2014_3,2014_4,2015_1,2015_2,2015_3,2015_4,2016_1,2016_2,2016_3,2016_4,2017_1,2017_2,2017_3,2017_4,2018_1,2018_2,2018_3,2018_4,2019_1,2019_2,2019_3,2019_4,2020_1,2020_2,2020_3,2020_4,2021_1,2021_2,2021_3,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4
0,2013,1,2013-01-04,t+33,t+30,t+27,t+24,t+21,t+18,t+15,t+12,t+9,t+6,t+3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2013,2,2013-01-11,t+33,t+30,t+27,t+24,t+21,t+18,t+15,t+12,t+9,t+6,t+3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,2013,3,2013-01-18,t+33,t+30,t+27,t+24,t+21,t+18,t+15,t+12,t+9,t+6,t+3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,2013,4,2013-01-25,t+33,t+30,t+27,t+24,t+21,t+18,t+15,t+12,t+9,t+6,t+3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,2013,5,2013-02-01,t+34,t+31,t+28,t+25,t+22,t+19,t+16,t+13,t+10,t+7,t+4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,2013,6,2013-02-08,t+34,t+31,t+28,t+25,t+22,t+19,t+16,t+13,t+10,t+7,t+4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,2013,7,2013-02-15,t+34,t+31,t+28,t+25,t+22,t+19,t+16,t+13,t+10,t+7,t+4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,2013,8,2013-02-22,t+34,t+31,t+28,t+25,t+22,t+19,t+16,t+13,t+10,t+7,t+4,t+1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,2013,9,2013-03-01,t+35,t+32,t+29,t+26,t+23,t+20,t+17,t+14,t+11,t+8,t+5,t+2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,2013,10,2013-03-08,t+35,t+32,t+29,t+26,t+23,t+20,t+17,t+14,t+11,t+8,t+5,t+2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


<h3 style = "color: dark; font-family: charter;">
    <b>5. Spreads the "t+h" values over the remaining rare decimal values</b>
  </h3>

In [29]:
globals()[f'{sector}_{frequency}_growth_rates_horizon'] = replace_horizon_2(globals()[f'{sector}_{frequency}_growth_rates_horizon'])

In [30]:
globals()[f'{sector}_{frequency}_growth_rates_horizon'].head(60)

Unnamed: 0,year,id_ns,date,2010_1,2010_2,2010_3,2010_4,2011_1,2011_2,2011_3,2011_4,2012_1,2012_2,2012_3,2012_4,2013_1,2013_2,2013_3,2013_4,2014_1,2014_2,2014_3,2014_4,2015_1,2015_2,2015_3,2015_4,2016_1,2016_2,2016_3,2016_4,2017_1,2017_2,2017_3,2017_4,2018_1,2018_2,2018_3,2018_4,2019_1,2019_2,2019_3,2019_4,2020_1,2020_2,2020_3,2020_4,2021_1,2021_2,2021_3,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4
0,2013,1,2013-01-04,t+33,t+30,t+27,t+24,t+21,t+18,t+15,t+12,t+9,t+6,t+3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2013,2,2013-01-11,t+33,t+30,t+27,t+24,t+21,t+18,t+15,t+12,t+9,t+6,t+3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,2013,3,2013-01-18,t+33,t+30,t+27,t+24,t+21,t+18,t+15,t+12,t+9,t+6,t+3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,2013,4,2013-01-25,t+33,t+30,t+27,t+24,t+21,t+18,t+15,t+12,t+9,t+6,t+3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,2013,5,2013-02-01,t+34,t+31,t+28,t+25,t+22,t+19,t+16,t+13,t+10,t+7,t+4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,2013,6,2013-02-08,t+34,t+31,t+28,t+25,t+22,t+19,t+16,t+13,t+10,t+7,t+4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,2013,7,2013-02-15,t+34,t+31,t+28,t+25,t+22,t+19,t+16,t+13,t+10,t+7,t+4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,2013,8,2013-02-22,t+34,t+31,t+28,t+25,t+22,t+19,t+16,t+13,t+10,t+7,t+4,t+1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,2013,9,2013-03-01,t+35,t+32,t+29,t+26,t+23,t+20,t+17,t+14,t+11,t+8,t+5,t+2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,2013,10,2013-03-08,t+35,t+32,t+29,t+26,t+23,t+20,t+17,t+14,t+11,t+8,t+5,t+2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


<div style="color: rgb(61, 48, 162); font-size: 12px;">
    Back to the
    <a href="#outilne" style="color: #687EFF;">
    outline.
    </a>
    <div/>

<div id="3">
   <!-- Contenido de la celda de destino -->
</div>

<h1><span style = "color: rgb(0, 65, 75); font-family: charter;">3.</span> <span style = "color: dark; font-family: charter;">Create intermediate revisions datasets</span></h1>

<div id="3.1.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: charter;">3.1.</span>
    <span style = "color: dark; font-family: charter;">
    Functions for creating intermediate revisions dataset
    </span>
    </h2>

<div style="font-family: charter; text-align: left; color:dark">
    <b> Getting last row index for t+h value for each column</b>
    <div/>

In [None]:
def get_last_index_h(df):
    # Creamos un diccionario para almacenar los registros
    registros = {}

    # Iteramos sobre cada columna en el DataFrame, excluyendo las columnas 'year', 'date' y 'id_ns'
    for columna in df.columns.drop(['year', 'date', 'id_ns']):
        # Creamos un diccionario para almacenar los índices de fila de cada valor 't+\d' en la columna actual
        registros_columna = {}
        # Iteramos sobre cada valor único en la columna actual, excluyendo NaN
        for valor in df[columna].dropna().unique():
            # Usamos expresiones regulares para encontrar valores que contengan 't+\d'
            if re.search(r't\+\d', valor):
                # Encontramos el índice de la última aparición de valor que contiene 't+\d' en la columna
                ultimo_indice = df[df[columna] == valor].index.max()
                # Agregamos el valor y su índice al diccionario de registros de la columna
                registros_columna[valor] = ultimo_indice
        # Agregamos el diccionario de registros de la columna al diccionario principal
        registros[columna] = registros_columna

    # Devolvemos los registros
    return registros

<div style="font-family: charter; text-align: left; color:dark">
    <b> Computting intermediate revisions</b>
    <div/>

In [None]:
def computing_inter_revisions(df, registros):
    # Extraemos las columnas de df, excluyendo 'year', 'date' y 'id_ns'
    columnas = df.columns.drop(['year', 'date', 'id_ns'])
    
    # Obtenemos el valor máximo de h para determinar el número de filas en el DataFrame de revisiones
    max_h = max([int(valor.split('+')[1]) for columna in registros.values() for valor in columna.keys()])
    num_filas = max_h - 1
    
    # Creamos un DataFrame vacío para almacenar las revisiones intermedias
    revisiones_intermedias = pd.DataFrame(columns=columnas, index=range(num_filas))
    
    # Iteramos sobre cada valor de h
    for h in range(2, max_h + 1):
        # Calculamos el nombre de la columna en el nuevo DataFrame
        columna_revision = f"t+{h} - t+1"
        
        # Iteramos sobre cada columna en df
        for columna in columnas:
            # Obtenemos los índices correspondientes a t+h y t+1
            indice_h = float(registros[columna].get(f"t+{h}", float('nan')))
            indice_t1 = float(registros[columna].get(f"t+1", float('nan')))

            # Verificamos si los índices son válidos
            if np.isnan(indice_h) or np.isnan(indice_t1):
                # Si alguno de los índices es NaN, asignamos NaN al resultado
                resultado = np.nan
            else:
                # Realizamos la resta y almacenamos el resultado en la columna correspondiente
                resultado = df.at[int(indice_h), columna] - df.at[int(indice_t1), columna]
                # Guardamos el resultado en la fila correspondiente de revisiones_intermedias
                revisiones_intermedias.at[h - 2, columna] = resultado
    
    return revisiones_intermedias

<div style="font-family: charter; text-align: left; color:dark">
    <b> Transpose intermediate revisions dataset</b>
    <div/>

In [None]:
def transpose_inter_revisions(revisiones_intermedias):
    # Transponemos el DataFrame
    revisiones_transpuestas = revisiones_intermedias.T
    
    # Establecemos el nombre de la primera columna como 'intermediate_revision_date'
    revisiones_transpuestas.columns.name = 'intermediate_revision_date'
    
    # Renombramos las columnas
    revisiones_transpuestas.columns = [f'{sector}_revision_{i+1}' for i in range(len(revisiones_transpuestas.columns))]
    
    # Reiniciamos el índice
    revisiones_transpuestas = revisiones_transpuestas.reset_index()
    
    # Cambiamos el nombre de la columna del índice
    revisiones_transpuestas = revisiones_transpuestas.rename(columns={'index': 'inter_revision_date'})
    
    return revisiones_transpuestas

<div id="3.2.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: charter;">3.3.</span>
    <span style = "color: dark; font-family: charter;">
    Creating intermediate revisions dataset step by step
    </span>
    </h2>

<span style = "color: dark; font-family: charter;">
    <b>1. Generating a dictionary with the row indices and their t+h values</b>
  </span>

In [None]:
dictionary = get_last_index_h(globals()[f'{sector}_{frequency}_growth_rates_horizon'])

<span style = "color: dark; font-family: charter;">
    <b>2. Computing intermediate revisions</b>
  </span>

In [None]:
# Utilizamos la función para calcular las revisiones intermedias
globals()[f'{sector}_inter_revisions'] = computing_inter_revisions(globals()[f'{sector}_{frequency}_growth_rates'], dictionary)
globals()[f'{sector}_inter_revisions']

<span style = "color: dark; font-family: charter;">
    <b>3. Transpose the intermediate revisions dataset</b>
  </span>

In [None]:
# Utilizamos la función para transponer las revisiones intermedias
globals()[f"{sector}_{frequency}_inter_revisions"] = transpose_inter_revisions(globals()[f'{sector}_inter_revisions'])

# Mostramos el resultado
print("Revisiones Intermedias Transpuestas:")
globals()[f"{sector}_{frequency}_inter_revisions"].head(20)

<div style="color: rgb(61, 48, 162); font-size: 12px;">
    Back to the
    <a href="#outilne" style="color: #687EFF;">
    outline.
    </a>
    <div/>

<div id="3.3.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: charter;">3.3.</span>
    <span style = "color: dark; font-family: charter;">
    Clean-up intermediate revisions dataset (last version to be loaded to SQL)
    </span>
    </h2>

In [None]:
import pandas as pd

# Extraer el mes y el año de la columna 'revision_date'
globals()[f"{sector}_{frequency}_inter_revisions"]['month'] = globals()[f"{sector}_{frequency}_inter_revisions"]['inter_revision_date'].str.split('_').str[0]
globals()[f"{sector}_{frequency}_inter_revisions"]['year'] = globals()[f"{sector}_{frequency}_inter_revisions"]['inter_revision_date'].str.split('_').str[1]

# Mapear los nombres de los meses a sus respectivos números
month_mapping = {
    'ene': '01', 'feb': '02', 'mar': '03', 'abr': '04',
    'may': '05', 'jun': '06', 'jul': '07', 'ago': '08',
    'sep': '09', 'oct': '10', 'nov': '11', 'dic': '12'
}

globals()[f"{sector}_{frequency}_inter_revisions"]['month'] = globals()[f"{sector}_{frequency}_inter_revisions"]['month'].map(month_mapping)

# Crear una nueva columna con la fecha en formato YYYY-MM-DD
globals()[f"{sector}_{frequency}_inter_revisions"]['inter_revision_date'] = globals()[f"{sector}_{frequency}_inter_revisions"]['year'] + '-' + globals()[f"{sector}_{frequency}_inter_revisions"]['month']

# Convertir la columna 'revision_date' a tipo de datos de fecha
globals()[f"{sector}_{frequency}_inter_revisions"]['inter_revision_date'] = pd.to_datetime(globals()[f"{sector}_{frequency}_inter_revisions"]['inter_revision_date'], format='%Y-%m')

# Eliminar columnas temporales 'month' y 'year'
globals()[f"{sector}_{frequency}_inter_revisions"].drop(['month', 'year'], axis=1, inplace=True)

# Mostrar el resultado
globals()[f"{sector}_{frequency}_inter_revisions"]

<span style = "color: dark; font-family: charter;">
    <b>Change format date</b>
  </span>

In [None]:
print(globals()[f"{sector}_{frequency}_inter_revisions"]['inter_revision_date'].dtype)

In [None]:
globals()[f"{sector}_{frequency}_inter_revisions"]['inter_revision_date'] = pd.to_datetime(globals()[f"{sector}_{frequency}_inter_revisions"]['inter_revision_date']).dt.date

<div style="color: rgb(61, 48, 162); font-size: 12px;">
    Back to the
    <a href="#outilne" style="color: #687EFF;">
    outline.
    </a>
    <div/>

<div id="4.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: charter;">4.</span>
    <span style = "color: dark; font-family: charter;">
    Loading to SQL
    </span>
    </h2>

In [None]:
# Check if all environment variables are defined
if not all([host, user, password]):
    raise ValueError("Some environment variables are missing (CIUP_SQL_HOST, CIUP_SQL_USER, CIUP_SQL_PASS)")

# Create connection string
connection_string = f"postgresql://{user}:{password}@{host}:{port}/{database}"

# Create SQLAlchemy engine
engine = create_engine(connection_string)

# REVISIONES

#globals()[f'{sector}_{frequency}_growth_rates_horizon'].to_sql(f'{sector}_{frequency}_growth_rates_horizon', engine, index=False, if_exists='replace')
#globals()[f"{sector}_{frequency}_inter_revisions"].to_sql(f'{sector}_{frequency}_inter_revisions', engine, index=False, if_exists='replace')

<div style="color: rgb(61, 48, 162); font-size: 12px;">
    Back to the
    <a href="#outilne" style="color: #687EFF;">
    outline.
    </a>
    <div/>

In [57]:
import pandas as pd
import re

# Dataframe original
data = {
    'year': [2013]*22,
    'id_ns': [f'{i:02d}' for i in range(1, 23)],
    'date': ['2013-01-04', '2013-01-11', '2013-01-18', '2013-01-25', '2013-02-01', '2013-02-08', '2013-02-15', '2013-02-22',
             '2013-03-01', '2013-03-08', '2013-03-15', '2013-03-22', '2013-04-05', '2013-04-12', '2013-04-19', '2013-04-26',
             '2013-05-03', '2013-05-10', '2013-05-17', '2013-05-24', '2013-05-31', '2013-06-07'],
    '2010_1': ['t+33', '6.2', '6.2', '6.2', '6.2', '6.2', '6.2', 't+34', '6.2', '6.2', '6.2', '6.2', '6.2', '6.2', '6.2', '6.2', '6.2', '6.2', '6.2', 't+37', '6.2', 'NaN'],
    '2010_2': ['t+30', '10.0', '10.0', '10.0', '10.0', '10.0', '10.0', 't+31', '10.0', '10.0', '10.0', '10.0', '10.0', '10.0', '10.0', '10.0', '10.0', '10.0', '10.0', 't+34', '10.0', 'NaN'],
    '2010_3': ['t+27', '9.6', '9.6', '9.6', '9.6', '9.6', '9.6', 't+28', '9.6', '9.6', '9.6', '9.6', '9.6', '9.6', '9.6', '9.6', '9.5', '9.5', '9.5', 't+31', '9.5', 'NaN'],
    '2010_4': ['t+24', '9.2', '9.2', '9.2', '9.2', '9.2', '9.2', 't+25', '9.2', '9.2', '9.2', '9.2', '9.2', '9.2', '9.2', '9.2', '9.2', '9.2', '9.2', 't+28', '9.2', 'NaN'],
    '2011_1': ['t+21', '8.8', '8.8', '8.8', '8.8', '8.8', '8.8', 't+22', '8.8', '8.8', '8.8', '8.8', '8.8', '8.8', '8.8', '8.8', '8.6', '8.6', '8.6', 't+25', '8.6', '8.6'],
    '2011_2': ['t+18', '6.9', '6.9', '6.9', '6.9', '6.9', '6.9', 't+19', '6.9', '6.9', '6.9', '6.9', '6.9', '6.9', '6.9', '6.9', '6.9', '6.9', '6.9', 't+22', '6.9', '6.9'],
    '2011_3': ['t+15', '6.7', '6.7', '6.7', '6.7', '6.7', '6.7', 't+16', '6.7', '6.7', '6.7', '6.7', '6.7', '6.7', '6.7', '6.7', '6.6', '6.6', '6.6', 't+19', '6.6', '6.6']
}

df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])

df

Unnamed: 0,year,id_ns,date,2010_1,2010_2,2010_3,2010_4,2011_1,2011_2,2011_3
0,2013,1,2013-01-04,t+33,t+30,t+27,t+24,t+21,t+18,t+15
1,2013,2,2013-01-11,6.2,10.0,9.6,9.2,8.8,6.9,6.7
2,2013,3,2013-01-18,6.2,10.0,9.6,9.2,8.8,6.9,6.7
3,2013,4,2013-01-25,6.2,10.0,9.6,9.2,8.8,6.9,6.7
4,2013,5,2013-02-01,6.2,10.0,9.6,9.2,8.8,6.9,6.7
5,2013,6,2013-02-08,6.2,10.0,9.6,9.2,8.8,6.9,6.7
6,2013,7,2013-02-15,6.2,10.0,9.6,9.2,8.8,6.9,6.7
7,2013,8,2013-02-22,t+34,t+31,t+28,t+25,t+22,t+19,t+16
8,2013,9,2013-03-01,6.2,10.0,9.6,9.2,8.8,6.9,6.7
9,2013,10,2013-03-08,6.2,10.0,9.6,9.2,8.8,6.9,6.7


In [60]:
def transform_dataframe(df):
    df = df.copy()
    df['date'] = pd.to_datetime(df['date'])
    columns = df.columns[3:]

    for col in columns:
        base_t = None

        for i in range(len(df)):
            current_value = df.at[i, col]

            if pd.isna(current_value) or str(current_value) == '' or re.match(r't\+\d+', str(current_value)):
                if re.match(r't\+\d+', str(current_value)):
                    base_t = int(current_value.split('+')[1])
                continue

            if base_t is not None:
                month_diff = (df.at[i, 'date'].month - df.at[i-1, 'date'].month)
                base_t += month_diff
            else:
                base_t = 0  # In case base_t was not set, we start with t+0 for the first replacement.

            if re.match(r'[-+]?\d+\.\d+', str(current_value)):
                df.at[i, col] = f't+{base_t}'

    return df


In [61]:
# Aplica la transformación a cada fila del dataframe
transformed_df = transform_dataframe(df)
transformed_df

Unnamed: 0,year,id_ns,date,2010_1,2010_2,2010_3,2010_4,2011_1,2011_2,2011_3
0,2013,1,2013-01-04,t+33,t+30,t+27,t+24,t+21,t+18,t+15
1,2013,2,2013-01-11,t+33,t+30,t+27,t+24,t+21,t+18,t+15
2,2013,3,2013-01-18,t+33,t+30,t+27,t+24,t+21,t+18,t+15
3,2013,4,2013-01-25,t+33,t+30,t+27,t+24,t+21,t+18,t+15
4,2013,5,2013-02-01,t+34,t+31,t+28,t+25,t+22,t+19,t+16
5,2013,6,2013-02-08,t+34,t+31,t+28,t+25,t+22,t+19,t+16
6,2013,7,2013-02-15,t+34,t+31,t+28,t+25,t+22,t+19,t+16
7,2013,8,2013-02-22,t+34,t+31,t+28,t+25,t+22,t+19,t+16
8,2013,9,2013-03-01,t+35,t+32,t+29,t+26,t+23,t+20,t+17
9,2013,10,2013-03-08,t+35,t+32,t+29,t+26,t+23,t+20,t+17
