<div style="text-align: center; font-family: 'charter bt pro roman'; color: rgb(0, 65, 75);">
    <h1>
    GDP Vintages and Releases datasets
    </h1>
</div>

<div style="text-align: center; font-family: 'charter bt pro roman'; color: rgb(0, 65, 75);">
<h3>
Documentation
<br>
____________________
<br>
</h3>
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    This 
    <span style="color: rgb(0, 65, 75);">jupyter notebook</span>
    provides a step-by-step guide to <b>data building</b> regarding the project <b>'Revisiones y sesgos en las estimaciones preliminares del PBI en el Perú'</b>. This guide covers the creation of GDP mid-term revision dataset for each sector. A key step is the construction at par of what we will call “The ‘t+h’ structure”. This dataset is similar to that of the GDP growth vintages by sector, but instead of growth rate values, it contains values of type “t+h”, where h indicates how many months have passed since the preliminary growth rate was first published; that is, this jupyter notebook also covers the creation of vintages datasets of growth rates associated with a horizon (<b>h</b>).
</div>

<div style="text-align: center; font-family: 'PT Serif Pro Book'; color: rgb(0, 65, 75); font-size: 16px;">
    Jason Cruz
    <br>
    <a href="mailto:jj.cruza@up.edu.pe" style="color: rgb(0, 153, 123); font-size: 16px;">
        jj.cruza@up.edu.pe
    </a>
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;line-height: 1.5;">
<span style="font-size: 34px;">&#128452;</span> The 't+h' structure should be available for all sectors and frequencies.
    <br>
    <span style="font-size: 24px;">&#8987;</span> Available since <b>1994-2024</b> (Table 1) and since <b>1997-2024</b> (Table 2). 
    <br>
</div>

<div style="font-family: Amaya; text-align: left; color: rgb(0, 65, 75); font-size:16px">The following <b>outline is functional</b>. By utilising the provided buttons, users are able to enhance their experience by browsing this script.<div/>

<div id="outilne">
   <!-- Contenido de la celda de destino -->
</div>

<div style="background-color: #292929; padding: 10px; line-height: 1.5; font-family: 'PT Serif Pro Book';">
    <h2 style="text-align: left; color: #E0E0E0;">
        Outline
    </h2>
    <br>
    <a href="#libraries" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        Libraries</a>
    <br>
    <a href="#setup" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        Initial set-up</a>
    <br>
    <a href="#1" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        1. Economic sector selector</a>
    <br>
    <a href="#2" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        2. Create horizon datasets</a>
    <br>
    <a href="#2.1." style="color: #94FFD8; font-size: 16px; margin-left: 20px;">
        2.1. Loading growth rate datasets from postgresql.</a>
    <br>
    <a href="#2.2." style="color: #94FFD8; font-size: 16px; margin-left: 20px;">
        2.2. Creating horizon dataset step by step.</a> 
    <br>
    <a href="#3" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        3. Create base year datasets</a>
    <br>
    <a href="#3.1." style="color: #94FFD8; font-size: 16px; margin-left: 20px;">
        3.1. Loading growth rate datasets from postgresql.</a>
    <br>
    <a href="#3.2." style="color: #94FFD8; font-size: 16px; margin-left: 20px;">
        3.2. Creating base year dataset.</a> 
    <br>
    <a href="#4" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        4. Remove observations affected by base year</a>
    <br>
    <a href="#5" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        5. Create datasets with dummy-seasonal values of revisions</a>
    <br>
    <a href="#5.1." style="color: #94FFD8; font-size: 16px; margin-left: 20px;">
        5.1. Loading merged irregular calendar dataset from <code>PostgresSQL</code></a>
    <br>
    <a href="#5.2." style="color: #94FFD8; font-size: 16px; margin-left: 20px;">
        5.2. Creating datasets with dummy-seasonal values of revisions</a>
    <br>
    <a href="#6" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        6. Create growth rates by horizon dataset</a>
    <br>
    <a href="#7" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        7. Create vintages and releases datasets</a>
    <br>
    <a href="#8" style="color: #E0E0E0; font-size: 18px; margin-left: 0px;">
        8. Loading to SQL</a>
    <br>
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Any questions or issues regarding the coding, please email Jason Cruz <a href="mailto:jj.cruza@alum.up.edu.pe" style="color: rgb(0, 153, 123); text-decoration: none;"><span style="font-size: 24px;">&#x2709;</span>
    </a>.
    <div/>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    If you don't have the libraries below, please use the following code (as example) to install the required libraries.
    <div/>

In [None]:
#!pip install os # Comment this code with "#" if you have already installed this library.

<div id="libraries">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark;">
    <h2>
    Libraries
    </h2>
    <div/>

In [20]:
# POSTGRESSQL
import os
from sqlalchemy import create_engine

# HORIZON DATASETS
import pandas as pd
import numpy as np
import re


<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="setup">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark;">
    <h2>
    Initial set-up
    </h2>
    <div/>

<p style="font-family: PT Serif Pro Book; text-align: left; color:dark; font-size:16px"> The following function will establish a connection to the <code>gdp_revisions_datasets</code> database in <code>PostgreSQL</code>. The <b>input data</b> used in this jupyter notebook will be loaded from this <code>PostgreSQL</code> database, and similarly, all <b>output data</b> generated by this jupyter notebook will be stored in that database. Ensure that you set the necessary parameters to access the server once you have obtained the required permissions.<p/>
    
<p style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
To request permissions, please email Jason Cruz <a href="mailto:jj.cruza@alum.up.edu.pe" style="color: rgb(0, 153, 123); text-decoration: none;"> <span style="font-size: 24px;">&#x2709;</span>
    </a>.
<p/>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    <span style="font-size: 24px; color: #FFA823; font-weight: bold;">&#9888;</span>
    Enter your user credentials to acces to SQL.
    <div/>

In [21]:
def create_sqlalchemy_engine():
    """
    Function to create an SQLAlchemy engine using environment variables.
    
    Returns:
        engine: SQLAlchemy engine object.
    """
    # Get environment variables
    user = os.environ.get('CIUP_SQL_USER')  # Get the SQL user from environment variables
    password = os.environ.get('CIUP_SQL_PASS')  # Get the SQL password from environment variables
    host = os.environ.get('CIUP_SQL_HOST')  # Get the SQL host from environment variables
    port = 5432  # Set the SQL port to 5432
    database = 'gdp_revisions_datasets'  # Set the database name 'gdp_revisions_datasets' from SQL

    # Check if all environment variables are defined
    if not all([host, user, password]):
        raise ValueError("Some environment variables are missing (CIUP_SQL_HOST, CIUP_SQL_USER, CIUP_SQL_PASS)")

    # Create connection string
    connection_string = f"postgresql://{user}:{password}@{host}:{port}/{database}"

    # Create SQLAlchemy engine
    engine = create_engine(connection_string)
    
    return engine

<div style="text-align: left;">
    <span style="font-size: 24px; color: rgb(255, 32, 78); font-weight: bold;">&#9888;</span>
    <span style="font-family: PT Serif Pro Book; color: black; font-size: 16px;">
        Import all other functions required by this jupyter notebook.
    </span>
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color:dark; font-size:16px"> Please, check the script <code>gdp_revisions_datasets_functions.py</code> which contains all the functions required by this jupyter notebook. The functions there are ordered according to the <a href="#outilne" style="color: #3d30a2;">sections</a> of this jupyter notebok.<div/>

In [22]:
from gdp_revisions_datasets_functions import *

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="1">
   <!-- Contenido de la celda de destino -->
</div>

<h1><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book; color: dark;">1.</span> <span style = "color: dark; font-family: PT Serif Pro Book;">Economic sector and data frequency selector</span></h1>

<div id="steps-1">
   <!-- Contenido de la celda de destino -->
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <a href="#step-1-1" style="text-decoration: none; color: #006769"> <span style="font-size: 24px; color: rgb(0, 65, 75);">&#10122;</span> Select economic sector</a>
    <br>
    <a href="#step-1-2" style="text-decoration: none; color: #006769"><span style="font-size: 24px; color: rgb(0, 65, 75)">&#10123;</span> Select frequency</a>
</div>

<div id="step-1-1">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10122;</span> <span>Select economic sector</span>
  </div>

In [151]:
# Call the function to show the popup window
sector = show_option_window()
print("Selected economic sector:", sector)

Selected economic sector: fishing


<div id="step-1-2">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10123;</span> <span>Select frequency</span>
  </div>

In [152]:
# Call the function to show the popup window
frequency = show_frequency_window()
print("Selected frequency:", frequency)

Selected frequency: quarterly


<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 20px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#steps-1" style="color: #006769; text-decoration: none;">⮝</a>
    </span> 
    <a href="#steps-1" style="color: #006769; text-decoration: none;">Back to steps.</a>
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="2">
   <!-- Contenido de la celda de destino -->
</div>

<h1><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book; color: dark;">2.</span> <span style = "color: dark; font-family: PT Serif Pro Book;">Create horizon datasets</span></h1>

<div id="2.1.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">2.1. </span> <span style = "color: dark; font-family: PT Serif Pro Book;">Loading growth rate datasets from <code>PostgresSQL</code></span></h2>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Connect to SQL.
    <div/>

In [153]:
# Connect to SQL
engine = create_sqlalchemy_engine()

# SQL Query
query = f"SELECT * FROM {sector}_{frequency}_growth_rates;" # Please change your query to PosgtresSQL as you see fit

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Comment the code below if you want the default option (display rows and columns of the dataframe in a limited way)
    <div/>

In [154]:
#pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Check the loaded dataframe.
    <div/>

In [155]:
# Read growth rates dataset as DataFrame
globals()[f'{sector}_{frequency}_growth_rates'] = pd.read_sql(query, engine)
growth_rates = globals()[f'{sector}_{frequency}_growth_rates']
growth_rates_df = growth_rates.copy()
growth_rates_df.head(10)

Unnamed: 0,year,id_ns,date,1994_1,1994_2,1994_3,1994_4,1995_1,1995_2,1995_3,...,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4,2024_1
0,1997,1,1997-01-10,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
1,1997,2,1997-01-17,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
2,1997,3,1997-01-24,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
3,1997,4,1997-01-31,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
4,1997,5,1997-02-07,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
5,1997,6,1997-02-14,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
6,1997,7,1997-02-21,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
7,1997,8,1997-02-28,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
8,1997,9,1997-03-07,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
9,1997,10,1997-03-14,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,


In [156]:
# Find duplicates of id_ns within the same year
duplicated_rows = growth_rates_df[growth_rates_df.duplicated(subset=['year', 'id_ns'], keep=False)]
duplicated_rows

Unnamed: 0,year,id_ns,date,1994_1,1994_2,1994_3,1994_4,1995_1,1995_2,1995_3,...,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4,2024_1


In [157]:
growth_rates_df.iloc[:10,:5]

Unnamed: 0,year,id_ns,date,1994_1,1994_2
0,1997,1,1997-01-10,14.6,37.4
1,1997,2,1997-01-17,14.6,37.4
2,1997,3,1997-01-24,14.6,37.4
3,1997,4,1997-01-31,14.6,37.4
4,1997,5,1997-02-07,14.6,37.4
5,1997,6,1997-02-14,14.6,37.4
6,1997,7,1997-02-21,14.6,37.4
7,1997,8,1997-02-28,14.6,37.4
8,1997,9,1997-03-07,14.6,37.4
9,1997,10,1997-03-14,14.6,37.4


<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="2.2.">
   <!-- Contenido de la celda de destino -->
</div>

<div id="2.3.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">2.2.</span>
    <span style = "color: dark; font-family: PT Serif Pro Book;">
    Creating horizon dataset step by step
    </span>
    </h2>

<div id="steps-2">
   <!-- Contenido de la celda de destino -->
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <a href="#step-2-1" style="text-decoration: none; color: #006769"> <span style="font-size: 24px; color: rgb(0, 65, 75);">&#10122;</span> Replace decimal values by “t+h” values only in the rows representing a new rung</a>
    <br>
    <a href="#step-2-2" style="text-decoration: none; color: #006769"><span style="font-size: 24px; color: rgb(0, 65, 75)">&#10123;</span> Concatenate first 3 columns: year, date, id_ns </a>
    <br>
    <a href="#step-2-3" style="text-decoration: none; color: #006769"><span style="font-size: 24px; color: rgb(0, 65, 75)">&#10124;</span> Convert columns to string type</a>
    <br>
    <a href="#step-2-4" style="text-decoration: none; color: #006769"><span style="font-size: 24px; color: rgb(0, 65, 75)">&#10125;</span> Spreads the "t+h" values over the remaining decimal values </a>
    <br>
    <a href="#step-2-5" style="text-decoration: none; color: #006769"><span style="font-size: 24px; color: rgb(0, 65, 75)">&#10126;</span> Exporting to excel file </a>
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
   <span>Set key variables to fill the growth_rates with 't+h' values</span>
  </div>

In [160]:
# Call the function to set h_initial
if frequency == "monthly":
    h_initial = 1
elif frequency == "quarterly":
    h_initial = 3
elif frequency == "annual":
    h_initial = 12
else:
    h_initial = None

print(h_initial)

# suggested: monthly (1), quarterly (3), annual (12)

3


In [161]:
# Call the function to show the start_row window
start_row = 0 # Change according your preferences

In [162]:
# Define the mapping of frequencies to h_counter values
frequency_mapping = {
    'monthly': 1,
    'quarterly': 3,
    'annual': 12
}

# Get the appropriate h_counter value based on the selected frequency
h_counter = frequency_mapping.get(frequency)
print("Selected h_counter:", h_counter)

Selected h_counter: 3


<div id="step-2-1">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10122;</span> <span>Replace decimal values by “t+h” values only in the rows representing a new rung</span>
  </div>

In [118]:
horizon = replace_horizon(growth_rates_df.iloc[:, 3:], start_row, h_initial, h_counter)
horizon_df = horizon.copy()
horizon_df.head(10)

Unnamed: 0,1994_1,1994_2,1994_3,1994_4,1995_1,1995_2,1995_3,1995_4,1996_1,1996_2,...,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4,2024_1
0,t+33,t+30,t+27,t+24,t+21,t+18,t+15,t+12,t+9,t+6,...,,,,,,,,,,
1,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,-23.9,-19.8,7.5,...,,,,,,,,,,
2,11.6,27.5,4.8,7.2,10.6,7.7,8.7,2.7,0.8,8.0,...,,,,,,,,,,
3,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,-23.9,-19.8,7.5,...,,,,,,,,,,
4,11.6,27.5,4.8,7.2,10.6,7.7,8.7,2.7,0.8,8.0,...,,,,,,,,,,
5,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,-23.9,-19.8,7.5,...,,,,,,,,,,
6,11.6,27.5,4.8,7.2,10.6,7.7,8.7,2.7,0.8,8.0,...,,,,,,,,,,
7,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,-23.9,-19.8,7.5,...,,,,,,,,,,
8,11.6,27.5,4.8,7.2,10.6,7.7,8.7,2.7,0.8,8.0,...,,,,,,,,,,
9,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,-23.9,-19.8,7.5,...,,,,,,,,,,


<div id="step-2-2">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10123;</span> <span>Concatenate first 3 columns: year, date, id_ns</span>
  </div>

In [119]:
# Get the first three columns of the original DataFrame
first_3_columns = growth_rates_df.iloc[:, :3]

# Concatenate the first three columns with h_{sector}_{frequency}_growth_rates
horizon_df = pd.concat([first_3_columns, horizon_df], axis=1)
horizon_df.head(20)

Unnamed: 0,year,id_ns,date,1994_1,1994_2,1994_3,1994_4,1995_1,1995_2,1995_3,...,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4,2024_1
0,1997,1,1997-01-10,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
1,1997,1,1997-01-10,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
2,1997,2,1997-01-17,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
3,1997,2,1997-01-17,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
4,1997,3,1997-01-24,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
5,1997,3,1997-01-24,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
6,1997,4,1997-01-31,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
7,1997,4,1997-01-31,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
8,1997,5,1997-02-07,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
9,1997,5,1997-02-07,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,


<div id="step-2-3">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10124;</span> <span>Convert columns to string type</span>
  </div>

In [120]:
horizon_df = columns_str(horizon_df)
horizon_df.head(20)

Unnamed: 0,year,id_ns,date,1994_1,1994_2,1994_3,1994_4,1995_1,1995_2,1995_3,...,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4,2024_1
0,1997,1,1997-01-10,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
1,1997,1,1997-01-10,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
2,1997,2,1997-01-17,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
3,1997,2,1997-01-17,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
4,1997,3,1997-01-24,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
5,1997,3,1997-01-24,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
6,1997,4,1997-01-31,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
7,1997,4,1997-01-31,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
8,1997,5,1997-02-07,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
9,1997,5,1997-02-07,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,


<div id="step-2-4">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10125;</span> Spreads the "t+h" values over the remaining decimal values
  </div>

In [121]:
horizon_df = replace_horizon_1(horizon_df)
horizon_df.head(20)

Unnamed: 0,year,id_ns,date,1994_1,1994_2,1994_3,1994_4,1995_1,1995_2,1995_3,...,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4,2024_1
0,1997,1,1997-01-10,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
1,1997,1,1997-01-10,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
2,1997,2,1997-01-17,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
3,1997,2,1997-01-17,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
4,1997,3,1997-01-24,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
5,1997,3,1997-01-24,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
6,1997,4,1997-01-31,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
7,1997,4,1997-01-31,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
8,1997,5,1997-02-07,t+34,t+31,t+28,t+25,t+22,t+19,t+16,...,,,,,,,,,,
9,1997,5,1997-02-07,t+34,t+31,t+28,t+25,t+22,t+19,t+16,...,,,,,,,,,,


<div id="step-2-5">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10126;</span> Exporting to excel file
  </div>

In [122]:
# Export to excel file
#with pd.ExcelWriter('gdp_monthly_growth_rates_h.xlsx') as writer:
#    horizon_df.to_excel(writer, sheet_name='gdp_monthly_growth_rates_h', index=False) # this is an optional, to view data generated

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Create copies of <code>horizon_df</code> to be used as arguments in other functions later on
    <div/>

In [123]:
horizon_df_copy_1 = horizon_df.copy()
horizon_df_copy_2 = horizon_df.copy()

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 20px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#steps-2" style="color: #006769; text-decoration: none;">⮝</a>
    </span> 
    <a href="#steps-2" style="color: #006769; text-decoration: none;">Back to steps.</a>
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="3">
   <!-- Contenido de la celda de destino -->
</div>

<h1><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">3.</span> <span style = "color: dark; font-family: PT Serif Pro Book;">Create base year datasets</span></h1>

<div id="3.1.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">3.1. </span> <span style = "color: dark; font-family: PT Serif Pro Book;">Loading growth rate datasets from <code>PostgresSQL</code></span></h2>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Connect to SQL.
    <div/>

In [124]:
# Connect to SQL
engine = create_sqlalchemy_engine()

# SQL Query
query = f"SELECT * FROM ns_base_year;" # Please change your query to PosgtresSQL as you see fit

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Check the loaded dataframe.
    <div/>

In [125]:
# Read growth rates dataset as DataFrame
ns_base_year = pd.read_sql(query, engine)
ns_base_year_df = ns_base_year.copy()
ns_base_year_df.head(10)

Unnamed: 0,year,id_ns,date,base_year
0,1994,1,1994-01-10,1990
1,1994,2,1994-01-17,1990
2,1994,3,1994-01-24,1990
3,1994,4,1994-02-01,1990
4,1994,5,1994-02-07,1990
5,1994,6,1994-02-14,1990
6,1994,7,1994-02-21,1990
7,1994,8,1994-02-28,1990
8,1994,9,1994-03-04,1990
9,1994,10,1994-03-14,1990


<div id="3.2.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">3.2.</span>
    <span style = "color: dark; font-family: PT Serif Pro Book;">
    Creating base year dataset
    </span>
    </h2>

In [126]:
base_year_df = replace_floats_with_base_year(ns_base_year_df, growth_rates_df)

In [127]:
base_year_df.head(10)

Unnamed: 0,year,id_ns,date,1994_1,1994_2,1994_3,1994_4,1995_1,1995_2,1995_3,...,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4,2024_1
0,1997,1,1997-01-10,1990,1990,1990,1990,1990,1990,1990,...,,,,,,,,,,
1,1997,1,1997-01-10,1990,1990,1990,1990,1990,1990,1990,...,,,,,,,,,,
2,1997,2,1997-01-17,1990,1990,1990,1990,1990,1990,1990,...,,,,,,,,,,
3,1997,2,1997-01-17,1990,1990,1990,1990,1990,1990,1990,...,,,,,,,,,,
4,1997,3,1997-01-24,1990,1990,1990,1990,1990,1990,1990,...,,,,,,,,,,
5,1997,3,1997-01-24,1990,1990,1990,1990,1990,1990,1990,...,,,,,,,,,,
6,1997,4,1997-01-31,1990,1990,1990,1990,1990,1990,1990,...,,,,,,,,,,
7,1997,4,1997-01-31,1990,1990,1990,1990,1990,1990,1990,...,,,,,,,,,,
8,1997,5,1997-02-07,1990,1990,1990,1990,1990,1990,1990,...,,,,,,,,,,
9,1997,5,1997-02-07,1990,1990,1990,1990,1990,1990,1990,...,,,,,,,,,,


<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Consider the code below as provisional. Actually, no data should be exported to the current directory folder, all data should be uploaded to SQL.
    <div/>

In [128]:
# Export to excel file
#with pd.ExcelWriter('base_year_df.xlsx') as writer:
#    base_year_df.to_excel(writer, sheet_name='base_year_df', index=False) # this is an optional, to view data generated

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="4">
   <!-- Contenido de la celda de destino -->
</div>

<h1><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">4.</span> <span style = "color: dark; font-family: PT Serif Pro Book;">Remove observations affected by base year</span></h1>

<div id="steps-4">
   <!-- Contenido de la celda de destino -->
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <a href="#st-4-1" style="text-decoration: none; color: #006769"> <span style="font-size: 24px; color: rgb(0, 65, 75);">&#10122;</span> Generating a dictionary to match observations affected by base year</a>
    <br>
    <a href="#st-4-2" style="text-decoration: none; color: #006769"><span style="font-size: 24px; color: rgb(0, 65, 75)">&#10123;</span> Remove observations affected by base year</a>
    <br>
</div>

<div id="st-4-1">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10122;</span> <span>Generating a dictionary to match observations affected by base year</span>
  </div>

In [129]:
base_year_dictionary = create_dic_base_year(base_year_df)

In [130]:
base_year_dictionary

{'1998_1': {350,
  351,
  352,
  353,
  354,
  355,
  356,
  357,
  358,
  359,
  360,
  361,
  362,
  363,
  364,
  365,
  366,
  367,
  368,
  369,
  370,
  371,
  372,
  373,
  374,
  375,
  376,
  377,
  378,
  379,
  380,
  381,
  382,
  383,
  384,
  385,
  386,
  387,
  388,
  389,
  390,
  391,
  392,
  393,
  394,
  395,
  396,
  397,
  398,
  399,
  400,
  401,
  402,
  403,
  404,
  405,
  406,
  407,
  408,
  409,
  410,
  411,
  412,
  413,
  414,
  415,
  416,
  417,
  418,
  419,
  420,
  421,
  422,
  423,
  424,
  425,
  426,
  427,
  428,
  429,
  430,
  431},
 '1998_2': {350,
  351,
  352,
  353,
  354,
  355,
  356,
  357,
  358,
  359,
  360,
  361,
  362,
  363,
  364,
  365,
  366,
  367,
  368,
  369,
  370,
  371,
  372,
  373,
  374,
  375,
  376,
  377,
  378,
  379,
  380,
  381,
  382,
  383,
  384,
  385,
  386,
  387,
  388,
  389,
  390,
  391,
  392,
  393,
  394,
  395,
  396,
  397,
  398,
  399,
  400,
  401,
  402,
  403,
  404,
  405,
  406,
  407,

<div id="st-4-2">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10123;</span> <span>Remove observations affected by base year</span>
  </div>

In [131]:
input_growth_rates_df = remove_base_year_affected_obs(base_year_dictionary, growth_rates_df)

In [132]:
input_growth_rates_df.head(30)

Unnamed: 0,year,id_ns,date,1994_1,1994_2,1994_3,1994_4,1995_1,1995_2,1995_3,...,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4,2024_1
0,1997,1,1997-01-10,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
1,1997,1,1997-01-10,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
2,1997,2,1997-01-17,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
3,1997,2,1997-01-17,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
4,1997,3,1997-01-24,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
5,1997,3,1997-01-24,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
6,1997,4,1997-01-31,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
7,1997,4,1997-01-31,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,
8,1997,5,1997-02-07,11.6,27.5,4.8,7.2,10.6,7.7,8.7,...,,,,,,,,,,
9,1997,5,1997-02-07,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,...,,,,,,,,,,


<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Consider the code below as provisional. Actually, no data should be exported to the current directory folder, all data should be uploaded to SQL.
    <div/>

In [133]:
# Export to excel file
#with pd.ExcelWriter('input_growth_rates_df.xlsx') as writer:
#    input_growth_rates_df.to_excel(writer, sheet_name='input_growth_rates_df', index=False) # this is an optional, to view data generated

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 20px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#steps-5" style="color: #006769; text-decoration: none;">⮝</a>
    </span> 
    <a href="#steps-4" style="color: #006769; text-decoration: none;">Back to steps.</a>
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="5">
   <!-- Contenido de la celda de destino -->
</div>

<h1><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">5.</span> <span style = "color: dark; font-family: PT Serif Pro Book;">Create datasets with dummy-seasonal values of revisions</span></h1>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    <span style="font-size: 24px; color: rgb(255, 32, 78); font-weight: bold;">&#9888;</span>
    This section must be apllied only for <b>monthly</b> frequency
    <div/>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    <span style="font-size: 24px; color: #FFA823; font-weight: bold;">&#9888;</span>
    In this section we replace the GDP growth rates by 1 and 0. Since one row corresponds to a Weekly Note (WN), a row full of ones indicates that both Table 1 (monthly growth rates) and Table 2 (quarterly and annual growth rates) were revised in the same NS. 
    <div/>

<div id="5.1.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">5.1. </span> <span style = "color: dark; font-family: PT Serif Pro Book;">Loading merged irregular calendar dataset from <code>PostgresSQL</code></span></h2>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Connect to SQL.
    <div/>

In [134]:
# Connect to SQL
#engine = create_sqlalchemy_engine()

# SQL Query
#irregular_calendar_query = f"SELECT * FROM revisions_irregular_calendar_merged;"

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Check the loaded dataframe.
    <div/>

In [135]:
# Read growth rates dataset as DataFrame
#irregular_calendar = pd.read_sql(irregular_calendar_query, engine)
#irregular_calendar = irregular_calendar.copy()
#irregular_calendar.head(10)

<div id="5.2.">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">5.2.</span>
    <span style = "color: dark; font-family: PT Serif Pro Book;">
    Creating datasets with dummy-seasonal values of revisions
    </span>
    </h2>

In [136]:
#dummies_df = replace_strings_with_dummies(irregular_calendar, horizon_df_copy_1)

In [137]:
#dummies_df.head(10)

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Convert columns to integer type
    <div/>

In [138]:
#input_dummies_df = convert_columns_to_float(dummies_df)
#input_dummies_df.head(10)

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Consider the code below as provisional. Actually, no data should be exported to the current directory folder, all data should be uploaded to SQL.
    <div/>

In [139]:
# Export to excel file
#with pd.ExcelWriter('dummies_df.xlsx') as writer:
#    dummies_df.to_excel(writer, sheet_name='dummies_df', index=False) # this is an optional, to view data generated

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 20px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#steps-5" style="color: #006769; text-decoration: none;">⮝</a>
    </span> 
    <a href="#steps-5" style="color: #006769; text-decoration: none;">Back to steps.</a>
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="6">
   <!-- Contenido de la celda de destino -->
</div>

<h1><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">6.</span> <span style = "color: dark; font-family: PT Serif Pro Book;">Create growth rates by horizon dataset</span></h1>

<div id="steps-6">
   <!-- Contenido de la celda de destino -->
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <a href="#st-6-1" style="text-decoration: none; color: #006769"> <span style="font-size: 24px; color: rgb(0, 65, 75);">&#10122;</span> Generating a dictionary with the row indices and their t+h values</a>
    <br>
    <a href="#st-6-2" style="text-decoration: none; color: #006769"><span style="font-size: 24px; color: rgb(0, 65, 75)">&#10123;</span> Colapse growth rates by horizon ('t+h') </a>
    <br>
</div>

<div id="st-6-1">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10122;</span> <span>Generating a dictionary with the row indices and their t+h values</span>
  </div>

In [140]:
horizon_df_copy_2.head(5) # Check horizon data

Unnamed: 0,year,id_ns,date,1994_1,1994_2,1994_3,1994_4,1995_1,1995_2,1995_3,...,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4,2024_1
0,1997,1,1997-01-10,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
1,1997,1,1997-01-10,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
2,1997,2,1997-01-17,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
3,1997,2,1997-01-17,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,
4,1997,3,1997-01-24,t+33,t+30,t+27,t+24,t+21,t+18,t+15,...,,,,,,,,,,


In [141]:
horizon_dictionary = get_last_index_h(horizon_df_copy_2)
horizon_dictionary

{'1994_1': {'t+33': 7, 't+34': 15, 't+35': 21, 't+36': 29, 't+37': 45},
 '1994_2': {'t+30': 7, 't+31': 15, 't+32': 21, 't+33': 29, 't+34': 45},
 '1994_3': {'t+27': 7, 't+28': 15, 't+29': 21, 't+30': 29, 't+31': 45},
 '1994_4': {'t+24': 7, 't+25': 15, 't+26': 21, 't+27': 29, 't+28': 45},
 '1995_1': {'t+21': 7,
  't+22': 15,
  't+23': 21,
  't+24': 29,
  't+25': 47,
  't+26': 55,
  't+27': 63,
  't+28': 71,
  't+29': 81,
  't+30': 89,
  't+31': 97,
  't+32': 105,
  't+33': 113,
  't+34': 121,
  't+35': 129,
  't+36': 137,
  't+37': 139},
 '1995_2': {'t+18': 7,
  't+19': 15,
  't+20': 21,
  't+21': 29,
  't+22': 47,
  't+23': 55,
  't+24': 63,
  't+25': 71,
  't+26': 81,
  't+27': 89,
  't+28': 97,
  't+29': 105,
  't+30': 113,
  't+31': 121,
  't+32': 129,
  't+33': 137,
  't+34': 139},
 '1995_3': {'t+15': 7,
  't+16': 15,
  't+17': 21,
  't+18': 29,
  't+19': 47,
  't+20': 55,
  't+21': 63,
  't+22': 71,
  't+23': 81,
  't+24': 89,
  't+25': 97,
  't+26': 105,
  't+27': 113,
  't+28': 1

<div id="st-6-2">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10123;</span> <span>Colapse growth rates by horizon <code>'t+h'</code></span>
  </div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; color: dark; font-size:16px">
    Note that <code>'t+h'</code> is 'h' months after the target date.
    <div/>

In [142]:
filtered_h_df = filter_df_by_indices(growth_rates_df, horizon_dictionary)
filtered_h_df

Unnamed: 0,horizon,1994_1,1994_2,1994_3,1994_4,1995_1,1995_2,1995_3,1995_4,1996_1,...,2021_4,2022_1,2022_2,2022_3,2022_4,2023_1,2023_2,2023_3,2023_4,2024_1
0,t+1,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,-23.9,-19.8,...,-7.4,-26.2,-12.8,5.9,-18.4,22.4,-61.0,-8.3,-3.6,-29.5
1,t+2,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,-23.9,-19.3,...,-7.4,-26.2,-12.8,5.9,-18.4,22.4,-61.0,-8.3,-3.6,-29.5
2,t+3,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,-23.9,-19.3,...,-7.4,-26.2,-12.8,5.9,-18.4,22.4,-61.0,-8.3,-3.6,
3,t+4,14.6,37.4,16.0,20.8,0.5,-13.6,-24.1,-23.9,-19.3,...,-7.4,-33.2,-12.8,14.2,-18.4,15.9,-61.0,-8.3,-3.6,
4,t+5,19.5,44.5,25.0,24.4,-5.4,-20.4,-30.4,-23.4,-19.7,...,-7.4,-33.2,-12.8,14.2,-18.4,15.9,-61.0,-8.3,-3.6,
5,t+6,,,,,-5.4,-20.4,-30.4,-23.4,-19.7,...,-7.4,-33.2,-12.8,14.2,-18.4,15.9,-61.0,-8.3,,
6,t+7,,,,,-5.4,-20.4,-30.4,-23.4,-19.7,...,-3.0,-33.2,-10.1,14.2,-17.7,15.9,-61.0,-8.3,,
7,t+8,,,,,-5.4,-20.4,-30.4,-23.4,-19.7,...,-3.0,-33.2,-10.1,14.2,-17.7,15.9,-61.0,-8.3,,
8,t+9,,,,,-5.4,-20.4,-30.4,-23.4,-19.7,...,-3.0,-33.2,-10.1,14.2,-17.7,15.9,-61.0,,,
9,t+10,,,,,-5.4,-20.4,-30.4,-23.4,-19.7,...,-3.1,-26.8,-10.1,17.4,-17.7,15.9,-61.0,,,


<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 20px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#steps-5" style="color: #006769; text-decoration: none;">⮝</a>
    </span> 
    <a href="#steps-6" style="color: #006769; text-decoration: none;">Back to steps.</a>
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="7">
   <!-- Contenido de la celda de destino -->
</div>

<h1><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">7.</span> <span style = "color: dark; font-family: PT Serif Pro Book;">Create vintages and releases datasets</span></h1>

<div id="steps-7">
   <!-- Contenido de la celda de destino -->
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <a href="#st-7-1" style="text-decoration: none; color: #006769"> <span style="font-size: 24px; color: rgb(0, 65, 75);">&#10122;</span> Create vintages</a>
    <br>
    <a href="#st-7-2" style="text-decoration: none; color: #006769"><span style="font-size: 24px; color: rgb(0, 65, 75)">&#10123;</span> Clean-up vintages dataframe </a>
    <br>
    <a href="#st-7-3" style="text-decoration: none; color: #006769"><span style="font-size: 24px; color: rgb(0, 65, 75)">&#10124;</span> Create releases </a>
    <br>
</div>

<div id="st-7-1">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10122;</span> <span>Create vintages</span>
  </div>

<div style="text-align: left;">
    <span style="font-size: 24px; color: rgb(255, 32, 78); font-weight: bold;">&#9888;</span>
    <span style="font-family: PT Serif Pro Book; color: black; font-size: 16px;">
        Change by <code>benchmark_growth_rates_df</code> to analyze on benchmark revisions.
    </span>
</div>

In [143]:
vintages = create_vintages(input_growth_rates_df)
#vintages = create_vintages(input_dummies_df) # To generate vintages with dummy values instead growth rates
#vintages = create_vintages(benchmark_growth_rates_df)
#vintages.head(10)

<div id="st-7-2">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10123;</span> <span>Clean-up vintages dataframe</span>
  </div>

In [144]:
# Call the appropriate function based on frequency and sector
if frequency == 'monthly':
    vintages_df = process_monthly(vintages)
elif frequency == 'quarterly':
    vintages_df = process_quarterly(vintages)
elif frequency == 'annual':
    vintages_df = process_annual(vintages)

TypeError: arg must be a list, tuple, 1-d array, or Series

In [None]:
vintages_df.head(30)

<div id="st-7-3">
   <!-- Contenido de la celda de destino -->
</div>

<div style="text-align: left; font-family: 'PT Serif Pro Book'; font-size:22px">
    <span style="font-size: 24px; color: rgb(0, 65, 75)">&#10124;</span> <span>Create releases</span>
  </div>

In [145]:
releases_df = create_releases(vintages_df, sector)
releases_df.head(30)

aux,vintages_date,fishing_release_1,fishing_release_2,fishing_release_3,fishing_release_4,fishing_release_5,fishing_release_6,fishing_release_7,fishing_release_8,fishing_release_9,...,fishing_release_32,fishing_release_33,fishing_release_34,fishing_release_35,fishing_release_36,fishing_release_37,fishing_release_38,fishing_release_39,fishing_release_40,fishing_most_recent
0,1994-03-01,11.6,11.6,11.6,11.6,11.6,,,,,...,,,,,,,,,,11.6
1,1994-06-01,27.5,27.5,27.5,27.5,27.5,,,,,...,,,,,,,,,,27.5
2,1994-09-01,4.8,4.8,4.8,4.8,4.8,,,,,...,,,,,,,,,,4.8
3,1994-12-01,7.2,7.2,7.2,7.2,7.2,,,,,...,,,,,,,,,,7.2
4,1995-03-01,10.6,10.6,10.6,10.6,10.6,10.7,10.7,10.7,10.7,...,,,,,,,,,,10.7
5,1995-06-01,7.7,7.7,7.7,7.7,7.7,7.8,7.8,7.8,7.8,...,,,,,,,,,,7.8
6,1995-09-01,8.7,8.8,8.8,8.8,8.8,8.8,8.8,8.8,8.8,...,,,,,,,,,,8.8
7,1995-12-01,2.7,2.7,2.7,2.7,2.7,2.8,2.8,2.8,2.8,...,,,,,,,,,,2.8
8,1996-03-01,0.8,-0.1,-0.1,-0.1,-0.1,1.4,1.4,1.4,1.4,...,,,,,,,,,,1.2
9,1996-06-01,8.0,8.2,8.2,8.2,8.2,8.1,8.1,8.1,8.1,...,,,,,,,,,,7.8


<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 20px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#steps-6" style="color: #006769; text-decoration: none;">⮝</a>
    </span> 
    <a href="#steps-7" style="color: #006769; text-decoration: none;">Back to steps.</a>
</div>

#### Check column type

In [104]:
print(releases_df[f'{sector}_release_1'].dtype)

float64


In [105]:
print(releases_df['vintages_date'].dtype)

datetime64[ns]


<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>

<div id="8">
   <!-- Contenido de la celda de destino -->
</div>

<h2><span style = "color: rgb(0, 65, 75); font-family: charter;">8.</span>
    <span style = "color: dark; font-family: charter;">
    Loading to SQL
    </span>
    </h2>

In [106]:
horizon_df.to_sql(f'{sector}_{frequency}_growth_rates_horizon', engine, index=False, if_exists='replace')
#base_year_df.to_sql(f'{sector}_{frequency}_growth_rates_base_year', engine, index=False, if_exists='replace')
#filtered_h_df.to_sql(f'{sector}_{frequency}_h_benchmark', engine, index=False, if_exists='replace')
vintages_df.to_sql(f'{sector}_{frequency}_vintages', engine, index=False, if_exists='replace')
releases_df.to_sql(f'{sector}_{frequency}_releases', engine, index=False, if_exists='replace')

121

Loading to SQL releases of irregular calendar of revision dummies

<div style="text-align: left;">
    <span style="font-size: 24px; color: rgb(255, 32, 78); font-weight: bold;">&#9888;</span>
    <span style="font-family: PT Serif Pro Book; color: black; font-size: 16px;">
        Run code below only if ypu want to generate releases in terms of dummies for irregular calendar of revisions
    </span>
</div>

In [107]:
#releases_df.to_sql(f'{sector}_{frequency}_releases_dummies', engine, index=False, if_exists='replace')

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 20px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#step-1-1" style="color: rgb(255, 32, 78); text-decoration: none;">⮝</a>
    </span> 
    <a href="#step-1-1" style="color: rgb(255, 32, 78); text-decoration: none;">Back to <b>select economic sector</b>.</a>
</div>

<div style="font-family: PT Serif Pro Book; text-align: left; color: dark; font-size: 16px;">
    <span style="font-size: 30px; color: rgb(255, 32, 78); font-weight: bold;">
        <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">&#11180;</a>
    </span> 
    <a href="#outilne" style="color: rgb(0, 153, 123); text-decoration: none;">Back to the outline.</a>
</div>