# New GDP Real-Time Dataset

> **Author:** Jason Cruz  
  **Last updated:** 11/13/2025  
  **Python version:** 3.12  
  **Project:** Rationality and Nowcasting on Peruvian GDP Revisions  

---

## üìå Summary
Welcome to the **Peruvian GDP Real-Time Dataset (RTD)** construction notebook! This notebook will guide you through the **step-by-step process** of creating your own RTD using GDP revisions from the **Central Reserve Bank of Peru** (BCRP). Whether you are a researcher, policymaker, or analyst, this notebook helps you construct real-time data of monthly GDP growth for Peru, starting from scratch.

### What will this notebook help you achieve?
1. **Downloading PDFs** from the BCRP Weekly Reports (WR).
2. **Generating PDF inputs** by shortening them to focus on key pages containing GDP growth rate tables.
3. **Cleaning-up extracted data** to ensure it's usable and building RTD.
4. **Concatenating RTD** from different years and frequencies (monthly, quarterly, annual).
5. **Updating metadata** for storing base years changes and other revisions-based information.
6. **Converting RTD** to releases dataset for econometric analysis.

üåê **Main Data Source:** [BCRP Weekly Report](https://www.bcrp.gob.pe/publicaciones/nota-semanal.html) (üì∞ WR, from here on)  
For any questions or issues, feel free to reach out via email: [Jason üì®](mailto:jj.cruza@up.edu.pe)

---

### ‚öôÔ∏è Initial Set-up

Before preprocessing the new GDP releases data, we need to perform some initial set-up steps:

1. üß∞ **Import helper functions** from `gdp_rtd_pipeline.py` that are required for this notebook.
2. üõ¢Ô∏è **Connect to the PostgreSQL database** that will contain GDP revisions datasets. _(This step is pending: direct access will be provided via ODBC or other methods, allowing users to connect from any software or programming language.)_
3. üìÇ **Create necessary folders** to store inputs, outputs, logs, and screenshots.


> üöß Although the second step (database connection) is pending, the notebook currently works using **flat files (CSV)**. These CSV files will **not be saved in GitHub** as they are included in the `.gitignore` to ensure no data is stored publicly. Users can be confident that no data will be stored on GitHub. The notebook **automatically generates the CSV files**, giving users direct access to the dataset on their own systems. The data is created on the fly and can be saved locally for further use.

### üß∞ Import helper functions

This notebook relies on a set of helper functions found in the script `gdp_rtd_pipeline.py`. These functions will be used throughout the notebook, so please ensure you have them ready by running the line of code below.

In [1]:
from gdp_rtd_pipeline import *

pygame 2.5.2 (SDL 2.28.3, Python 3.12.1)
Hello from the pygame community. https://www.pygame.org/contribute.html


> üõ†Ô∏è **Libraries:** Before you begin, please ensure that you have the required libraries installed and imported. See all the libraries you need section by section in `gdp_rtd_pipeline.py`.

In [2]:
#!pip install os # Comment this code with "#" if you have already installed this library.

**Check out Python information**

In [3]:
import sys
import platform

print("üêç Python Information")
print(f"  Version  : {sys.version.split()[0]}")
print(f"  Compiler : {platform.python_compiler()}")
print(f"  Build    : {platform.python_build()}")
print(f"  OS       : {platform.system()} {platform.release()}")

üêç Python Information
  Version  : 3.12.1
  Compiler : MSC v.1916 64 bit (AMD64)
  Build    : ('main', 'Jan 19 2024 15:44:08')
  OS       : Windows 10


### üìÇ Create necessary folders

We will start by creating the necessary folders to store the data at various stages of processing. The following code ensures all required directories exist, and if not, it creates them.

In [6]:
from pathlib import Path  # Importing Path module from pathlib to handle file and directory paths in a cross-platform way.

# Get current working directory
PROJECT_ROOT = Path.cwd()  # Get the current working directory where the notebook is being executed.

# User input for folder location
user_input = input("Enter relative path (default='.'): ").strip() or "."  # Prompt user to input the folder path or use the default value "."
target_path = (PROJECT_ROOT / user_input).resolve()  # Combine the project root directory with user input to get the full target path.

# Create the necessary directories if they don't already exist
target_path.mkdir(parents=True, exist_ok=True)  # Creates the target folder and any necessary parent directories.
print(f"Using path: {target_path}")  # Print out the path being used for confirmation.

# Define paths for saving data and PDFs
pdf_folder = 'new_weekly_reports'  # This folder will store the new Weekly Reports (post-2013), which are in PDF format.
raw_pdf_subfolder = os.path.join(pdf_folder, 'raw')  # Subfolder for saving the raw PDFs exactly as downloaded from the BCRP website.
input_pdf_subfolder = os.path.join(pdf_folder, 'input')  # Subfolder for saving reduced PDFs that contain only the selected pages with GDP growth tables.

data_folder = 'data'  # Main folder for storing all data files.
input_data_subfolder = os.path.join(data_folder, 'input')  # Folder for storing preprocessed data throughout all periods (NEW+OLD data).
output_data_subfolder = os.path.join(data_folder, 'output')  # Folder for storing final RTD datasets and releases after processing.

# Create all folders if they don't exist yet
for folder in [pdf_folder, raw_pdf_subfolder, input_pdf_subfolder, data_folder, input_data_subfolder, output_data_subfolder]:
    os.makedirs(folder, exist_ok=True)  # Create each folder in the list if it doesn't already exist.
    print(f"üìÇ {folder} created")  # Print confirmation for each folder created.

# Additional folders for metadata, records, and alert tracking
metadata_folder = 'metadata'  # Folder for storing metadata files like wr_metadata.csv.
record_folder = 'record'  # Folder for storing .txt files that track the files already processed to avoid reprocessing them.
alert_track_folder = 'alert_track'  # Folder for saving download notifications and alerts.

# Create additional required folders
for folder in [metadata_folder, pdf_folder, input_pdf_subfolder, record_folder]:
    os.makedirs(folder, exist_ok=True)  # Create the additional folders if they don't exist.
    print(f"üìÇ {folder} created")  # Print confirmation for each of these additional folders.


Enter relative path (default='.'):  .


Using path: C:\Users\Jason Cruz\OneDrive\Documentos\RA\CIUP\GDP Revisions\GitHub\peru_gdp_revisions\gdp_revisions_datasets
üìÇ new_wr created
üìÇ new_wr\raw created
üìÇ new_wr\input created
üìÇ data created
üìÇ data\input created
üìÇ data\output created
üìÇ metadata created
üìÇ new_wr created
üìÇ new_wr\input created
üìÇ record created


---

## 1. Downloading PDFs

---

The **BCRP Weekly Report** is our primary source of data collection for constructing the Peruvian GDP Real-Time Dataset (RTD). This report, published weekly by the **Central Reserve Bank of Peru (BCRP)**, is an official document that contains critical macroeconomic statistics, including GDP growth rates.

The two main tables we focus on in this project are:
- **Table 1:** Monthly GDP growth rates (real GDP, 12-month percentage changes)
- **Table 2:** Quarterly/Annual GDP growth rates (real GDP, 12-month percentage changes)

This section automates the process of downloading the **BCRP Weekly Report PDFs** directly from the official BCRP website, ensuring that we can collect the most up-to-date data for our analysis.

---

### üõ†Ô∏è What the Scraper Bot Does:

1. **Opens the official BCRP Weekly Report page** at [this link](https://www.bcrp.gob.pe/publicaciones/nota-semanal.html).
2. **Finds and collects all PDF links** for the reports.
3. **Downloads the PDFs** in chronological order (from newest to oldest).
4. Optionally, plays a **notification sound** after every batch of downloads.
5. **Organizes** the downloaded PDFs into year-based folders.

---

#### ‚ö†Ô∏è Important Notes:

- **CAPTCHA Handling**: If a CAPTCHA appears during the download process, you'll need to manually solve it in the browser window and then **re-run the Scraper Bot**. The Scraper Bot cannot bypass CAPTCHA verification.
  
- **Automatic WebDriver Management**: This script uses `webdriver-manager` to automatically handle browser drivers (by default, it uses Chrome). **No need to manually download ChromeDriver or GeckoDriver**. If you wish to use a different browser, you can modify the `browser` parameter in the `init_driver()` function.
  
- **Custom Notification Sound**: If you'd like to receive notifications when each batch of downloads finishes, you can place your own MP3 file in the `alert_track` folder. We provide a warning track (in .mp3 format on GitHub). However, here are some free sources of .mp3 files so you can choose the ones you prefer:
  - [Pixabay Audio](https://pixabay.com/music/) üéµ
  - [FreeSound](https://freesound.org/) üé∂
  - [FreePD](https://freepd.com/) üéº

---

### üì• Scraper Bot for BCRP Weekly Reports

In [None]:
# Run the function to start the scraper bot
pdf_downloader(
    bcrp_url = "https://www.bcrp.gob.pe/publicaciones/nota-semanal.html",  # URL of the BCRP Weekly Report
    raw_pdf_folder = raw_pdf_subfolder,  # Folder to save the raw downloaded PDFs
    download_record_folder = record_folder,  # Folder to store download logs
    download_record_txt = '1_downloaded_pdfs.txt',  # Record of downloaded PDFs
    alert_track_folder = alert_track_folder,  # Folder for MP3 alert sound
    max_downloads = 60,  # Maximum number of PDFs to download
    downloads_per_batch = 6,  # Number of PDFs to download per batch
    headless = False  # Run in browser window (set to True for headless mode)
)

### üóÇÔ∏è Organize Downloaded PDFs

After downloading the PDFs, it is essential to organize them into year-based folders to keep everything structured. This will help in later stages of data extraction and cleaning.

Run the following code to organize the downloaded PDFs. It'll happen in the blink of an eye.

In [None]:
# Get the list of files in the directory
files = os.listdir(raw_pdf_subfolder)

# Call the function to organize files by year
organize_files_by_year(raw_pdf_subfolder)

### üîß Handling Defective PDFs

Occasionally, you may encounter defective PDFs (e.g., corrupted files, incomplete downloads, etc.). In such cases, you can replace the defective PDFs with new, valid ones. The following function allows you to replace defective PDFs.

üîÑ Replace Defective PDFs:

Use this function to replace any defective PDFs that were downloaded. Just specify the year, the defective PDF name, and the new PDF that you want to use as a replacement.

In [None]:
# Replace specific defective PDFs (friendly outputs with icons)
replace_defective_pdfs(
    items=[
        ("2017", "ns-08-2017.pdf", "ns-07-2017"), # Replace a defective PDF in 2017 folder
        ("2019", "ns-23-2019.pdf", "ns-22-2019"), # Replace a defective PDF in 2019 folder
    ],
    root_folder=input_pdf_subfolder,  # Base folder containing year-based folders
    record_folder=record_folder,  # Folder where downloaded PDF logs are stored
    download_record_txt = '1_downloaded_pdfs.txt',  # Log of downloaded PDFs
    quarantine=os.path.join(input_pdf_subfolder, "_quarantine")  # Folder to store defective PDFs (set to None to delete them)
)

> ‚ö° **Troubleshooting Tip:** If you encounter any issues during the data cleansing step (section 3), and suspect that the problem lies with defective PDFs, you can replace those PDFs using the above function. This will help avoid errors in the following sections. In case you encounter a problem with any particular defective PDF, you can also download alternative versions of the Weekly Reports for the same month, and replace the faulty ones as needed.

> üöÄ **Next Steps**: With the PDFs downloaded, organized, and ready for use, we can move on to the data cleaning and extraction steps. This will be covered in the next section of the notebook. 

## 2. Generating PDF Inputs

Now that we have successfully downloaded the **BCRP Weekly Reports (WR)**, it is important to note that each PDF file contains over 100 pages. However, not all pages are relevant to this project.

For this analysis, we only need a **few key pages** from each WR:
- **Table 1**: Monthly real GDP growth (12-month percentage changes)
- **Table 2**: Annual and quarterly real GDP growth

The goal of this section is to **trim the PDFs**, retaining just the necessary pages for analysis: the key tables and the cover page that provides the publication date and serial number for identification.

The following steps will guide you through the process of generating these trimmed PDF files.

---

### üõ†Ô∏è What This Step Does:

1. **Extracts key pages** from each WR, focusing on the pages that contain **Table 1** and **Table 2**.
2. **Retains the cover page** that provides metadata, such as publication date and serial number.
3. **Creates new PDFs** containing only the relevant pages, ensuring efficiency by reducing file sizes.
4. Organizes these **trimmed PDFs** into year-based subfolders for easy access.

---

_quarentine will be discard of the input PDF generator

In [None]:
# Run the function to generate trimmed PDFs for input
pdf_input_generator(
    raw_pdf_folder = raw_pdf_subfolder,
    input_pdf_folder = input_pdf_subfolder,
    input_pdf_record_folder = record_folder,
    input_pdf_record_txt = '2_generated_input_pdfs.txt',
    keywords = ["ECONOMIC SECTORS"]
)

Again, probably the WR (PDF files, now of few pages) were stored in disorder in the `input_pdf_folder` folder. The following code sorts the PDFs into subfolders (years) by placing each WR (which now includes only the key tables) according to the year of its publication. This happens in the **"blink of an eye"**.  

In [None]:
# Get the list of files in the directory
files = os.listdir(input_pdf_subfolder)

# Call the function to organize files
organize_files_by_year(input_pdf_subfolder)

## 3. Data cleaning

<div style="font-family: PT Serif Pro Book; text-align: left; color:dark; font-size:16px">
<p>     
Since we already have the PDFs <span style="font-size: 24px;">&#128462;</span> with just the tables required for this project, we can start extracting them. Then we can proceed with data cleaning.
</p>  
<div/>

### 3.2 Extracting tables and data cleanup

<div style="font-family: PT Serif Pro Book; text-align: left; color:dark; font-size:16px">
<p>     
The main library used for extracting tables from PDFs <span style="font-size: 24px;">&#128462;</span> is <code>pdfplumber</code>. You can review the official documentation by clicking <a href="https://github.com/jsvine/pdfplumber" style="color: rgb(0, 153, 123); font-size: 16px;">here</a>.
</p>
    
<p>     
    The functions in <b>Section 3</b> of the <code>"new_gdp_datasets_functions.py"</code> script were built to deal with each of these issues. An interesting exercise is to compare the original tables (the ones in the PDF <span style="font-size: 24px;">&#128462;</span>) and the cleaned tables (by the cleanup codes below). Thus, the cleanup codes for <a href="#3-2-1" style="color: rgb(0, 153, 123); font-size: 16px;">Table 1</a> and <a href="#3-2-1" style="color: rgb(0, 153, 123); font-size: 16px;">Table 2</a> generates two dictionaries, the first one stores the raw tables; that is, the original tables from the PDF <span style="font-size: 24px;">&#128462;</span> extracted by the <code>pdfplumber</code> library, while the second dictionary stores the fully cleaned tables.
</p>
<div/>

<div style="font-family: PT Serif Pro Book; text-align: left; color:dark; font-size:16px">
    The code iterates through each PDF <span style="font-size: 24px;">&#128462;</span> and extracts the two required tables from each. The extracted information is then transformed into dataframes and the columns and values are cleaned up to conform to Python conventions (pythonic).
    <div/>

<h3><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">3.2.1.</span>
    <span style = "color: dark; font-family: PT Serif Pro Book;">
    <span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;">Table 1.</span> Extraction and cleaning of data from tables on monthly real GDP growth rates.
    </span>
    </h3>

<div style="font-family: PT Serif Pro Book; text-align: left; color:dark; font-size:16px">
<p>     
The basic criterion to start extracting tables is to use keywords (sufficient condition). I mean, tables containing the following keywords meet the requirements to be extracted.
</p>
<div/>

Si por alguna raz√≥n ejecutas el c√≥digo de la secci√≥n 3 y no continuas ejecutando la secci√≥n subsecuente, puedes estar tranquilo de que un registro los guard√≥. La pr√≥xima vez que visite este script basta con empezar desde esta secci√≥n 3 (eliminando el txt) para generar los dataframes que no se guardaron en ningun lado, estos son insumos esenciales para la secci√≥n 4. Alternativamente puede guardar todos los dataframes generados en una carpeta como respaldo y empezar desde la secci√≥n 4 carg√°ndolos.


If you want the runners to *also* write the cleaned dicts out to a single combined Parquet/CSV per table (alongside the per-WR files), I can add that as an optional flag (`persist_combined=True`) without changing the defaults.


# If you will run until this section and you are planning to go back and retake from section 4, enter "True"

# Table 1 data into *row-based* vintage format

In [None]:
# Define base folder for saving vintages data (.csv)
data_folder = 'data'

# Define subfolder for saving 
input_data_subfolder = os.path.join(data_folder, 'input')

# Define subfolder for saving 
output_data_subfolder = os.path.join(data_folder, 'output')

# Create all required folders (if they do not already exist) and confirm creation
for folder in [data_folder, input_data_subfolder, output_data_subfolder]:
    os.makedirs(folder, exist_ok=True)
    print(f"üìÇ {folder} created")

In [None]:
raw_1, clean_1, vintages_1 = new_table_1_cleaner(
    input_pdf_folder = input_pdf_subfolder,
    record_folder = record_folder,
    record_txt = 'new_created_rtd_tab_1.txt',
    persist = True,
    persist_folder = input_data_subfolder,
    pipeline_version = "s3.0.0",
)


In [None]:
raw_1.keys()

In [None]:
clean_1.keys()

In [None]:
vintages_1.keys()

In [None]:
raw_1['ns_11_2024_1']

In [None]:
clean_1['ns_11_2024_1']

In [None]:
vintages_1['ns_11_2024_1']

# Checking the cleaning version out

In [None]:
df100 = vintages_1["ns_04_2022_1"]
print(df100.attrs)
# {'pipeline_version': 's3.0.0'}


In [None]:
vintages_1["ns_04_2022_1"].attrs

# Table 2 data into *row-based* vintage format

In [None]:
raw_2, clean_2, vintages_2 = new_table_2_cleaner(
    input_pdf_folder = input_pdf_subfolder,
    record_folder = record_folder,
    record_txt = 'new_created_rtd_tab_2.txt',
    persist = True,
    persist_folder = input_data_subfolder,
    pipeline_version = "s3.0.0",
)


In [None]:
raw_2['ns_04_2022_2']

In [None]:
clean_2['ns_04_2022_2']

In [None]:
vintages_2['ns_04_2022_2']

In [None]:
df200 = vintages_2["ns_04_2022_2"]
print(df200.attrs)
# {'pipeline_version': 's3.0.0'}


In [None]:
vintages_2["ns_04_2022_1"].attrs

## 4. Concatenating RTD across years by frequency

**Connect to the PostgreSQL database**

The following function will establish a connection to the `gdp_revisions_datasets` database in `PostgreSQL`. The **input data** used in this jupyter notebook will be loaded from this `PostgreSQL` database, and similarly, all **output data** generated by this jupyter notebook will be stored in that database. Ensure that you set the necessary parameters to access the server once you have obtained the required permissions.

> üí° **Tip:** To request permissions, please email [Jason üì®](mailto:jj.cruza@alum.up.edu.pe)  
> ‚ö†Ô∏è **Warning:** Make sure you have set your SQL credentials as environment variables before proceeding.  

In [None]:
from sqlalchemy import create_engine
import os

In [None]:
def create_sqlalchemy_engine(database="gdp_revisions_datasets", port=5432):
    """
    Create an SQLAlchemy engine to connect to the PostgreSQL database.
    
    Environment Variables Required:
        CIUP_SQL_USER: SQL username
        CIUP_SQL_PASS: SQL password
        CIUP_SQL_HOST: SQL host address

    Args:
        database (str): Name of the database. Default is 'gdp_revisions_datasets'.
        port (int): Port number. Default is 5432.

    Returns:
        engine (sqlalchemy.engine.Engine): SQLAlchemy engine object.
    
    Raises:
        ValueError: If required environment variables are missing.

    Example:
        engine = create_sqlalchemy_engine()
    """
    user = os.environ.get('CIUP_SQL_USER')
    password = os.environ.get('CIUP_SQL_PASS')
    host = os.environ.get('CIUP_SQL_HOST')

    if not all([host, user, password]):
        raise ValueError("‚ùå Missing environment variables: CIUP_SQL_HOST, CIUP_SQL_USER, CIUP_SQL_PASS")

    connection_string = f"postgresql://{user}:{password}@{host}:{port}/{database}"
    engine = create_engine(connection_string)

    print(f"üîó Connected to PostgreSQL database: {database} at {host}:{port}")
    return engine

In [None]:
engine = create_sqlalchemy_engine()

In [None]:
concatenated_1 = concatenate_table_1(
    input_data_subfolder=input_data_subfolder,
    record_folder=record_folder,
    record_txt="4_concatenated_rtd_tab_1.txt",
    persist=True,
    persist_folder=output_data_subfolder,
    csv_file_label="monthly_gdp_rtd.csv",   # your custom name
)

In [None]:
concatenated_1.keys()

In [None]:
concatenated_1.head(10)

In [None]:
concatenated_2 = concatenate_table_2(
    input_data_subfolder=input_data_subfolder,
    record_folder=record_folder,
    record_txt="4_concatenated_rtd_tab_2.txt",
    persist=True,
    persist_folder=output_data_subfolder,
    csv_file_label="quarterly_annual_gdp_rtd.csv",  # your custom name
)

In [None]:
concatenated_2.head(10)

## 5. Metadata

### Revision Calendar

In [None]:
# Define base folder for saving all digital PDFs
metadata_folder = 'metadata'

# Define base folder for saving all digital PDFs
pdf_folder = 'pdf'

# Define subfolder for saving reduced PDFs containing only selected pages with GDP growth tables (monthly, quarterly, and annual frequencies)
input_pdf_subfolder = os.path.join(pdf_folder, 'input')

# Define folder for saving .txt files with download and dataframe record
record_folder = 'record'

# Create all required folders (if they do not already exist) and confirm creation
for folder in [metadata_folder, pdf_folder, input_pdf_subfolder, record_folder]:
    os.makedirs(folder, exist_ok=True)
    print(f"üìÇ {folder} created")

In [None]:
# Define the base_year_list for mapping base years (modify or extend this list as needed)
base_year_list = [
    {"year": 1994, "wr": 1, "base_year": 1990},
    {"year": 2000, "wr": 28, "base_year": 1994},
    {"year": 2014, "wr": 11, "base_year": 2007},
    {"year": 2022, "wr": 20, "base_year": 2019},
    # Add more mappings if needed
]

In [None]:
# Call the function to update the metadata
updated_df = update_metadata(
    metadata_folder = metadata_folder,
    input_pdf_folder = input_pdf_subfolder,
    record_folder = record_folder,
    record_txt = "wr_metadata.txt",
    wr_metadata_csv = "wr_metadata.csv",
    base_year_list = base_year_list
)

In [None]:
updated_df.iloc[-30:]   # last 5 rows

In [None]:
print(updated_df["benchmark_revision"].dtype)

In [None]:
print(updated_df["base_year"].dtype)

### 5.1 Generating adjusted RTDs by removing revisions affected by base years (based on metadata)

# Drop base year

In [None]:
base_year_list_2 = [
    "2000m7",   # 1990 -> 1994
    "2014m3",   # 1994 -> 2007
]

In [None]:
# Process both monthly and quarterly GDP files and save them with new names
adjusted_rtd = apply_base_year_sentinel(
    base_year_list=base_year_list_2,
    sentinel=-999999.0,
    output_data_subfolder=output_data_subfolder,
    csv_file_labels=["monthly_gdp_rtd.csv", "quarterly_annual_gdp_rtd.csv"]
)

In [None]:
# Access the processed data (adjusted CSV files)
adjusted_monthly_rtd = adjusted_rtd["by_adjusted_monthly_gdp_rtd.csv"]
adjusted_quarterly_rtd = adjusted_rtd["by_adjusted_quarterly_annual_gdp_rtd.csv"]

### 5.2 Generating benchmark RTD for revisions affected by benchmarking procedures (based on metadata)

# Bench

In [None]:

csv_file_labels = [
    "monthly_gdp_rtd",
    "quarterly_annual_gdp_rtd",
    "by_adjusted_monthly_gdp_rtd",
    "by_adjusted_quarterly_annual_gdp_rtd"
]
benchmark_dataset_csv = [
    "monthly_gdp_benchmark",
    "quarterly_annual_gdp_benchmark",
    "by_adjusted_monthly_gdp_benchmark",
    "by_adjusted_quarterly_annual_gdp_benchmark"
]
record_txt = "_converted_to_benchmark.txt"

In [None]:
wr_metadata_csv = "wr_metadata.csv"

In [None]:
processed_datasets = convert_to_benchmark_dataset(
    output_data_subfolder=output_data_subfolder,
    csv_file_labels=csv_file_labels,
    metadata_folder=metadata_folder,
    wr_metadata_csv=wr_metadata_csv,
    record_folder=record_folder,
    record_txt=record_txt,
    benchmark_dataset_csv=benchmark_dataset_csv
)


In [None]:
# Acceder a los resultados procesados
processed_datasets.keys()

In [None]:
processed_datasets['monthly_gdp_benchmark']

## 6. Releases

In [None]:
csv_file_labels = [
    "monthly_gdp_rtd",
    "quarterly_annual_gdp_rtd",
    "by_adjusted_monthly_gdp_rtd",
    "by_adjusted_quarterly_annual_gdp_rtd",
    "monthly_gdp_benchmark",
    "quarterly_annual_gdp_benchmark",
    "by_adjusted_monthly_gdp_benchmark",
    "by_adjusted_quarterly_annual_gdp_benchmark"
]
releases_dataset_csv = [
    "monthly_gdp_releases",
    "quarterly_annual_gdp_releases",
    "by_adjusted_monthly_gdp_releases",
    "by_adjusted_quarterly_annual_gdp_releases",
    "monthly_gdp_benchmark_releases",
    "quarterly_annual_gdp_benchmark_releases",
    "by_adjusted_monthly_gdp_benchmark_releases",
    "by_adjusted_quarterly_annual_gdp_benchmark_releases"
]
record_txt = "5_converted_to_releases.txt"

In [None]:
# Run the conversion function
releases_df = convert_to_releases_dataset(
    output_data_subfolder=output_data_subfolder,
    csv_file_labels=csv_file_labels,
    record_folder=record_folder,
    record_txt=record_txt,
    releases_dataset_csv=releases_dataset_csv
)

In [None]:
# Displaying the converted releases dataset for "monthly_gdp_releases"
releases_df["by_adjusted_monthly_gdp_releases"]
