# Old GDP Real-Time Dataset

**Author:** Jason Cruz  
**Last updated:** 08/13/2025  
**Python version:** 3.12  
**Project:** Rationality and Nowcasting on Peruvian GDP Revisions  

---
## üìå Summary
This notebook documents the step-by-step **construction of real-time datasets** for **Peruvian GDP revisions** since 2013‚ÄìPRESENT. It covers:

1. **Downloading PDFs** (actually Weekly Reports (WR)) from the Central Reserve Bank of Peru's website.
2. **Generating PDF inputs** by shorten them in order to keep key pages containing required tables where GDP growth rates are in.
3. **Cleaning-up data** extracted from input PDFs.
4. **Concatenating real-time datasets across years by frequency** 
5. **Storing RTD to SQL** for availability to users upon request and further analysis.

üåê **Main Data Source:** [BCRP Weekly Report](https://www.bcrp.gob.pe/publicaciones/nota-semanal.html) (üì∞ WR, from here on)  
Any questions or issues regarding the coding, please email [Jason üì®](mailto:jj.cruza@up.edu.pe)  

---

## üõ†Ô∏è Libraries

If you don't have the libraries below, please use the following code (as example) to install the required libraries.

In [None]:
#!pip install os # Comment this code with "#" if you have already installed this library.

Check out Python information

In [1]:
import sys
import platform

print("üêç Python Information")
print(f"  Version  : {sys.version.split()[0]}")
print(f"  Compiler : {platform.python_compiler()}")
print(f"  Build    : {platform.python_build()}")
print(f"  OS       : {platform.system()} {platform.release()}")

üêç Python Information
  Version  : 3.12.1
  Compiler : MSC v.1916 64 bit (AMD64)
  Build    : ('main', 'Jan 19 2024 15:44:08')
  OS       : Windows 10


**Import helper functions**

> ‚ö†Ô∏è Please, check the script `new_gdp_datasets_functions.py` which contains all the functions required by this _jupyter notebook_. The functions there are ordered according to the sections of this jupyter notebok.

In [2]:
from gdp_rtd_pipeline import *

pygame 2.5.2 (SDL 2.28.3, Python 3.12.1)
Hello from the pygame community. https://www.pygame.org/contribute.html


## ‚öôÔ∏è Initial set-up

Before preprocessing new GDP releases data, we will:

* **Create necessary folders** for storing inputs, outputs, logs, and screenshots.
* **Connect to the PostgreSQL database** containing GDP revisions datasets.
* **Import helper functions** from `new_gdp_datasets_functions.py`.

**Create necessary folders**

In [3]:
from pathlib import Path

PROJECT_ROOT = Path.cwd()
user_input = input("Enter relative path (default='.'): ").strip() or "."
target_path = (PROJECT_ROOT / user_input).resolve()
target_path.mkdir(parents=True, exist_ok=True)
print(f"üìÇ Using path: {target_path}")

Enter relative path (default='.'):  .


üìÇ Using path: C:\Users\Jason Cruz\OneDrive\Documentos\RA\CIUP\GDP Revisions\GitHub\peru_gdp_revisions\gdp_revisions_datasets


<h1><span style = "color: rgb(0, 65, 75); font-family: PT Serif Pro Book;; color: dark;">2.</span> <span style = "color: dark; font-family: PT Serif Pro Book;">Data cleaning</span></h1>

In [5]:
# Define base folder for saving all digital PDFs
pdf_folder = 'pdf'

# Define subfolder for saving the original PDFs as downloaded from the BCRP website
raw_pdf_subfolder = os.path.join(pdf_folder, 'raw')

# Define subfolder for saving reduced PDFs containing only selected pages with GDP growth tables (monthly, quarterly, and annual frequencies)
input_pdf_subfolder = os.path.join(pdf_folder, 'input')

# Define folder for saving .txt files with download and dataframe record
record_folder = 'record'

# Define folder for saving warning bells. This is for download notifications (see section 1).
alert_track_folder = 'alert_track'

# Create all required folders (if they do not already exist) and confirm creation
for folder in [pdf_folder, raw_pdf_subfolder, input_pdf_subfolder, record_folder, alert_track_folder]:
    os.makedirs(folder, exist_ok=True)
    print(f"üìÇ {folder} created")

üìÇ pdf created
üìÇ pdf\raw created
üìÇ pdf\input created
üìÇ record created
üìÇ alert_track created


In [None]:
# Define base folder for saving vintages data (.csv)
old_wr_folder = 'old_wr'

# Define subfolder for saving 
old_wr_subfolder_1 = os.path.join(old_wr_folder, 'table_1')

# Define subfolder for saving 
old_wr_subfolder_2 = os.path.join(old_wr_folder, 'table_2')

# Define base folder for saving vintages data (.csv)
data_folder = 'data'

# Define subfolder for saving 
input_data_subfolder = os.path.join(data_folder, 'input')

# Define subfolder for saving 
output_data_subfolder = os.path.join(data_folder, 'output')

# Create all required folders (if they do not already exist) and confirm creation
for folder in [old_wr_folder, old_wr_subfolder_1, old_wr_subfolder_2, data_folder, input_data_subfolder, output_data_subfolder]:
    os.makedirs(folder, exist_ok=True)
    print(f"üìÇ {folder} created")

In [None]:
old_raw_1, old_clean_1, old_vintages_1 = old_table_1_cleaner(
    input_csv_folder = old_wr_subfolder_1,
    record_folder = record_folder,
    record_txt = 'old_created_rtd_tab_1.txt',
    persist = True,
    persist_folder = input_data_subfolder,
    pipeline_version = "s3.0.0",
    sep = ";",
)

In [None]:
old_raw_1.keys()

In [None]:
old_clean_1.keys()

In [None]:
old_vintages_1.keys()

In [None]:
old_raw_1['ns_51_1995_1']

In [None]:
old_clean_1['ns_51_1995_1']

In [None]:
old_vintages_1['ns_13_1995_1']

# Checking the cleaning version out

In [None]:
df100 = old_vintages_1["ns_04_1995_1"]
print(df100.attrs)
# {'pipeline_version': 's3.0.0'}


In [None]:
old_vintages_1["ns_04_1995_1"].attrs

### 2.1.1. Table 1. Extraction and cleaning of data from tables on monthly real GDP growth rates.

In [None]:
old_raw_2, old_clean_2, old_vintages_2 = old_table_2_cleaner(
    input_csv_folder = old_wr_subfolder_2,
    record_folder = record_folder,
    record_txt = 'old_created_rtd_tab_2.txt',
    persist = True,
    persist_folder = input_data_subfolder,
    pipeline_version = "s3.0.0",
    sep = ";",
)

In [None]:
old_raw_2.keys()

In [None]:
old_clean_2.keys()

In [None]:
old_vintages_2.keys()

In [None]:
old_raw_2["ns_20_2012_2"]

In [None]:
old_clean_2["ns_20_2012_2"]

Aqui _d indica el mes, y simplemente se mapea en el orden en que aparece en su carpeta 

In [None]:
old_vintages_2["ns_20_2012_2"]