# üì¶ 13_load_playlist_items_manual_static_to_bigquery

## üéØ Objetivo

Cargar la tabla estructural playlist_items_manual_static a BigQuery.

No es hist√≥rica. No contiene snapshot_date.  
Se reescribe completamente en cada ejecuci√≥n del pipeline.  
Act√∫a como tabla puente entre playlists y videos.

Destino: youtube-datasets-360.angelgarciadatablog.playlist_items_manual_static

In [1]:
from dotenv import load_dotenv
import os
from google.cloud import bigquery

In [2]:
load_dotenv()

PROJECT_ID = os.getenv("GCP_PROJECT")
DATASET_ID = "angelgarciadatablog"
TABLE_ID = "playlist_items_manual_static"

FULL_TABLE_ID = f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"

client = bigquery.Client(project=PROJECT_ID)

print("Destino configurado:", FULL_TABLE_ID)


Destino configurado: youtube-datasets-360.angelgarciadatablog.playlist_items_manual_static


## üß± Cargar snapshot desde Parquet (temporal)  

‚ö†Ô∏è Nota temporal:
Durante la fase de notebooks, el DataFrame se carga desde Parquet como mecanismo de intercambio entre notebooks.
En la versi√≥n productiva (scripts .py), el DataFrame se pasar√° directamente sin almacenamiento intermedio.

In [3]:
import pandas as pd
from pathlib import Path

PROJECT_ROOT = Path.cwd().parents[0]
PROCESSED_PATH = PROJECT_ROOT / "data" / "processed" / "youtube"

df_playlist_items_manual_static = pd.read_parquet(
    PROCESSED_PATH / "playlist_items_manual_static.parquet"
)

df_playlist_items_manual_static.head()


Unnamed: 0,playlist_id,video_id,position,added_at,extracted_at
0,PLV4oS06_KpqbsY_I8iR4HRvb6w3vXUBIM,7bwkNrRpgw0,0,2026-01-23 01:51:06+00:00,2026-02-14 22:47:41.027713+00:00
1,PLV4oS06_KpqbsY_I8iR4HRvb6w3vXUBIM,HDyKUodeuNw,1,2026-01-23 01:37:27+00:00,2026-02-14 22:47:41.027713+00:00
2,PLV4oS06_KpqZGwOHo-tsdIiaZts7qaqql,Zj6uiqMvFOU,0,2026-01-17 15:02:37+00:00,2026-02-14 22:47:41.027713+00:00
3,PLV4oS06_KpqZGwOHo-tsdIiaZts7qaqql,RiYjYfMTGvw,1,2026-01-11 18:05:55+00:00,2026-02-14 22:47:41.027713+00:00
4,PLV4oS06_KpqZGwOHo-tsdIiaZts7qaqql,0VmI47XeOuE,2,2026-01-11 18:05:33+00:00,2026-02-14 22:47:41.027713+00:00


In [4]:
df_playlist_items_manual_static.dtypes

playlist_id                     str
video_id                        str
position                      int64
added_at        datetime64[us, UTC]
extracted_at    datetime64[us, UTC]
dtype: object

## üèó Crear tabla particionada con el esquema y datos del dataframe 


In [5]:
from google.api_core.exceptions import NotFound
from google.cloud.bigquery import SchemaField

schema = [
    SchemaField("playlist_id", "STRING"),
    SchemaField("video_id", "STRING"),
    SchemaField("position", "INT64"),
    SchemaField("added_at", "TIMESTAMP"),
    SchemaField("extracted_at", "TIMESTAMP"),
]

try:
    client.get_table(FULL_TABLE_ID)
    print("Tabla ya existe.")
    
except NotFound:
    table = bigquery.Table(FULL_TABLE_ID, schema=schema)
    client.create_table(table)
    print("Tabla creada.")



Tabla creada.


## üìå Cargar datos del parquet a big query

In [None]:
# 2Ô∏è‚É£ Carga los datos desde tu DataFrame hacia BigQuery. WRITE TRUNCATE = sobreescribe los datos
job_config = bigquery.LoadJobConfig(
    write_disposition="WRITE_TRUNCATE"
)

job = client.load_table_from_dataframe(
    df_playlist_items_manual_static,
    FULL_TABLE_ID,
    job_config=job_config
)

job.result()

print("Tabla playlist_items_manual_static reemplazada correctamente.")





Tabla playlist_items_manual_static reemplazada correctamente.
