# 🧪 EDA + Migración CSV → Firestore

Notebook listo para **Google Colab**. Sube `movies.csv` y tu **Service Account JSON**.
Este flujo cubre:
1) EDA rápido
2) Migración de CSV a Firestore (colección `movies`)
3) Lectura desde Firestore a `DataFrame`


In [12]:

import pandas as pd
import json
from google.cloud import firestore
from google.oauth2 import service_account

CSV_PATH = "movies.csv"
SERVICE_ACCOUNT_JSON = "proyectomovies.json"
PROJECT_ID = None  # Pon tu project_id si no viene en el JSON

# Carga credenciales
with open(SERVICE_ACCOUNT_JSON, "r") as f:
    creds_dict = json.load(f)
credentials = service_account.Credentials.from_service_account_info(creds_dict)
project_id = PROJECT_ID or creds_dict["project_id"]

# Cliente de Firestore
db = firestore.Client(credentials=credentials, project=project_id)

# EDA rápido
#  cambio clave: encoding="latin-1"
df = pd.read_csv(CSV_PATH, encoding="latin-1")
display(df.head())
print("Filas, Columnas:", df.shape)
print("Columnas:", list(df.columns))



Unnamed: 0,budget,company,country,director,genre,gross,name,rating,released,runtime,score,star,votes,writer,year
0,8000000.0,Columbia Pictures Corporation,USA,Rob Reiner,Adventure,52287414.0,Stand by Me,R,1986-08-22,89,8.1,Wil Wheaton,299174,Stephen King,1986
1,6000000.0,Paramount Pictures,USA,John Hughes,Comedy,70136369.0,Ferris Bueller's Day Off,PG-13,1986-06-11,103,7.8,Matthew Broderick,264740,John Hughes,1986
2,15000000.0,Paramount Pictures,USA,Tony Scott,Action,179800601.0,Top Gun,PG,1986-05-16,110,6.9,Tom Cruise,236909,Jim Cash,1986
3,18500000.0,Twentieth Century Fox Film Corporation,USA,James Cameron,Action,85160248.0,Aliens,R,1986-07-18,137,8.4,Sigourney Weaver,540152,James Cameron,1986
4,9000000.0,Walt Disney Pictures,USA,Randal Kleiser,Adventure,18564613.0,Flight of the Navigator,PG,1986-08-01,90,6.9,Joey Cramer,36636,Mark H. Baker,1986


Filas, Columnas: (6820, 15)
Columnas: ['budget', 'company', 'country', 'director', 'genre', 'gross', 'name', 'rating', 'released', 'runtime', 'score', 'star', 'votes', 'writer', 'year']


In [13]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [14]:

# MIGRACIÓN CSV -> Firestore (colección "movies")
# Usa add() para IDs auto-generados. Campos esperados: name, director, genre, company
# Si tus columnas tienen otro nombre, ajusta el mapeo.

import math

collection = db.collection("movies")
batch_size = 500
batch = db.batch()
count = 0

for _, row in df.iterrows():
    doc = {
        "name": row.get("name"),
        "director": row.get("director"),
        "genre": row.get("genre"),
        "company": row.get("company"),
    }
    ref = collection.document()  # ID autogenerado
    batch.set(ref, doc)
    count += 1
    if count % batch_size == 0:
        batch.commit()
        batch = db.batch()
# commit final
batch.commit()
print(f"Migrados {count} documentos a la colección 'movies'.")


Migrados 6820 documentos a la colección 'movies'.


In [15]:

# Lectura de Firestore -> DataFrame
docs = db.collection("movies").stream()
rows = []
for d in docs:
    data = d.to_dict() or {}
    data["doc_id"] = d.id
    rows.append(data)
movies_df = pd.DataFrame(rows)
display(movies_df.head(20))
print("Total documentos:", movies_df.shape[0])


Unnamed: 0,name,company,director,genre,doc_id
0,Maggie,Lionsgate,Henry Hobson,Drama,00YwDjXCMtClsen3kizS
1,Solarbabies,Brooksfilms,Alan Johnson,Action,01hmOxlwAqRLEDm9ErLS
2,The Bone Collector,Columbia Pictures Corporation,Phillip Noyce,Crime,01pH48ssMKjxiPvvm86B
3,The Seeker: The Dark Is Rising,Twentieth Century Fox Film Corporation,David L. Cunningham,Adventure,01pbaDrgLqV6xNJMjdqJ
4,The Shallows,Columbia Pictures,Jaume Collet-Serra,Drama,01rcVhqhc0qmrbAZYrvx
5,Vida salvaje,C.O.R.E. Feature Animation,Steve 'Spaz' Williams,Animation,02m3wNcPZnumDE0XrDHU
6,Zoolander 2,Panorama Films,Ben Stiller,Comedy,02tXZGaoaWXtCOLFVzYS
7,Seraphim Falls,Icon Productions,David Von Ancken,Action,03aFov131h9ulkzs4IsV
8,Kids in the Hall: Brain Candy,Paramount Pictures,Kelly Makin,Comedy,03l6kwi5z9xogRIBBAQY
9,American Gangster,Universal Pictures,Ridley Scott,Biography,03nfyekiy7myRrFIZT56


Total documentos: 6820
