## CELDA 1: Configuraci√≥n de Rutas

In [1]:
import sys
import os

# Agregar rutas del proyecto
sys.path.append(os.path.abspath(".."))  # ra√≠z del proyecto
sys.path.append(os.path.abspath("../src"))  # carpeta src

# Verificar rutas
print("Rutas configuradas:")
for path in sys.path[-3:]:
    print(f"  - {path}")

Rutas configuradas:
  - /home/arturoallen/proyecto_52_sistemas/lib/python3.12/site-packages
  - /home/arturoallen/proyecto_52_sistemas/proyecto_sistemas
  - /home/arturoallen/proyecto_52_sistemas/proyecto_sistemas/src


## CELDA 2: Descargar Recursos de NLTK

In [2]:
import nltk

print("Descargando recursos de NLTK...")
nltk.download('punkt_tab', quiet=True)
nltk.download('punkt', quiet=True)
nltk.download('stopwords', quiet=True)
print("‚úì NLTK configurado")

Descargando recursos de NLTK...
‚úì NLTK configurado


## CELDA 3: Inicializar Apache Spark

In [3]:
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("ProyectoFinal_MapReduce") \
    .master("local[2]") \
    .config("spark.driver.memory", "4g") \
    .config("spark.executor.memory", "4g") \
    .config("spark.sql.shuffle.partitions", "4") \
    .getOrCreate()

sc = spark.sparkContext
sc.setLogLevel("WARN")

print("‚úì Spark inicializado")
print(f"  Version: {spark.version}")
print(f"  Master: {sc.master}")

Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/12/10 22:48:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


‚úì Spark inicializado
  Version: 4.0.1
  Master: local[2]


## CELDA 4: Importar Funciones de Utilidad

In [4]:
from src.utils import (
    read_txt,
    strip_gutenberg_headers,
    preprocess_text,
    load_all_books
)

print("‚úì Utilidades importadas")

‚úì Utilidades importadas


## CELDA 5 Cargar 100 Libros

In [6]:
data_dir = "../data"
books = load_all_books(data_dir, max_books=100)

print(f"\n Resumen de carga:")
print(f"   Total de libros: {len(books)}")
if books:
    ejemplo = books[0]
    print(f"   Ejemplo - ID: {ejemplo[0]}, Tokens: {len(ejemplo[3])}")


üìö Cargando 100 libros desde ../data/
  ‚úì Procesados 10/100 libros
  ‚úì Procesados 20/100 libros
  ‚úì Procesados 30/100 libros
  ‚úì Procesados 40/100 libros
  ‚úì Procesados 50/100 libros
  ‚úì Procesados 60/100 libros
  ‚úì Procesados 70/100 libros
  ‚úì Procesados 80/100 libros
  ‚úì Procesados 90/100 libros
  ‚úì Procesados 100/100 libros
‚úÖ Total de libros cargados exitosamente: 100


üìä Resumen de carga:
   Total de libros: 100
   Ejemplo - ID: A Christmas Carol - Charles Dickens, Tokens: 13396


## CELDA 6: Crear RDD de Libros

In [7]:
print("\n" + "="*80)
print("CREANDO RDD DE SPARK")
print("="*80)

# Crear RDD con estructura: (book_id, title, tokens)
books_rdd = sc.parallelize([
    (book[0], book[1], book[3])  # (id, filename, tokens)
    for book in books
])

print(f"‚úì RDD creado con {books_rdd.count()} documentos")
print(f"  Particiones: {books_rdd.getNumPartitions()}")



üî• CREANDO RDD DE SPARK


25/12/10 22:50:10 WARN TaskSetManager: Stage 0 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
[Stage 0:>                                                          (0 + 2) / 2]

‚úì RDD creado con 100 documentos
  Particiones: 2


                                                                                

## CELDA 7: CALCULAR TF (Term Frequency) CON MAPREDUCE


In [8]:
print("\n" + "="*80)
print("CALCULAR TF (TERM FREQUENCY) CON MAPREDUCE")
print("="*80)
print("\nF√≥rmula: TF(d,t) = frecuencia de t√©rmino t en documento d\n")

# MAP: Para cada documento, emitir pares ((doc_id, palabra), 1)
print("üîπ MAP: Emitiendo pares ((doc_id, palabra), 1)...")

tf_pairs = books_rdd.flatMap(
    lambda x: [((x[0], word), 1) for word in x[2]]  # ((book_id, word), 1)
)

print(f"  ‚úì {tf_pairs.count():,} pares emitidos")

# REDUCE: Sumar frecuencias por (doc_id, palabra)
print("\nüîπ REDUCE: Sumando frecuencias por (documento, palabra)...")

tf_counts = tf_pairs.reduceByKey(lambda a, b: a + b)

print(f"  ‚úì {tf_counts.count():,} t√©rminos √∫nicos procesados")




CALCULAR TF (TERM FREQUENCY) CON MAPREDUCE

F√≥rmula: TF(d,t) = frecuencia de t√©rmino t en documento d

üîπ MAP: Emitiendo pares ((doc_id, palabra), 1)...


25/12/10 22:50:12 WARN TaskSetManager: Stage 1 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
                                                                                

  ‚úì 6,760,238 pares emitidos

üîπ REDUCE: Sumando frecuencias por (documento, palabra)...


25/12/10 22:50:15 WARN TaskSetManager: Stage 2 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.

  ‚úì 812,971 t√©rminos √∫nicos procesados


                                                                                

## CELDA 8: CALCULAR DF (Document Frequency) CON MAPEO REDUCCION

In [9]:
print("\n" + "="*80)
print("CALCULAR DF (DOCUMENT FREQUENCY) CON MAPREDUCE")
print("="*80)
print("\nF√≥rmula: DF(t) = n√∫mero de documentos que contienen el t√©rmino t\n")

# MAP: Emitir (palabra, 1) para cada aparici√≥n √∫nica en un documento
print("üîπ MAP: Emitiendo pares (palabra, 1)...")

df_pairs = tf_counts.map(
    lambda x: (x[0][1], 1)  # (word, 1) - solo la palabra
)

# REDUCE: Contar en cu√°ntos documentos aparece cada palabra
print("\nüîπ REDUCE: Contando documentos por palabra...")

df_counts = df_pairs.reduceByKey(lambda a, b: a + b)

print(f"  ‚úì {df_counts.count():,} palabras √∫nicas en el vocabulario")

'''
# Verificaci√≥n
ejemplo_df = df_counts.take(5)
print("\nüîç Ejemplos de DF:")
for word, count in ejemplo_df:
    print(f"  Palabra '{word}' ‚Üí DF = {count} documentos")
'''



üìä PASO 2: CALCULAR DF (DOCUMENT FREQUENCY) CON MAPREDUCE

F√≥rmula: DF(t) = n√∫mero de documentos que contienen el t√©rmino t

üîπ MAP: Emitiendo pares (palabra, 1)...

üîπ REDUCE: Contando documentos por palabra...


[Stage 5:>                                                          (0 + 2) / 2]

  ‚úì 116,707 palabras √∫nicas en el vocabulario


                                                                                

'\n# Verificaci√≥n\nejemplo_df = df_counts.take(5)\nprint("\nüîç Ejemplos de DF:")\nfor word, count in ejemplo_df:\n    print(f"  Palabra \'{word}\' ‚Üí DF = {count} documentos")\n'

## CELDA 9: CALCULAR IDF CON MAP

In [10]:
print("\n" + "="*80)
print("CALCULAR IDF (INVERSE DOCUMENT FREQUENCY) CON MAP")
print("="*80)
print("\nF√≥rmula: IDF(t) = log(N / DF(t))")
print(f"  donde N = {books_rdd.count()} (total de documentos)\n")

import math

N = books_rdd.count()

# MAP: Calcular IDF para cada palabra
print("üîπ MAP: Calculando IDF = log(N / DF) para cada palabra...")

idf_values = df_counts.map(
    lambda x: (x[0], math.log(N / x[1]))  # (word, IDF)
)

print(f"  ‚úì IDF calculado para {idf_values.count():,} palabras")
'''
# Verificaci√≥n
ejemplo_idf = idf_values.take(5)
print("\n Ejemplos de IDF:")
for word, idf in ejemplo_idf:
    df = N / math.exp(idf)  # Recuperar DF
    print(f"  '{word}': DF={int(df)} docs ‚Üí IDF = log({N}/{int(df)}) = {idf:.4f}")
'''


25/12/10 22:50:26 WARN TaskSetManager: Stage 7 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.



üìä PASO 3: CALCULAR IDF (INVERSE DOCUMENT FREQUENCY) CON MAP

F√≥rmula: IDF(t) = log(N / DF(t))


                                                                                

  donde N = 100 (total de documentos)



25/12/10 22:50:27 WARN TaskSetManager: Stage 8 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
                                                                                

üîπ MAP: Calculando IDF = log(N / DF) para cada palabra...
  ‚úì IDF calculado para 116,707 palabras


'\n# Verificaci√≥n\nejemplo_idf = idf_values.take(5)\nprint("\nüîç Ejemplos de IDF:")\nfor word, idf in ejemplo_idf:\n    df = N / math.exp(idf)  # Recuperar DF\n    print(f"  \'{word}\': DF={int(df)} docs ‚Üí IDF = log({N}/{int(df)}) = {idf:.4f}")\n'

## CELDA 10: CALCULAR TF-IDF CON JOIN + MAP

In [11]:
print("\n" + "="*80)
print("CALCULAR TF-IDF = TF √ó IDF CON JOIN Y MAP")
print("="*80)
print("\nF√≥rmula: TF-IDF(d,t) = TF(d,t) √ó IDF(t)\n")

# Preparar datos para JOIN
# TF: ((doc_id, word), count) ‚Üí (word, (doc_id, count))
print("üîπ Preparando datos para JOIN...")

tf_for_join = tf_counts.map(
    lambda x: (x[0][1], (x[0][0], x[1]))  # (word, (doc_id, tf_count))
)

# IDF: (word, idf) - ya est√° en formato correcto

# JOIN: Combinar TF con IDF por palabra
print("\nüîπ JOIN: Combinando TF con IDF...")

tf_idf_joined = tf_for_join.join(idf_values)
# Resultado: (word, ((doc_id, tf_count), idf))

print(f"  ‚úì {tf_idf_joined.count():,} combinaciones TF-IDF")

# MAP: Calcular TF-IDF = TF √ó IDF
print("\nüîπ MAP: Multiplicando TF √ó IDF...")

tfidf_scores = tf_idf_joined.map(
    lambda x: ((x[1][0][0], x[0]), x[1][0][1] * x[1][1])
    # ((doc_id, word), tf * idf)
)

print(f"  ‚úì {tfidf_scores.count():,} valores TF-IDF calculados")

'''
# Verificaci√≥n detallada
print("\nüîç Verificaci√≥n de c√°lculo TF-IDF:")
print("="*80)

ejemplo_libro = books[0][0]  # Primer libro
ejemplos_completos = tfidf_scores.filter(
    lambda x: x[0][0] == ejemplo_libro
).take(5)
'''


CALCULAR TF-IDF = TF √ó IDF CON JOIN Y MAP

F√≥rmula: TF-IDF(d,t) = TF(d,t) √ó IDF(t)

üîπ Preparando datos para JOIN...

üîπ JOIN: Combinando TF con IDF...


                                                                                

  ‚úì 812,971 combinaciones TF-IDF

üîπ MAP: Multiplicando TF √ó IDF...


                                                                                

  ‚úì 812,971 valores TF-IDF calculados


'\n# Verificaci√≥n detallada\nprint("\nüîç Verificaci√≥n de c√°lculo TF-IDF:")\nprint("="*80)\n\nejemplo_libro = books[0][0]  # Primer libro\nejemplos_completos = tfidf_scores.filter(\n    lambda x: x[0][0] == ejemplo_libro\n).take(5)\n'

## CELDA 11: NORMALIZAR TF-IDF CON MAPREDUCE

In [12]:
print("\n" + "="*80)
print("NORMALIZAR VECTORES TF-IDF CON MAPREDUCE")
print("="*80)
print("\nF√≥rmula: TF-IDF_norm(d,t) = TF-IDF(d,t) / ||TF-IDF(d)||")
print("  donde ||v|| = ‚àö(v‚ÇÅ¬≤ + v‚ÇÇ¬≤ + ... + v‚Çô¬≤)\n")

# MAP: Calcular cuadrado de cada TF-IDF ‚Üí (doc_id, tfidf¬≤)
print("üîπ MAP: Calculando cuadrados de TF-IDF...")

squared_tfidf = tfidf_scores.map(
    lambda x: (x[0][0], x[1] ** 2)  # (doc_id, tfidf¬≤)
)

# REDUCE: Sumar cuadrados por documento ‚Üí norma¬≤
print("\nüîπ REDUCE: Sumando cuadrados por documento...")

norms_squared = squared_tfidf.reduceByKey(lambda a, b: a + b)

print(f"  ‚úì Normas calculadas para {norms_squared.count()} documentos")

# MAP: Calcular ra√≠z cuadrada ‚Üí norma
print("\nüîπ MAP: Calculando ra√≠ces cuadradas (normas)...")

norms = norms_squared.map(
    lambda x: (x[0], math.sqrt(x[1]))  # (doc_id, ||TF-IDF||)
)

'''
# Verificaci√≥n de normas
ejemplo_normas = norms.take(3)
print("\nüîç Ejemplos de normas:")
for doc_id, norm in ejemplo_normas:
    print(f"  Documento '{doc_id}' ‚Üí ||TF-IDF|| = {norm:.2f}")
'''
# JOIN: Combinar TF-IDF con su norma
print("\nüîπ JOIN: Combinando TF-IDF con normas...")

tfidf_with_norm = tfidf_scores.map(
    lambda x: (x[0][0], (x[0][1], x[1]))  # (doc_id, (word, tfidf))
).join(norms)
# Resultado: (doc_id, ((word, tfidf), norm))

# MAP: Normalizar dividiendo por norma
print("\nüîπ MAP: Normalizando (dividiendo por norma)...")

tfidf_normalized = tfidf_with_norm.map(
    lambda x: ((x[0], x[1][0][0]), x[1][0][1] / x[1][1])
    # ((doc_id, word), tfidf_normalized)
)

print(f"  ‚úì {tfidf_normalized.count():,} valores normalizados")
print("\nTF-IDF normalizado completado")



NORMALIZAR VECTORES TF-IDF CON MAPREDUCE

F√≥rmula: TF-IDF_norm(d,t) = TF-IDF(d,t) / ||TF-IDF(d)||
  donde ||v|| = ‚àö(v‚ÇÅ¬≤ + v‚ÇÇ¬≤ + ... + v‚Çô¬≤)

üîπ MAP: Calculando cuadrados de TF-IDF...

üîπ REDUCE: Sumando cuadrados por documento...


                                                                                

  ‚úì Normas calculadas para 100 documentos

üîπ MAP: Calculando ra√≠ces cuadradas (normas)...

üîπ JOIN: Combinando TF-IDF con normas...

üîπ MAP: Normalizando (dividiendo por norma)...




  ‚úì 812,971 valores normalizados

‚úÖ TF-IDF normalizado completado


                                                                                

##  CELDA 12: CREAR VOCABULARIO Y MAPEOS

In [13]:
print("\n" + "="*80)
print("CREANDO VOCABULARIO Y MAPEOS")
print("="*80)

# Obtener vocabulario √∫nico
print("\nüîπ Extrayendo vocabulario √∫nico...")

vocab_rdd = tfidf_normalized.map(lambda x: x[0][1]).distinct().sortBy(lambda x: x)
vocabulary = vocab_rdd.collect()

print(f"  ‚úì Vocabulario: {len(vocabulary)} palabras √∫nicas")

# Crear mapeos palabra ‚Üî √≠ndice
word_to_idx = {word: idx for idx, word in enumerate(vocabulary)}
idx_to_word = {idx: word for word, idx in word_to_idx.items()}
'''
print(f"  ‚úì Mapeos creados")
print(f"\nüîç Top 10 palabras del vocabulario:")
for i, word in enumerate(vocabulary[:10], 1):
    print(f"  {i}. '{word}'")
'''
# Crear mapeo de IDs de libros
book_ids = books_rdd.map(lambda x: x[0]).collect()
book_id_to_idx = {bid: idx for idx, bid in enumerate(book_ids)}
idx_to_book_id = {idx: bid for bid, idx in book_id_to_idx.items()}

print(f"\nüìñ Total de libros: {len(book_ids)}")



üìö CREANDO VOCABULARIO Y MAPEOS

üîπ Extrayendo vocabulario √∫nico...


25/12/10 22:50:46 WARN TaskSetManager: Stage 53 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


  ‚úì Vocabulario: 116707 palabras √∫nicas

üìñ Total de libros: 100


                                                                                

## CELDA 13: CALCULAR SIMILITUD COSENO CON MAPREDUCE

In [14]:
print("\n" + "="*80)
print(" CALCULAR SIMILITUD COSENO CON MAPREDUCE")
print("="*80)
print("\nF√≥rmula: sim(d‚ÇÅ, d‚ÇÇ) = Œ£(TF-IDF‚ÇÅ(t) √ó TF-IDF‚ÇÇ(t))")
print("  (suma de productos de TF-IDF normalizados)\n")

# Convertir a √≠ndices num√©ricos
print("üîπ Convirtiendo a √≠ndices num√©ricos...")

tfidf_indexed = tfidf_normalized.map(
    lambda x: ((book_id_to_idx[x[0][0]], word_to_idx[x[0][1]]), x[1])
    # ((doc_idx, word_idx), tfidf_norm)
)

# Crear pares de documentos para producto cartesiano
print("\nüîπ Preparando para producto cartesiano...")

# Formato: (word_idx, (doc_idx, tfidf))
tfidf_by_word = tfidf_indexed.map(
    lambda x: (x[0][1], (x[0][0], x[1]))  # (word_idx, (doc_idx, tfidf))
)

print("\nüîπ JOIN: Auto-uniendo por palabra (producto cartesiano)...")

# Self-join: combinar documentos que comparten palabras
word_pairs = tfidf_by_word.join(tfidf_by_word)
# Resultado: (word_idx, ((doc1_idx, tfidf1), (doc2_idx, tfidf2)))

print(f"  ‚úì {word_pairs.count():,} pares de (documento, palabra) generados")

# MAP: Calcular producto de TF-IDFs
# IMPORTANTE: Filtrar duplicados y auto-similitudes
print("\nüîπ MAP: Calculando productos de TF-IDF (eliminando duplicados)...")

similarity_contributions = word_pairs.filter(
    # Solo tomar pares donde doc1_idx < doc2_idx (evita duplicados y auto-similitud)
    lambda x: x[1][0][0] < x[1][1][0]
).map(
    lambda x: ((x[1][0][0], x[1][1][0]), x[1][0][1] * x[1][1][1])
    # ((doc1_idx, doc2_idx), tfidf1 * tfidf2)
)

print(f"  ‚úì {similarity_contributions.count():,} contribuciones √∫nicas calculadas")

# REDUCE: Sumar contribuciones por par de documentos
print("\nüîπ REDUCE: Sumando similitudes por par de documentos...")

similarity_scores = similarity_contributions.reduceByKey(lambda a, b: a + b)

print(f"  ‚úì {similarity_scores.count():,} similitudes calculadas")

'''
# Verificaci√≥n
print("\nüîç Top 5 pares m√°s similares:")
top_similar = similarity_scores.takeOrdered(5, key=lambda x: -x[1])
for (idx1, idx2), sim in top_similar:
    bid1 = idx_to_book_id[idx1]
    bid2 = idx_to_book_id[idx2]
    if bid1 != bid2:  # Excluir auto-similitud
        print(f"  '{bid1}' ‚Üî '{bid2}' ‚Üí similitud = {sim:.4f}")
'''
print("\n Matriz de similitud calculada con MapReduce")



üéØ CALCULAR SIMILITUD COSENO CON MAPREDUCE

F√≥rmula: sim(d‚ÇÅ, d‚ÇÇ) = Œ£(TF-IDF‚ÇÅ(t) √ó TF-IDF‚ÇÇ(t))
  (suma de productos de TF-IDF normalizados)

üîπ Convirtiendo a √≠ndices num√©ricos...

üîπ Preparando para producto cartesiano...

üîπ JOIN: Auto-uniendo por palabra (producto cartesiano)...


                                                                                

  ‚úì 29,774,579 pares de (documento, palabra) generados

üîπ MAP: Calculando productos de TF-IDF (eliminando duplicados)...


                                                                                

  ‚úì 14,480,804 contribuciones √∫nicas calculadas

üîπ REDUCE: Sumando similitudes por par de documentos...




  ‚úì 4,946 similitudes calculadas

‚úÖ Matriz de similitud calculada con MapReduce


                                                                                

## CELDA 14: FUNCI√ìN DE RECOMENDACI√ìN

In [15]:
print("\n" + "="*80)
print("FUNCI√ìN DE RECOMENDACI√ìN")
print("="*80)

# Recolectar similitudes como diccionario
print("\nüîπ Recolectando similitudes...")
similarity_dict = similarity_scores.collectAsMap()
print(f"  ‚úì {len(similarity_dict):,} similitudes almacenadas")

def recomendar_libros_mapreduce(libro_id, N=5):
    """
    Recomienda N libros similares usando las similitudes calculadas con MapReduce.
    """
    # Convertir ID a √≠ndice
    if libro_id not in book_id_to_idx:
        raise ValueError(f" Libro '{libro_id}' no encontrado")
    
    doc_idx = book_id_to_idx[libro_id]
    
    # Buscar todas las similitudes con este documento
    similitudes = []
    for (idx1, idx2), score in similarity_dict.items():
        # Como ahora solo guardamos idx1 < idx2, buscar en ambos lados
        if idx1 == doc_idx:
            similitudes.append((idx_to_book_id[idx2], score))
        elif idx2 == doc_idx:
            similitudes.append((idx_to_book_id[idx1], score))
    
    # Ordenar por similitud (de mayor a menor)
    similitudes.sort(key=lambda x: x[1], reverse=True)
    
    # Eliminar duplicados manteniendo el score m√°s alto
    seen = set()
    similitudes_unicas = []
    for book_id, score in similitudes:
        if book_id not in seen:
            seen.add(book_id)
            similitudes_unicas.append((book_id, score))
    
    return similitudes_unicas[:N]

print("\n‚úì Funci√≥n recomendar_libros_mapreduce() creada")



üé¨ CREANDO FUNCI√ìN DE RECOMENDACI√ìN

üîπ Recolectando similitudes...


                                                                                

  ‚úì 4,946 similitudes almacenadas

‚úì Funci√≥n recomendar_libros_mapreduce() creada


## CELDA 15: FUNCI√ìN DE PALABRAS IMPORTANTES

In [16]:
print("\n" + "="*80)
print(" CREANDO FUNCI√ìN DE PALABRAS IMPORTANTES")
print("="*80)

# Recolectar TF-IDF normalizado
print("\nüîπ Recolectando TF-IDF normalizado...")
tfidf_dict = tfidf_normalized.collectAsMap()
print(f"  ‚úì {len(tfidf_dict):,} valores TF-IDF almacenados")

def palabras_importantes_mapreduce(libro_id, M=10):
    """
    Encuentra las M palabras m√°s importantes de un libro.
    """
    if libro_id not in book_ids:
        raise ValueError(f" Libro '{libro_id}' no encontrado")
    
    # Filtrar palabras de este documento
    palabras = [
        (word, score)
        for (doc, word), score in tfidf_dict.items()
        if doc == libro_id
    ]
    
    # Ordenar por score
    palabras.sort(key=lambda x: x[1], reverse=True)
    
    return palabras[:M]

print("\n‚úì Funci√≥n palabras_importantes_mapreduce() creada")



üî§ CREANDO FUNCI√ìN DE PALABRAS IMPORTANTES

üîπ Recolectando TF-IDF normalizado...


                                                                                

  ‚úì 812,971 valores TF-IDF almacenados

‚úì Funci√≥n palabras_importantes_mapreduce() creada


## CELDA 16: CAT√ÅLOGO COMPLETO

In [17]:
print("\n" + "="*80)
print(" CAT√ÅLOGO COMPLETO DE LIBROS")
print("="*80)
print(f"\nTotal: {len(book_ids)} libros\n")

for i, book_id in enumerate(book_ids, 1):
    filename = books_rdd.filter(lambda x: x[0] == book_id).first()[1]
    print(f"{i:3}. [ID: {book_id}] {filename}")

print("\n Proyecto MapReduce completado")
print("="*80)



üìö CAT√ÅLOGO COMPLETO DE LIBROS

Total: 100 libros



25/12/10 22:51:29 WARN TaskSetManager: Stage 90 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


  1. [ID: A Christmas Carol - Charles Dickens] A Christmas Carol - Charles Dickens.txt


25/12/10 22:51:30 WARN TaskSetManager: Stage 91 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:51:30 WARN TaskSetManager: Stage 92 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


  2. [ID: A Dictionary of the Art of Printing - William Savage] A Dictionary of the Art of Printing - William Savage.txt
  3. [ID: A Doll's House - Henrik Ibsen] A Doll's House - Henrik Ibsen.txt


25/12/10 22:51:31 WARN TaskSetManager: Stage 93 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


  4. [ID: A Modest Proposal - Jonathan Swift] A Modest Proposal - Jonathan Swift.txt


25/12/10 22:51:31 WARN TaskSetManager: Stage 94 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:51:32 WARN TaskSetManager: Stage 95 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


  5. [ID: A Study in Scarlet - Arthur Conan Doyle] A Study in Scarlet - Arthur Conan Doyle.txt


25/12/10 22:51:33 WARN TaskSetManager: Stage 96 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


  6. [ID: A Tale of Two Cities - Charles Dickens] A Tale of Two Cities - Charles Dickens.txt


25/12/10 22:51:33 WARN TaskSetManager: Stage 97 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


  7. [ID: Adventures of Huckleberry Finn - Mark Twain] Adventures of Huckleberry Finn - Mark Twain.txt
  8. [ID: Alice's Adventures in Wonderland - Lewis Carroll] Alice's Adventures in Wonderland - Lewis Carroll.txt


25/12/10 22:51:34 WARN TaskSetManager: Stage 98 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


  9. [ID: Anna Karenina - Leo Tolstoy] Anna Karenina - Leo Tolstoy.txt


25/12/10 22:51:35 WARN TaskSetManager: Stage 99 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 10. [ID: Anne of Green Gables - L M Montgomery] Anne of Green Gables - L M Montgomery.txt


25/12/10 22:51:35 WARN TaskSetManager: Stage 100 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 11. [ID: Beowulf - An Anglo-Saxon Epic Poem] Beowulf - An Anglo-Saxon Epic Poem.txt


25/12/10 22:51:36 WARN TaskSetManager: Stage 101 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 12. [ID: Beyond Good and Evil - Friedrich Nietzsche] Beyond Good and Evil - Friedrich Nietzsche.txt


25/12/10 22:51:36 WARN TaskSetManager: Stage 102 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 13. [ID: Bleak House - Charles Dickens] Bleak House - Charles Dickens.txt


25/12/10 22:51:37 WARN TaskSetManager: Stage 103 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 14. [ID: Blue Trousers - Murasaki Shikibu] Blue Trousers - Murasaki Shikibu.txt


25/12/10 22:51:38 WARN TaskSetManager: Stage 104 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:51:38 WARN TaskSetManager: Stage 105 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 15. [ID: Chi Pei Ou Than - Shizhen Wang] Chi Pei Ou Than - Shizhen Wang.txt


                                                                                

 16. [ID: Complete Works of Shakespeare] Complete Works of Shakespeare.txt


25/12/10 22:51:39 WARN TaskSetManager: Stage 106 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:51:40 WARN TaskSetManager: Stage 107 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 17. [ID: Cranford - Elizabeth Cleghorn Gaskell] Cranford - Elizabeth Cleghorn Gaskell.txt


25/12/10 22:51:40 WARN TaskSetManager: Stage 108 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 18. [ID: Crime and Punishment - Fyodor Dostoyevsky] Crime and Punishment - Fyodor Dostoyevsky.txt


25/12/10 22:51:41 WARN TaskSetManager: Stage 109 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 19. [ID: Doctrina Christiana] Doctrina Christiana.txt


25/12/10 22:51:42 WARN TaskSetManager: Stage 110 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 20. [ID: Don Quixote - Miguel de Cervantes] Don Quixote - Miguel de Cervantes.txt
 21. [ID: Dr Jekyll and Mr Hyde - Robert Louis Stevenson] Dr Jekyll and Mr Hyde - Robert Louis Stevenson.txt


25/12/10 22:51:42 WARN TaskSetManager: Stage 111 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 22. [ID: Dracula - Bram Stoker] Dracula - Bram Stoker.txt


25/12/10 22:51:43 WARN TaskSetManager: Stage 112 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 23. [ID: Frankenstein - Mary Wollstonecraft Shelley] Frankenstein - Mary Wollstonecraft Shelley.txt


25/12/10 22:51:43 WARN TaskSetManager: Stage 113 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:51:44 WARN TaskSetManager: Stage 114 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 24. [ID: Golden Days for Boys and Girls Vol XII] Golden Days for Boys and Girls Vol XII.txt


25/12/10 22:51:45 WARN TaskSetManager: Stage 115 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 25. [ID: Great Expectations - Charles Dickens] Great Expectations - Charles Dickens.txt


25/12/10 22:51:45 WARN TaskSetManager: Stage 116 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 26. [ID: Grimms' Fairy Tales - Jacob and Wilhelm Grimm] Grimms' Fairy Tales - Jacob and Wilhelm Grimm.txt
 27. [ID: Gulliver's Travels - Jonathan Swift] Gulliver's Travels - Jonathan Swift.txt


25/12/10 22:51:46 WARN TaskSetManager: Stage 117 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 28. [ID: Harriet Martineau - Florence Fenwick Miller] Harriet Martineau - Florence Fenwick Miller.txt


25/12/10 22:51:47 WARN TaskSetManager: Stage 118 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 29. [ID: History of Tom Jones - Henry Fielding] History of Tom Jones - Henry Fielding.txt


25/12/10 22:51:47 WARN TaskSetManager: Stage 119 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 30. [ID: How to Observe Morals and Manners - Harriet Martineau] How to Observe Morals and Manners - Harriet Martineau.txt


25/12/10 22:51:48 WARN TaskSetManager: Stage 120 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:51:48 WARN TaskSetManager: Stage 121 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 31. [ID: Jane Eyre - Charlotte Bronte] Jane Eyre - Charlotte Bronte.txt


25/12/10 22:51:49 WARN TaskSetManager: Stage 122 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 32. [ID: LITTLE WOMEN] LITTLE WOMEN.txt


25/12/10 22:51:50 WARN TaskSetManager: Stage 123 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 33. [ID: Leibniz's New Essays Concerning Human Understanding - John Dewey] Leibniz's New Essays Concerning Human Understanding - John Dewey.txt
 34. [ID: Leng Yan Guan - Junqing Wang] Leng Yan Guan - Junqing Wang.txt


25/12/10 22:51:50 WARN TaskSetManager: Stage 124 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 35. [ID: Les Miserables - Victor Hugo] Les Miserables - Victor Hugo.txt


25/12/10 22:51:51 WARN TaskSetManager: Stage 125 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 36. [ID: Leviathan - Thomas Hobbes] Leviathan - Thomas Hobbes.txt


25/12/10 22:51:52 WARN TaskSetManager: Stage 126 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:51:52 WARN TaskSetManager: Stage 127 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 37. [ID: Little Women - Louisa May Alcott] Little Women - Louisa May Alcott.txt


25/12/10 22:51:53 WARN TaskSetManager: Stage 128 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 38. [ID: Marjorie at Seacote - Carolyn Wells] Marjorie at Seacote - Carolyn Wells.txt


25/12/10 22:51:53 WARN TaskSetManager: Stage 129 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 39. [ID: Meditations - Marcus Aurelius] Meditations - Marcus Aurelius.txt


25/12/10 22:51:54 WARN TaskSetManager: Stage 130 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 40. [ID: Metamorphosis - Franz Kafka] Metamorphosis - Franz Kafka.txt


25/12/10 22:51:55 WARN TaskSetManager: Stage 131 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 41. [ID: Middlemarch - George Eliot] Middlemarch - George Eliot.txt
 42. [ID: Moby Dick - Herman Melville] Moby Dick - Herman Melville.txt


25/12/10 22:51:55 WARN TaskSetManager: Stage 132 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 43. [ID: Moby Multiple Language Lists of Common Words - Grady Ward] Moby Multiple Language Lists of Common Words - Grady Ward.txt


25/12/10 22:51:56 WARN TaskSetManager: Stage 133 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 44. [ID: My Life Vol 1 - Richard Wagner] My Life Vol 1 - Richard Wagner.txt


25/12/10 22:51:57 WARN TaskSetManager: Stage 134 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 45. [ID: Oliver Twist - Charles Dickens] Oliver Twist - Charles Dickens.txt


25/12/10 22:51:57 WARN TaskSetManager: Stage 135 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 46. [ID: On Liberty - John Stuart Mill] On Liberty - John Stuart Mill.txt


25/12/10 22:51:58 WARN TaskSetManager: Stage 136 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:51:58 WARN TaskSetManager: Stage 137 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 47. [ID: Paradise Lost - John Milton] Paradise Lost - John Milton.txt
 48. [ID: Pride and Prejudice - Jane Austen] Pride and Prejudice - Jane Austen.txt


25/12/10 22:51:59 WARN TaskSetManager: Stage 138 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 49. [ID: Puvis de Chavannes - Francois Crastre] Puvis de Chavannes - Francois Crastre.txt


25/12/10 22:52:00 WARN TaskSetManager: Stage 139 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 50. [ID: Right Ho Jeeves - P G Wodehouse] Right Ho Jeeves - P G Wodehouse.txt


25/12/10 22:52:00 WARN TaskSetManager: Stage 140 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:01 WARN TaskSetManager: Stage 141 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 51. [ID: Romantic Castles and Palaces] Romantic Castles and Palaces.txt


25/12/10 22:52:01 WARN TaskSetManager: Stage 142 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:02 WARN TaskSetManager: Stage 143 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:02 WARN TaskSetManager: Stage 144 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 52. [ID: Romeo and Juliet - William Shakespeare] Romeo and Juliet - William Shakespeare.txt


25/12/10 22:52:03 WARN TaskSetManager: Stage 145 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 53. [ID: Second Treatise of Government - John Locke] Second Treatise of Government - John Locke.txt


25/12/10 22:52:04 WARN TaskSetManager: Stage 146 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:04 WARN TaskSetManager: Stage 147 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 54. [ID: Sense and Sensibility - Jane Austen] Sense and Sensibility - Jane Austen.txt


25/12/10 22:52:05 WARN TaskSetManager: Stage 148 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:05 WARN TaskSetManager: Stage 149 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 55. [ID: Shakespeare's Family - C C Stopes] Shakespeare's Family - C C Stopes.txt


25/12/10 22:52:06 WARN TaskSetManager: Stage 150 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:06 WARN TaskSetManager: Stage 151 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 56. [ID: Society in America Vol 1 - Harriet Martineau] Society in America Vol 1 - Harriet Martineau.txt


25/12/10 22:52:07 WARN TaskSetManager: Stage 152 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:08 WARN TaskSetManager: Stage 153 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 57. [ID: Tess of the d'Urbervilles - Thomas Hardy] Tess of the d'Urbervilles - Thomas Hardy.txt


25/12/10 22:52:08 WARN TaskSetManager: Stage 154 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:09 WARN TaskSetManager: Stage 155 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 58. [ID: The Adventures of Ferdinand Count Fathom - T Smollett] The Adventures of Ferdinand Count Fathom - T Smollett.txt


25/12/10 22:52:09 WARN TaskSetManager: Stage 156 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:10 WARN TaskSetManager: Stage 157 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:10 WARN TaskSetManager: Stage 158 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 59. [ID: The Adventures of Roderick Random - T Smollett] The Adventures of Roderick Random - T Smollett.txt


25/12/10 22:52:11 WARN TaskSetManager: Stage 159 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 60. [ID: The Adventures of Sherlock Holmes - Arthur Conan Doyle] The Adventures of Sherlock Holmes - Arthur Conan Doyle.txt


25/12/10 22:52:12 WARN TaskSetManager: Stage 160 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:12 WARN TaskSetManager: Stage 161 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 61. [ID: The Adventures of Tom Sawyer - Mark Twain] The Adventures of Tom Sawyer - Mark Twain.txt


25/12/10 22:52:13 WARN TaskSetManager: Stage 162 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:13 WARN TaskSetManager: Stage 163 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 62. [ID: The Blue Castle - L M Montgomery] The Blue Castle - L M Montgomery.txt


25/12/10 22:52:14 WARN TaskSetManager: Stage 164 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:14 WARN TaskSetManager: Stage 165 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 63. [ID: The Brothers Karamazov - Fyodor Dostoyevsky] The Brothers Karamazov - Fyodor Dostoyevsky.txt


25/12/10 22:52:15 WARN TaskSetManager: Stage 166 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:16 WARN TaskSetManager: Stage 167 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 64. [ID: The Confessions of St Augustine] The Confessions of St Augustine.txt


25/12/10 22:52:16 WARN TaskSetManager: Stage 168 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:17 WARN TaskSetManager: Stage 169 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:17 WARN TaskSetManager: Stage 170 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 65. [ID: The Count of Monte Cristo - Alexandre Dumas] The Count of Monte Cristo - Alexandre Dumas.txt


25/12/10 22:52:18 WARN TaskSetManager: Stage 171 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 66. [ID: The Country Seats of the United States - William Birch] The Country Seats of the United States - William Birch.txt


25/12/10 22:52:18 WARN TaskSetManager: Stage 172 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:19 WARN TaskSetManager: Stage 173 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.
                                                                                

 67. [ID: The Diary of Samuel Pepys - Samuel Pepys] The Diary of Samuel Pepys - Samuel Pepys.txt


25/12/10 22:52:20 WARN TaskSetManager: Stage 174 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:20 WARN TaskSetManager: Stage 175 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 68. [ID: The Divine Comedy - Dante Alighieri] The Divine Comedy - Dante Alighieri.txt


25/12/10 22:52:21 WARN TaskSetManager: Stage 176 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:22 WARN TaskSetManager: Stage 177 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 69. [ID: The Enchanted April - Elizabeth Von Arnim] The Enchanted April - Elizabeth Von Arnim.txt


25/12/10 22:52:22 WARN TaskSetManager: Stage 178 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:23 WARN TaskSetManager: Stage 179 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 70. [ID: The Expedition of Humphry Clinker - T Smollett] The Expedition of Humphry Clinker - T Smollett.txt


25/12/10 22:52:23 WARN TaskSetManager: Stage 180 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:24 WARN TaskSetManager: Stage 181 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 71. [ID: The Great Gatsby - F Scott Fitzgerald] The Great Gatsby - F Scott Fitzgerald.txt


25/12/10 22:52:25 WARN TaskSetManager: Stage 182 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:25 WARN TaskSetManager: Stage 183 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 72. [ID: The Hero] The Hero.txt


25/12/10 22:52:26 WARN TaskSetManager: Stage 184 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:26 WARN TaskSetManager: Stage 185 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 73. [ID: The Iliad - Homer] The Iliad - Homer.txt


25/12/10 22:52:27 WARN TaskSetManager: Stage 186 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:28 WARN TaskSetManager: Stage 187 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 74. [ID: The Importance of Being Earnest - Oscar Wilde] The Importance of Being Earnest - Oscar Wilde.txt


25/12/10 22:52:28 WARN TaskSetManager: Stage 188 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:29 WARN TaskSetManager: Stage 189 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 75. [ID: The Itching PalmA Study of the Habit of Tipping in Americ] The Itching PalmA Study of the Habit of Tipping in Americ.txt


25/12/10 22:52:29 WARN TaskSetManager: Stage 190 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:30 WARN TaskSetManager: Stage 191 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 76. [ID: The Kama Sutra of Vatsyayana] The Kama Sutra of Vatsyayana.txt


25/12/10 22:52:31 WARN TaskSetManager: Stage 192 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:31 WARN TaskSetManager: Stage 193 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 77. [ID: The King in Yellow - Robert W Chambers] The King in Yellow - Robert W Chambers.txt


25/12/10 22:52:32 WARN TaskSetManager: Stage 194 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:32 WARN TaskSetManager: Stage 195 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 78. [ID: The Lesser Key of Solomon - Goetia] The Lesser Key of Solomon - Goetia.txt


25/12/10 22:52:33 WARN TaskSetManager: Stage 196 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:33 WARN TaskSetManager: Stage 197 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 79. [ID: The Odyssey - Homer] The Odyssey - Homer.txt


25/12/10 22:52:34 WARN TaskSetManager: Stage 198 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:35 WARN TaskSetManager: Stage 199 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 80. [ID: The Part Borne by the Dutch in the Discovery of Australia - J E Heeres] The Part Borne by the Dutch in the Discovery of Australia - J E Heeres.txt


25/12/10 22:52:35 WARN TaskSetManager: Stage 200 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:36 WARN TaskSetManager: Stage 201 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 81. [ID: The Philosophy of Auguste Comte - Lucien Levy-Bruhl] The Philosophy of Auguste Comte - Lucien Levy-Bruhl.txt


25/12/10 22:52:36 WARN TaskSetManager: Stage 202 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:37 WARN TaskSetManager: Stage 203 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 82. [ID: The Picture of Dorian Gray - Oscar Wilde] The Picture of Dorian Gray - Oscar Wilde.txt


25/12/10 22:52:37 WARN TaskSetManager: Stage 204 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:38 WARN TaskSetManager: Stage 205 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 83. [ID: The Prince - Niccolo Machiavelli] The Prince - Niccolo Machiavelli.txt


25/12/10 22:52:39 WARN TaskSetManager: Stage 206 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:39 WARN TaskSetManager: Stage 207 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 84. [ID: The Republic - Plato] The Republic - Plato.txt


25/12/10 22:52:40 WARN TaskSetManager: Stage 208 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:40 WARN TaskSetManager: Stage 209 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 85. [ID: The Romance of Lust - Anonymous] The Romance of Lust - Anonymous.txt


25/12/10 22:52:41 WARN TaskSetManager: Stage 210 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:42 WARN TaskSetManager: Stage 211 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:42 WARN TaskSetManager: Stage 212 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 86. [ID: The Scarlet Letter - Nathaniel Hawthorne] The Scarlet Letter - Nathaniel Hawthorne.txt


25/12/10 22:52:43 WARN TaskSetManager: Stage 213 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 87. [ID: The Souls of Black Folk - W E B Du Bois] The Souls of Black Folk - W E B Du Bois.txt


25/12/10 22:52:43 WARN TaskSetManager: Stage 214 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:44 WARN TaskSetManager: Stage 215 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:45 WARN TaskSetManager: Stage 216 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 88. [ID: The Tragical History of Doctor Faustus - Christopher Marlowe] The Tragical History of Doctor Faustus - Christopher Marlowe.txt


25/12/10 22:52:45 WARN TaskSetManager: Stage 217 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:46 WARN TaskSetManager: Stage 218 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 89. [ID: The Wars of Religion in France 1559-1576 - James Westfall Thompson] The Wars of Religion in France 1559-1576 - James Westfall Thompson.txt


25/12/10 22:52:46 WARN TaskSetManager: Stage 219 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:47 WARN TaskSetManager: Stage 220 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 90. [ID: The Wonderful Wizard of Oz - L Frank Baum] The Wonderful Wizard of Oz - L Frank Baum.txt


25/12/10 22:52:48 WARN TaskSetManager: Stage 221 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 91. [ID: The Yellow Wallpaper - Charlotte Perkins Gilman] The Yellow Wallpaper - Charlotte Perkins Gilman.txt


25/12/10 22:52:48 WARN TaskSetManager: Stage 222 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:49 WARN TaskSetManager: Stage 223 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 92. [ID: Thus Spake Zarathustra - Friedrich Nietzsche] Thus Spake Zarathustra - Friedrich Nietzsche.txt


25/12/10 22:52:49 WARN TaskSetManager: Stage 224 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:50 WARN TaskSetManager: Stage 225 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 93. [ID: Treasure Island - Robert Louis Stevenson] Treasure Island - Robert Louis Stevenson.txt


25/12/10 22:52:50 WARN TaskSetManager: Stage 226 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:51 WARN TaskSetManager: Stage 227 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 94. [ID: Twas the Night before Christmas - Clement Clarke Moore] Twas the Night before Christmas - Clement Clarke Moore.txt


25/12/10 22:52:52 WARN TaskSetManager: Stage 228 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:52 WARN TaskSetManager: Stage 229 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 95. [ID: Twenty Years After - Alexandre Dumas] Twenty Years After - Alexandre Dumas.txt


25/12/10 22:52:53 WARN TaskSetManager: Stage 230 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:53 WARN TaskSetManager: Stage 231 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 96. [ID: Ulysses - James Joyce] Ulysses - James Joyce.txt


25/12/10 22:52:54 WARN TaskSetManager: Stage 232 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:55 WARN TaskSetManager: Stage 233 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 97. [ID: Walden and Civil Disobedience - Henry David Thoreau] Walden and Civil Disobedience - Henry David Thoreau.txt


25/12/10 22:52:55 WARN TaskSetManager: Stage 234 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:56 WARN TaskSetManager: Stage 235 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:56 WARN TaskSetManager: Stage 236 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


 98. [ID: War and Peace - Leo Tolstoy] War and Peace - Leo Tolstoy.txt


25/12/10 22:52:57 WARN TaskSetManager: Stage 237 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


 99. [ID: White Nights and Other Stories - Fyodor Dostoyevsky] White Nights and Other Stories - Fyodor Dostoyevsky.txt


25/12/10 22:52:57 WARN TaskSetManager: Stage 238 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/10 22:52:58 WARN TaskSetManager: Stage 239 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


100. [ID: Wuthering Heights - Emily Bronte] Wuthering Heights - Emily Bronte.txt

‚úÖ Proyecto MapReduce completado


## CELDA 17: QUITANDO WARNINGS

In [19]:
print("\n" + "="*80)
print("‚ö° OPTIMIZANDO ACCESO A DATOS (ELIMINAR WARNINGS)")
print("="*80)

# Crear diccionario local de t√≠tulos (evita serializar RDD completo)
print("\nüîπ Creando mapeo de t√≠tulos...")
book_titles = {book_id: title for book_id, title, _ in books_rdd.collect()}

print(f"  ‚úì {len(book_titles)} t√≠tulos mapeados en memoria")
print("\n Los warnings de 29816 KiB deber√≠an desaparecer ahora")



üîπ Creando mapeo de t√≠tulos...


25/12/10 22:52:59 WARN TaskSetManager: Stage 240 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.


  ‚úì 100 t√≠tulos mapeados en memoria



## CELDA 18: RECOMENDACIONES PERSONALIZADAS

In [20]:
print("\n" + "="*80)
print("RECOMENDAR LIRBRO")
print("="*80)

libro_input = input("\nIngresa el ID del libro o n√∫mero de la lista: ").strip()

try:
    # Si es un n√∫mero, buscar por √≠ndice
    if libro_input.isdigit():
        idx = int(libro_input) - 1
        if 0 <= idx < len(book_ids):
            libro_seleccionado = book_ids[idx]
        else:
            raise ValueError(f"N√∫mero fuera de rango (1-{len(book_ids)})")
    else:
        libro_seleccionado = libro_input
    
    # Pedir cantidad de recomendaciones
    n_recs_input = input("¬øCu√°ntas recomendaciones deseas? (default: 5): ").strip()
    n_recs = int(n_recs_input) if n_recs_input else 5
    
    print(f"\n{'='*80}")
    print(f"üìñ BUSCANDO RECOMENDACIONES...")
    print(f"{'='*80}\n")
    
    # CAMBIO AQU√ç: Usar diccionario en lugar de filter
    filename = book_titles[libro_seleccionado]
    print(f"Libro seleccionado: {filename}")
    print(f"ID: {libro_seleccionado}\n")
    
    # Generar recomendaciones
    recomendaciones = recomendar_libros_mapreduce(libro_seleccionado, N=n_recs)
    
    print(f"TOP {n_recs} RECOMENDACIONES:\n")
    print(f"{'#':<4} {'SIMILITUD':<12} {'LIBRO'}")
    print("="*80)
    
    for i, (book_id, score) in enumerate(recomendaciones, 1):
        # CAMBIO AQU√ç: Usar diccionario en lugar de filter
        rec_filename = book_titles[book_id]
        
        # Crear barra visual
        bar_length = int(score * 40)
        bar = "‚ñà" * bar_length
        
        print(f"{i:<4} {score:>6.4f} ({score*100:>5.1f}%)  {rec_filename}")
        print(f"     {bar}")
    
    print("\n Recomendaciones generadas con MapReduce")
    
except ValueError as e:
    print(f"\n Error: {e}")
except Exception as e:
    print(f"\n Error inesperado: {e}")



üé¨ SISTEMA DE RECOMENDACIONES INTERACTIVO



üìñ Ingresa el ID del libro o n√∫mero de la lista:  39
¬øCu√°ntas recomendaciones deseas? (default: 5):  10



üìñ BUSCANDO RECOMENDACIONES...

üìö Libro seleccionado: Meditations - Marcus Aurelius.txt
üÜî ID: Meditations - Marcus Aurelius

üéØ TOP 10 RECOMENDACIONES:

#    SIMILITUD    LIBRO
1    0.5848 ( 58.5%)  The Confessions of St Augustine.txt
     ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
2    0.3542 ( 35.4%)  Paradise Lost - John Milton.txt
     ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
3    0.3234 ( 32.3%)  The Divine Comedy - Dante Alighieri.txt
     ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
4    0.3096 ( 31.0%)  Complete Works of Shakespeare.txt
     ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
5    0.1932 ( 19.3%)  The Lesser Key of Solomon - Goetia.txt
     ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
6    0.1815 ( 18.1%)  Thus Spake Zarathustra - Friedrich Nietzsche.txt
     ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
7    0.1070 ( 10.7%)  The Iliad - Homer.txt
     ‚ñà‚ñà‚ñà‚ñà
8    0.1065 ( 10.6%)  Romeo and Juliet - William Shakespeare.txt
     ‚ñà‚ñà‚ñà‚ñà
9    0.0861 (  8.6%)  The Republic - Plato.t

## CELDA 21: PALABRAS CARACTER√çSTICAS 

In [22]:
print("\n" + "="*80)
print("PESO DE LAS PALABRAS")
print("="*80)

print("\n Analiza las palabras m√°s importantes de un libro")

libro_input = input("\nüìñ Ingresa el ID del libro o n√∫mero: ").strip()

try:
    # Si es un n√∫mero, buscar por √≠ndice
    if libro_input.isdigit():
        idx = int(libro_input) - 1
        if 0 <= idx < len(book_ids):
            libro_seleccionado = book_ids[idx]
        else:
            raise ValueError(f"N√∫mero fuera de rango (1-{len(book_ids)})")
    else:
        libro_seleccionado = libro_input
    
    # Pedir cantidad de palabras
    m_palabras_input = input("¬øCu√°ntas palabras mostrar? (default: 10): ").strip()
    m_palabras = int(m_palabras_input) if m_palabras_input else 10
    
    print(f"\n{'='*80}")
    print(f"ANALIZANDO PALABRAS CARACTER√çSTICAS...")
    print(f"{'='*80}\n")
    
    # Obtener nombre del libro
    filename = books_rdd.filter(lambda x: x[0] == libro_seleccionado).first()[1]
    print(f"Libro: {filename}")
    print(f"ID: {libro_seleccionado}\n")
    
    # Obtener palabras importantes
    palabras = palabras_importantes_mapreduce(libro_seleccionado, M=m_palabras)
    
    if not palabras:
        print("No se encontraron palabras para este libro")
    else:
        print(f"TOP {len(palabras)} PALABRAS M√ÅS IMPORTANTES:\n")
        print(f"{'#':<4} {'PALABRA':<20} {'TF-IDF':<12} {'IMPORTANCIA'}")
        print("="*80)
        
        max_score = palabras[0][1]  # Score m√°s alto para normalizar barras
        
        for i, (palabra, score) in enumerate(palabras, 1):
            # Barra visual proporcional
            bar_length = int((score / max_score) * 40)
            bar = "‚ñà" * bar_length
            
            print(f"{i:<4} {palabra:<20} {score:>8.4f}     {bar}")
        
        print("\nAn√°lisis de palabras completado")
        print("\nLas palabras con mayor TF-IDF son las m√°s caracter√≠sticas del libro")
        
except ValueError as e:
    print(f"\n Error: {e}")
except Exception as e:
    print(f"\n Error inesperado: {e}")



PESO DE LAS PALABRAS

 Analiza las palabras m√°s importantes de un libro



üìñ Ingresa el ID del libro o n√∫mero:  100
¬øCu√°ntas palabras mostrar? (default: 10):  11



ANALIZANDO PALABRAS CARACTER√çSTICAS...



25/12/11 00:32:11 WARN TaskSetManager: Stage 243 contains a task of very large size (29816 KiB). The maximum recommended task size is 1000 KiB.
25/12/11 00:32:12 WARN TaskSetManager: Stage 244 contains a task of very large size (29072 KiB). The maximum recommended task size is 1000 KiB.


Libro: Wuthering Heights - Emily Bronte.txt
ID: Wuthering Heights - Emily Bronte

TOP 11 PALABRAS M√ÅS IMPORTANTES:

#    PALABRA              TF-IDF       IMPORTANCIA
1    heathcliff             0.6685     ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
2    linton                 0.5725     ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
3    hareton                0.2558     ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
4    earnshaw               0.1802     ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
5    cathy                  0.1773     ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
6    catherine              0.1632     ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
7    hindley                0.0975     ‚ñà‚ñà‚ñà‚ñà‚ñà
8    wuthering              0.0901     ‚ñà‚ñà‚ñà‚ñà‚ñà
9    nelly                  0.0863     ‚ñà‚ñà‚ñà‚ñà‚ñà
10   edgar                  0.0801     ‚ñà‚ñà‚ñà‚ñà
11   grange          