Este notebook construye una muestra representativa de la población lectora con base en variables clave del dataset `books_rating`

A continuación se describen las variables seleccionadas como representativas del comportamiento de la población:

| Variable            | Dominio                          | Estadísticas conocidas                           | Comentarios adicionales |
|---------------------|-----------------------------|--------------------------------|---------------------------------------------|
| `review/score`      | [1.0 – 5.0]                 | Moda: 5.0, Media aprox: 4.2    | Escala de satisfaccion                      |
| `review/helpfulness`| [0/0 – 10/10]               | Rango típico 1–10              | Requiere limpieza y transformación a entero |
| `user_id`           | IDs únicos por usuario      | Frecuencia variable            | Clasifica por volumen de participación      |


In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, split, count
from pyspark.sql.types import FloatType, IntegerType

spark = SparkSession.builder.appName('MuestreoBooksRating').getOrCreate()

# Cargar dataset
df = spark.read.option('header', 'true').csv('Books_rating.csv')
df.printSchema()
df.select('review/score', 'review/helpfulness', 'user_id').show(5) ##Ejemplo de variables de participacion

25/05/03 21:55:55 WARN Utils: Your hostname, MacBook-Pro-de-Juan.local resolves to a loopback address: 127.0.0.1; using 192.168.100.16 instead (on interface en0)
25/05/03 21:55:55 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/05/03 21:55:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


root
 |-- Id: string (nullable = true)
 |-- Title: string (nullable = true)
 |-- Price: string (nullable = true)
 |-- User_id: string (nullable = true)
 |-- profileName: string (nullable = true)
 |-- review/helpfulness: string (nullable = true)
 |-- review/score: string (nullable = true)
 |-- review/time: string (nullable = true)
 |-- review/summary: string (nullable = true)
 |-- review/text: string (nullable = true)

+------------+------------------+--------------+
|review/score|review/helpfulness|       user_id|
+------------+------------------+--------------+
|         4.0|               7/7| AVCGYZL8FQQTD|
|         5.0|             10/10|A30TK6U7DNS82R|
|         5.0|             10/11|A3UH4UZ4RSVO82|
|         4.0|               7/7|A2MVUWT453QH61|
|         4.0|               3/3|A22X4XUPKF66MR|
+------------+------------------+--------------+
only showing top 5 rows



25/05/03 21:56:06 WARN GarbageCollectionMetrics: To enable non-built-in garbage collector(s) List(G1 Concurrent GC), users should configure it(them) to spark.eventLog.gcMetrics.youngGenerationGarbageCollectors or spark.eventLog.gcMetrics.oldGenerationGarbageCollectors


In [2]:
# Remover advertencias
spark.sparkContext.setLogLevel("ERROR")

In [3]:
# Limpieza de datos 
df_clean = df.filter((col("review/score").isNotNull()) & 
                     (col("review/helpfulness").isNotNull()) & 
                     (col("user_id").isNotNull()))

In [4]:
df = df_clean

In [5]:
# Convertir columnas relevantes a int
df = df.withColumn("Score_num", col("review/score").cast(FloatType()))
df = df.withColumn("Helpfulness_num", split(col("review/helpfulness"), "/")[0].cast(IntegerType()))

# Clasificar score
df = df.withColumn("score_group", when(col("Score_num") >= 4, "Alta").otherwise("Baja"))
# Clasificar helpfulness
df = df.withColumn("helpfulness_group", when(col("Helpfulness_num") >= 8, "Alta").otherwise("Baja"))

In [6]:
# Remover valores no convertidos correctamente
df = df.filter((col("Score_num").isNotNull()) & (col("Helpfulness_num").isNotNull()))

In [7]:
# Calcular cantidad de reseñas por usuario
user_reviews = df.groupBy("user_id").agg(count("user_id").alias("review_count"))
df = df.join(user_reviews, on="user_id", how="left")


In [8]:
# Clasificación más detallada de usuarios según frecuencia de participación
df = df.withColumn(
    "user_group_detailed",
    when(col("review_count") >= 20, "20+ reseñas")
    .when((col("review_count") >= 10) & (col("review_count") < 20), "10-19 reseñas")
    .when((col("review_count") >= 5) & (col("review_count") < 10), "5-9 reseñas")
    .when((col("review_count") >= 2) & (col("review_count") < 5), "2-4 reseñas")
    .otherwise("1 reseña")
)

In [9]:
# Mostrar distribución con mejor formato
from pyspark.sql.functions import format_number
distribution = df.groupBy("user_group_detailed").count().orderBy("count", ascending=False)
distribution = distribution.withColumn("count", format_number(col("count"), 0))
distribution.show(truncate=False, n=100)

                                                                                

+-------------------+-------+
|user_group_detailed|count  |
+-------------------+-------+
|1 reseña           |690,877|
|2-4 reseñas        |580,802|
|20+ reseñas        |540,951|
|5-9 reseñas        |338,868|
|10-19 reseñas      |268,931|
+-------------------+-------+



In [10]:
# Revisión de distribución conjunta de variables de caracterización
df.groupBy("score_group", "helpfulness_group", "user_group_detailed").count().orderBy("count", ascending=False).show()

[Stage 15:>                                                       (0 + 12) / 13]

+-----------+-----------------+-------------------+------+
|score_group|helpfulness_group|user_group_detailed| count|
+-----------+-----------------+-------------------+------+
|       Alta|             Baja|           1 reseña|482065|
|       Alta|             Baja|        2-4 reseñas|402632|
|       Alta|             Baja|        20+ reseñas|356614|
|       Alta|             Baja|        5-9 reseñas|243589|
|       Alta|             Baja|      10-19 reseñas|190730|
|       Baja|             Baja|           1 reseña|104137|
|       Baja|             Baja|        2-4 reseñas| 97797|
|       Alta|             Alta|        20+ reseñas| 85625|
|       Baja|             Baja|        20+ reseñas| 78467|
|       Alta|             Alta|           1 reseña| 72828|
|       Baja|             Baja|        5-9 reseñas| 53827|
|       Alta|             Alta|        2-4 reseñas| 53819|
|       Baja|             Baja|      10-19 reseñas| 44771|
|       Baja|             Alta|           1 reseña| 3184

                                                                                

In [11]:
# Cálculo de ocurrencias por combinación
comb_counts = df.groupBy("score_group", "helpfulness_group", "user_group_detailed").count()
total = df.count()
comb_probs = comb_counts.withColumn("probabilidad", (col("count") / total))
comb_probs.orderBy(col("probabilidad").desc()).show(truncate=False)

[Stage 27:>                                                       (0 + 12) / 13]

+-----------+-----------------+-------------------+------+--------------------+
|score_group|helpfulness_group|user_group_detailed|count |probabilidad        |
+-----------+-----------------+-------------------+------+--------------------+
|Alta       |Baja             |1 reseña           |482065|0.1991651066815015  |
|Alta       |Baja             |2-4 reseñas        |402632|0.1663473706520621  |
|Alta       |Baja             |20+ reseñas        |356614|0.14733503854068844 |
|Alta       |Baja             |5-9 reseñas        |243589|0.1006387710608326  |
|Alta       |Baja             |10-19 reseñas      |190730|0.07880008048160057 |
|Baja       |Baja             |1 reseña           |104137|0.04302419116611146 |
|Baja       |Baja             |2-4 reseñas        |97797 |0.0404048207982965  |
|Alta       |Alta             |20+ reseñas        |85625 |0.035375960212012   |
|Baja       |Baja             |20+ reseñas        |78467 |0.03241863322576287 |
|Alta       |Alta             |1 reseña 

                                                                                

In [12]:
# Muestreo por todas las combinaciones posibles (2x2x7 = 28 combinaciones)
from itertools import product
selected_columns = ["user_id", "review/helpfulness", "review/score", "review/summary", "Score_num", "user_group_detailed"]

for s, h in product(["Alta", "Baja"], repeat=2):
    for u in ["20+ reseñas", "10-19 reseñas", "5-9 reseñas", "2-4 reseñas", "1 reseña"]:
        subset = df.filter((col("score_group") == s) &
                           (col("helpfulness_group") == h) &
                           (col("user_group_detailed") == u))
        count_subset = subset.count()
        print(f"Muestra: Score={s}, Helpfulness={h}, Usuario={u}, Registros={count_subset}")
        muestra = subset.select(*selected_columns).sample(False, 0.1, seed=42)
        muestra.show(5, truncate=False)

                                                                                

Muestra: Score=Alta, Helpfulness=Alta, Usuario=20+ reseñas, Registros=85625


                                                                                

+--------------+------------------+------------+------------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                            |Score_num|user_group_detailed|
+--------------+------------------+------------+------------------------------------------+---------+-------------------+
|A1075MZNVRMSEO|17/19             |4.0         |A great book about destructive temptations|4.0      |20+ reseñas        |
|A1075MZNVRMSEO|24/24             |5.0         |A great mid-size dictionary               |5.0      |20+ reseñas        |
|A1075MZNVRMSEO|10/10             |5.0         |A wonderful textbook for serious students.|5.0      |20+ reseñas        |
|A11L4SBY7NCSZU|35/36             |5.0         |Fantastic                                 |5.0      |20+ reseñas        |
|A11LNPG39A2ZV4|28/30             |4.0         |Best for Beginners                        |4.0      |20+ reseñas        |
+--------------+--------

                                                                                

Muestra: Score=Alta, Helpfulness=Alta, Usuario=10-19 reseñas, Registros=23537


                                                                                

+--------------+------------------+------------+---------------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                               |Score_num|user_group_detailed|
+--------------+------------------+------------+---------------------------------------------+---------+-------------------+
|A109DZXQULGEUK|22/24             |5.0         |A modern classic                             |5.0      |10-19 reseñas      |
|A10FBJXMQPI0LL|10/13             |4.0         |I really enjoyed this book!                  |4.0      |10-19 reseñas      |
|A10MR5DEPRCX98|10/16             |5.0         |FORGET HAMLET, A NEW GUY IS IN TOWN          |5.0      |10-19 reseñas      |
|A142SWWCTIKL0H|42/42             |5.0         |Hard to believe this was written decades ago!|5.0      |10-19 reseñas      |
|A14YHC72SHZHRT|27/27             |4.0         |Delicious food; a couple of minor gripes     |4.0      |10-19 reseñas      |


                                                                                

Muestra: Score=Alta, Helpfulness=Alta, Usuario=5-9 reseñas, Registros=27862


                                                                                

+--------------+------------------+------------+--------------------------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                                          |Score_num|user_group_detailed|
+--------------+------------------+------------+--------------------------------------------------------+---------+-------------------+
|A109VVAISTEUKY|35/37             |5.0         |Great Book                                              |5.0      |5-9 reseñas        |
|A10NXI59GUCNOL|10/27             |5.0         |Most detailed East Germanic-Eastern Europe early history|5.0      |5-9 reseñas        |
|A10NXI59GUCNOL|10/27             |5.0         |Most detailed East Germanic-Eastern Europe early history|5.0      |5-9 reseñas        |
|A12EGV286BQ9TS|13/19             |5.0         |Justice as a norm, charity by choice                    |5.0      |5-9 reseñas        |
|A12PHF2F5JMM90|12/16             |5.0         |

                                                                                

Muestra: Score=Alta, Helpfulness=Alta, Usuario=2-4 reseñas, Registros=53819


                                                                                

+--------------+------------------+------------+-------------------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                                   |Score_num|user_group_detailed|
+--------------+------------------+------------+-------------------------------------------------+---------+-------------------+
|A10AAANQU1PBEL|8/11              |5.0         |If you want to UNDERSTAND PROFITS look no further|5.0      |2-4 reseñas        |
|A10LPLG0L3MQ62|33/33             |4.0         |Immigrant Kids - A Must Read!                    |4.0      |2-4 reseñas        |
|A10Q8XL2TD9DFS|15/17             |5.0         |Tis' will touch your heart                       |5.0      |2-4 reseñas        |
|A11I0KEN069MP4|31/38             |5.0         |It's About Time!                                 |5.0      |2-4 reseñas        |
|A11RL6JBZ1JFH6|16/18             |5.0         |Best motivational series I've read               

                                                                                

Muestra: Score=Alta, Helpfulness=Alta, Usuario=1 reseña, Registros=72828


                                                                                

+--------------+------------------+------------+---------------------------------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                                                 |Score_num|user_group_detailed|
+--------------+------------------+------------+---------------------------------------------------------------+---------+-------------------+
|A105WH4V8GMM1 |9/10              |4.0         |Provides indepth analysis of the issues involved...            |4.0      |1 reseña           |
|A10DLEX52H8P2M|21/21             |5.0         |Excellent Ideas                                                |5.0      |1 reseña           |
|A10FATH5XNEQ2D|17/25             |5.0         |READ THIS BOOK!!! America's future may depend on it!           |5.0      |1 reseña           |
|A10Y596I3SL84S|57/58             |5.0         |Keats lives                                                    |5.0      |1 reseña           |

                                                                                

Muestra: Score=Alta, Helpfulness=Baja, Usuario=20+ reseñas, Registros=356614


                                                                                

+--------------+------------------+------------+-----------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                           |Score_num|user_group_detailed|
+--------------+------------------+------------+-----------------------------------------+---------+-------------------+
|A101DG7P9E26PW|0/0               |5.0         |I want more!!!                           |5.0      |20+ reseñas        |
|A101DG7P9E26PW|0/0               |5.0         |Transports You to Paris in the 1920s     |5.0      |20+ reseñas        |
|A101DG7P9E26PW|0/0               |4.0         |A Shocking Journey                       |4.0      |20+ reseñas        |
|A10J604M0KKBAS|0/3               |5.0         |Excellent!                               |5.0      |20+ reseñas        |
|A10J604M0KKBAS|2/2               |4.0         |Everything you could ask for in a novel!!|4.0      |20+ reseñas        |
+--------------+----------------

                                                                                

Muestra: Score=Alta, Helpfulness=Baja, Usuario=10-19 reseñas, Registros=190730


                                                                                

+--------------------+------------------+------------+----------------------------------------------------+---------+-------------------+
|user_id             |review/helpfulness|review/score|review/summary                                      |Score_num|user_group_detailed|
+--------------------+------------------+------------+----------------------------------------------------+---------+-------------------+
|A0469729ADTHXTW0CPIS|0/0               |5.0         |Loved it                                            |5.0      |10-19 reseñas      |
|A0919846H34XADJMF99R|1/1               |5.0         |A staple in any library                             |5.0      |10-19 reseñas      |
|A0919846H34XADJMF99R|1/1               |5.0         |A staple in any library                             |5.0      |10-19 reseñas      |
|A1042BIXF6ZMAC      |0/0               |5.0         |A favorite growing up and great memories as an adult|5.0      |10-19 reseñas      |
|A105L4AE1HAC4Y      |1/2         

                                                                                

Muestra: Score=Alta, Helpfulness=Baja, Usuario=5-9 reseñas, Registros=243589


                                                                                

+---------------------+------------------+------------+------------------------------------------------------------+---------+-------------------+
|user_id              |review/helpfulness|review/score|review/summary                                              |Score_num|user_group_detailed|
+---------------------+------------------+------------+------------------------------------------------------------+---------+-------------------+
|A00540411RKGTDNU543WS|1/1               |5.0         |GREAT!                                                      |5.0      |5-9 reseñas        |
|A07084061WTSSXN6VLV92|0/0               |5.0         |One of Oscar Wilde's Best Plays                             |5.0      |5-9 reseñas        |
|A100UD67AHFODS       |0/0               |5.0         |Fantastic book ~ Tonst of information & fabulous photographs|5.0      |5-9 reseñas        |
|A109FU7LXFNVFA       |0/0               |5.0         |A classic                                                   |5.

                                                                                

Muestra: Score=Alta, Helpfulness=Baja, Usuario=2-4 reseñas, Registros=402632


                                                                                

+---------------------+------------------+------------+-------------------------+---------+-------------------+
|user_id              |review/helpfulness|review/score|review/summary           |Score_num|user_group_detailed|
+---------------------+------------------+------------+-------------------------+---------+-------------------+
|A037265110S4AB0GTCV06|0/1               |5.0         |Excellent                |5.0      |2-4 reseñas        |
|A100XA0T0MQBNI       |0/0               |5.0         |Could'nt put it down.    |5.0      |2-4 reseñas        |
|A1012EDQVQFYBT       |2/3               |5.0         |Excellent                |5.0      |2-4 reseñas        |
|A105L53Q4T23EY       |0/0               |5.0         |Always a hit at Storytime|5.0      |2-4 reseñas        |
|A106M5V1P0JHKE       |0/0               |4.0         |Regency Romance!         |4.0      |2-4 reseñas        |
+---------------------+------------------+------------+-------------------------+---------+-------------

                                                                                

Muestra: Score=Alta, Helpfulness=Baja, Usuario=1 reseña, Registros=482065


                                                                                

+---------------------+------------------+------------+----------------------------------------------------------------------------+---------+-------------------+
|user_id              |review/helpfulness|review/score|review/summary                                                              |Score_num|user_group_detailed|
+---------------------+------------------+------------+----------------------------------------------------------------------------+---------+-------------------+
|A074169924XKZ8IJ310GN|0/0               |4.0         |Review                                                                      |4.0      |1 reseña           |
|A101AERBB9U25Y       |0/0               |5.0         |My 16month old boy LOVES this book. It is the perfect size for little hands.|5.0      |1 reseña           |
|A101I9JWTDAE66       |1/1               |5.0         |Twisty. I loved how you went back in time.                                  |5.0      |1 reseña           |
|A10557PVOBSBUZ       

                                                                                

Muestra: Score=Baja, Helpfulness=Alta, Usuario=20+ reseñas, Registros=20245


                                                                                

+--------------+------------------+------------+-----------------------------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                                             |Score_num|user_group_detailed|
+--------------+------------------+------------+-----------------------------------------------------------+---------+-------------------+
|A127B67SDWCONL|11/20             |2.0         |Mere Lewis                                                 |2.0      |20+ reseñas        |
|A140XH16IKR4B0|17/19             |3.0         |Great info, ruined by tone                                 |3.0      |20+ reseñas        |
|A140XH16IKR4B0|49/96             |2.0         |Positively a health risk!                                  |2.0      |20+ reseñas        |
|A18BI74KN3ZVW5|26/36             |3.0         |Should Morals and Philosophy Guide Our Society and Economy?|3.0      |20+ reseñas        |
|A18URP1YKAD79S|9/9        

                                                                                

Muestra: Score=Baja, Helpfulness=Alta, Usuario=10-19 reseñas, Registros=9893


                                                                                

+--------------+------------------+------------+------------------------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                                        |Score_num|user_group_detailed|
+--------------+------------------+------------+------------------------------------------------------+---------+-------------------+
|A11TYILTAFKPR3|21/23             |3.0         |Good read                                             |3.0      |10-19 reseñas      |
|A14307VEWHLNF3|13/14             |3.0         |Decent Compendium but Very Shallow Understanding      |3.0      |10-19 reseñas      |
|A14307VEWHLNF3|9/15              |3.0         |California Energy Crisis and Progressive Republicanism|3.0      |10-19 reseñas      |
|A17X8U4KYC0LKP|8/45              |1.0         |Dreadful                                              |1.0      |10-19 reseñas      |
|A195QTF4JPPBOX|16/20             |2.0         |Save yourself 

                                                                                

Muestra: Score=Baja, Helpfulness=Alta, Usuario=5-9 reseñas, Registros=13590


                                                                                

+--------------+------------------+------------+-------------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                             |Score_num|user_group_detailed|
+--------------+------------------+------------+-------------------------------------------+---------+-------------------+
|A10EJF7MDTG17Y|15/21             |1.0         |Shame on Thomas Nelson for Publishing This.|1.0      |5-9 reseñas        |
|A122CC2BBCN31O|9/10              |1.0         |Look for a different edition....           |1.0      |5-9 reseñas        |
|A122CC2BBCN31O|9/10              |1.0         |Look for a different edition....           |1.0      |5-9 reseñas        |
|A15D9BPFAZTC2B|19/22             |3.0         |Well performed reading of a Gothic classic |3.0      |5-9 reseñas        |
|A16IMM180JAOQU|25/45             |3.0         |Reviews                                    |3.0      |5-9 reseñas        |
+--------------+

                                                                                

Muestra: Score=Baja, Helpfulness=Alta, Usuario=2-4 reseñas, Registros=26554


                                                                                

+--------------+------------------+------------+---------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                         |Score_num|user_group_detailed|
+--------------+------------------+------------+---------------------------------------+---------+-------------------+
|A1086X5YV9M4QF|9/27              |2.0         |Glad I don't serve the God of this book|2.0      |2-4 reseñas        |
|A111DVWFAZPOO1|17/36             |1.0         |Complainsong                           |1.0      |2-4 reseñas        |
|A117795JJCCM61|21/44             |1.0         |shockingly bad                         |1.0      |2-4 reseñas        |
|A12X63TUXQK3L7|14/22             |1.0         |A Derivitave Cluster (Insert Expletive)|1.0      |2-4 reseñas        |
|A1342PFIFCMGHV|16/37             |1.0         |This book is a pack of lies.           |1.0      |2-4 reseñas        |
+--------------+------------------+------------+

                                                                                

Muestra: Score=Baja, Helpfulness=Alta, Usuario=1 reseña, Registros=31847


                                                                                

+--------------+------------------+------------+---------------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                               |Score_num|user_group_detailed|
+--------------+------------------+------------+---------------------------------------------+---------+-------------------+
|A111Y9QLUBHKNW|18/21             |3.0         |Disappointing                                |3.0      |1 reseña           |
|A113QTA2TO2AOZ|26/27             |2.0         |Poor Value                                   |2.0      |1 reseña           |
|A12I2VC7GL3WGP|75/86             |3.0         |Planar Exploration                           |3.0      |1 reseña           |
|A12VHPMRKM923V|11/22             |2.0         |Interesting But The Book Title Is Misleading!|2.0      |1 reseña           |
+--------------+------------------+------------+---------------------------------------------+---------+-------------------+


                                                                                

Muestra: Score=Baja, Helpfulness=Baja, Usuario=20+ reseñas, Registros=78467


                                                                                

+--------------+------------------+------------+--------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                        |Score_num|user_group_detailed|
+--------------+------------------+------------+--------------------------------------+---------+-------------------+
|A10J604M0KKBAS|1/1               |3.0         |Good, but spare us the husband's story|3.0      |20+ reseñas        |
|A10O4LYO967IZ |0/0               |3.0         |Governess without a clue              |3.0      |20+ reseñas        |
|A10O4LYO967IZ |0/1               |3.0         |Frivolity of the Victorian Age Abounds|3.0      |20+ reseñas        |
|A123PD1D2BJUR5|0/1               |3.0         |More Tell and fewer Sacketts, please  |3.0      |20+ reseñas        |
|A123PD1D2BJUR5|0/2               |2.0         |L'Amour misfires                      |2.0      |20+ reseñas        |
+--------------+------------------+------------+--------

                                                                                

Muestra: Score=Baja, Helpfulness=Baja, Usuario=10-19 reseñas, Registros=44771


                                                                                

+---------------------+------------------+------------+----------------------------------------------+---------+-------------------+
|user_id              |review/helpfulness|review/score|review/summary                                |Score_num|user_group_detailed|
+---------------------+------------------+------------+----------------------------------------------+---------+-------------------+
|A00891092QIVH4W1YP46A|1/1               |2.0         |I didn't care for this book                   |2.0      |10-19 reseñas      |
|A00891092QIVH4W1YP46A|1/1               |2.0         |I didn't care for this book                   |2.0      |10-19 reseñas      |
|A00891092QIVH4W1YP46A|1/1               |2.0         |I didn't care for this book                   |2.0      |10-19 reseñas      |
|A12BUBPS6ZNZ82       |5/8               |2.0         |Well-written, but the plot is exceedingly dull|2.0      |10-19 reseñas      |
|A13910TC3NZ6LE       |6/15              |2.0         |Flowery Prose 

                                                                                

Muestra: Score=Baja, Helpfulness=Baja, Usuario=5-9 reseñas, Registros=53827


                                                                                

+--------------+------------------+------------+-------------------------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                                         |Score_num|user_group_detailed|
+--------------+------------------+------------+-------------------------------------------------------+---------+-------------------+
|A109VVAISTEUKY|1/10              |3.0         |Would have been more enjoyable if it had more direction|3.0      |5-9 reseñas        |
|A10NYSAQCUJCWY|1/1               |2.0         |Kindle edition can't do this Justice                   |2.0      |5-9 reseñas        |
|A10NYSAQCUJCWY|1/1               |2.0         |Kindle edition can't do this Justice                   |2.0      |5-9 reseñas        |
|A11C08FBE0Q9VI|3/6               |3.0         |Did not work for me                                    |3.0      |5-9 reseñas        |
|A11MEX1N34Y5JT|4/13              |1.0         |won't d

                                                                                

Muestra: Score=Baja, Helpfulness=Baja, Usuario=2-4 reseñas, Registros=97797


                                                                                

+--------------+------------------+------------+-------------------------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                                         |Score_num|user_group_detailed|
+--------------+------------------+------------+-------------------------------------------------------+---------+-------------------+
|A104EP737ZROHQ|4/10              |1.0         |UGH!!                                                  |1.0      |2-4 reseñas        |
|A105XBOOCZNISQ|7/19              |1.0         |Self published, unreadable, apologia for a child rapist|1.0      |2-4 reseñas        |
|A1086X5YV9M4QF|0/0               |2.0         |Coffin to Heaven                                       |2.0      |2-4 reseñas        |
|A10SJ1NNSQ1CB |3/16              |1.0         |Too much bias to be objective.                         |1.0      |2-4 reseñas        |
|A10W5NFZ9PLX4K|3/10              |1.0         |A pure,

                                                                                

Muestra: Score=Baja, Helpfulness=Baja, Usuario=1 reseña, Registros=104137




+--------------+------------------+------------+------------------------------------------------------------+---------+-------------------+
|user_id       |review/helpfulness|review/score|review/summary                                              |Score_num|user_group_detailed|
+--------------+------------------+------------+------------------------------------------------------------+---------+-------------------+
|A102LSCF3BSKUR|0/8               |1.0         |New Spring by Robert Jordan                                 |1.0      |1 reseña           |
|A107ZPHSRVHZ00|1/1               |2.0         |Kindle Download/Publisher Proofreading Errors               |2.0      |1 reseña           |
|A109PJ5IB5R6N2|4/6               |3.0         |Out of date content! However, good explanation, good CD-ROM.|3.0      |1 reseña           |
|A10R2Z2FQLPYEY|1/2               |1.0         |kindle version full of typos                                |1.0      |1 reseña           |
|A10Y0YM81YDLIT|1/1 

                                                                                

**Técnica aplicada**: *Muestreo estratificado* sobre combinaciones de score, utilidad y tipo de usuario.

Se utiliza esta técnica porque permite garantizar que cada subgrupo esté representado proporcionalmente en la muestra.

**Justificación:**
- Los usuarios muy activos pueden dominar el conjunto si no se estratifica.
- Las reseñas útiles y bien puntuadas tienen mayor influencia sobre recomendaciones.
- El análisis cruzado de estas variables representa diferentes comportamientos del lector.