# Amazon Products Data Analysis

Este notebook carga todos los archivos JSON de la carpeta `data/extractions/amazon/` y los convierte en DataFrames de pandas para su an√°lisis.

In [1]:
import pandas as pd
import json
from pathlib import Path
import os

## Cargar todos los archivos JSON

In [2]:
# Directorio con los archivos JSON
data_dir = Path('data/extractions/amazon')

# Encontrar todos los archivos JSON
json_files = list(data_dir.glob('*.json'))

print(f"üìÅ Archivos JSON encontrados: {len(json_files)}")
for file in json_files:
    print(f"  - {file.name}")

üìÅ Archivos JSON encontrados: 3
  - amazon_cafe.json
  - amazon_monitor_gaming.json
  - amazon_leche_de_vaca.json


## Convertir cada JSON a DataFrame

In [3]:
# Diccionario para almacenar todos los DataFrames
dataframes = {}

for json_file in json_files:
    # Leer el archivo JSON
    with open(json_file, 'r', encoding='utf-8') as f:
        data = json.load(f)
    
    # Convertir a DataFrame
    df = pd.DataFrame(data)
    
    # Usar el nombre del archivo (sin extensi√≥n) como clave
    df_name = json_file.stem
    dataframes[df_name] = df
    
    print(f"‚úÖ {df_name}: {len(df)} productos, {len(df.columns)} columnas")

print(f"\nüìä Total de DataFrames creados: {len(dataframes)}")

‚úÖ amazon_cafe: 50 productos, 26 columnas
‚úÖ amazon_monitor_gaming: 50 productos, 26 columnas
‚úÖ amazon_leche_de_vaca: 50 productos, 26 columnas

üìä Total de DataFrames creados: 3


## Explorar los DataFrames

### Vista general de cada DataFrame

In [4]:
# Mostrar informaci√≥n de cada DataFrame
for name, df in dataframes.items():
    print(f"\n{'='*60}")
    print(f"üì¶ {name.upper()}")
    print(f"{'='*60}")
    print(f"N√∫mero de productos: {len(df)}")
    print(f"Columnas: {list(df.columns)}")
    print(f"\nPrimeras 3 filas:")
    display(df.head(3))


üì¶ AMAZON_CAFE
N√∫mero de productos: 50
Columnas: ['asin', 'title', 'brand', 'price', 'original_price', 'discount', 'rating', 'reviews_count', 'has_prime', 'free_shipping', 'availability', 'seller', 'options', 'additional_specs', 'url', 'image_url', 'search_term', 'position', 'specifications', 'product_overview', 'nutrition_facts', 'ingredients', 'description', 'features', 'dimensions', 'weight']

Primeras 3 filas:


Unnamed: 0,asin,title,brand,price,original_price,discount,rating,reviews_count,has_prime,free_shipping,...,search_term,position,specifications,product_overview,nutrition_facts,ingredients,description,features,dimensions,weight
0,B0BHSRPJMJ,"By Amazon Colombian Coffee Bean, 100% Arabica,...",by Amazon,‚Ç¨18.92,‚Ç¨18.92,10% off your first subscription order,4.0 out of 5 stars,0,False,False,...,cafe,1,"[{'label': 'Product Dimensions', 'value': '‚Äé20...","{'Brand Name': 'by Amazon', 'Item Form': 'Whol...",{},,"At our Selection by Amazon, we pride ourselves...","[2 packs: 2 x 500g, total 1kg, Suitable for al...",‚Äé20.5 x 18.8 x 14.8 cm; 1.12 kg,‚Äé1.12 kg
1,B08J3DZ3FF,Incapto Specialty Coffee Bean 1kg | Origin Bra...,Incapto,‚Ç¨33.95,‚Ç¨33.95,‚Ç¨3.00 off your first subscription order,4.1 out of 5 stars,0,False,False,...,cafe,2,"[{'label': 'Product Dimensions', 'value': '‚Äé10...","{'Brand Name': 'Incapto', 'Item Form': 'Whole ...",{},Safety Information:\n\nPut the necessary amoun...,,[üíö TASTING NOTES - This coffee from Brazil is ...,‚Äé10 x 22.5 x 33.5 cm; 1 kg,‚Äé1 kg
2,B0D2Y7VHCG,"CLASSIC ARABICA COFFEE ORGANIC BEAN, 1 KG (NAT...",Naturela,‚Ç¨24.96,‚Ç¨24.96,,4.1 out of 5 stars,0,False,False,...,cafe,3,"[{'label': 'Package Dimensions', 'value': '‚Äé30...","{'Brand Name': 'Naturela', 'Item Form': 'Whole...",{},Safety Information:\n\nConsumption not recomme...,"CLASSIC ARABICA COFFEE ORGANIC BEAN, 1 KG (NAT...",[100% ORGANIC ARABICA COFFEE BEANS - Sourced f...,‚Äé30.7 x 14.2 x 13.3 cm; 1 kg,‚Äé1 kg



üì¶ AMAZON_MONITOR_GAMING
N√∫mero de productos: 50
Columnas: ['asin', 'title', 'brand', 'price', 'original_price', 'discount', 'rating', 'reviews_count', 'has_prime', 'free_shipping', 'availability', 'seller', 'options', 'additional_specs', 'url', 'image_url', 'search_term', 'position', 'specifications', 'product_overview', 'nutrition_facts', 'ingredients', 'description', 'features', 'dimensions', 'weight']

Primeras 3 filas:


Unnamed: 0,asin,title,brand,price,original_price,discount,rating,reviews_count,has_prime,free_shipping,...,search_term,position,specifications,product_overview,nutrition_facts,ingredients,description,features,dimensions,weight
0,B0D54DTG3X,ASUS ROG Swift OLED PG39WCDM - 39-inch Curved ...,ASUS,"‚Ç¨1,499.00","‚Ç¨1,599.00",,3.7 out of 5 stars,"Price, product page\n‚Ç¨1,499.00\n‚Ç¨1,499\n.\n00 ...",False,False,...,monitor gaming,1,"[{'label': 'Brand', 'value': '‚ÄéASUS'}, {'label...","{'Brand Name': 'ASUS', 'Screen Size': '39 Inch...",{},,,[39-inch curved (3440 x 1440) OLED gaming moni...,‚Äé31.8 x 89 x 57.5 cm; 9.61 kg,
1,B0C23M7Y9V,"KOORUI 24E6CA Curved Gaming Monitor 24 Inch, 1...",KOORUI,‚Ç¨94.99,‚Ç¨99.99,,4.5 out of 5 stars,100+ bought in past month,False,False,...,monitor gaming,2,"[{'label': 'Brand', 'value': '‚ÄéKOORUI'}, {'lab...","{'Brand Name': 'KOORUI', 'Screen Size': '24.0'...",{},,,[[24'' Full HD Ultra-Thin Curved Monitor 1500R...,‚Äé6.8 x 53.7 x 31.9 cm; 4.89 kg,‚Äé4.89 kg
2,B0FGX5KPJ5,KTC 27-inch Gaming Monitor | 2K@200Hz | Built-...,KTC,‚Ç¨199.99,,,4.5 out of 5 stars,"Price, product page\n‚Ç¨199.99\n‚Ç¨199\n.\n99",False,False,...,monitor gaming,3,"[{'label': 'Brand', 'value': '‚ÄéKTC'}, {'label'...","{'Brand Name': 'KTC', 'Screen Size': '27 Inche...",{},,,[[2K QHD / 200Hz / 1ms] This 27-inch gaming mo...,‚Äé21.3 x 61.6 x 53.6 cm; 8.3 kg,‚Äé8.3 kg



üì¶ AMAZON_LECHE_DE_VACA
N√∫mero de productos: 50
Columnas: ['asin', 'title', 'brand', 'price', 'original_price', 'discount', 'rating', 'reviews_count', 'has_prime', 'free_shipping', 'availability', 'seller', 'options', 'additional_specs', 'url', 'image_url', 'search_term', 'position', 'specifications', 'product_overview', 'nutrition_facts', 'ingredients', 'description', 'features', 'dimensions', 'weight']

Primeras 3 filas:


Unnamed: 0,asin,title,brand,price,original_price,discount,rating,reviews_count,has_prime,free_shipping,...,search_term,position,specifications,product_overview,nutrition_facts,ingredients,description,features,dimensions,weight
0,B01ITRIBGU,"Central Lechera Asturiana Semi-skimmed milk, P...",Central Lechera Asturiana,‚Ç¨6.60,‚Ç¨1.10,,4.6 out of 5 stars,0,False,False,...,leche de vaca,1,"[{'label': 'Package Dimensions', 'value': '‚Äé26...","{'Brand Name': 'Central Lechera Asturiana', 'F...","{'Energy': '190kJ', 'Fat': '1.55g', 'Saturates...",Safety Information:\n\nKeep Refrigerated,Traditional Brik Milk 1L Semi-skimmed. With a ...,[Semi-skimmed cow's milk preserving the necess...,‚Äé26.9 x 22.4 x 13.8 cm; 6.42 kg,‚Äé6 kg
1,B01DUW8BPA,Pascual - Pascual Classic Whole Milk. Format 6...,Pascual,‚Ç¨7.44,‚Ç¨1.24,,4.6 out of 5 stars,0,False,False,...,leche de vaca,2,"[{'label': 'Package Dimensions', 'value': '‚Äé23...","{'Brand Name': 'Pascual', 'Flavour': 'Whole Mi...",{},Safety Information:\n\nStore in refrigerator a...,Pack of 6 Pascual Skimmed Animal Welfare Milk ...,"[MILK LOVERS: Your usual milk, now packaged in...",‚Äé23 x 22.5 x 16 cm; 6.4 kg,
2,B00CAKL1C4,Lauki Semi-skimmed milk - Pack of 6 x 1 L- Tot...,Lauki,,,,4.7 out of 5 stars,0,False,False,...,leche de vaca,3,"[{'label': 'Product Dimensions', 'value': '‚Äé22...","{'Brand Name': 'Lauki', 'Flavour': 'Tasteless'...",{},"Safety Information:\n\nStore in a cool, dry pl...",Lauki Semi-skimmed milk - Pack 6 x 1 L - Total...,[High quality semi-skimmed milk with Animal We...,‚Äé22.5 x 15 x 21.6 cm; 6.5 kg,‚Äé6.5 kg


## Acceso individual a cada DataFrame

Puedes acceder a cada DataFrame usando su nombre:

In [5]:
# Ejemplos de acceso:
# df_cafe = dataframes.get('amazon_cafe')
# df_leche = dataframes.get('amazon_leche_de_vaca')
# df_monitor = dataframes.get('amazon_monitor_gaming')

# Mostrar los DataFrames disponibles
print("DataFrames disponibles:")
for name in dataframes.keys():
    print(f"  - dataframes['{name}']")

DataFrames disponibles:
  - dataframes['amazon_cafe']
  - dataframes['amazon_monitor_gaming']
  - dataframes['amazon_leche_de_vaca']


## An√°lisis b√°sico de precios

In [6]:
for name, df in dataframes.items():
    print(f"\nüìä An√°lisis de precios - {name}")
    print("-" * 50)
    
    # Limpiar y convertir precios
    if 'price' in df.columns:
        # Extraer n√∫meros del precio
        df['price_numeric'] = df['price'].str.replace('‚Ç¨', '').str.replace(',', '.').str.strip()
        df['price_numeric'] = pd.to_numeric(df['price_numeric'], errors='coerce')
        
        # Estad√≠sticas
        if df['price_numeric'].notna().any():
            print(f"Precio promedio: ‚Ç¨{df['price_numeric'].mean():.2f}")
            print(f"Precio m√≠nimo: ‚Ç¨{df['price_numeric'].min():.2f}")
            print(f"Precio m√°ximo: ‚Ç¨{df['price_numeric'].max():.2f}")
            print(f"Mediana: ‚Ç¨{df['price_numeric'].median():.2f}")
        else:
            print("No se encontraron precios v√°lidos")


üìä An√°lisis de precios - amazon_cafe
--------------------------------------------------
Precio promedio: ‚Ç¨17.46
Precio m√≠nimo: ‚Ç¨2.60
Precio m√°ximo: ‚Ç¨49.95
Mediana: ‚Ç¨15.18

üìä An√°lisis de precios - amazon_monitor_gaming
--------------------------------------------------
Precio promedio: ‚Ç¨143.38
Precio m√≠nimo: ‚Ç¨69.00
Precio m√°ximo: ‚Ç¨799.00
Mediana: ‚Ç¨119.00

üìä An√°lisis de precios - amazon_leche_de_vaca
--------------------------------------------------
Precio promedio: ‚Ç¨8.24
Precio m√≠nimo: ‚Ç¨1.17
Precio m√°ximo: ‚Ç¨28.99
Mediana: ‚Ç¨7.14


## An√°lisis de ratings

In [7]:
for name, df in dataframes.items():
    print(f"\n‚≠ê An√°lisis de ratings - {name}")
    print("-" * 50)
    
    if 'rating' in df.columns:
        # Extraer el valor num√©rico del rating
        df['rating_numeric'] = df['rating'].str.extract(r'(\d+\.?\d*)')[0]
        df['rating_numeric'] = pd.to_numeric(df['rating_numeric'], errors='coerce')
        
        if df['rating_numeric'].notna().any():
            print(f"Rating promedio: {df['rating_numeric'].mean():.2f} estrellas")
            print(f"Rating m√°s alto: {df['rating_numeric'].max():.2f} estrellas")
            print(f"Rating m√°s bajo: {df['rating_numeric'].min():.2f} estrellas")
            print(f"\nTop 3 productos mejor valorados:")
            top_rated = df.nlargest(3, 'rating_numeric')[['title', 'rating_numeric', 'price']]
            display(top_rated)
        else:
            print("No se encontraron ratings v√°lidos")


‚≠ê An√°lisis de ratings - amazon_cafe
--------------------------------------------------
Rating promedio: 4.32 estrellas
Rating m√°s alto: 4.70 estrellas
Rating m√°s bajo: 3.80 estrellas

Top 3 productos mejor valorados:


Unnamed: 0,title,rating_numeric,price
15,NESCAF√â Dolce Gusto Caf√© con Leche - Coffee Ca...,4.7,‚Ç¨25.65
32,STARBUCKS Nespresso Roast Espresso Variety Pac...,4.7,‚Ç¨39.90
33,Delta Caf√©s - Coffee in Gold Bean - 2 Packs of...,4.7,‚Ç¨43.87



‚≠ê An√°lisis de ratings - amazon_monitor_gaming
--------------------------------------------------
Rating promedio: 4.43 estrellas
Rating m√°s alto: 4.80 estrellas
Rating m√°s bajo: 3.70 estrellas

Top 3 productos mejor valorados:


Unnamed: 0,title,rating_numeric,price
27,"MSI mag 244F, 24"",FHD,1920x1080 - Gaming Monit...",4.8,‚Ç¨123.58
3,"MSI mag 27C6F 27"" Curved Gaming Monitor FHD, 1...",4.6,‚Ç¨109.00
4,"Lenovo Legion R24e - FHD 23.8"" Gaming Monitor ...",4.6,‚Ç¨84.99



‚≠ê An√°lisis de ratings - amazon_leche_de_vaca
--------------------------------------------------
Rating promedio: 4.46 estrellas
Rating m√°s alto: 5.00 estrellas
Rating m√°s bajo: 1.00 estrellas

Top 3 productos mejor valorados:


Unnamed: 0,title,rating_numeric,price
32,"Larsa Leche Semidesnatada, Pack 6 x 1L",5.0,‚Ç¨6.84
2,Lauki Semi-skimmed milk - Pack of 6 x 1 L- Tot...,4.7,
9,"Central Lechera Asturiana Whole Milk, Pack 6 x...",4.7,‚Ç¨2.25


## An√°lisis de marcas

In [8]:
for name, df in dataframes.items():
    print(f"\nüè∑Ô∏è An√°lisis de marcas - {name}")
    print("-" * 50)
    
    if 'brand' in df.columns:
        # Contar productos por marca
        brand_counts = df['brand'].value_counts()
        print(f"N√∫mero de marcas √∫nicas: {brand_counts[brand_counts.index != 'N/A'].count()}")
        print(f"\nTop 5 marcas:")
        print(brand_counts[brand_counts.index != 'N/A'].head())


üè∑Ô∏è An√°lisis de marcas - amazon_cafe
--------------------------------------------------
N√∫mero de marcas √∫nicas: 19

Top 5 marcas:
brand
by Amazon        9
Delta Caf√©s      8
SAULA PREMIUM    4
Lavazza          4
CAFETER√çA        3
Name: count, dtype: int64

üè∑Ô∏è An√°lisis de marcas - amazon_monitor_gaming
--------------------------------------------------
N√∫mero de marcas √∫nicas: 14

Top 5 marcas:
brand
AOC       8
ASUS      6
LG        6
KOORUI    6
MSI       5
Name: count, dtype: int64

üè∑Ô∏è An√°lisis de marcas - amazon_leche_de_vaca
--------------------------------------------------
N√∫mero de marcas √∫nicas: 14

Top 5 marcas:
brand
Central Lechera Asturiana    15
Pascual                       8
Puleva                        5
Lauki                         2
by Amazon                     2
Name: count, dtype: int64


## Exportar DataFrames combinados (opcional)

In [None]:
# Si quieres combinar todos los DataFrames en uno solo:
combined_df = pd.concat(dataframes.values(), ignore_index=True)

print(f"üìä DataFrame combinado creado:")
print(f"  - Total de productos: {len(combined_df)}")
print(f"  - Columnas: {len(combined_df.columns)}")

# Opcional: Guardar como CSV
# combined_df.to_csv('data/all_products.csv', index=False, encoding='utf-8')
# print("‚úÖ Guardado en data/all_products.csv")