# Análisis de ventas de productos digitales simuladas

Este proyecto busca analizar patrones de consumo en el mercado de productos digitales (ebooks, cursos online, plantillas, música/licencias) mediante un dataset simulado.

## Esquema del proyecto 
```mermaid 
graph TD
    B[1 <br> Cargar datos]--> C[2 <br> Validaciones básicas]
    C --> D[3 <br> Limpieza]
    D --> E[4 <br> Enriquecimiento temporal]
    E --> F[5 <br> KPIs globales]
    F --> G[6 <br> EDA por dimensiones]
    G --> H[7 <br> Cohorts y recurrencia]
    H --> I[8 <br> Segmentación simple]
    I --> J[9 <br> Visualizaciones clave]
    J --> K[10 <br> Hallazgos y recomendaciones]
    K --> L[11 <br> Dashboard]
```

In [None]:
# 1. Carga de datos
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv('C:/Users/nat27/Desktop/Desktop/Proyectos/CienciaDatos/digital-sales-analytics/data/digital_products_sales_simulated.csv')
data.head()

Unnamed: 0,order_id,customer_id,product_name,category,price_usd,quantity,discount_rate,gross_amount_usd,net_revenue_usd,purchase_dt,region,channel,payment_method
0,6824,2,Musica-44,Musica,53.61,1,0.0,53.61,53.61,2024-09-23 00:32:55,,Email,PayPal
1,6387,1,Curso-27,Curso,118.73,1,0.05,118.73,112.79,2024-09-23 01:00:03,LATAM,Marketplace,Card
2,1623,126,Curso-03,Curso,98.43,1,0.0,98.43,98.43,2024-09-23 03:01:31,EU,SocialAds,Card
3,1146,2,Plantilla-06,Plantilla,10.32,1,0.0,10.32,10.32,2024-09-23 03:03:24,EU,Marketplace,Card
4,2146,2,Plantilla-46,Plantilla,23.09,2,0.05,46.19,43.88,2024-09-23 03:27:10,LATAM,Website,Card


In [None]:
# 2. Validaciones básicas
data.shape
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8000 entries, 0 to 7999
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   order_id          8000 non-null   int64  
 1   customer_id       8000 non-null   int64  
 2   product_name      8000 non-null   object 
 3   category          8000 non-null   object 
 4   price_usd         8000 non-null   float64
 5   quantity          8000 non-null   int64  
 6   discount_rate     8000 non-null   float64
 7   gross_amount_usd  8000 non-null   float64
 8   net_revenue_usd   8000 non-null   float64
 9   purchase_dt       8000 non-null   object 
 10  region            5172 non-null   object 
 11  channel           8000 non-null   object 
 12  payment_method    8000 non-null   object 
dtypes: float64(4), int64(3), object(6)
memory usage: 812.6+ KB


In [None]:
data.describe()

Unnamed: 0,order_id,customer_id,price_usd,quantity,discount_rate,gross_amount_usd,net_revenue_usd
count,8000.0,8000.0,8000.0,8000.0,8000.0,8000.0,8000.0
mean,4000.5,140.232375,44.573959,1.208,0.01285,47.446583,46.83088
std,2309.54541,364.348402,49.647697,0.499016,0.026715,48.612188,48.01961
min,1.0,1.0,3.0,1.0,0.0,3.0,2.55
25%,2000.75,2.0,11.8375,1.0,0.0,13.45,13.29
50%,4000.5,8.0,18.67,1.0,0.0,26.01,25.715
75%,6000.25,62.0,60.625,1.0,0.0,65.02,64.05
max,8000.0,2498.0,246.83,3.0,0.15,246.83,236.28


In [7]:
data.shape

(8000, 13)