# E-commerce Business Analytics (Versión Interactiva D3.js)

Esta versión (`EDA_Refactored_v2.ipynb`) introduce visualizaciones interactivas construidas con **D3.js v7** para enriquecer la exploración de métricas clave: ingresos, categorías, geografía, satisfacción del cliente y rendimiento logístico.

---
## Tabla de Contenidos
1. Introducción y Objetivos
2. Configuración de Parámetros
3. Carga y Procesamiento de Datos
4. Cálculo de Métricas de Negocio
5. Helpers de Integración D3
6. Toggle de Tema (Light/Dark)
7. KPIs Interactivos
8. Línea de Ingresos Mensuales
9. Top Categorías
10. Mapa Choropleth
11. Distribución de Reviews
12. Delivery vs Review (Opcional)
13. Insights y Recomendaciones
14. Checklist TODO / Próximos Pasos

---
### 1. Introducción y Objetivos
Esta notebook replica la lógica analítica del notebook original y añade:
- Visualizaciones interactivas creadas directamente con D3.js.
- Animaciones y transiciones suaves.
- Soporte de tema claro/oscuro.
- Accesibilidad básica (role / aria-label / focus).
- Estructura modular: datos en Python → JSON embebido → render en JS.


In [6]:
# 2. Configuración de Parámetros
ANALYSIS_YEAR = 2023
COMPARISON_YEAR = 2022  # Puede ser None
ANALYSIS_MONTH = None   # 1-12 o None
DATA_PATH = "ecommerce_data/"

print(f"Análisis: {ANALYSIS_YEAR} | Comparación: {COMPARISON_YEAR if COMPARISON_YEAR else 'N/A'} | Mes: {ANALYSIS_MONTH if ANALYSIS_MONTH else 'Año completo'}")

Análisis: 2023 | Comparación: 2022 | Mes: Año completo


In [7]:
# 3. Carga y Procesamiento de Datos
import pandas as pd
import json
from data_loader import EcommerceDataLoader, load_and_process_data
from business_metrics import BusinessMetricsCalculator

loader, processed_data = load_and_process_data(DATA_PATH)
summary = loader.get_data_summary()
print("Datasets cargados:")
for k, v in summary.items():
    print(f"- {k}: {v['rows']} filas, {v['columns']} columnas")

Loaded orders: 10,000 records
Loaded order_items: 16,047 records
Loaded products: 6,000 records
Loaded customers: 8,000 records
Loaded reviews: 6,571 records
Loaded payments: 14,091 records
Datasets cargados:
- orders: 10000 filas, 11 columnas
- order_items: 16047 filas, 8 columnas
- reviews: 6571 filas, 7 columnas


In [8]:
# 4. Cálculo de Métricas de Negocio
sales_data = loader.create_sales_dataset(year_filter=ANALYSIS_YEAR,
                                         month_filter=ANALYSIS_MONTH,
                                         status_filter='delivered')
comparison_data = None
if COMPARISON_YEAR:
    comparison_data = loader.create_sales_dataset(year_filter=COMPARISON_YEAR,
                                                  month_filter=ANALYSIS_MONTH,
                                                  status_filter='delivered')
combined_data = sales_data if comparison_data is None else loader.create_sales_dataset(month_filter=ANALYSIS_MONTH, status_filter='delivered')
if comparison_data is not None:
    combined_data = combined_data[combined_data['purchase_year'].isin([ANALYSIS_YEAR, COMPARISON_YEAR])]

calc = BusinessMetricsCalculator(combined_data)
report = calc.generate_comprehensive_report(current_year=ANALYSIS_YEAR,
                                            previous_year=COMPARISON_YEAR)

print("Métricas calculadas: keys ->", list(report.keys()))

Métricas calculadas: keys -> ['analysis_period', 'comparison_period', 'revenue_metrics', 'monthly_trends', 'product_performance', 'geographic_performance', 'customer_satisfaction', 'delivery_performance']


In [9]:
# 5. Extraer variables para exportar a D3.js
# Ajusta los nombres de las claves según la estructura real de tu 'report'
revenue_monthly = report.get('revenue_monthly', [])
product_perf = report.get('product_performance', [])
geo_perf = report.get('geo_performance', [])
review_stats = report.get('review_stats', {})
delivery_stats = report.get('delivery_stats', {})
kpi_data = {
    'totalRevenue': report.get('revenue_metrics', {}).get('total_revenue'),
    'totalOrders': report.get('revenue_metrics', {}).get('total_orders'),
    'averageOrderValue': report.get('revenue_metrics', {}).get('average_order_value'),
    'revenueGrowthPct': report.get('revenue_metrics', {}).get('revenue_growth_pct'),
    'fastDeliveryPct': report.get('delivery_metrics', {}).get('fast_delivery_percentage')
}

In [10]:
from IPython.display import HTML, display

# Render JSON data for D3
blocks = [
    f'<script>window.kpiMetrics = {json.dumps(to_python_types(kpi_metrics), ensure_ascii=False)};</script>',
    f'<script>window.revenueMonthly = {json.dumps(to_python_types(revenue_monthly), ensure_ascii=False)};</script>',
    f'<script>window.productPerf = {json.dumps(to_python_types(product_perf), ensure_ascii=False)};</script>',
    f'<script>window.reviewStats = {json.dumps(to_python_types(review_stats), ensure_ascii=False)};</script>',
    f'<script>window.deliveryStats = {json.dumps(to_python_types(delivery_stats), ensure_ascii=False)};</script>',
    f'<script>window.paymentMethods = {json.dumps(to_python_types(payment_methods), ensure_ascii=False)};</script>',
    f'<script>window.customerSegments = {json.dumps(to_python_types(customer_segments), ensure_ascii=False)};</script>',
    f'<script>window.geoData = {json.dumps(to_python_types(geo_data), ensure_ascii=False)};</script>',
]

display(HTML("\n".join(blocks)))
display(HTML("<script>window.dispatchEvent(new Event('d3-data-ready'));</script>"))

NameError: name 'to_python_types' is not defined

### 6. Toggle de Tema (Light/Dark)
La siguiente celda inserta un switch para alternar tema. Los gráficos D3 deben leer la clase del contenedor para ajustar colores.

In [None]:
from IPython.display import HTML, display

display(HTML('''
<style>
  .toggle-container { text-align: center; margin: 20px 0; }
  .theme-toggle { background: #3498db; color: white; border: none; padding: 10px 20px; border-radius: 5px; cursor: pointer; }
  .theme-toggle:hover { background: #2980b9; }
  .d3-dark .theme-toggle { background: #e74c3c; }
  :root {
    --bg-color: #ffffff;
    --text-color: #333333;
    --card-bg: #f8f9fa;
    --border-color: #dee2e6;
    --accent-color: #3498db;
  }
  .d3-dark {
    --bg-color: #1a1a1a;
    --text-color: #e8e8e8;
    --card-bg: #2c2c2c;
    --border-color: #444444;
    --accent-color: #74b9ff;
  }
  body { background: var(--bg-color); color: var(--text-color); transition: all 0.3s ease; }
  .kpi-card { background: var(--card-bg); border: 1px solid var(--border-color); padding: 15px; margin: 10px; border-radius: 8px; }
</style>
<div class="toggle-container">
  <button class="theme-toggle" onclick="toggleTheme()">🌙 Dark Mode</button>
</div>
<script>
  function toggleTheme() {
    document.body.classList.toggle('d3-dark');
    const isDark = document.body.classList.contains('d3-dark');
    localStorage.setItem('d3-theme', isDark ? 'dark' : 'light');
    document.querySelector('.theme-toggle').textContent = isDark ? '☀️ Light Mode' : '🌙 Dark Mode';
  }
  // Load saved theme
  if (localStorage.getItem('d3-theme') === 'dark') {
    document.body.classList.add('d3-dark');
    document.querySelector('.theme-toggle').textContent = '☀️ Light Mode';
  }
</script>
'''))

### 7. KPIs Interactivos
Contenedor base para tarjetas KPI animadas con D3.

In [None]:
from IPython.display import HTML, display

display(HTML('''
<div class="kpi-grid" style="display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 15px; margin: 20px 0;">
  <div id="kpi-total-orders" class="kpi-card">
    <h4>Total Orders</h4>
    <div class="kpi-value" style="font-size: 2em; color: var(--accent-color); font-weight: bold;">---</div>
  </div>
  <div id="kpi-total-revenue" class="kpi-card">
    <h4>Total Revenue</h4>
    <div class="kpi-value" style="font-size: 2em; color: var(--accent-color); font-weight: bold;">---</div>
  </div>
  <div id="kpi-avg-order" class="kpi-card">
    <h4>Avg Order Value</h4>
    <div class="kpi-value" style="font-size: 2em; color: var(--accent-color); font-weight: bold;">---</div>
  </div>
  <div id="kpi-customers" class="kpi-card">
    <h4>Active Customers</h4>
    <div class="kpi-value" style="font-size: 2em; color: var(--accent-color); font-weight: bold;">---</div>
  </div>
</div>

<script src="https://d3js.org/d3.v7.min.js"></script>
<script>
  function initKPIs() {
    console.log('Initializing KPIs...');
    console.log('window.kpiMetrics:', window.kpiMetrics);
    
    if (!window.kpiMetrics) {
      console.warn('No KPI data available');
      return;
    }
    
    // Animate numbers using D3's interpolation
    const kpis = [
      { id: 'kpi-total-orders', value: window.kpiMetrics.total_orders, format: d3.format(',') },
      { id: 'kpi-total-revenue', value: window.kpiMetrics.total_revenue, format: d3.format('$,.0f') },
      { id: 'kpi-avg-order', value: window.kpiMetrics.avg_order_value, format: d3.format('$,.2f') },
      { id: 'kpi-customers', value: window.kpiMetrics.unique_customers, format: d3.format(',') }
    ];
    
    kpis.forEach(kpi => {
      const element = document.querySelector(`#${kpi.id} .kpi-value`);
      if (element && kpi.value !== undefined) {
        // Animate from 0 to final value
        const interpolator = d3.interpolateNumber(0, kpi.value);
        d3.select(element)
          .transition()
          .duration(1500)
          .tween('text', function() {
            return function(t) {
              this.textContent = kpi.format(interpolator(t));
            };
          });
      }
    });
  }
  
  // Initialize when data is ready
  if (window.kpiMetrics) {
    initKPIs();
  } else {
    window.addEventListener('d3-data-ready', initKPIs);
  }
</script>
'''))

In [None]:
from IPython.display import HTML, display

display(HTML('''
<div class="kpi-card">
  <h3>Monthly Revenue Trend</h3>
  <div id="revenue-line-chart"></div>
</div>

<script>
  function initRevenueLine() {
    console.log('Initializing Revenue Line Chart...');
    console.log('window.revenueMonthly:', window.revenueMonthly);
    
    const container = d3.select('#revenue-line-chart');
    container.selectAll('*').remove(); // Clear previous
    
    if (!window.revenueMonthly || !Array.isArray(window.revenueMonthly) || window.revenueMonthly.length === 0) {
      console.warn('No monthly revenue data available');
      container.append('div')
        .style('color', '#999')
        .style('margin-top', '8px')
        .text('No monthly revenue data available');
      return;
    }
    
    const data = window.revenueMonthly;
    const margin = { top: 20, right: 30, bottom: 40, left: 70 };
    const width = 600 - margin.left - margin.right;
    const height = 300 - margin.top - margin.bottom;
    
    const svg = container.append('svg')
      .attr('width', width + margin.left + margin.right)
      .attr('height', height + margin.top + margin.bottom)
      .attr('role', 'img')
      .attr('aria-label', 'Monthly revenue trend line chart');
    
    const g = svg.append('g')
      .attr('transform', `translate(${margin.left},${margin.top})`);
    
    // Scales
    const xScale = d3.scaleLinear()
      .domain(d3.extent(data, d => d.month))
      .range([0, width]);
    
    const yScale = d3.scaleLinear()
      .domain([0, d3.max(data, d => d.revenue)])
      .nice()
      .range([height, 0]);
    
    // Line generator
    const line = d3.line()
      .x(d => xScale(d.month))
      .y(d => yScale(d.revenue))
      .curve(d3.curveMonotoneX);
    
    // Axes
    g.append('g')
      .attr('transform', `translate(0,${height})`)
      .call(d3.axisBottom(xScale).tickFormat(d => `Month ${d}`));
    
    g.append('g')
      .call(d3.axisLeft(yScale).tickFormat(d3.format('.2s')));
    
    // Line path
    g.append('path')
      .datum(data)
      .attr('fill', 'none')
      .attr('stroke', 'var(--accent-color)')
      .attr('stroke-width', 2)
      .attr('d', line);
    
    // Points
    g.selectAll('.dot')
      .data(data)
      .enter().append('circle')
      .attr('class', 'dot')
      .attr('cx', d => xScale(d.month))
      .attr('cy', d => yScale(d.revenue))
      .attr('r', 4)
      .attr('fill', 'var(--accent-color)')
      .attr('tabindex', '0')
      .on('mouseover', function(event, d) {
        const label = d.year ? `${d.year}` : 'Revenue';
        tooltip.style('opacity', 1).html(`${label}<br>Month ${d.month}: $${d3.format('.2s')(d.revenue)}`);
      })
      .on('mouseout', () => tooltip.style('opacity', 0));
    
    // Tooltip
    const tooltip = d3.select('body').append('div')
      .attr('class', 'd3-tooltip')
      .style('opacity', 0)
      .style('position', 'absolute')
      .style('background', 'var(--card-bg)')
      .style('border', '1px solid var(--border-color)')
      .style('border-radius', '4px')
      .style('padding', '8px')
      .style('font-size', '12px')
      .style('pointer-events', 'none');
  }
  
  // Initialize when data is ready
  if (window.revenueMonthly) {
    initRevenueLine();
  } else {
    window.addEventListener('d3-data-ready', initRevenueLine);
  }
</script>
'''))

In [None]:
from IPython.display import HTML, display

display(HTML('''
<div class="kpi-card">
  <h3>Top Product Categories</h3>
  <div id="categories-bar"></div>
</div>

<script>
  function initCategoriesBar() {
    console.log('Initializing Categories Bar Chart...');
    console.log('window.productPerf:', window.productPerf);
    
    const container = d3.select('#categories-bar');
    container.selectAll('*').remove(); // Clear previous
    
    if (!window.productPerf || !window.productPerf.top_categories) {
      console.warn('No product performance data');
      container.append('div')
        .style('color', '#999')
        .style('margin-top', '8px')
        .text('No category data available');
      return;
    }
    
    const data = window.productPerf.top_categories.slice(0, 10); // Top 10
    
    const margin = { top: 20, right: 30, bottom: 100, left: 70 };
    const width = 600 - margin.left - margin.right;
    const height = 350 - margin.top - margin.bottom;
    
    const svg = container.append('svg')
      .attr('width', width + margin.left + margin.right)
      .attr('height', height + margin.top + margin.bottom)
      .attr('role', 'img')
      .attr('aria-label', 'Top product categories bar chart');
    
    const g = svg.append('g')
      .attr('transform', `translate(${margin.left},${margin.top})`);
    
    // Scales
    const xScale = d3.scaleBand()
      .domain(data.map(d => d.category))
      .range([0, width])
      .padding(0.1);
    
    const yScale = d3.scaleLinear()
      .domain([0, d3.max(data, d => d.revenue)])
      .nice()
      .range([height, 0]);
    
    // Axes
    g.append('g')
      .attr('transform', `translate(0,${height})`)
      .call(d3.axisBottom(xScale))
      .selectAll('text')
      .style('text-anchor', 'end')
      .attr('dx', '-.8em')
      .attr('dy', '.15em')
      .attr('transform', 'rotate(-45)');
    
    g.append('g')
      .call(d3.axisLeft(yScale).tickFormat(d3.format('.2s')));
    
    // Bars
    g.selectAll('.bar')
      .data(data)
      .enter().append('rect')
      .attr('class', 'bar')
      .attr('x', d => xScale(d.category))
      .attr('width', xScale.bandwidth())
      .attr('y', height)
      .attr('height', 0)
      .attr('fill', 'var(--accent-color)')
      .attr('tabindex', '0')
      .transition()
      .duration(800)
      .attr('y', d => yScale(d.revenue))
      .attr('height', d => height - yScale(d.revenue));
  }
  
  // Initialize when data is ready
  if (window.productPerf) {
    initCategoriesBar();
  } else {
    window.addEventListener('d3-data-ready', initCategoriesBar);
  }
</script>
'''))

### 8. Choropleth (Geographic Performance)

Placeholder: Will render a D3 choropleth using a TopoJSON map (e.g., Brazil states or US states) joined with `geoPerformance` dataset.

Data keys expected:
- geoPerformance: array of objects with keys `region`, `revenue`, `avgReview`, `orders`.

Implementation Steps (TODO):
1. Load TopoJSON via `fetch` (single-shot, cache in `window.__topology`).
2. Convert to GeoJSON features.
3. Create a color scale (quantize or sequential) based on revenue.
4. Append `<path>` elements, set `d` attribute with geoPath.
5. Tooltip on hover: region + revenue (formatted) + avgReview + orders.
6. Keyboard navigation: `tabindex=0` on each path; focus style (stroke highlight).
7. Legend: horizontal gradient or discrete boxes with labels.
8. Theme adaptation: stroke color / background.

Performance Consideration:
- Use simplified TopoJSON to keep path complexity low (< 500KB).

In [None]:
from IPython.display import HTML, display

display(HTML('''
<div class="kpi-card">
  <h3>Review Score Distribution</h3>
  <div id="reviews-dist"></div>
</div>

<script>
  function initReviewsHistogram() {
    console.log('Initializing Reviews Histogram...');
    console.log('window.reviewStats:', window.reviewStats);
    
    const container = d3.select('#reviews-dist');
    container.selectAll('*').remove(); // Clear previous
    
    if (!window.reviewStats) {
      console.warn('reviewStats no definido');
      container.append('div')
        .style('color', '#999')
        .style('margin-top', '8px')
        .text('No review stats');
      return;
    }
    
    let counts = [];
    if (window.reviewStats.score_counts) {
      counts = Object.entries(window.reviewStats.score_counts).map(([score, count]) => ({ score: +score, count }));
    } else {
      console.warn('Estructura inesperada reviewStats');
      container.append('div')
        .style('color', '#999')
        .style('margin-top', '8px')
        .text('Unexpected reviewStats structure');
      return;
    }
    
    if (!counts.length) {
      container.append('div')
        .style('color', '#999')
        .style('margin-top', '8px')
        .text('No review score counts');
      return;
    }
    
    const margin = { top: 20, right: 30, bottom: 40, left: 50 };
    const width = 500 - margin.left - margin.right;
    const height = 300 - margin.top - margin.bottom;
    
    const svg = container.append('svg')
      .attr('width', width + margin.left + margin.right)
      .attr('height', height + margin.top + margin.bottom)
      .attr('role', 'img')
      .attr('aria-label', 'Review score distribution histogram');
    
    const g = svg.append('g')
      .attr('transform', `translate(${margin.left},${margin.top})`);
    
    // Scales
    const xScale = d3.scaleBand()
      .domain(counts.map(d => d.score))
      .range([0, width])
      .padding(0.1);
    
    const yScale = d3.scaleLinear()
      .domain([0, d3.max(counts, d => d.count)])
      .nice()
      .range([height, 0]);
    
    // Axes
    g.append('g')
      .attr('transform', `translate(0,${height})`)
      .call(d3.axisBottom(xScale));
    
    g.append('g')
      .call(d3.axisLeft(yScale));
    
    // Bars
    g.selectAll('.bar')
      .data(counts)
      .enter().append('rect')
      .attr('class', 'bar')
      .attr('x', d => xScale(d.score))
      .attr('width', xScale.bandwidth())
      .attr('y', height)
      .attr('height', 0)
      .attr('fill', 'var(--accent-color)')
      .transition()
      .duration(800)
      .attr('y', d => yScale(d.count))
      .attr('height', d => height - yScale(d.count));
  }
  
  // Initialize when data is ready
  if (window.reviewStats) {
    initReviewsHistogram();
  } else {
    window.addEventListener('d3-data-ready', initReviewsHistogram);
  }
</script>
'''))

In [None]:
from IPython.display import HTML, display

display(HTML('''
<div class="kpi-card">
  <h3>Delivery Time vs Review Score</h3>
  <div id="delivery-review-scatter"></div>
</div>

<script>
  function initDeliveryScatter() {
    console.log('Initializing Delivery Scatter Plot...');
    console.log('window.deliveryStats:', window.deliveryStats);
    
    const container = d3.select('#delivery-review-scatter');
    container.selectAll('*').remove(); // Clear previous
    
    if (!window.deliveryStats) {
      console.warn('deliveryStats no definido');
      container.append('div')
        .style('color', '#999')
        .style('margin-top', '8px')
        .text('No delivery stats');
      return;
    }
    
    const points = window.deliveryStats.delivery_review_correlation || [];
    if (!points.length) {
      console.warn('No hay puntos para scatter');
      container.append('div')
        .style('color', '#999')
        .style('margin-top', '8px')
        .text('No scatter points');
      return;
    }
    
    const margin = { top: 20, right: 30, bottom: 50, left: 60 };
    const width = 500 - margin.left - margin.right;
    const height = 350 - margin.top - margin.bottom;
    
    const svg = container.append('svg')
      .attr('width', width + margin.left + margin.right)
      .attr('height', height + margin.top + margin.bottom)
      .attr('role', 'img')
      .attr('aria-label', 'Delivery time vs review score scatter plot');
    
    const g = svg.append('g')
      .attr('transform', `translate(${margin.left},${margin.top})`);
    
    // Scales
    const xScale = d3.scaleLinear()
      .domain(d3.extent(points, d => d.delivery_days))
      .nice()
      .range([0, width]);
    
    const yScale = d3.scaleLinear()
      .domain(d3.extent(points, d => d.review_score))
      .nice()
      .range([height, 0]);
    
    // Axes
    g.append('g')
      .attr('transform', `translate(0,${height})`)
      .call(d3.axisBottom(xScale))
      .append('text')
      .attr('x', width / 2)
      .attr('y', 35)
      .attr('fill', 'var(--text-color)')
      .style('text-anchor', 'middle')
      .text('Delivery Days');
    
    g.append('g')
      .call(d3.axisLeft(yScale))
      .append('text')
      .attr('transform', 'rotate(-90)')
      .attr('y', -40)
      .attr('x', -height / 2)
      .attr('fill', 'var(--text-color)')
      .style('text-anchor', 'middle')
      .text('Review Score');
    
    // Points
    g.selectAll('.dot')
      .data(points)
      .enter().append('circle')
      .attr('class', 'dot')
      .attr('cx', d => xScale(d.delivery_days))
      .attr('cy', d => yScale(d.review_score))
      .attr('r', 3)
      .attr('fill', 'var(--accent-color)')
      .attr('opacity', 0.6);
  }
  
  // Initialize when data is ready
  if (window.deliveryStats) {
    initDeliveryScatter();
  } else {
    window.addEventListener('d3-data-ready', initDeliveryScatter);
  }
</script>
'''))

### 9. Insights & Recommendations (Preliminary)

Initial observations (auto + manual refinement pending):
- Revenue growth vs comparison year visible in line chart (validate % sign in KPI card).
- Category revenue concentration: top 5 categories likely dominate majority share (validate Pareto ~80/20 once data rendered).
- Review distribution skew will inform whether to prioritize quality or logistics improvements.
- Delivery vs Review scatter: identify thresholds where additional delivery speed no longer improves review scores.
- Geographic disparities (once choropleth implemented) can guide regional logistics or marketing focus.

Next refinement pass should:
1. Quantify top-category cumulative share.
2. Compute correlation between delivery days and review score; surface R value.
3. Flag underperforming regions (below average revenue & below average review).
4. Add filter controls (month range / category) to dynamically recalc JSON exports.

### 10. TODO / Backlog Checklist

- [ ] Implement Choropleth (TopoJSON fetch + color legend).
- [ ] Add dynamic filtering (month range slider) and re-render charts.
- [ ] Compute & embed correlation coefficient for delivery vs review.
- [ ] Accessibility audit (keyboard traversal order, ARIA roles, contrast in dark mode).
- [ ] Performance pass (minify inline JS, consider external JS bundling if size grows).
- [ ] Add download button for aggregated JSON metrics.
- [ ] Unit-test Python data transformations (business rules) separately.
- [ ] Document theming API and data export interface.

---
End of current interactive D3 EDA scaffold. Proceed with backlog items to achieve full spec compliance from `prompt_d3.md`. ✅