# E-commerce Business Analytics (Versión Interactiva D3.js)

Esta versión (`EDA_Refactored_v2.ipynb`) introduce visualizaciones interactivas construidas con **D3.js v7** para enriquecer la exploración de métricas clave: ingresos, categorías, geografía, satisfacción del cliente y rendimiento logístico.

---
## Tabla de Contenidos
1. Introducción y Objetivos
2. Configuración de Parámetros
3. Carga y Procesamiento de Datos
4. Cálculo de Métricas de Negocio
5. Helpers de Integración D3
6. Toggle de Tema (Light/Dark)
7. KPIs Interactivos
8. Línea de Ingresos Mensuales
9. Top Categorías
10. Mapa Choropleth
11. Distribución de Reviews
12. Delivery vs Review (Opcional)
13. Insights y Recomendaciones
14. Checklist TODO / Próximos Pasos

---
### 1. Introducción y Objetivos
Esta notebook replica la lógica analítica del notebook original y añade:
- Visualizaciones interactivas creadas directamente con D3.js.
- Animaciones y transiciones suaves.
- Soporte de tema claro/oscuro.
- Accesibilidad básica (role / aria-label / focus).
- Estructura modular: datos en Python → JSON embebido → render en JS.


In [None]:
# 2. Configuración de Parámetros
ANALYSIS_YEAR = 2023
COMPARISON_YEAR = 2022  # Puede ser None
ANALYSIS_MONTH = None   # 1-12 o None
DATA_PATH = "ecommerce_data/"

print(f"Análisis: {ANALYSIS_YEAR} | Comparación: {COMPARISON_YEAR if COMPARISON_YEAR else 'N/A'} | Mes: {ANALYSIS_MONTH if ANALYSIS_MONTH else 'Año completo'}")

In [None]:
# 3. Carga y Procesamiento de Datos
import pandas as pd
import json
from data_loader import EcommerceDataLoader, load_and_process_data
from business_metrics import BusinessMetricsCalculator

loader, processed_data = load_and_process_data(DATA_PATH)
summary = loader.get_data_summary()
print("Datasets cargados:")
for k, v in summary.items():
    print(f"- {k}: {v['rows']} filas, {v['columns']} columnas")

In [None]:
# 4. Cálculo de Métricas de Negocio
sales_data = loader.create_sales_dataset(year_filter=ANALYSIS_YEAR,
                                         month_filter=ANALYSIS_MONTH,
                                         status_filter='delivered')
comparison_data = None
if COMPARISON_YEAR:
    comparison_data = loader.create_sales_dataset(year_filter=COMPARISON_YEAR,
                                                  month_filter=ANALYSIS_MONTH,
                                                  status_filter='delivered')
combined_data = sales_data if comparison_data is None else loader.create_sales_dataset(month_filter=ANALYSIS_MONTH, status_filter='delivered')
if comparison_data is not None:
    combined_data = combined_data[combined_data['purchase_year'].isin([ANALYSIS_YEAR, COMPARISON_YEAR])]

calc = BusinessMetricsCalculator(combined_data)
report = calc.generate_comprehensive_report(current_year=ANALYSIS_YEAR,
                                            previous_year=COMPARISON_YEAR)

print("Métricas calculadas: keys ->", list(report.keys()))

In [None]:
# 5. Helpers de Integración D3
from IPython.display import HTML, display
import json

def d3_embed_json(var_name: str, df):
    """Devuelve un bloque <script> que define una constante JS con datos del DataFrame."""
    return f"<script>const {var_name} = {json.dumps(df.to_dict(orient='records'))};</script>"

# Preparar datasets derivados para pasar a D3
revenue_monthly = report['monthly_trends'].to_dict(orient='records') if 'monthly_trends' in report else []
product_perf = []
if 'product_performance' in report and 'top_categories' in report['product_performance']:
    product_perf = report['product_performance']['top_categories'].head(15).to_dict(orient='records')
geo_perf = []
if 'geographic_performance' in report and 'error' not in report['geographic_performance'].columns:
    geo_perf = report['geographic_performance'].to_dict(orient='records')
review_stats = report.get('customer_satisfaction', {})
delivery_stats = report.get('delivery_performance', {})

kpi_data = {
    'totalRevenue': report['revenue_metrics']['total_revenue'],
    'totalOrders': report['revenue_metrics']['total_orders'],
    'averageOrderValue': report['revenue_metrics']['average_order_value'],
    'revenueGrowthPct': report['revenue_metrics'].get('revenue_growth_rate'),
    'fastDeliveryPct': delivery_stats.get('fast_delivery_percentage')
}

# Exportar variables JS
blocks = [
    f"<script>const revenueMonthly = {json.dumps(revenue_monthly)};</script>",
    f"<script>const topCategories = {json.dumps(product_perf)};</script>",
    f"<script>const geoPerformance = {json.dumps(geo_perf)};</script>",
    f"<script>const reviewStats = {json.dumps(review_stats)};</script>",
    f"<script>const deliveryStats = {json.dumps(delivery_stats)};</script>",
    f"<script>const kpiData = {json.dumps(kpi_data)};</script>"
]
HTML("\n".join(blocks))

### 6. Toggle de Tema (Light/Dark)
La siguiente celda inserta un switch para alternar tema. Los gráficos D3 deben leer la clase del contenedor para ajustar colores.

In [None]:
from IPython.display import HTML
HTML('''
<style>
  .theme-wrapper { font-family: system-ui, sans-serif; margin: 0.5rem 0 1rem; }
  .theme-wrapper button { padding: 6px 12px; border-radius: 4px; border: 1px solid #888; cursor: pointer; background: var(--btn-bg,#f5f5f5); }
  .dark-mode { --bg:#121212; --fg:#e0e0e0; --panel:#1e1e1e; }
  .light-mode { --bg:#ffffff; --fg:#222222; --panel:#fafafa; }
  body.d3-dark { background: var(--bg); color: var(--fg); }
  .d3-chart text { font-family: system-ui, sans-serif; }
</style>
<div class="theme-wrapper">
  <button id="toggle-theme">Toggle Tema</button>
</div>
<script>
(function(){
  const clsDark = 'd3-dark';
  const btn = document.getElementById('toggle-theme');
  function apply(pref){
    if(pref === 'dark'){ document.body.classList.add(clsDark); }
    else { document.body.classList.remove(clsDark); }
    localStorage.setItem('d3Theme', pref);
    window.dispatchEvent(new CustomEvent('d3-theme-change', { detail: { theme: pref } }));
  }
  const stored = localStorage.getItem('d3Theme') || 'light';
  apply(stored);
  btn.onclick = () => {
    const next = document.body.classList.contains(clsDark) ? 'light' : 'dark';
    apply(next);
  };
})();
</script>
''')

### 7. KPIs Interactivos
Contenedor base para tarjetas KPI animadas con D3.

In [None]:
from IPython.display import HTML
HTML('''
<div id="kpi-container" class="d3-chart" style="display:flex;gap:16px;flex-wrap:wrap;margin:8px 0;">
  <!-- KPIs generados por D3 -->
</div>
<script src="https://cdn.jsdelivr.net/npm/d3@7"></script>
<script>
(function(){
  if(!window.kpiData){ console.warn('kpiData no definido'); return; }
  const root = d3.select('#kpi-container');
  const fmtAbbr = d3.format('.2s');
  const items = [
    { key: 'totalRevenue', label: 'Total Revenue', format: v => '$'+fmtAbbr(v) },
    { key: 'totalOrders', label: 'Total Orders', format: v => fmtAbbr(v) },
    { key: 'averageOrderValue', label: 'AOV', format: v => '$'+d3.format('.2f')(v) },
    { key: 'revenueGrowthPct', label: 'Rev Growth %', format: v => (v==null? '—' : d3.format('+.2f')(v)+'%') },
    { key: 'fastDeliveryPct', label: 'Fast Delivery %', format: v => (v==null? '—' : d3.format('.1f')(v)+'%') }
  ];
  const card = root.selectAll('.kpi-card').data(items).enter().append('div')
    .attr('class','kpi-card')
    .style('padding','12px 16px')
    .style('border','1px solid #ccc')
    .style('border-radius','8px')
    .style('min-width','160px')
    .style('background','var(--panel, #fff)')
    .style('box-shadow','0 1px 3px rgba(0,0,0,0.08)');
  card.append('div').text(d=>d.label)
    .style('font-size','0.75rem')
    .style('letter-spacing','.5px')
    .style('text-transform','uppercase')
    .style('opacity',0.7);
  const value = card.append('div').attr('class','kpi-value')
    .style('font-size','1.6rem')
    .style('font-weight','600')
    .style('line-height','1.2')
    .text('0');
  // Animación simple
  value.each(function(d){
    const target = kpiData[d.key];
    if(target == null || isNaN(target)){ d3.select(this).text('—'); return; }
    const i = d3.interpolateNumber(0, target);
    d3.select(this)
      .transition()
      .duration(900)
      .tween('text', function(){
        return t => { this.textContent = d.format(i(t)); };
      });
  });
  // Tema dinámico
  window.addEventListener('d3-theme-change', e => {
    const isDark = e.detail.theme === 'dark';
    card.style('border-color', isDark? '#555':'#ccc');
  });
})();
</script>
''')

In [None]:
from IPython.display import HTML
HTML('''
<div class="d3-chart" id="revenue-line" style="margin-top:28px;">
  <h4 style="margin:4px 0 8px;">Monthly Revenue vs Comparison Year</h4>
  <svg width="880" height="360" role="img" aria-label="Monthly Revenue Line Chart"></svg>
</div>
<script>
(function(){
  if(!window.revenueMonthly){ console.warn('revenueMonthly no definido'); return; }
  const svg = d3.select('#revenue-line svg');
  const margin = {top:28,right:32,bottom:40,left:64};
  const width = +svg.attr('width') - margin.left - margin.right;
  const height = +svg.attr('height') - margin.top - margin.bottom;
  const g = svg.append('g').attr('transform',`translate(${margin.left},${margin.top})`);
  // Procesar datos
  const parseM = d => +d.month;
  const months = d3.range(1,13);
  const currentYear = revenueMonthly.filter(d=>d.year==='ANALYSIS').map(d=>({month:parseM(d), revenue:d.revenue}));
  const compYear = revenueMonthly.filter(d=>d.year==='COMPARISON').map(d=>({month:parseM(d), revenue:d.revenue}));
  const maxY = d3.max([d3.max(currentYear,d=>d.revenue), d3.max(compYear,d=>d.revenue)]) || 0;
  const x = d3.scalePoint().domain(months).range([0,width]).padding(0.5);
  const y = d3.scaleLinear().domain([0, maxY*1.1]).range([height,0]);
  const line = d3.line().x(d=>x(d.month)).y(d=>y(d.revenue)).curve(d3.curveMonotoneX);
  // Ejes
  g.append('g').attr('transform',`translate(0,${height})`).call(d3.axisBottom(x).tickFormat(d=>d3.timeFormat('%b')(new Date(2024,d-1,1))));
  g.append('g').call(d3.axisLeft(y).ticks(6).tickFormat(d3.format('.2s')));
  g.append('text').attr('x',-40).attr('y',-8).attr('fill','currentColor').style('font-size','0.75rem').text('Revenue');
  // Lineas
  const pathCurrent = g.append('path').datum(currentYear).attr('fill','none').attr('stroke','#2563eb').attr('stroke-width',2).attr('d',line).attr('stroke-dasharray','0,1');
  const pathComp = g.append('path').datum(compYear).attr('fill','none').attr('stroke','#94a3b8').attr('stroke-width',2).attr('d',line).attr('stroke-dasharray','0,1');
  function animatePath(p){
    const len = p.node().getTotalLength();
    p.attr('stroke-dasharray',`${len},${len}`).attr('stroke-dashoffset',len)
     .transition().duration(1200).ease(d3.easeLinear).attr('stroke-dashoffset',0);
  }
  animatePath(pathCurrent); animatePath(pathComp);
  // Puntos con tooltip
  const tooltip = d3.select('#revenue-line').append('div')
    .style('position','absolute').style('pointer-events','none').style('background','rgba(0,0,0,0.75)')
    .style('color','#fff').style('padding','4px 8px').style('border-radius','4px')
    .style('font-size','0.75rem').style('opacity',0);
  function addDots(data,color,label){
    g.selectAll('.dot-'+label).data(data).enter().append('circle')
      .attr('class','dot-'+label).attr('cx',d=>x(d.month)).attr('cy',d=>y(d.revenue)).attr('r',4)
      .attr('fill',color).attr('tabindex',0).on('mouseenter focus', function(e,d){
        tooltip.style('opacity',1).html(`${label}<br>Month ${d.month}: $${d3.format('.2s')(d.revenue)}`);
      }).on('mousemove', function(e){
        tooltip.style('left',(e.pageX+12)+'px').style('top',(e.pageY-28)+'px');
      }).on('mouseleave blur', ()=>tooltip.style('opacity',0));
  }
  addDots(currentYear,'#2563eb','Current');
  addDots(compYear,'#94a3b8','Comparison');
  // Leyenda
  const legend = svg.append('g').attr('transform',`translate(${width-40},10)`);
  [['#2563eb','Current Year'],['#94a3b8','Comparison']].forEach((d,i)=>{
    const lg = legend.append('g').attr('transform',`translate(0,${i*18})`);
    lg.append('rect').attr('width',12).attr('height',12).attr('fill',d[0]);
    lg.append('text').attr('x',16).attr('y',10).style('font-size','0.7rem').text(d[1]);
  });
})();
</script>
''')

In [None]:
from IPython.display import HTML
HTML('''
<div id="categories-bar" class="d3-chart" style="margin-top:42px;">
  <h4 style="margin:4px 0 8px;">Top Categories by Revenue</h4>
  <svg width="760" height="360" role="img" aria-label="Top Categories Bar Chart"></svg>
</div>
<script>
(function(){
  if(!window.topCategories){ console.warn('topCategories no definido'); return; }
  const svg = d3.select('#categories-bar svg');
  const margin = {top:20,right:20,bottom:40,left:180};
  const width = +svg.attr('width') - margin.left - margin.right;
  const height = +svg.attr('height') - margin.top - margin.bottom;
  const g = svg.append('g').attr('transform',`translate(${margin.left},${margin.top})`);
  const data = topCategories.map(d=>({ category: d.category, revenue: +d.revenue })).sort((a,b)=>b.revenue-a.revenue).slice(0,15);
  const y = d3.scaleBand().domain(data.map(d=>d.category)).range([0,height]).padding(0.15);
  const x = d3.scaleLinear().domain([0, d3.max(data,d=>d.revenue)||0]).range([0,width]);
  g.append('g').call(d3.axisLeft(y).tickSize(0)).selectAll('text').style('font-size','0.7rem').call(sel=>sel.each(function(){this.setAttribute('tabindex',0);}));
  g.append('g').attr('transform',`translate(0,${height})`).call(d3.axisBottom(x).ticks(6).tickFormat(d3.format('.2s')));
  const bars = g.selectAll('.bar').data(data).enter().append('rect')
    .attr('class','bar').attr('x',0).attr('y',d=>y(d.category))
    .attr('height',y.bandwidth())
    .attr('width',0)
    .attr('fill','#16a34a')
    .attr('rx',3).attr('ry',3)
    .attr('aria-label',d=>`${d.category} revenue ${d.revenue}`)
    .attr('tabindex',0);
  bars.transition().delay((d,i)=>i*30).duration(900).attr('width',d=>x(d.revenue));
  // Valores
  g.selectAll('.val').data(data).enter().append('text')
    .attr('class','val')
    .attr('x',d=>x(d.revenue)+6)
    .attr('y',d=>y(d.category)+y.bandwidth()/2+4)
    .style('font-size','0.65rem')
    .style('opacity',0)
    .text(d=>d3.format('.2s')(d.revenue))
    .transition().delay((d,i)=>400+i*30).duration(600).style('opacity',0.9);
  // Foco accesible
  d3.select('#categories-bar').selectAll('rect.bar').on('focus', function(e,d){
    d3.select(this).attr('stroke','#000').attr('stroke-width',1.5);
  }).on('blur', function(){ d3.select(this).attr('stroke','none'); });
})();
</script>
''')

### 8. Choropleth (Geographic Performance)

Placeholder: Will render a D3 choropleth using a TopoJSON map (e.g., Brazil states or US states) joined with `geoPerformance` dataset.

Data keys expected:
- geoPerformance: array of objects with keys `region`, `revenue`, `avgReview`, `orders`.

Implementation Steps (TODO):
1. Load TopoJSON via `fetch` (single-shot, cache in `window.__topology`).
2. Convert to GeoJSON features.
3. Create a color scale (quantize or sequential) based on revenue.
4. Append `<path>` elements, set `d` attribute with geoPath.
5. Tooltip on hover: region + revenue (formatted) + avgReview + orders.
6. Keyboard navigation: `tabindex=0` on each path; focus style (stroke highlight).
7. Legend: horizontal gradient or discrete boxes with labels.
8. Theme adaptation: stroke color / background.

Performance Consideration:
- Use simplified TopoJSON to keep path complexity low (< 500KB).

In [None]:
from IPython.display import HTML
HTML('''
<div id="reviews-dist" class="d3-chart" style="margin-top:42px;">
  <h4 style="margin:4px 0 8px;">Review Score Distribution</h4>
  <svg width="560" height="320" role="img" aria-label="Histogram of review scores"></svg>
</div>
<script>
(function(){
  if(!window.reviewStats){ console.warn('reviewStats no definido'); return; }
  // Esperamos que reviewStats.scores exista; si no, intentar derivar.
  let counts = [];
  if(reviewStats.reviewScoreCounts){
    counts = reviewStats.reviewScoreCounts; // [{score: X, count: Y}]
  } else if(reviewStats.scores){
    const by = d3.rollup(reviewStats.scores, v=>v.length, d=>d);
    counts = Array.from(by, ([score,count])=>({score:+score,count}));
  } else { console.warn('Estructura inesperada reviewStats'); return; }
  counts = counts.sort((a,b)=>a.score-b.score);
  const svg = d3.select('#reviews-dist svg');
  const margin = {top:20,right:16,bottom:40,left:48};
  const width = +svg.attr('width') - margin.left - margin.right;
  const height = +svg.attr('height') - margin.top - margin.bottom;
  const g = svg.append('g').attr('transform',`translate(${margin.left},${margin.top})`);
  const x = d3.scaleBand().domain(counts.map(d=>d.score)).range([0,width]).padding(0.2);
  const y = d3.scaleLinear().domain([0, d3.max(counts,d=>d.count)||0]).nice().range([height,0]);
  g.append('g').attr('transform',`translate(0,${height})`).call(d3.axisBottom(x));
  g.append('g').call(d3.axisLeft(y).ticks(5).tickFormat(d3.format('.2s')));
  const bar = g.selectAll('.revbar').data(counts).enter().append('rect')
    .attr('class','revbar').attr('x',d=>x(d.score)).attr('y',height)
    .attr('width',x.bandwidth()).attr('height',0).attr('fill','#f59e0b')
    .attr('rx',2).attr('role','img').attr('aria-label',d=>`Score ${d.score} count ${d.count}`)
    .attr('tabindex',0);
  bar.transition().duration(900).attr('y',d=>y(d.count)).attr('height',d=>height - y(d.count));
  g.selectAll('.revlabel').data(counts).enter().append('text')
    .attr('x',d=>x(d.score)+x.bandwidth()/2).attr('y',d=>y(d.count)-4)
    .attr('text-anchor','middle').style('font-size','0.65rem').style('opacity',0)
    .text(d=>d.count)
    .transition().delay(600).duration(500).style('opacity',0.9);
  // Focus highlight
  bar.on('focus', function(){ d3.select(this).attr('stroke','#000').attr('stroke-width',1.5); })
     .on('blur', function(){ d3.select(this).attr('stroke','none'); });
})();
</script>
''')

In [None]:
from IPython.display import HTML
HTML('''
<div id="delivery-review-scatter" class="d3-chart" style="margin-top:42px;">
  <h4 style="margin:4px 0 8px;">Delivery Time vs Review Score</h4>
  <svg width="640" height="380" role="img" aria-label="Scatter plot delivery time vs review score"></svg>
</div>
<script>
(function(){
  if(!window.deliveryStats){ console.warn('deliveryStats no definido'); return; }
  // Esperamos deliveryStats.points = [{delivery_days: X, review_score: Y}]
  let points = [];
  if(deliveryStats.points){ points = deliveryStats.points; }
  else if(deliveryStats.raw){ points = deliveryStats.raw.map(d=>({delivery_days:+d.delivery_days, review_score:+d.review_score})); }
  if(!points.length){ console.warn('No hay puntos para scatter'); return; }
  const svg = d3.select('#delivery-review-scatter svg');
  const margin = {top:24,right:32,bottom:48,left:56};
  const width = +svg.attr('width') - margin.left - margin.right;
  const height = +svg.attr('height') - margin.top - margin.bottom;
  const g = svg.append('g').attr('transform',`translate(${margin.left},${margin.top})`);
  const x = d3.scaleLinear().domain(d3.extent(points,d=>d.delivery_days)).nice().range([0,width]);
  const y = d3.scaleLinear().domain([0,5]).range([height,0]);
  g.append('g').attr('transform',`translate(0,${height})`).call(d3.axisBottom(x));
  g.append('g').call(d3.axisLeft(y).ticks(5));
  g.append('text').attr('x',width/2).attr('y',height+40).attr('text-anchor','middle').style('font-size','0.75rem').text('Delivery Days');
  g.append('text').attr('transform','rotate(-90)').attr('x',-height/2).attr('y',-42).attr('text-anchor','middle').style('font-size','0.75rem').text('Review Score');
  const color = d3.scaleSequential(d3.interpolateRdYlGn).domain([1,5]);
  const rScale = d3.scaleSqrt().domain([0, d3.max(points,d=>d.review_score)||5]).range([3,9]);
  const dots = g.selectAll('.dot').data(points).enter().append('circle')
    .attr('class','dot').attr('cx',d=>x(d.delivery_days)).attr('cy',d=>y(d.review_score))
    .attr('r',0).attr('fill',d=>color(d.review_score)).attr('fill-opacity',0.85)
    .attr('tabindex',0).attr('aria-label',d=>`Delivery ${d.delivery_days} days review ${d.review_score}`);
  dots.transition().duration(800).attr('r',d=>rScale(d.review_score));
  const tooltip = d3.select('#delivery-review-scatter').append('div')
    .style('position','absolute').style('pointer-events','none').style('background','rgba(0,0,0,0.75)')
    .style('color','#fff').style('padding','4px 8px').style('border-radius','4px')
    .style('font-size','0.7rem').style('opacity',0);
  dots.on('mouseenter focus', function(e,d){
    tooltip.style('opacity',1).html(`Delivery: ${d.delivery_days}d<br>Review: ${d.review_score}`);
  }).on('mousemove', function(e){ tooltip.style('left',(e.pageX+12)+'px').style('top',(e.pageY-28)+'px'); })
    .on('mouseleave blur', ()=> tooltip.style('opacity',0));
})();
</script>
''')

### 9. Insights & Recommendations (Preliminary)

Initial observations (auto + manual refinement pending):
- Revenue growth vs comparison year visible in line chart (validate % sign in KPI card).
- Category revenue concentration: top 5 categories likely dominate majority share (validate Pareto ~80/20 once data rendered).
- Review distribution skew will inform whether to prioritize quality or logistics improvements.
- Delivery vs Review scatter: identify thresholds where additional delivery speed no longer improves review scores.
- Geographic disparities (once choropleth implemented) can guide regional logistics or marketing focus.

Next refinement pass should:
1. Quantify top-category cumulative share.
2. Compute correlation between delivery days and review score; surface R value.
3. Flag underperforming regions (below average revenue & below average review).
4. Add filter controls (month range / category) to dynamically recalc JSON exports.

### 10. TODO / Backlog Checklist

- [ ] Implement Choropleth (TopoJSON fetch + color legend).
- [ ] Add dynamic filtering (month range slider) and re-render charts.
- [ ] Compute & embed correlation coefficient for delivery vs review.
- [ ] Accessibility audit (keyboard traversal order, ARIA roles, contrast in dark mode).
- [ ] Performance pass (minify inline JS, consider external JS bundling if size grows).
- [ ] Add download button for aggregated JSON metrics.
- [ ] Unit-test Python data transformations (business rules) separately.
- [ ] Document theming API and data export interface.

---
End of current interactive D3 EDA scaffold. Proceed with backlog items to achieve full spec compliance from `prompt_d3.md`. ✅