# Estudio de los datos de Ventas de una Ferretería

Autor: Diana Chacón Ocariz

## Contexto:

Se trata de una pequeña ferretería que maneja un poco más de 3.000 productos distintos. Poseen un software de gestión genérico que les provee una gran cantidad de reportes, básicamente tablas con números, díficiles de analizar (un reporte puede constar de varias decenas de páginas).


## Objetivos del negocio:

**Tener mayor visibilidad sobre las ventas para poder mejorar el proceso de compras y la toma de decisiones en general:** 

    - Poder analizar objetivamente las ventas
    - Determinar los productos que podrían entrar en rotura de stock al final del período
    - Identificar los productos menos vendidos
    - Identificar patrones en el comportamiento de las ventas para poder hacer predicciones de ventas
    

## Objetivos académicos:

    - Estudiar un caso real, con datos reales y cuyo resultado pueda ayudar a alguien a resolver un problema 
    - Demostrar que la ciencia de datos también puede ayudar a las PYMES
    - Conocer y practicar el uso de herramientas de ciencia de datos
    
## Fuentes de datos:

Los datos provienen de reportes sacados del software de gestión de la empresa. Se trata de archivos .xls que contienen sólo los datos de reportes sobre ventas por producto (2021 y 2022) y el stock al final del período. 

# Notebook 2: EDA: Análisis y Visualización

Recuperamos los datos (ya limpios) de los archivos **parquet** para comenzar un análisis más profundo de los datos, responder preguntas del negocio, verificar si hay patrones en las ventas y preparar los datos para utilizarlos en modelos predictivos.

Utilizaremos varias librerías de visualización con el fin de compararlas: **Matplotlib, Seaborn y Altair**

In [1]:
# Librerías utilizadas

import os
import glob
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

import altair as alt

In [2]:
BASE_DIR = Path.cwd()
BASE_DIR

PosixPath('/home/diana/Documentos/Ciencia de Datos/Proyecto Ventas')

In [3]:
%%time 
df_ventas = pd.read_parquet(f"{BASE_DIR / 'datos/out/ventas.parquet'}", engine='fastparquet')
df_ventas

CPU times: user 3.7 s, sys: 80.2 ms, total: 3.78 s
Wall time: 3.85 s


Unnamed: 0,num,fecha_comp,cliente,vendedor,monto,tipo,cod,producto,cantidad,fecha,tasa_dolar,monto_dolar
0,2020-0000000001-ne,2020-01-07 11:04:00,18018450.0,13,170601.20,ne,,,0.0,2020-01-07,67581.00,2.524396
1,2020-0000000022-fa,2020-01-07 08:57:00,10747595.0,11,1377605.59,fa,,,0.0,2020-01-07,67581.00,20.384510
2,2020-0000000023-fa,2020-01-07 07:52:00,14281493.0,7,623407.20,fa,,,0.0,2020-01-07,67581.00,9.224593
3,2020-0000000024-fa,2020-01-07 09:25:00,19339734.0,7,8520323.59,fa,,,0.0,2020-01-07,67581.00,126.075725
4,2020-0000000025-fa,2020-01-07 09:20:00,16788717.0,13,490901.59,fa,,,0.0,2020-01-07,67581.00,7.263899
...,...,...,...,...,...,...,...,...,...,...,...,...
40916,2022-0000006497-fa,2022-02-05 11:55:00,13763788.0,13,1.19,fa,00809,CONFITERIA TORONTO SAVOY,1.0,2022-02-05,4.60,0.258696
40917,2022-0000006497-fa,2022-02-05 11:55:00,13763788.0,13,1.66,fa,01398,CONFITERIA PASTILLAS CHAO SANDIA/ CEREZA,1.0,2022-02-05,4.60,0.360870
40918,2022-0000006497-fa,2022-02-05 11:55:00,13763788.0,13,6.90,fa,01404,CONFITERIA OREO TUBITO,1.0,2022-02-05,4.60,1.500000
40919,2022-0000006498-fa,2022-02-05 11:58:00,10743720.0,1,69.75,fa,04072,CERRADURA MANILLA RECTA ALUMINIO TOC,1.0,2022-02-05,4.60,15.163043


# Análisis de las Ventas

# Variación de las Ventas en el tiempo

In [4]:
df_ventas_fecha = df_ventas.loc[:,['fecha', 'num', 'cantidad', 'monto_dolar']]
df_ventas_fecha.set_index('fecha', inplace=True)
df_ventas_fecha

Unnamed: 0_level_0,num,cantidad,monto_dolar
fecha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-07,2020-0000000001-ne,0.0,2.524396
2020-01-07,2020-0000000022-fa,0.0,20.384510
2020-01-07,2020-0000000023-fa,0.0,9.224593
2020-01-07,2020-0000000024-fa,0.0,126.075725
2020-01-07,2020-0000000025-fa,0.0,7.263899
...,...,...,...
2022-02-05,2022-0000006497-fa,1.0,0.258696
2022-02-05,2022-0000006497-fa,1.0,0.360870
2022-02-05,2022-0000006497-fa,1.0,1.500000
2022-02-05,2022-0000006498-fa,1.0,15.163043


### Conclusiones:

    1) En promedio de venden 409 productos cada día
    2) El promedio de ventas diario es de 673 dólares
    3) 41 es el promedio de facturas que se hacen diariamente
    4) Se nota un incremento de las ventas a partir del 2do semestre del 2021
    5) Entre marzo y junio de 2021 se resgistraron la menor cantidad de ventas del período estudiado

## Evolución de las Ventas Mes a Mes

In [5]:
# Utilizamos nunique para contabilizar sólo una vez cada referencia de factura
df_ventas_ag = df_ventas.pivot_table(index='fecha', values=['num', 'cantidad', 'monto_dolar'], 
                                        aggfunc={'num':'nunique', 'cantidad':sum, 'monto_dolar':sum })
df_ventas_ag.reset_index(inplace=True)
df_ventas_ag

Unnamed: 0,fecha,cantidad,monto_dolar,num
0,2020-01-06,0.00,104.610817,21
1,2020-01-07,0.00,739.473869,59
2,2020-01-08,0.00,485.385935,64
3,2020-01-09,0.00,515.495960,65
4,2020-01-10,0.00,823.755195,66
...,...,...,...,...
597,2022-02-10,31.00,816.187364,6
598,2022-02-11,111.00,252.264642,8
599,2022-02-12,139.90,176.760259,22
600,2022-02-14,126.10,471.782609,15


In [6]:
df_ventas_ag['mes_anio'] = df_ventas_ag.fecha.dt.strftime('%m-%Y')
df_ventas_ag['dia_semana'] = df_ventas_ag.fecha.dt.weekday
df_ventas_ag['dia_mes'] = df_ventas_ag.fecha.dt.day
df_ventas_ag

Unnamed: 0,fecha,cantidad,monto_dolar,num,mes_anio,dia_semana,dia_mes
0,2020-01-06,0.00,104.610817,21,01-2020,0,6
1,2020-01-07,0.00,739.473869,59,01-2020,1,7
2,2020-01-08,0.00,485.385935,64,01-2020,2,8
3,2020-01-09,0.00,515.495960,65,01-2020,3,9
4,2020-01-10,0.00,823.755195,66,01-2020,4,10
...,...,...,...,...,...,...,...
597,2022-02-10,31.00,816.187364,6,02-2022,3,10
598,2022-02-11,111.00,252.264642,8,02-2022,4,11
599,2022-02-12,139.90,176.760259,22,02-2022,5,12
600,2022-02-14,126.10,471.782609,15,02-2022,0,14


In [7]:
lineas = alt.Chart(df_ventas_ag).mark_line().encode(
    x='fecha:T',
    y='monto_dolar:Q',
    color=alt.Color('yearmonth(fecha):O', scale=alt.Scale(scheme='goldgreen')),
    tooltip=[
        alt.Tooltip('fecha:T', title='Fecha'),
        alt.Tooltip('monto_dolar:Q', title='Ventas en $')
    ]).properties(width=800, height=200)

lineas

In [8]:
lineas = alt.Chart(df_ventas_ag).mark_line().encode(
    x='fecha:T',
    y='cantidad:Q',
    color=alt.Color('yearmonth(fecha):O', scale=alt.Scale(scheme='purpleblue')),
    tooltip=[
        alt.Tooltip('fecha:T', title='Fecha'),
        alt.Tooltip('cantidad:Q', title='Volumen de Venta')
    ]).properties(width=800, height=200)

lineas

In [9]:
lineas = alt.Chart(df_ventas_ag).mark_line().encode(
    x='fecha:T',
    y='num:Q',
    color=alt.Color('yearmonth(fecha):O', scale=alt.Scale(scheme='goldorange')),
    tooltip=[
        alt.Tooltip('fecha:T', title='Fecha'),
        alt.Tooltip('num:Q', title='N° Facturas')
    ]).properties(width=800, height=200)

lineas

In [10]:
df_ventas_dia_semana = df_ventas_ag.pivot_table(index='mes_anio',
                                            columns='dia_semana', aggfunc={'monto_dolar': sum},
                                            fill_value=0)
df_ventas_dia_semana 

Unnamed: 0_level_0,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar
dia_semana,0,1,2,3,4,5,6
mes_anio,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
01-2020,1531.657232,2730.974894,1557.202647,2207.919301,2123.907868,82.358816,333.857234
01-2021,2950.556683,4039.964339,2166.060704,2864.239439,2884.332234,996.294727,0.0
01-2022,5993.008193,6345.862615,4211.000312,3849.191008,4383.430025,4456.175386,0.0
02-2020,1563.44772,2453.31183,2397.020698,1701.697084,4424.442111,0.0,697.923364
02-2021,1559.886233,3585.569561,3141.89273,3451.844984,3229.538314,1140.054082,0.0
02-2022,549.006036,496.764451,473.216378,894.183072,644.988555,194.042868,0.0
03-2020,3220.645464,2652.192226,2903.512438,1892.565451,1937.839969,0.0,1693.88965
03-2021,3298.064658,2713.807813,2568.880136,2812.007238,1635.795558,480.829274,0.0
04-2020,1027.444224,1645.236462,2942.52572,1801.916525,3949.732841,0.0,0.0
04-2021,2468.683163,3202.946005,2310.648747,2619.331624,2941.064728,437.554386,0.0


In [11]:
mapa = alt.Chart(df_ventas_ag).mark_rect().encode(
                x=alt.X('day(fecha):T', title='Días de la Semana', axis = alt.Axis(labelAngle=0, labelFontSize=14)),
                y=alt.Y('yearmonth(fecha):T', title = 'Ventas en $', scale=alt.Scale(zero=False), 
                  axis = alt.Axis(grid=True, titleAnchor='middle', titleAngle = 270, labelFontSize=10)),
                color=alt.Color('sum(monto_dolar):Q', scale=alt.Scale(scheme='goldgreen'), title='Ventas en $'),
                tooltip=[
                    alt.Tooltip('day(fecha):T', title='Día'),
                    alt.Tooltip('yearmonth(fecha):T', title='Mes y Año'),
                    alt.Tooltip('sum(monto_dolar):Q', title='Ventas en $')]
                ).properties(title='Ventas en $ por Día de la Semana',
                             width=300, 
                             height=300
                ).configure_title(
                    fontSize = 16,
                    anchor = 'middle'
                ).interactive()

mapa

In [12]:
barras = alt.Chart(df_ventas_ag).mark_bar().encode(
                x=alt.X('day(fecha):T', title='Días de la Semana', axis = alt.Axis(labelAngle=0, labelFontSize=14)),
                y=alt.Y('sum(monto_dolar):Q', title = 'Ventas en $', scale=alt.Scale(zero=False), 
                  axis = alt.Axis(grid=True, titleAnchor='middle', titleAngle = 270, labelFontSize=10)),
                color=alt.Color(
                    'sum(monto_dolar):Q', scale=alt.Scale(scheme='goldgreen'), title='Ventas en $'),
                tooltip=[
                    alt.Tooltip('day(fecha):T', title='Día'),
                    alt.Tooltip('sum(monto_dolar):Q', title='Ventas en $')],
                ).properties(title='Ventas en $ por Día de la Semana',
                             width=600, 
                             height=300
                ).configure_title(
                    fontSize = 16,
                    anchor = 'middle'
                ).interactive()
barras

In [13]:
df_ventas_dia_mes = df_ventas_ag.pivot_table(index='mes_anio',
                                            columns='dia_mes', aggfunc={'monto_dolar': sum},
                                            fill_value=0)
df_ventas_dia_mes

Unnamed: 0_level_0,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar,monto_dolar
dia_mes,1,2,3,4,5,6,7,8,9,10,...,22,23,24,25,26,27,28,29,30,31
mes_anio,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
01-2020,0.0,0.0,0.0,0.0,0.0,104.610817,739.473869,485.385935,515.49596,823.755195,...,386.757714,398.197755,476.281492,0.0,87.300732,386.07131,180.092693,380.879095,881.460704,348.088407
01-2021,0.0,0.0,0.0,0.0,1025.174371,338.556452,559.099981,406.877705,113.560197,0.0,...,704.632136,241.391261,0.0,1082.488334,1511.613371,545.241111,741.883988,719.801425,228.342753,0.0
01-2022,0.0,0.0,1628.923241,773.260504,854.5,1579.416327,1191.730392,698.201474,0.0,1482.0,...,357.679167,0.0,1219.419624,937.280335,953.359244,1157.382664,916.773784,538.966173,0.0,782.501057
02-2020,0.0,228.427405,591.898212,840.911939,434.875294,582.694724,568.515668,0.0,288.046326,395.750196,...,0.0,0.0,0.0,0.0,1062.540813,514.902254,2931.814066,0.0,0.0,0.0
02-2021,597.742987,928.112308,448.788414,676.658349,1448.390864,285.706649,0.0,528.333392,1052.128996,945.011517,...,433.809853,1605.328256,1137.02534,620.210707,722.559393,0.0,0.0,0.0,0.0,0.0
02-2022,24.791045,109.06823,77.995708,392.723913,17.282609,0.0,77.223427,224.206522,364.148148,816.187364,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
03-2020,102.471183,1569.100584,740.15696,1213.58669,581.002611,646.227529,0.0,1359.571616,940.063855,918.230849,...,0.0,146.73494,377.625383,959.703056,434.917167,396.910467,0.0,0.0,564.746086,616.179035
03-2021,1282.797023,327.117997,521.856075,1258.638821,606.867712,138.317334,0.0,566.644612,696.262891,616.013044,...,286.7857,644.981047,305.002041,691.916384,422.453511,125.785476,0.0,389.73321,387.051264,492.064458
04-2020,429.947573,343.727378,1865.040038,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,748.34513,501.504998,885.166617,0.0,0.0,130.564393,205.237018,543.355017,457.353527,0.0
04-2021,0.0,0.0,0.0,0.0,310.550924,577.616637,288.305906,468.976941,244.816049,144.322024,...,1097.080988,986.918616,170.528262,0.0,1268.441429,874.563691,684.912882,369.473585,1222.950739,0.0


In [14]:
mapa = alt.Chart(df_ventas_ag).mark_rect().encode(
                x=alt.X('dia_mes:O', title='Días del mes', axis = alt.Axis(labelAngle=0, labelFontSize=14)),
                y=alt.Y('yearmonth(fecha):T', title = 'Mes y Año', scale=alt.Scale(zero=False), 
                  axis = alt.Axis(grid=True, titleAnchor='middle', titleAngle = 270, labelFontSize=10)),
                color=alt.Color(
                    'sum(monto_dolar):Q', scale=alt.Scale(scheme='goldgreen'), title='Ventas en $'),
                tooltip=[
                    alt.Tooltip('dia_mes:O', title='Día'),
                    alt.Tooltip('yearmonth(fecha):T', title='Mes y Año'),
                    alt.Tooltip('sum(monto_dolar):Q', format=',.2f', title='Ventas en $'),
                    alt.Tooltip('cantidad:Q', format=',.2f', title='Volumen de Ventas'),
                    alt.Tooltip('num:Q', title='N° Facturas')]
                ).properties(title='Ventas en $ por Día del Mes',
                             width=800,  
                             height=300
                ).configure_title(
                    fontSize = 16,
                    anchor = 'middle'
                ).interactive()

mapa

In [15]:
barras = alt.Chart(df_ventas_ag).mark_bar().encode(
                x=alt.X('dia_mes:O', title='Días del Mes', axis = alt.Axis(labelAngle=0, labelFontSize=14)),
                y=alt.Y('sum(monto_dolar):Q', title = 'Ventas en $', scale=alt.Scale(zero=False), 
                  axis = alt.Axis(grid=True, titleAnchor='middle', titleAngle = 270, labelFontSize=10)),
                color=alt.Color(
                    'sum(monto_dolar):Q', scale=alt.Scale(scheme='goldgreen'), title='Ventas en $'),
                tooltip=[
                    alt.Tooltip('dia_mes:O', title='Día'),
                    alt.Tooltip('sum(monto_dolar):Q', title='Ventas en $')],
                ).properties(title='Ventas en $ por Día del Mes',
                             width=800, 
                             height=300
                ).configure_title(
                    fontSize = 16,
                    anchor = 'middle'
                ).interactive()
barras

Los lunes y martes son los días en los que más se vende en $. El viernes en Volumen

Los jueves son los días que menos se vende, que sea en volumen o en $

### Análisis por mes

In [16]:
lineas = alt.Chart(df_ventas_ag).mark_bar().encode(
    x='dia_mes:O',
    y='monto_dolar:Q',
    color=alt.Color('yearmonth(fecha):T', scale=alt.Scale(scheme='goldgreen')),
    column=alt.Column('yearmonth(fecha):T', title='yearmonth(fecha)'),
    tooltip=[
        alt.Tooltip('fecha:T', title='Fecha'),
        alt.Tooltip('monto_dolar:Q', title='Ventas en $'),
    ]).properties(width=800, height=200)

lineas