# Mandatory Challenge
## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the dataset `warehouse_and_retail_sales` from Ironhack's database. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per supplier.
    - A table for the aggregate per item.

## Instructions
* Read the csv you can find in Ironhack's database.
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [1]:
# your code here

import pandas as pd
import numpy as np

In [2]:
# Creamos el DataFrame y lo visualizamos
df = pd.read_csv('Warehouse_and_Retail_Sales_20240205.csv')

df

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,2020,1,REPUBLIC NATIONAL DISTRIBUTING CO,100009,BOOTLEG RED - 750ML,WINE,0.00,0.0,2.00
1,2020,1,PWSWN INC,100024,MOMENT DE PLAISIR - 750ML,WINE,0.00,1.0,4.00
2,2020,1,RELIABLE CHURCHILL LLLP,1001,S SMITH ORGANIC PEAR CIDER - 18.7OZ,BEER,0.00,0.0,1.00
3,2020,1,LANTERNA DISTRIBUTORS INC,100145,SCHLINK HAUS KABINETT - 750ML,WINE,0.00,0.0,1.00
4,2020,1,DIONYSOS IMPORTS INC,100293,SANTORINI GAVALA WHITE - 750ML,WINE,0.82,0.0,0.00
...,...,...,...,...,...,...,...,...,...
307640,2020,9,DOPS INC,97896,ST PETERS ORGANIC ENG ALE NR 12/CS - 16.9OZ,BEER,0.00,0.0,1.00
307641,2020,9,ANHEUSER BUSCH INC,97918,STELLA ARTOIS 2/12 NR - 11.2OZ,BEER,372.45,315.0,3586.88
307642,2020,9,HEINEKEN USA,97942,TECATE 4/6 LNNR - 12OZ,BEER,7.79,0.0,4.00
307643,2020,9,RELIABLE CHURCHILL LLLP,97950,S SMITH WINTER WELCOME NR 12/CS - 18.7OZ,BEER,0.00,0.0,2.00


In [3]:
# Comprobamos si hay datos nulos
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307645 entries, 0 to 307644
Data columns (total 9 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   YEAR              307645 non-null  int64  
 1   MONTH             307645 non-null  int64  
 2   SUPPLIER          307478 non-null  object 
 3   ITEM CODE         307645 non-null  object 
 4   ITEM DESCRIPTION  307645 non-null  object 
 5   ITEM TYPE         307644 non-null  object 
 6   RETAIL SALES      307642 non-null  float64
 7   RETAIL TRANSFERS  307645 non-null  float64
 8   WAREHOUSE SALES   307645 non-null  float64
dtypes: float64(3), int64(2), object(4)
memory usage: 21.1+ MB


In [4]:
nan_cols = df.isna().sum() / df.shape[0]*100

In [5]:
nan_cols[nan_cols>0]

SUPPLIER        0.054283
ITEM TYPE       0.000325
RETAIL SALES    0.000975
dtype: float64

In [6]:
df.shape

(307645, 9)

In [7]:
nan_df = df[(df.SUPPLIER.isna())|(df['ITEM TYPE'].isna()) | (df['RETAIL SALES'].isna())]

In [8]:
nan_df.head()

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
106,2020,1,,107,JIGGER MEASURE SHOT GLASS,STR_SUPPLIES,14.69,18.0,0.0
188,2020,1,,113,BARTENDERS BLACK BOOK,STR_SUPPLIES,0.4,0.0,0.0
231,2020,1,,115,PLASTIC SHOT GLASS PACK,STR_SUPPLIES,5.71,6.0,0.0
252,2020,1,,117,WHISKEY TASTING JOURNAL,STR_SUPPLIES,0.08,0.0,0.0
261,2020,1,,118,PLASTIC WINE GLASS PACK,STR_SUPPLIES,7.4,10.0,0.0


In [9]:
df = df.dropna()

In [10]:
df.shape

(307477, 9)

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 307477 entries, 0 to 307644
Data columns (total 9 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   YEAR              307477 non-null  int64  
 1   MONTH             307477 non-null  int64  
 2   SUPPLIER          307477 non-null  object 
 3   ITEM CODE         307477 non-null  object 
 4   ITEM DESCRIPTION  307477 non-null  object 
 5   ITEM TYPE         307477 non-null  object 
 6   RETAIL SALES      307477 non-null  float64
 7   RETAIL TRANSFERS  307477 non-null  float64
 8   WAREHOUSE SALES   307477 non-null  float64
dtypes: float64(3), int64(2), object(4)
memory usage: 23.5+ MB


In [12]:
nan_df.head()

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
106,2020,1,,107,JIGGER MEASURE SHOT GLASS,STR_SUPPLIES,14.69,18.0,0.0
188,2020,1,,113,BARTENDERS BLACK BOOK,STR_SUPPLIES,0.4,0.0,0.0
231,2020,1,,115,PLASTIC SHOT GLASS PACK,STR_SUPPLIES,5.71,6.0,0.0
252,2020,1,,117,WHISKEY TASTING JOURNAL,STR_SUPPLIES,0.08,0.0,0.0
261,2020,1,,118,PLASTIC WINE GLASS PACK,STR_SUPPLIES,7.4,10.0,0.0


In [13]:
df.groupby('SUPPLIER')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x1216a8190>

In [14]:
df.groupby('SUPPLIER').sum().head()

Unnamed: 0_level_0,YEAR,MONTH,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
SUPPLIER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
8 VINI INC,18154,74,3313483313483313483313483283823313483313483313...,SECOLI RIPASSO DELIO VAL DOC - 750MLSECOLI RIP...,WINEWINEWINEWINEWINEWINEWINEWINEWINE,2.53,2.0,1.0
A HARDY USA LTD,8068,33,44262442624426244262,BUNRATTY POTCHEEN GLASS - 750MLBUNRATTY POTCHE...,LIQUORLIQUORLIQUORLIQUOR,0.56,0.0,0.0
A I G WINE & SPIRITS,143285,450,1192291202003354521192291202003354521192291202...,TROCADERO SPARK(BRUT) - 750MLDOM DES FONTANELL...,WINEWINEWINEWINEWINEWINEWINEWINEWINEWINEWINELI...,13.24,4.92,197.0
A VINTNERS SELECTIONS,20162849,65284,1487171015321015671030551036081056511086341087...,YAEGAKI SAKE - 18LHATSUMAGO SAKE JUN MAI SHU -...,WINEWINEWINEWINEWINEWINEWINELIQUORWINEWINEWINE...,9482.87,8238.29,35241.97
A&E INC,84732,311,3118433126453130433387524628931264531304331184...,ORLANDO ABRIGO BARBERA D'ALBA - 750MLVILLA VIS...,WINEWINEWINEWINEWINEWINEWINEWINEWINEWINEWINEWI...,11.49,0.08,0.0


In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 307477 entries, 0 to 307644
Data columns (total 9 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   YEAR              307477 non-null  int64  
 1   MONTH             307477 non-null  int64  
 2   SUPPLIER          307477 non-null  object 
 3   ITEM CODE         307477 non-null  object 
 4   ITEM DESCRIPTION  307477 non-null  object 
 5   ITEM TYPE         307477 non-null  object 
 6   RETAIL SALES      307477 non-null  float64
 7   RETAIL TRANSFERS  307477 non-null  float64
 8   WAREHOUSE SALES   307477 non-null  float64
dtypes: float64(3), int64(2), object(4)
memory usage: 23.5+ MB


In [16]:
df.groupby('SUPPLIER').mean(numeric_only=True).head()

Unnamed: 0_level_0,YEAR,MONTH,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
SUPPLIER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
8 VINI INC,2017.111111,8.222222,0.281111,0.222222,0.111111
A HARDY USA LTD,2017.0,8.25,0.14,0.0,0.0
A I G WINE & SPIRITS,2018.098592,6.338028,0.186479,0.069296,2.774648
A VINTNERS SELECTIONS,2017.495397,6.532319,0.948856,0.824324,3.526313
A&E INC,2017.428571,7.404762,0.273571,0.001905,0.0


In [17]:
df._get_numeric_data().head()

Unnamed: 0,YEAR,MONTH,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,2020,1,0.0,0.0,2.0
1,2020,1,0.0,1.0,4.0
2,2020,1,0.0,0.0,1.0
3,2020,1,0.0,0.0,1.0
4,2020,1,0.82,0.0,0.0


In [18]:
df.select_dtypes('number').head()

Unnamed: 0,YEAR,MONTH,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,2020,1,0.0,0.0,2.0
1,2020,1,0.0,1.0,4.0
2,2020,1,0.0,0.0,1.0
3,2020,1,0.0,0.0,1.0
4,2020,1,0.82,0.0,0.0


In [19]:
df.select_dtypes('object').head()

Unnamed: 0,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE
0,REPUBLIC NATIONAL DISTRIBUTING CO,100009,BOOTLEG RED - 750ML,WINE
1,PWSWN INC,100024,MOMENT DE PLAISIR - 750ML,WINE
2,RELIABLE CHURCHILL LLLP,1001,S SMITH ORGANIC PEAR CIDER - 18.7OZ,BEER
3,LANTERNA DISTRIBUTORS INC,100145,SCHLINK HAUS KABINETT - 750ML,WINE
4,DIONYSOS IMPORTS INC,100293,SANTORINI GAVALA WHITE - 750ML,WINE


In [20]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
YEAR,307477.0,2018.438238,1.08308,2017.0,2017.0,2019.0,2019.0,2020.0
MONTH,307477.0,6.424064,3.461853,1.0,3.0,7.0,9.0,12.0
RETAIL SALES,307477.0,7.003644,30.387012,-6.49,0.0,0.32,3.26,1816.49
RETAIL TRANSFERS,307477.0,6.938177,30.244239,-38.49,0.0,0.0,3.0,1990.83
WAREHOUSE SALES,307477.0,25.375561,249.500572,-4996.0,0.0,1.0,5.0,18317.0


In [21]:
df.describe(include='object').T

Unnamed: 0,count,unique,top,freq
SUPPLIER,307477,396,REPUBLIC NATIONAL DISTRIBUTING CO,20994
ITEM CODE,307477,34039,81321,24
ITEM DESCRIPTION,307477,34805,BURGANS ALBARINO - 750ML,44
ITEM TYPE,307477,8,WINE,187640


In [22]:
df['ITEM CODE'][0]

'100009'

In [23]:
df.groupby('SUPPLIER').mean(numeric_only=True).head()

Unnamed: 0_level_0,YEAR,MONTH,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
SUPPLIER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
8 VINI INC,2017.111111,8.222222,0.281111,0.222222,0.111111
A HARDY USA LTD,2017.0,8.25,0.14,0.0,0.0
A I G WINE & SPIRITS,2018.098592,6.338028,0.186479,0.069296,2.774648
A VINTNERS SELECTIONS,2017.495397,6.532319,0.948856,0.824324,3.526313
A&E INC,2017.428571,7.404762,0.273571,0.001905,0.0


In [24]:
df.groupby('SUPPLIER').sum(numeric_only=False).head()

Unnamed: 0_level_0,YEAR,MONTH,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
SUPPLIER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
8 VINI INC,18154,74,3313483313483313483313483283823313483313483313...,SECOLI RIPASSO DELIO VAL DOC - 750MLSECOLI RIP...,WINEWINEWINEWINEWINEWINEWINEWINEWINE,2.53,2.0,1.0
A HARDY USA LTD,8068,33,44262442624426244262,BUNRATTY POTCHEEN GLASS - 750MLBUNRATTY POTCHE...,LIQUORLIQUORLIQUORLIQUOR,0.56,0.0,0.0
A I G WINE & SPIRITS,143285,450,1192291202003354521192291202003354521192291202...,TROCADERO SPARK(BRUT) - 750MLDOM DES FONTANELL...,WINEWINEWINEWINEWINEWINEWINEWINEWINEWINEWINELI...,13.24,4.92,197.0
A VINTNERS SELECTIONS,20162849,65284,1487171015321015671030551036081056511086341087...,YAEGAKI SAKE - 18LHATSUMAGO SAKE JUN MAI SHU -...,WINEWINEWINEWINEWINEWINEWINELIQUORWINEWINEWINE...,9482.87,8238.29,35241.97
A&E INC,84732,311,3118433126453130433387524628931264531304331184...,ORLANDO ABRIGO BARBERA D'ALBA - 750MLVILLA VIS...,WINEWINEWINEWINEWINEWINEWINEWINEWINEWINEWINEWI...,11.49,0.08,0.0


In [25]:
grupo_supplier = df.groupby('SUPPLIER').agg({'YEAR': 'max', 
                            'MONTH': 'min', 
                            'ITEM CODE': 'first', 
                            'ITEM DESCRIPTION': 'last',
                            'ITEM TYPE': 'max', 
                            'RETAIL SALES': 'mean', 
                            'RETAIL TRANSFERS':'mean', 
                            'WAREHOUSE SALES': 'mean'})

In [26]:
grupo_supplier

Unnamed: 0_level_0,YEAR,MONTH,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
SUPPLIER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
8 VINI INC,2018,1,331348,SECOLI RIPASSO DELIO VAL DOC - 750ML,WINE,0.281111,0.222222,0.111111
A HARDY USA LTD,2017,6,44262,BUNRATTY POTCHEEN GLASS - 750ML,LIQUOR,0.140000,0.000000,0.000000
A I G WINE & SPIRITS,2020,1,119229,DOM DES FONTANELLE S/BLC - 750ML,WINE,0.186479,0.069296,2.774648
A VINTNERS SELECTIONS,2019,1,148717,ROOT 1 CAB - 750ML,WINE,0.948856,0.824324,3.526313
A&E INC,2019,1,311843,TENUTE RUBINO SUSUMANIELLO OLTREME 13 - 750ML,WINE,0.273571,0.001905,0.000000
...,...,...,...,...,...,...,...,...
WITH MALUS AFORETHOUGHT LLC,2020,9,356900,LIFE'S A PEACH 6/4 CANS,BEER,0.000000,0.000000,7.500000
YOUNG WON TRADING INC,2020,1,10345,HAKUSHIKA JUNMAI GINJO - 900ML,WINE,1.733575,1.693499,3.627084
YUENGLING BREWERY,2020,1,23733,YUENGLING LAGER 1/2K,KEGS,50.169281,48.806383,292.261133
Z WINE GALLERY IMPORTS LLC,2020,1,332104,ALAIN GEOFFREY CHAB - 750ML,WINE,0.440909,0.414773,0.681818


In [27]:
grupo_supplier = grupo_supplier.reset_index()

In [28]:
grupo_supplier.head()

Unnamed: 0,SUPPLIER,YEAR,MONTH,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,8 VINI INC,2018,1,331348,SECOLI RIPASSO DELIO VAL DOC - 750ML,WINE,0.281111,0.222222,0.111111
1,A HARDY USA LTD,2017,6,44262,BUNRATTY POTCHEEN GLASS - 750ML,LIQUOR,0.14,0.0,0.0
2,A I G WINE & SPIRITS,2020,1,119229,DOM DES FONTANELLE S/BLC - 750ML,WINE,0.186479,0.069296,2.774648
3,A VINTNERS SELECTIONS,2019,1,148717,ROOT 1 CAB - 750ML,WINE,0.948856,0.824324,3.526313
4,A&E INC,2019,1,311843,TENUTE RUBINO SUSUMANIELLO OLTREME 13 - 750ML,WINE,0.273571,0.001905,0.0


In [29]:
grupo_supplier_2 = df.groupby('SUPPLIER').agg({'RETAIL SALES': ['min', 'mean', 'std', 'max']}).head()

In [30]:
grupo_supplier_2

Unnamed: 0_level_0,RETAIL SALES,RETAIL SALES,RETAIL SALES,RETAIL SALES
Unnamed: 0_level_1,min,mean,std,max
SUPPLIER,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
8 VINI INC,0.0,0.281111,0.340971,1.07
A HARDY USA LTD,0.08,0.14,0.04,0.16
A I G WINE & SPIRITS,0.0,0.186479,0.340676,1.67
A VINTNERS SELECTIONS,-1.84,0.948856,3.280589,80.88
A&E INC,0.0,0.273571,0.276933,1.41


In [31]:
grupo_supplier_2.columns

MultiIndex([('RETAIL SALES',  'min'),
            ('RETAIL SALES', 'mean'),
            ('RETAIL SALES',  'std'),
            ('RETAIL SALES',  'max')],
           )

In [32]:
grupo_supplier_2[('RETAIL SALES', 'mean')]

SUPPLIER
8 VINI INC               0.281111
A HARDY USA LTD          0.140000
A I G WINE & SPIRITS     0.186479
A VINTNERS SELECTIONS    0.948856
A&E INC                  0.273571
Name: (RETAIL SALES, mean), dtype: float64

In [34]:
df.head()['ITEM TYPE'].to_list()

['WINE', 'WINE', 'BEER', 'WINE', 'WINE']

In [36]:
def moda(lst):
    
    lst = lst.to_list()
    
    letra = ''
    
    num = 0
    
    for e in lst:
        
        if lst.count(e) > num:
            
            letra = e
            num = lst.count(e)
        
    
    return letra

In [37]:
df.groupby('SUPPLIER').agg({'ITEM TYPE': moda}).head()

Unnamed: 0_level_0,ITEM TYPE
SUPPLIER,Unnamed: 1_level_1
8 VINI INC,WINE
A HARDY USA LTD,LIQUOR
A I G WINE & SPIRITS,WINE
A VINTNERS SELECTIONS,WINE
A&E INC,WINE


In [38]:
grupo_supplier_2.reset_index().to_csv('supplier_group.csv')

In [39]:
pd.read_csv('supplier_group.csv')

Unnamed: 0.1,Unnamed: 0,SUPPLIER,RETAIL SALES,RETAIL SALES.1,RETAIL SALES.2,RETAIL SALES.3
0,,,min,mean,std,max
1,0.0,8 VINI INC,0.0,0.28111111111111114,0.3409708361592104,1.07
2,1.0,A HARDY USA LTD,0.08,0.14,0.04000000000000001,0.16
3,2.0,A I G WINE & SPIRITS,0.0,0.18647887323943663,0.3406762123925603,1.67
4,3.0,A VINTNERS SELECTIONS,-1.84,0.9488563137882731,3.2805886859304656,80.88
5,4.0,A&E INC,0.0,0.2735714285714286,0.27693286523395866,1.41


In [40]:
pd.read_csv('supplier_group.csv').to_csv('supplier_group.csv')

In [41]:
pd.read_csv('supplier_group.csv')

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,SUPPLIER,RETAIL SALES,RETAIL SALES.1,RETAIL SALES.2,RETAIL SALES.3
0,0,,,min,mean,std,max
1,1,0.0,8 VINI INC,0.0,0.28111111111111114,0.3409708361592104,1.07
2,2,1.0,A HARDY USA LTD,0.08,0.14,0.04000000000000001,0.16
3,3,2.0,A I G WINE & SPIRITS,0.0,0.18647887323943663,0.3406762123925603,1.67
4,4,3.0,A VINTNERS SELECTIONS,-1.84,0.9488563137882731,3.2805886859304656,80.88
5,5,4.0,A&E INC,0.0,0.2735714285714286,0.27693286523395866,1.41


In [42]:
df.groupby('ITEM TYPE')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x1223d2fd0>

In [43]:
df.groupby('ITEM TYPE').sum(numeric_only=True)

Unnamed: 0_level_0,YEAR,MONTH,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
ITEM TYPE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
BEER,85610635,270898,574220.53,566714.0,6527236.51
DUNNAGE,145332,456,0.0,0.0,-121307.0
KEGS,20478724,64836,0.0,-1.0,118431.0
LIQUOR,131015604,414463,802691.43,794735.71,94906.27
NON-ALCOHOL,3833287,12026,27150.31,26666.38,26149.59
REF,159440,505,663.63,388.92,0.0
STR_SUPPLIES,641905,2028,2234.9,10207.66,0.0
WINE,378738407,1210040,746498.59,734618.04,1156984.91


In [44]:
df[df['WAREHOUSE SALES']<0].head()

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
822,2020,1,PREMIUM DISTRIBUTORS INC,175,EMPTY 1/2 KEG (30.00),DUNNAGE,0.0,0.0,-3999.0
1011,2020,1,ANHEUSER BUSCH INC,205,EMPTY 1/6 KEG (30.00),DUNNAGE,0.0,0.0,-934.0
1552,2020,1,FIVE GRAPES LLC,236502,J DE TELMONT SANS SOUFRE AJOUTE BRUT CHAMP - 7...,WINE,0.0,0.0,-1.0
2144,2020,1,LEGENDS LTD,25049,ERDINGER HEFE WEISSE 50L 1/2K,KEGS,0.0,0.0,-1.0
2305,2020,1,MILLER BREWING COMPANY,26350,LEINENKUGEL SNOW DRIFT VANILLA PRTR 1/2 KG,KEGS,0.0,0.0,-1.0


In [45]:
df.groupby('ITEM TYPE').agg({'SUPPLIER': 'first',
                             'YEAR': 'min',
                             'MONTH': 'max',
                             'WAREHOUSE SALES': 'mean'
                            }).reset_index()

Unnamed: 0,ITEM TYPE,SUPPLIER,YEAR,MONTH,WAREHOUSE SALES
0,BEER,RELIABLE CHURCHILL LLLP,2017,12,153.897072
1,DUNNAGE,PREMIUM DISTRIBUTORS INC,2017,12,-1684.819444
2,KEGS,LEGENDS LTD,2017,12,11.672679
3,LIQUOR,JIM BEAM BRANDS CO,2017,12,1.462121
4,NON-ALCOHOL,AMERICAN BEVERAGE MARKETERS,2017,12,13.77019
5,REF,Default,2017,12,0.0
6,STR_SUPPLIES,Default,2017,12,0.0
7,WINE,REPUBLIC NATIONAL DISTRIBUTING CO,2017,12,6.165982


In [47]:
df_item = df.groupby('ITEM TYPE').agg({'SUPPLIER': ['first', 'last'],
                             'YEAR': ['min', 'max'],
                             'MONTH': ['min', 'max'],
                             'WAREHOUSE SALES': ['mean', 'std']
                            }).reset_index()

In [48]:
df_item.to_excel('item.xlsx')

In [49]:
df.groupby('ITEM TYPE').sum(numeric_only=True)

Unnamed: 0_level_0,YEAR,MONTH,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
ITEM TYPE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
BEER,85610635,270898,574220.53,566714.0,6527236.51
DUNNAGE,145332,456,0.0,0.0,-121307.0
KEGS,20478724,64836,0.0,-1.0,118431.0
LIQUOR,131015604,414463,802691.43,794735.71,94906.27
NON-ALCOHOL,3833287,12026,27150.31,26666.38,26149.59
REF,159440,505,663.63,388.92,0.0
STR_SUPPLIES,641905,2028,2234.9,10207.66,0.0
WINE,378738407,1210040,746498.59,734618.04,1156984.91
