# Mandatory Challenge
## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the dataset `warehouse_and_retail_sales` from Ironhack's database. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per supplier.
    - A table for the aggregate per item.

## Instructions
* Read the csv you can find in Ironhack's database.
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [1]:
# your code here
import numpy as np
import pandas as pd

In [2]:
sales =pd.read_csv('Warehouse_and_Retail_Sales_20240205.csv')
sales.head()

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,2020,1,REPUBLIC NATIONAL DISTRIBUTING CO,100009,BOOTLEG RED - 750ML,WINE,0.0,0.0,2.0
1,2020,1,PWSWN INC,100024,MOMENT DE PLAISIR - 750ML,WINE,0.0,1.0,4.0
2,2020,1,RELIABLE CHURCHILL LLLP,1001,S SMITH ORGANIC PEAR CIDER - 18.7OZ,BEER,0.0,0.0,1.0
3,2020,1,LANTERNA DISTRIBUTORS INC,100145,SCHLINK HAUS KABINETT - 750ML,WINE,0.0,0.0,1.0
4,2020,1,DIONYSOS IMPORTS INC,100293,SANTORINI GAVALA WHITE - 750ML,WINE,0.82,0.0,0.0


In [3]:
sales.isna().sum()

YEAR                  0
MONTH                 0
SUPPLIER            167
ITEM CODE             0
ITEM DESCRIPTION      0
ITEM TYPE             1
RETAIL SALES          3
RETAIL TRANSFERS      0
WAREHOUSE SALES       0
dtype: int64

In [4]:
sales[sales['SUPPLIER'].isna()]




Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
106,2020,1,,107,JIGGER MEASURE SHOT GLASS,STR_SUPPLIES,14.69,18.0,0.0
188,2020,1,,113,BARTENDERS BLACK BOOK,STR_SUPPLIES,0.40,0.0,0.0
231,2020,1,,115,PLASTIC SHOT GLASS PACK,STR_SUPPLIES,5.71,6.0,0.0
252,2020,1,,117,WHISKEY TASTING JOURNAL,STR_SUPPLIES,0.08,0.0,0.0
261,2020,1,,118,PLASTIC WINE GLASS PACK,STR_SUPPLIES,7.40,10.0,0.0
...,...,...,...,...,...,...,...,...,...
307312,2019,11,,BC,BEER CREDIT,REF,0.00,0.0,-123.0
307313,2019,11,,WC,WINE CREDIT,REF,0.00,0.0,-275.0
307323,2020,9,,2,ICE,NON-ALCOHOL,1445.00,0.0,0.0
307431,2020,9,,3,COUPON,NON-ALCOHOL,,0.0,0.0


In [5]:
sales['SUPPLIER']=sales['SUPPLIER'].fillna('otros')
sales.isna().sum()

YEAR                0
MONTH               0
SUPPLIER            0
ITEM CODE           0
ITEM DESCRIPTION    0
ITEM TYPE           1
RETAIL SALES        3
RETAIL TRANSFERS    0
WAREHOUSE SALES     0
dtype: int64

In [6]:
sales[sales['ITEM TYPE'].isna()]

sales['ITEM TYPE']=sales['ITEM TYPE'].fillna('otro')
sales.isna().sum()

YEAR                0
MONTH               0
SUPPLIER            0
ITEM CODE           0
ITEM DESCRIPTION    0
ITEM TYPE           0
RETAIL SALES        3
RETAIL TRANSFERS    0
WAREHOUSE SALES     0
dtype: int64

In [7]:
sales=sales.dropna()

In [8]:
sales.isna().sum()

YEAR                0
MONTH               0
SUPPLIER            0
ITEM CODE           0
ITEM DESCRIPTION    0
ITEM TYPE           0
RETAIL SALES        0
RETAIL TRANSFERS    0
WAREHOUSE SALES     0
dtype: int64

In [15]:
#agrupados
sales.groupby('SUPPLIER').mean(numeric_only=True).head() #si no pongo solo los numericos no me deja 



Unnamed: 0_level_0,YEAR,MONTH,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
SUPPLIER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
8 VINI INC,2017.111111,8.222222,0.281111,0.222222,0.111111
A HARDY USA LTD,2017.0,8.25,0.14,0.0,0.0
A I G WINE & SPIRITS,2018.098592,6.338028,0.186479,0.069296,2.774648
A VINTNERS SELECTIONS,2017.495397,6.532319,0.948856,0.824324,3.526313
A&E INC,2017.428571,7.404762,0.273571,0.001905,0.0


In [16]:
sales.columns


Index(['YEAR', 'MONTH', 'SUPPLIER', 'ITEM CODE', 'ITEM DESCRIPTION',
       'ITEM TYPE', 'RETAIL SALES', 'RETAIL TRANSFERS', 'WAREHOUSE SALES'],
      dtype='object')

In [25]:
suppliergroup=sales.groupby('SUPPLIER').agg({'YEAR':'max' , 'MONTH':'min', 'ITEM CODE':'first', 'ITEM DESCRIPTION':'last',
       'ITEM TYPE':'min', 'RETAIL SALES':'mean', 'RETAIL TRANSFERS':'mean', 'WAREHOUSE SALES':'mean'})
uppliergroup=suppliergroup.reset_index()

suppliergroup.head()


Unnamed: 0_level_0,YEAR,MONTH,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
SUPPLIER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
8 VINI INC,2018,1,331348,SECOLI RIPASSO DELIO VAL DOC - 750ML,WINE,0.281111,0.222222,0.111111
A HARDY USA LTD,2017,6,44262,BUNRATTY POTCHEEN GLASS - 750ML,LIQUOR,0.14,0.0,0.0
A I G WINE & SPIRITS,2020,1,119229,DOM DES FONTANELLE S/BLC - 750ML,LIQUOR,0.186479,0.069296,2.774648
A VINTNERS SELECTIONS,2019,1,148717,ROOT 1 CAB - 750ML,BEER,0.948856,0.824324,3.526313
A&E INC,2019,1,311843,TENUTE RUBINO SUSUMANIELLO OLTREME 13 - 750ML,WINE,0.273571,0.001905,0.0


In [36]:
Itemcode_group=sales.groupby('ITEM CODE').sum(numeric_only=True)
Itemcode_group.head()

Unnamed: 0_level_0,YEAR,MONTH,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
ITEM CODE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
100002,2017,7,0.17,0.0,0.0
100007,8076,30,0.0,0.0,4.0
100008,6051,23,0.0,0.0,3.0
100009,26244,77,1.72,0.0,18.0
100011,6055,25,0.0,0.0,3.0


In [37]:
sales.to_csv('salesclean.csv', index=False)
suppliergroup.to_csv('suppliergroup.csv',index=False)
Itemcode_group.to_csv('Itemcode_group.csv',index=False)