# Mandatory Challenge
## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the dataset `warehouse_and_retail_sales` from Ironhack's database. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per supplier.
    - A table for the aggregate per item.

## Instructions
* Read the csv you can find in Ironhack's database.
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [6]:
# your code here
import pandas as pd
import numpy as np

df = pd.read_csv("C:\IronHack\labs\semana3\lab_df_calculations-main\lab_df_calculations-main\your-code\Warehouse_and_Retail_Sales_20240205.csv")
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307645 entries, 0 to 307644
Data columns (total 9 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   YEAR              307645 non-null  int64  
 1   MONTH             307645 non-null  int64  
 2   SUPPLIER          307478 non-null  object 
 3   ITEM CODE         307645 non-null  object 
 4   ITEM DESCRIPTION  307645 non-null  object 
 5   ITEM TYPE         307644 non-null  object 
 6   RETAIL SALES      307642 non-null  float64
 7   RETAIL TRANSFERS  307645 non-null  float64
 8   WAREHOUSE SALES   307645 non-null  float64
dtypes: float64(3), int64(2), object(4)
memory usage: 21.1+ MB


In [3]:
nan_cols = df.isna().sum() / len(df) *100

nan_cols[nan_cols>0]

SUPPLIER        0.054283
ITEM TYPE       0.000325
RETAIL SALES    0.000975
dtype: float64

In [4]:
nan_df = df[df.SUPPLIER.isna() | df["ITEM TYPE"].isna() | df["RETAIL SALES"].isna()]

In [5]:
nan_df.head()

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
106,2020,1,,107,JIGGER MEASURE SHOT GLASS,STR_SUPPLIES,14.69,18.0,0.0
188,2020,1,,113,BARTENDERS BLACK BOOK,STR_SUPPLIES,0.4,0.0,0.0
231,2020,1,,115,PLASTIC SHOT GLASS PACK,STR_SUPPLIES,5.71,6.0,0.0
252,2020,1,,117,WHISKEY TASTING JOURNAL,STR_SUPPLIES,0.08,0.0,0.0
261,2020,1,,118,PLASTIC WINE GLASS PACK,STR_SUPPLIES,7.4,10.0,0.0


In [7]:
df = df.dropna()

In [9]:
df.to_csv("C:\IronHack\labs\semana3\lab_df_calculations-main\lab_df_calculations-main\your-code\cleared_data.csv", index=False)

In [15]:
df.groupby("SUPPLIER").head()


Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,2020,1,REPUBLIC NATIONAL DISTRIBUTING CO,100009,BOOTLEG RED - 750ML,WINE,0.00,0.0,2.0
1,2020,1,PWSWN INC,100024,MOMENT DE PLAISIR - 750ML,WINE,0.00,1.0,4.0
2,2020,1,RELIABLE CHURCHILL LLLP,1001,S SMITH ORGANIC PEAR CIDER - 18.7OZ,BEER,0.00,0.0,1.0
3,2020,1,LANTERNA DISTRIBUTORS INC,100145,SCHLINK HAUS KABINETT - 750ML,WINE,0.00,0.0,1.0
4,2020,1,DIONYSOS IMPORTS INC,100293,SANTORINI GAVALA WHITE - 750ML,WINE,0.82,0.0,0.0
...,...,...,...,...,...,...,...,...,...
303111,2020,9,SNR HOLDINGS LLC,42210,MISFIT PEACH CHARD 750ML,WINE,5.80,0.0,0.0
303114,2020,9,SNR HOLDINGS LLC,42214,MISFIT DRAGONFRUIT RASPBERRY SHIRAZ 750ML,WINE,5.71,0.0,0.0
304980,2020,9,CROOK & MARKER LLC,69670,CROOK & MARKER SPIKED TEA VARIETY 3/8PK CANS,BEER,0.00,0.0,2.0
304981,2020,9,CROOK & MARKER LLC,69671,CROOK & MARKER SPIKED LEMONADE VARIETY 3/8PK C,BEER,0.00,0.0,3.0


In [21]:
supplier = df.groupby("SUPPLIER").agg({
                           "ITEM CODE": "first",
                           "ITEM DESCRIPTION": "last",
                           "ITEM TYPE": "last",
                           "RETAIL SALES": "mean",
                           "RETAIL TRANSFERS": "std",
                           "WAREHOUSE SALES": "median"}).reset_index()

In [23]:
supplier.to_csv("C:\IronHack\labs\semana3\lab_df_calculations-main\lab_df_calculations-main\your-code\supplier.csv", index=False)

In [25]:
item = df.groupby("ITEM TYPE").agg({"ITEM CODE": "first",
                                   "ITEM DESCRIPTION": "last",
                                   "RETAIL SALES": "max",
                                   "RETAIL TRANSFERS": "mean",
                                   "WAREHOUSE SALES": "std"}).reset_index()

In [27]:
item.head()

Unnamed: 0,ITEM TYPE,ITEM CODE,ITEM DESCRIPTION,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,BEER,1001,S SMITH WINTER WELCOME 4/6NR - 12OZ,1494.0,13.361799,648.60819
1,DUNNAGE,1279,EMPTY 1/4 KEG (30.00),0.0,0.0,1604.198062
2,KEGS,10387,DELIRIUM NOCTURNUM 1/6 KEG,0.0,-9.9e-05,41.60337
3,LIQUOR,10103,ROMANA SAMBUCA - 50ML,1816.49,12.243656,9.278896
4,NON-ALCOHOL,166661,BITBURGER NA 4/6 NR - 11.2OZ,2739.0,13.97609,37.448419


In [26]:
item.to_csv("C:\IronHack\labs\semana3\lab_df_calculations-main\lab_df_calculations-main\your-code\item.csv",index=False)