# Automation for the analyst

A team of analysts prepares the monthly report on the prices of the product selected by the Board. Because they are aware you know Python, they asked you to automate the process. Talking to the team, you have set the following business conditions that enable process automation:

Three report parameters are available:
- **product_group_id**,
- **product**,
- **date**.

Assumptions for each parameter:

1. A parameter may have at most one value,
1. If the parameter is empty we return all records from the group,
1. We assume that the file is always prepared correctly (we want to practice report automation, not error handling).

Based on the above requirements:

1. load the  **config.xlsx** file using `openpyxl`,
1. prepare appropriate conditions to filter data from **product_cleaned.csv**,
1. based on the conditions filter the frame,
1. aggregate the data using a **pivot_table**:
   a) index-product, province,
   b) columns-dates,
   c) value-average product price,
   d) remember to remove 0,
6. save the file to the spreadsheet any way you want.

Hints:

1. You can save individual filtering conditions to variables and then use them all to filter `DataFrame`, the same as writing them all as before i.e. `df.loc[var1 & var2]`
1. If you decide to write with Pandas, be careful with the parameters passed to the function (what happens if you set `index=False`?). Link to the [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html).

In [1]:
import pandas as pd
import openpyxl

# Load the config.xlsx file using openpyxl
config_wb = openpyxl.load_workbook('../../01_Data/config.xlsx')
config_sheet = config_wb.active

# Load the product_prices_cleaned.csv data
df = pd.read_csv('../../01_Data/product_prices_cleaned.csv', sep=';')
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')

In [3]:
# Extracting report parameters
product_group_id = config_sheet['B2'].value
product = config_sheet['B3'].value
date = config_sheet['B4'].value

# product_group_id = 2
# product = None
# date = '1999-01-01'

In [7]:
# Preparing conditions to filter data
condition = pd.Series([True] * len(df))  # to create a Pandas Series named condition filled with True values, and the length of this Series is the same as the number of rows in the DataFrame df

# pd.Series je objekt v knižnici Pandas, ktorý reprezentuje jednorozmerné dátové pole s označením. 

if product_group_id:
    condition &= (df['product_group_id'] == product_group_id)
if product:
    condition &= (df['product'] == product)
if date:
    condition &= (df['date'] == pd.to_datetime(date))

# Filtering the data based on conditions
filtered_df = df[condition]
filtered_df.head()

#V zhrnutí tento kód filtruje DataFrame na základe podmienok poskytnutých pre product_group_id, product a date. 
# Filtrovanie sa vykonáva aktualizovaním Series condition a použitím ho na výber riadkov z pôvodného DataFrame, 
# čo vedie k filtrovanému DataFrame s názvom filtered_df.

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product,month,quarter,year
5,HOLY CROSS,whole pickled cucumbers 0.9l - per 1pc.,PLN,1,,0.28,2010-04-01,whole pickled cucumbers 0.9l - per 1pc.,4,2,2010
12,POMERANIA,30% tomato concentrate - per 1kg,PLN,1,,7.46,1999-10-01,30% tomato concentrate - per 1kg,10,4,1999
15,POLAND,whole pickled cucumbers 0.9l - per 1pc.,PLN,1,,2.36,2004-12-01,whole pickled cucumbers 0.9l - per 1pc.,12,4,2004
26,LOWER SILESIA,frozen carrot and pea mix - per 1kg,PLN,1,,2.78,2005-07-01,frozen carrot and pea mix - per 1kg,7,3,2005
37,MASOVIA,"apple juice, boxed - per 1l",PLN,1,,1.91,2007-08-01,"apple juice, boxed - per 1l",8,3,2007


In [43]:
filtered_df = filtered_df[filtered_df['value'] != 0]

# Aggregating data using a pivot table
pivot_table = filtered_df.pivot_table(index=['product', 'province'], 
                                      columns='date', 
                                      values='value', 
                                      aggfunc='mean')

pivot_table

Unnamed: 0_level_0,date,1999-01-01
product,province,Unnamed: 2_level_1
Backpacker's canned pork meat - per 300 g,GREATER POLAND,2.92
Backpacker's canned pork meat - per 300 g,HOLY CROSS,2.69
Backpacker's canned pork meat - per 300 g,KUYAVIA-POMERANIA,2.56
Backpacker's canned pork meat - per 300 g,LESSER POLAND,2.71
Backpacker's canned pork meat - per 300 g,LUBLIN,2.77
...,...,...
smoked bacon with ribs - per 1kg,SILESIA,7.89
smoked bacon with ribs - per 1kg,SUBCARPATHIA,8.10
smoked bacon with ribs - per 1kg,WARMIA-MASURIA,8.03
smoked bacon with ribs - per 1kg,WEST POMERANIA,8.94


In [None]:
# Save the file to a spreadsheet
pivot_table.to_excel('../../01_Data/filtered_report.xlsx', engine='openpyxl')