# Automation for the analyst

A team of analysts prepares the monthly report on the prices of the product selected by the Board. Because they are aware you know Python, they asked you to automate the process. Talking to the team, you have set the following business conditions that enable process automation:

Three report parameters are available:
- **product_group_id**,
- **product**,
- **date**.

Assumptions for each parameter:

1. A parameter may have at most one value,
1. If the parameter is empty we return all records from the group,
1. We assume that the file is always prepared correctly (we want to practice report automation, not error handling).

Based on the above requirements:

1. load the  **config.xlsx** file using `openpyxl`,
1. prepare appropriate conditions to filter data from **product_cleaned.csv**,
1. based on the conditions filter the frame,
1. aggregate the data using a **pivot_table**:
   a) index-product, province,
   b) columns-dates,
   c) value-average product price,
   d) remember to remove 0,
6. save the file to the spreadsheet any way you want.

Hints:

1. You can save individual filtering conditions to variables and then use them all to filter `DataFrame`, the same as writing them all as before i.e. `df.loc[var1 & var2]`
1. If you decide to write with Pandas, be careful with the parameters passed to the function (what happens if you set `index=False`?). Link to the [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html).

In [1]:
import pandas as pd
from openpyxl import load_workbook

In [3]:
config_file = "./../../01_Data/config.xlsx"
config_wb = load_workbook(config_file)
config_ws = config_wb.active

In [19]:
# Extract parameters from config.xlsx
product_group_id = config_ws.cell(row=2, column=2).value  # Assumes parameter in row 2, column A
product = config_ws.cell(row=3, column=2).value           # Assumes parameter in row 2, column B
date_filter = config_ws.cell(row=4, column=2).value       # Assumes parameter in row 2, column C

In [20]:
print(product_group_id)
print(product)
print(date_filter)

1
None
None


In [25]:
data = pd.read_csv(
  '../../01_Data/product_prices_cleaned.csv',
  sep=',',
  encoding='UTF-8',
  decimal='.'
)

In [27]:
# Prepare conditions for filtering
conditions = []
if product_group_id:
    conditions.append(data["product_group_id"] == product_group_id)
if product:
    conditions.append(data["product"] == product)
if date_filter:
    conditions.append(data["date"] == date_filter)

In [29]:
# Apply filters
if conditions:
    filtered_df = data.loc[pd.concat(conditions, axis=1).all(axis=1)]
else:
    filtered_df = data 

In [31]:
filtered_df.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product,year,month,quarter
5,HOLY CROSS,whole pickled cucumbers 0.9l - per 10pcs.,PLN,1,,0.28,2010-04-01,whole pickled cucumbers 0.9l - per 10pcs.,2010,4,2
12,POMERANIA,30% tomato concentrate - per 1kg,PLN,1,,7.46,1999-10-01,30% tomato concentrate - per 1kg,1999,10,4
15,POLAND,whole pickled cucumbers 0.9l - per 10pcs.,PLN,1,,2.36,2004-12-01,whole pickled cucumbers 0.9l - per 10pcs.,2004,12,4
26,LOWER SILESIA,frozen carrot and pea mix - per 1kg,PLN,1,,2.78,2005-07-01,frozen carrot and pea mix - per 1kg,2005,7,3
37,MASOVIA,"apple juice, boxed - per 1l",PLN,1,,1.91,2007-08-01,"apple juice, boxed - per 1l",2007,8,3


In [33]:
# Pivot table
pivot_table = pd.pivot_table(
    filtered_df,
    values="value",
    index=["product", "province"],
    columns=["date"],
    aggfunc="mean"
)

In [36]:
pivot_table

Unnamed: 0_level_0,date,1999-01-01,1999-02-01,1999-03-01,1999-04-01,1999-05-01,1999-06-01,1999-07-01,1999-08-01,1999-09-01,1999-10-01,...,2019-03-01,2019-04-01,2019-05-01,2019-06-01,2019-07-01,2019-08-01,2019-09-01,2019-10-01,2019-11-01,2019-12-01
product,province,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
30% tomato concentrate - per 1kg,GREATER POLAND,7.56,7.09,7.32,7.84,6.84,7.41,7.28,8.27,7.23,7.02,...,6.78,5.67,1.14,3.69,9.71,5.78,4.96,6.12,8.33,6.03
30% tomato concentrate - per 1kg,HOLY CROSS,3.81,2.67,0.06,0.57,2.07,0.01,2.56,1.82,2.80,1.27,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
30% tomato concentrate - per 1kg,KUYAVIA-POMERANIA,6.72,7.02,6.37,6.14,6.39,6.78,7.06,6.59,7.03,6.76,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
30% tomato concentrate - per 1kg,LESSER POLAND,6.23,4.35,3.41,1.07,4.87,0.93,1.19,1.55,4.40,4.39,...,3.25,0.37,6.83,3.16,9.40,0.06,5.13,0.74,3.15,7.93
30% tomato concentrate - per 1kg,LOWER SILESIA,6.01,5.52,6.22,6.04,5.93,5.72,6.27,5.87,6.31,5.69,...,2.68,0.87,5.90,2.22,2.09,6.66,3.50,6.38,3.63,6.56
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
whole pickled cucumbers 0.9l - per 10pcs.,SILESIA,2.22,2.01,2.11,2.41,2.21,2.01,2.39,2.43,2.33,2.04,...,1.41,0.38,3.30,0.66,1.12,2.81,1.97,2.49,1.51,1.98
whole pickled cucumbers 0.9l - per 10pcs.,SUBCARPATHIA,2.46,2.50,2.53,2.26,2.53,2.34,2.54,2.37,2.43,2.56,...,3.49,2.86,3.18,3.72,3.28,3.86,3.44,3.91,3.46,3.42
whole pickled cucumbers 0.9l - per 10pcs.,WARMIA-MASURIA,2.11,0.38,1.03,1.81,0.95,1.90,0.17,1.27,0.72,1.28,...,1.41,0.26,1.32,1.67,1.36,1.46,1.22,0.20,1.45,0.03
whole pickled cucumbers 0.9l - per 10pcs.,WEST POMERANIA,2.15,0.50,1.45,2.02,1.99,0.11,0.14,1.96,1.05,1.61,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00


In [37]:
# Remove 0 values
pivot_table = pivot_table.replace(0, pd.NA).dropna(how="all")

In [38]:
pivot_table

Unnamed: 0_level_0,date,1999-01-01,1999-02-01,1999-03-01,1999-04-01,1999-05-01,1999-06-01,1999-07-01,1999-08-01,1999-09-01,1999-10-01,...,2019-03-01,2019-04-01,2019-05-01,2019-06-01,2019-07-01,2019-08-01,2019-09-01,2019-10-01,2019-11-01,2019-12-01
product,province,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
30% tomato concentrate - per 1kg,GREATER POLAND,7.56,7.09,7.32,7.84,6.84,7.41,7.28,8.27,7.23,7.02,...,6.78,5.67,1.14,3.69,9.71,5.78,4.96,6.12,8.33,6.03
30% tomato concentrate - per 1kg,HOLY CROSS,3.81,2.67,0.06,0.57,2.07,0.01,2.56,1.82,2.8,1.27,...,,,,,,,,,,
30% tomato concentrate - per 1kg,KUYAVIA-POMERANIA,6.72,7.02,6.37,6.14,6.39,6.78,7.06,6.59,7.03,6.76,...,,,,,,,,,,
30% tomato concentrate - per 1kg,LESSER POLAND,6.23,4.35,3.41,1.07,4.87,0.93,1.19,1.55,4.4,4.39,...,3.25,0.37,6.83,3.16,9.4,0.06,5.13,0.74,3.15,7.93
30% tomato concentrate - per 1kg,LOWER SILESIA,6.01,5.52,6.22,6.04,5.93,5.72,6.27,5.87,6.31,5.69,...,2.68,0.87,5.9,2.22,2.09,6.66,3.5,6.38,3.63,6.56
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
whole pickled cucumbers 0.9l - per 10pcs.,SILESIA,2.22,2.01,2.11,2.41,2.21,2.01,2.39,2.43,2.33,2.04,...,1.41,0.38,3.3,0.66,1.12,2.81,1.97,2.49,1.51,1.98
whole pickled cucumbers 0.9l - per 10pcs.,SUBCARPATHIA,2.46,2.5,2.53,2.26,2.53,2.34,2.54,2.37,2.43,2.56,...,3.49,2.86,3.18,3.72,3.28,3.86,3.44,3.91,3.46,3.42
whole pickled cucumbers 0.9l - per 10pcs.,WARMIA-MASURIA,2.11,0.38,1.03,1.81,0.95,1.9,0.17,1.27,0.72,1.28,...,1.41,0.26,1.32,1.67,1.36,1.46,1.22,0.2,1.45,0.03
whole pickled cucumbers 0.9l - per 10pcs.,WEST POMERANIA,2.15,0.5,1.45,2.02,1.99,0.11,0.14,1.96,1.05,1.61,...,,,,,,,,,,


In [39]:
# Save the output to Excel
output_file = "report.xlsx"
pivot_table.to_excel(output_file, sheet_name="Report")