# Merge with conditions

Using the raw data from the **product_prices_cleaned.csv** file, perform the task of finding out how many times a lower price has been quoted for a given product, province, month historically. To do this, perform the following steps:

1. merge the table with itself. What type of join should be used?
1. filter the data to find earlier years and the values smaller than the one from current year in a given province.
1. group the data accordingly.

Which product(s) had the most such occurrences?

> See what happens to the column names when the columns are not used as a merge condition, but have the same names

In [2]:
import pandas as pd

In [3]:
data = pd.read_csv(
  '../../01_Data/product_prices_cleaned.csv',
  sep=';',
  encoding='UTF-8',
  decimal='.'
)

In [4]:
data['date'] = pd.to_datetime(data['date'])

In [5]:
# Merge the table with itself
merged = pd.merge(
    data,
    data,
    on=["product", "province"],
    suffixes=("_x", "_y")
)

In [7]:
# Filter to find historical years and lower prices
filtered = merged[
    (merged['date_x'] > merged['date_y']) &  # Historical year
    (merged['value_y'] < merged['value_x'])  # Lower price
]

In [9]:
filtered.head(5)

Unnamed: 0,province,product_types_x,currency_x,product_group_id_x,product_line_x,value_x,date_x,product,product_types_y,currency_y,product_group_id_y,product_line_y,value_y,date_y
1,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg,,PLN,2,pork ham cooked - per 1kg,15.76,2007-01-01
3,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg,,PLN,2,pork ham cooked - per 1kg,16.26,2010-04-01
4,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg,,PLN,2,pork ham cooked - per 1kg,17.52,2011-05-01
5,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg,,PLN,2,pork ham cooked - per 1kg,15.57,2003-08-01
7,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg,,PLN,2,pork ham cooked - per 1kg,19.47,2012-01-01


In [10]:
# Group by product and province and count occurrences
result = (
    filtered
    .groupby(['product', 'province'])
    .size()
    .reset_index(name='lower_price_count')
)

In [11]:
result

Unnamed: 0,product,province,lower_price_count
0,30% tomato concentrate - per 1kg,GREATER POLAND,21035
1,30% tomato concentrate - per 1kg,HOLY CROSS,75
2,30% tomato concentrate - per 1kg,KUYAVIA-POMERANIA,6329
3,30% tomato concentrate - per 1kg,LESSER POLAND,25248
4,30% tomato concentrate - per 1kg,LOWER SILESIA,2734
...,...,...,...
437,whole pickled cucumbers 0.9l - per 10pcs.,SILESIA,21883
438,whole pickled cucumbers 0.9l - per 10pcs.,SUBCARPATHIA,24499
439,whole pickled cucumbers 0.9l - per 10pcs.,WARMIA-MASURIA,2635
440,whole pickled cucumbers 0.9l - per 10pcs.,WEST POMERANIA,2731


In [12]:
most_occurrences = result.sort_values(by='lower_price_count', ascending=False)

In [13]:
most_occurrences

Unnamed: 0,product,province,lower_price_count
149,beef with bone (rump steak) - per 1kg,WEST POMERANIA,29651
183,boneless beef (sirloin) - per 1kg,WEST POMERANIA,29516
134,beef with bone (rump steak) - per 1kg,GREATER POLAND,29104
176,boneless beef (sirloin) - per 1kg,OPOLE,28986
62,Italian head cheese - per 1kg,POMERANIA,28868
...,...,...,...
8,30% tomato concentrate - per 1kg,OPOLE,48
286,natural chocolate plain - per 1kg,WEST POMERANIA,44
102,"apple juice, boxed - per 1l",HOLY CROSS,31
106,"apple juice, boxed - per 1l",LUBLIN,29


In [14]:
print("Products with the most occurrences of lower historical prices:")
print(most_occurrences.head())

Products with the most occurrences of lower historical prices:
                                   product        province  lower_price_count
149  beef with bone (rump steak) - per 1kg  WEST POMERANIA              29651
183      boneless beef (sirloin) - per 1kg  WEST POMERANIA              29516
134  beef with bone (rump steak) - per 1kg  GREATER POLAND              29104
176      boneless beef (sirloin) - per 1kg           OPOLE              28986
62           Italian head cheese - per 1kg       POMERANIA              28868
