# Adding a column

Following on the solution of the previous exercise, create a new column: **product** using **product_types** and **product_line**:

1. check that the **product_types** and **product_line** columns are complementary (an empty value in one column entails a non-empty value in the other).
1. create a new column: **product** using the values from the **product_types** column e.g. `df['product'] = df['product_types']`.
1. find non-empty values in the **product_line** column and enter them into the **product** column,
1. use a method of your choice to check if all values in the **product** column are non-empty.
1. remove duplicates from the table.
1. using the `to_csv` method save the data (we are going to use it later in the course), set separator=';' and `index=False`.<br>
Save the file as `product_prices_cleaned.csv`

The `read_csv` method is one of many that can be used to save a `DataFrame` as a **csv** file. Within the scope of this exercise we are interested in the following parameters.
- `sep`- row separator (default ','),
- `index`- is the index (row number by default) of the table to be saved as well (default: `True`).

Sample call:
```
df.to_csv(
    'filepath',
    sep=';', # separator setting
    index=False
)
```

In [12]:
import pandas as pd
df = pd.read_csv(
                    '../../01_Data/product_prices_renamed.csv', 
                    sep=';',
                    decimal=','
)

df['value'] = pd.to_numeric(df['value'], errors='coerce')

df.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-3
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-2
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-2
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-3


In [13]:
# Checking if product_types and product_line columns are complementary

df[df['product_types'].isna() == df['product_line'].isna()]
# df['product_types'].isna() == df['product_line'].isna()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date


In [14]:
# Creating a new column: product

df['product'] = df['product_types']
df.loc[df["product"].isna(), 'product'] = df['product_line']

df.sample(10)

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product
137619,LUBUSZ,,PLN,2,"salted herring, non-dressed - per 1kg",0.0,2018-2,"salted herring, non-dressed - per 1kg"
5941,WEST POMERANIA,,PLN,2,beef with bone (rump steak) - per 1kg,32.27,2017-4,beef with bone (rump steak) - per 1kg
44218,LESSER POLAND,,PLN,2,dressed chickens - per 1kg,6.43,2014-4,dressed chickens - per 1kg
32843,LUBUSZ,,PLN,4,bread - per 1kg,,2001-8,bread - per 1kg
26015,SUBCARPATHIA,,PLN,2,Backpacker's canned pork meat - per 300 g,2.94,2012-8,Backpacker's canned pork meat - per 300 g
106146,SILESIA,whole pickled cucumbers 0.9l - per 1pc.,PLN,1,,2.34,2006-3,whole pickled cucumbers 0.9l - per 1pc.
146642,POMERANIA,frozen carrot and pea mix - per 1kg,EUR,1,,0.616867,2005-10,frozen carrot and pea mix - per 1kg
129542,LESSER POLAND,,PLN,2,pork meat with bone (shoulder) - per 1kg,9.72,2009-6,pork meat with bone (shoulder) - per 1kg
13452,LOWER SILESIA,,PLN,2,barley groats sausage - per 1kg,8.84,2014-2,barley groats sausage - per 1kg
136679,SILESIA,,PLN,2,pork belly cooked - per 1kg,15.48,2010-4,pork belly cooked - per 1kg


In [15]:
# Checking if all values in the product column are non-empty
df['product'].isna().sum()

0

In [16]:
# Removing duplicates from the table
df = df.drop_duplicates()

In [17]:
# Saving the DataFrame to a CSV file
df.to_csv('../../01_Data/product_prices_cleaned_test.csv', sep=';', index=False)

In [19]:
df.describe().round(2)

Unnamed: 0,product_group_id,value
count,128520.0,119952.0
mean,2.33,7.35
std,0.91,36.4
min,1.0,0.0
25%,2.0,0.0
50%,2.0,4.24
75%,3.0,12.08
max,4.0,3000.0
