# Adding a column

Following on the solution of the previous exercise, create a new column: **product** using **product_types** and **product_line**:

1. check that the **product_types** and **product_line** columns are complementary (an empty value in one column entails a non-empty value in the other).
1. create a new column: **product** using the values from the **product_types** column e.g. `df['product'] = df['product_types']`.
1. find non-empty values in the **product_line** column and enter them into the **product** column,
1. use a method of your choice to check if all values in the **product** column are non-empty.
1. remove duplicates from the table.
1. using the `to_csv` method save the data (we are going to use it later in the course), set separator=';' and `index=False`.<br>
Save the file as `product_prices_cleaned.csv`

The `read_csv` method is one of many that can be used to save a `DataFrame` as a **csv** file. Within the scope of this exercise we are interested in the following parameters.
- `sep`- row separator (default ','),
- `index`- is the index (row number by default) of the table to be saved as well (default: `True`).

Sample call:
```
df.to_csv(
    'filepath',
    sep=';', # separator setting    index=False
)
```

In [45]:
import pandas as pd

In [46]:
df = pd.read_csv('../../01_Data/product_prices_renamed_almost_cleaned.csv')
df.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-3
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-2
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-2
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-3


In [47]:
# Checking if product_types and product_line columns are complementary
df[df['product_types'].isna() == df['product_line'].isna()]

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date


In [48]:
# Creating a new column: product
df['product'] = df['product_types']
df.loc[df["product"].isna(), 'product'] = df['product_line']

df.sample(10)

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product
99179,LUBUSZ,"beet sugar white, bagged - per 1kg",PLN,3,,0.0,2016-6,"beet sugar white, bagged - per 1kg"
56941,KUYAVIA-POMERANIA,,PLN,4,plain mixed bread (wheat-rye) - per 1kg,2.37,2004-4,plain mixed bread (wheat-rye) - per 1kg
129638,LOWER SILESIA,whole pickled cucumbers 0.9l - per 1pc.,PLN,1,,0.0,2011-6,whole pickled cucumbers 0.9l - per 1pc.
148306,POLAND,,PLN,2,haddock fillets frozen - per 1kg,9.3,2005-10,haddock fillets frozen - per 1kg
119358,SUBCARPATHIA,"apple juice, boxed - per 1l",PLN,1,,0.0,2017-4,"apple juice, boxed - per 1l"
21936,LESSER POLAND,,PLN,2,smoked bacon with ribs - per 1kg,18.23,2017-8,smoked bacon with ribs - per 1kg
22102,POMERANIA,fresh chicken eggs - per 10pcs.,PLN,3,,0.26,2009-1,fresh chicken eggs - per 10pcs.
11299,LESSER POLAND,,PLN,2,pork meat with bone (shoulder) - per 1kg,10.64,2018-5,pork meat with bone (shoulder) - per 1kg
47774,HOLY CROSS,frozen carrot and pea mix - per 1kg,PLN,1,,0.0,2010-6,frozen carrot and pea mix - per 1kg
41470,LESSER POLAND,"beet sugar white, bagged - per 1kg",PLN,3,,3.11,2010-9,"beet sugar white, bagged - per 1kg"


In [49]:
# Checking if all values in the product column are non-empty
df['product'].isna().sum()

0

In [50]:
# Removing duplicates from the table
df = df.drop_duplicates()

In [51]:
# Saving the DataFrame to a CSV file
df.to_csv('../../01_Data/product_prices_cleaned.csv', sep=';', index=False)

---

In [52]:
df.describe()

Unnamed: 0,product_group_id,value
count,128503.0,119935.0
mean,2.33351,6.981268
std,0.906699,7.43212
min,1.0,0.0
25%,2.0,0.0
50%,2.0,4.26
75%,3.0,12.08
max,4.0,36.68
