# Adding a column

Following on the solution of the previous exercise, create a new column: **product** using **product_types** and **product_line**:

1. check that the **product_types** and **product_line** columns are complementary (an empty value in one column entails a non-empty value in the other).
1. create a new column: **product** using the values from the **product_types** column e.g. `df['product'] = df['product_types']`.
1. find non-empty values in the **product_line** column and enter them into the **product** column,
1. use a method of your choice to check if all values in the **product** column are non-empty.
1. remove duplicates from the table.
1. using the `to_csv` method save the data (we are going to use it later in the course), set separator=';' and `index=False`.<br>
Save the file as `product_prices_cleaned.csv`

The `read_csv` method is one of many that can be used to save a `DataFrame` as a **csv** file. Within the scope of this exercise we are interested in the following parameters.
- `sep`- row separator (default ','),
- `index`- is the index (row number by default) of the table to be saved as well (default: `True`).

Sample call:
```
df.to_csv(
    'filepath',
    sep=';', # separator setting    index=False
)
```

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv(
  '../../01_Data/product_prices_renamed.csv',
  sep=';',
  encoding='UTF-8',
  decimal='.'
)

In [3]:
df.loc[df["date"] == "1888-0", "date"] = "1999-1"
df.loc[df["date"] == "2099-13", "date"] = "2019-1"

In [4]:
df.loc[df["currency"] == "EUR", "value"] = df["value"] * 4.15

In [5]:
df.loc[df["currency"] == "EUR", "currency"] = "PLN"

In [6]:
df = df[~(df["value"] == 3000)]

In [7]:
df

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-3
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-2
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-2
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-3
...,...,...,...,...,...,...,...
149935,KUYAVIA-POMERANIA,,PLN,2,pork meat (raw bacon) - per 1kg,12.15,2016-11
149936,ŁÓDŹ,"beet sugar white, bagged - per 1kg",PLN,3,,0.00,2012-5
149937,LESSER POLAND,,PLN,4,plain mixed bread (wheat-rye) - per 1kg,3.05,2008-6
149938,WARMIA-MASURIA,,PLN,2,boneless beef (sirloin) - per 1kg,11.87,2000-11


In [8]:
complementary = ((df['product_types'].isnull() & df['product_line'].notnull()) |
                 (df['product_types'].notnull() & df['product_line'].isnull()))

In [9]:
df['product'] = df['product_types']

In [10]:
df['product'].fillna(df['product_line'], inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['product'].fillna(df['product_line'], inplace=True)


In [11]:
df

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-3,pork ham cooked - per 1kg
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-2,bread - per 1kg
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12,barley groats sausage - per 1kg
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-2,dressed chickens - per 1kg
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-3,Italian head cheese - per 1kg
...,...,...,...,...,...,...,...,...
149935,KUYAVIA-POMERANIA,,PLN,2,pork meat (raw bacon) - per 1kg,12.15,2016-11,pork meat (raw bacon) - per 1kg
149936,ŁÓDŹ,"beet sugar white, bagged - per 1kg",PLN,3,,0.00,2012-5,"beet sugar white, bagged - per 1kg"
149937,LESSER POLAND,,PLN,4,plain mixed bread (wheat-rye) - per 1kg,3.05,2008-6,plain mixed bread (wheat-rye) - per 1kg
149938,WARMIA-MASURIA,,PLN,2,boneless beef (sirloin) - per 1kg,11.87,2000-11,boneless beef (sirloin) - per 1kg


In [12]:
if df['product'].isnull().any():
    print("There are empty values in the 'product' column")
else:
    print("All values in the 'product' column are non-empty.")

Všechny hodnoty ve sloupci 'product' jsou neprázdné.


In [13]:
df.drop_duplicates(inplace=True)

In [14]:
df.to_csv('product_prices_cleaned.csv', sep=';', index=False)

In [15]:
df

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-3,pork ham cooked - per 1kg
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-2,bread - per 1kg
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12,barley groats sausage - per 1kg
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-2,dressed chickens - per 1kg
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-3,Italian head cheese - per 1kg
...,...,...,...,...,...,...,...,...
149933,SILESIA,,PLN,2,smoked bacon with ribs - per 1kg,15.95,2015-9,smoked bacon with ribs - per 1kg
149934,SILESIA,,PLN,2,barley groats sausage - per 1kg,4.50,2004-8,barley groats sausage - per 1kg
149935,KUYAVIA-POMERANIA,,PLN,2,pork meat (raw bacon) - per 1kg,12.15,2016-11,pork meat (raw bacon) - per 1kg
149936,ŁÓDŹ,"beet sugar white, bagged - per 1kg",PLN,3,,0.00,2012-5,"beet sugar white, bagged - per 1kg"
