# Removing errors
Using the solutions from the previous class and the **product_prices_renamed.csv** file, modify `loc` so that it corrects the errors present in the dataset:

1. In the **date** column, data from 1888 appeared - '1888-0', change the value to 1999-1,
1. In the **date** column, data from 2099 appeared - '2099-13', change the value to 2019-1,
1. There is a spelling error in the **product_types** column - correct it. Number of pieces should be '10pcs.`. Check whether the task was done correctly.
1. Use `loc` to convert the values given in `EUR` to `PLN` with 4.15 exchange rate. 
1. Filter from the set those rows where the price for the product is 3000.

Hint: Instead of writing `loc` twice, first query the data for rows where **currency** = `EUR` and save it to a variable.

> Remember that `loc` modifies data irrevocably.

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('../../01_Data/product_prices_renamed.csv', sep=';', decimal='.')
df.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-3
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-2
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-2
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-3


In [4]:
df.loc[df['date'] == '1888-0', 'date'] = '1999-1'
df.loc[df['date'] == '2099-13', 'date'] = '2019-01'

In [6]:
df.loc[df['product_types'] == 'fresh chichen egges - per 666pcs.', 'product_types'] = 'fresh chicken eggs - per 10pcs.'

df['product_types'] = df['product_types'].str.replace('per 666pcs.', 'per 10pcs.')

df['product_types'].unique()

array([nan, 'whole pickled cucumbers 0.9l - per 1pc.',
       'fresh chicken eggs - per 10pcs.',
       '30% tomato concentrate - per 1kg',
       'frozen carrot and pea mix - per 1kg',
       'beet sugar white, bagged - per 1kg',
       'apple juice, boxed - per 1l', 'white table salt bagged - per 1kg',
       'natural chocolate plain - per 1kg'], dtype=object)

In [7]:
exchange_rate = 4.15
eur_condition = df['currency'] == 'EUR'

df.loc[eur_condition, 'value'] = df['value'] * exchange_rate
# df.loc[eur_condition, 'value'] = df['value'].apply(lambda x: x * exchange_rate)

df.loc[eur_condition, 'currency'] = 'PLN'

In [11]:
df = df[df['value'] != 3000]

---

In [12]:
# Performing a check for all the DataFrame edits made

# Checking if the date '1888-0' has been changed to '1999-1'
check_date_1888 = df[df['date'].astype(str) == '1888-0']

# Checking if the date '2099-13' has been changed to '2019-1'
check_date_2099 = df[df['date'].astype(str) == '2099-13']

# Checking if the product_types 'fresh chichen egges - per 666pcs.' has been changed to 'fresh chicken eggs - per 10pcs.'
check_product_types = df[df['product_types'] == 'fresh chichen egges - per 666pcs.']

# Checking if the values in EUR have been converted (Note: Since all EUR rows were converted to PLN, we expect no rows with EUR)
check_currency_conversion = df[df['currency'] == 'EUR']

# Checking if values equal to 3000 have been removed
check_value_3000 = df[df['value'] == 3000]

len(check_date_1888), len(check_date_2099), len(check_product_types), len(check_currency_conversion), len(check_value_3000)


(0, 0, 0, 0, 0)

In [13]:
df.to_csv('../../01_Data/product_prices_renamed_almost_cleaned.csv', index=False)