# Formatting dates

Load the _product_prices_cleaned.csv_ file, the _date_ column contains information about the months of value reporting.

1. What date format is it?
1. Change the column format to `datetime64`.
1. Add a new column – _month_ to the DataFrame as a way to isolate the information about the month from the _date_ column.
1. Add a new column – _quarter_ to the DataFrame as a way to isolate the information about the quarter from the _date_ column,
1. Add a new column – _year_ to the DataFrame as a way to isolate the information about the year from the _date_ column,
1. Using the `dt.strfime` method ([link](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.strftime.html) to the documentation), convert the  _date_ column to the _YYYY-MM-01_ format and overwrite its value; e.g.:
```
df['date'] = col.dt.strftime('format')
```
1. Overwrite the _product_prices_cleaned.csv_ file. It will be used in the following sections to analyze the collection.


In [1]:
import pandas as pd

In [2]:
dates = ['2020-02', '2013-11', '2029-06']
extraction_example = pd.Series(pd.to_datetime(dates, format='%Y-%m'))
extraction_example.dt.month  # extracting months
extraction_example.dt.year  # extracting only years

0    2020
1    2013
2    2029
dtype: int32

In [3]:
data = pd.read_csv(
  '../../01_Data/product_prices_cleaned.csv',
  sep=';',
  encoding='UTF-8',
  decimal='.'
)

In [4]:
data.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-3,pork ham cooked - per 1kg
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-2,bread - per 1kg
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12,barley groats sausage - per 1kg
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-2,dressed chickens - per 1kg
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-3,Italian head cheese - per 1kg


In [5]:
data['date'].head()

0     2013-3
1     2018-2
2    2019-12
3     2019-2
4     2002-3
Name: date, dtype: object

In [6]:
data['date'] = pd.to_datetime(data['date'], format='%Y-%m')

In [7]:
data.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-02-01,bread - per 1kg
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12-01,barley groats sausage - per 1kg
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-02-01,dressed chickens - per 1kg
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-03-01,Italian head cheese - per 1kg


"""Add a new column – month to the DataFrame as a way to isolate the information about the month from the date column.
Add a new column – quarter to the DataFrame as a way to isolate the information about the quarter from the date column,
Add a new column – year to the DataFrame as a way to isolate the information about the year from the date column,"""

In [8]:
data_year = data['date'].dt.year

In [9]:
data_year.head()

0    2013
1    2018
2    2019
3    2019
4    2002
Name: date, dtype: int32

In [10]:
data['year'] = data_year

In [11]:
data_month = data['date'].dt.month

In [12]:
data['month'] = data_month

In [13]:
data.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product,year,month
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg,2013,3
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-02-01,bread - per 1kg,2018,2
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12-01,barley groats sausage - per 1kg,2019,12
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-02-01,dressed chickens - per 1kg,2019,2
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-03-01,Italian head cheese - per 1kg,2002,3


In [14]:
data_quarter = data['date'].dt.quarter

In [15]:
data_quarter.head()

0    1
1    1
2    4
3    1
4    1
Name: date, dtype: int32

In [16]:
data['quarter'] = data_quarter

In [17]:
data.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product,year,month,quarter
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg,2013,3,1
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-02-01,bread - per 1kg,2018,2,1
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12-01,barley groats sausage - per 1kg,2019,12,4
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-02-01,dressed chickens - per 1kg,2019,2,1
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-03-01,Italian head cheese - per 1kg,2002,3,1


In [18]:
data['date'] = data['date'].dt.strftime('%Y-%m-%d')

In [19]:
data

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product,year,month,quarter
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg,2013,3,1
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-02-01,bread - per 1kg,2018,2,1
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12-01,barley groats sausage - per 1kg,2019,12,4
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-02-01,dressed chickens - per 1kg,2019,2,1
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-03-01,Italian head cheese - per 1kg,2002,3,1
...,...,...,...,...,...,...,...,...,...,...,...
128515,SILESIA,,PLN,2,smoked bacon with ribs - per 1kg,15.95,2015-09-01,smoked bacon with ribs - per 1kg,2015,9,3
128516,SILESIA,,PLN,2,barley groats sausage - per 1kg,4.50,2004-08-01,barley groats sausage - per 1kg,2004,8,3
128517,KUYAVIA-POMERANIA,,PLN,2,pork meat (raw bacon) - per 1kg,12.15,2016-11-01,pork meat (raw bacon) - per 1kg,2016,11,4
128518,ŁÓDŹ,"beet sugar white, bagged - per 1kg",PLN,3,,0.00,2012-05-01,"beet sugar white, bagged - per 1kg",2012,5,2


In [20]:
data.to_csv('../../01_Data/product_prices_cleaned.csv', index=False)