# Formatting dates

Load the _product_prices_cleaned.csv_ file, the _date_ column contains information about the months of value reporting.

1. What date format is it?
1. Change the column format to `datetime64`.
1. Add a new column – _month_ to the DataFrame as a way to isolate the information about the month from the _date_ column.
1. Add a new column – _quarter_ to the DataFrame as a way to isolate the information about the quarter from the _date_ column,
1. Add a new column – _year_ to the DataFrame as a way to isolate the information about the year from the _date_ column,
1. Using the `dt.strfime` method ([link](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.strftime.html) to the documentation), convert the  _date_ column to the _YYYY-MM-01_ format and overwrite its value; e.g.:
```
df['date'] = col.dt.strftime('format')
```
1. Overwrite the _product_prices_cleaned.csv_ file. It will be used in the following sections to analyze the collection.


In [18]:
import pandas as pd

In [19]:
df = pd.read_csv('../../01_Data/product_prices_cleaned.csv', sep=';')
# df = df[['province', 'product', 'value', 'date']]
df.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-3,pork ham cooked - per 1kg
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-2,bread - per 1kg
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12,barley groats sausage - per 1kg
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-2,dressed chickens - per 1kg
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-3,Italian head cheese - per 1kg


In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128503 entries, 0 to 128502
Data columns (total 8 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   province          128503 non-null  object 
 1   product_types     34255 non-null   object 
 2   currency          128503 non-null  object 
 3   product_group_id  128503 non-null  int64  
 4   product_line      94248 non-null   object 
 5   value             119935 non-null  float64
 6   date              128503 non-null  object 
 7   product           128503 non-null  object 
dtypes: float64(1), int64(1), object(6)
memory usage: 7.8+ MB


In [21]:
df['date'] = pd.to_datetime(df['date'], format='%Y-%m', errors='coerce')

In [22]:
# Adding new columns – month, quarter, and year
df['month'] = df['date'].dt.month
df['quarter'] = df['date'].dt.quarter
df['year'] = df['date'].dt.year

# Converting the 'date' column to the YYYY-MM-01 format
df['date'] = df['date'].dt.strftime('%Y-%m-%d') #'%Y-%m-01'


df.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product,month,quarter,year
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg,3,1,2013
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-02-01,bread - per 1kg,2,1,2018
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12-01,barley groats sausage - per 1kg,12,4,2019
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-02-01,dressed chickens - per 1kg,2,1,2019
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-03-01,Italian head cheese - per 1kg,3,1,2002


In [23]:
df.to_csv('../../01_Data/product_prices_cleaned.csv', sep=';', index=False)