# Formatting dates

Load the _product_prices_cleaned.csv_ file, the _date_ column contains information about the months of value reporting.

1. What date format is it?
1. Change the column format to `datetime64`.
1. Add a new column – _month_ to the DataFrame as a way to isolate the information about the month from the _date_ column.
1. Add a new column – _quarter_ to the DataFrame as a way to isolate the information about the quarter from the _date_ column,
1. Add a new column – _year_ to the DataFrame as a way to isolate the information about the year from the _date_ column,
1. Using the `dt.strfime` method ([link](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.strftime.html) to the documentation), convert the  _date_ column to the _YYYY-MM-01_ format and overwrite its value; e.g.:
```
df['date'] = col.dt.strftime('format')
```
1. Overwrite the _product_prices_cleaned.csv_ file. It will be used in the following sections to analyze the collection.


In [1]:
import pandas as pd

df = pd.read_csv(
                    '../../01_Data/product_prices_cleaned.csv', 
                    sep=',',
                    decimal=','
)

df['value'] = pd.to_numeric(df['value'], errors='coerce')

df.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product2,product,month,quarter,year
0,SUBCARPATHIA,,PLN,2.0,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg,pork ham cooked - per 1kg,3.0,1.0,2013.0
1,ŁÓDŹ,,PLN,4.0,bread - per 1kg,,2018-02-01,bread - per 1kg,bread - per 1kg,2.0,1.0,2018.0
2,KUYAVIA-POMERANIA,,PLN,2.0,barley groats sausage - per 1kg,3.55,2019-12-01,barley groats sausage - per 1kg,barley groats sausage - per 1kg,12.0,4.0,2019.0
3,LOWER SILESIA,,PLN,2.0,dressed chickens - per 1kg,6.14,2019-02-01,dressed chickens - per 1kg,dressed chickens - per 1kg,2.0,1.0,2019.0
4,WARMIA-MASURIA,,PLN,2.0,Italian head cheese - per 1kg,5.63,2002-03-01,Italian head cheese - per 1kg,Italian head cheese - per 1kg,3.0,1.0,2002.0


In [2]:
df['date'] = pd.to_datetime(df['date'], errors='coerce')


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128503 entries, 0 to 128502
Data columns (total 12 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   province          128503 non-null  object        
 1   product_types     25687 non-null   object        
 2   currency          111367 non-null  object        
 3   product_group_id  111367 non-null  float64       
 4   product_line      85680 non-null   object        
 5   value             102799 non-null  float64       
 6   date              111367 non-null  datetime64[ns]
 7   product2          111367 non-null  object        
 8   product           111367 non-null  object        
 9   month             111367 non-null  float64       
 10  quarter           111367 non-null  float64       
 11  year              111367 non-null  float64       
dtypes: datetime64[ns](1), float64(5), object(6)
memory usage: 11.8+ MB


In [4]:
# Pridajte nový stĺpec - mesiac
df['month'] = df['date'].dt.month

# Pridajte nový stĺpec - štvrťrok
df['quarter'] = df['date'].dt.quarter

# Pridajte nový stĺpec - rok
df['year'] = df['date'].dt.year

df.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product2,product,month,quarter,year
0,SUBCARPATHIA,,PLN,2.0,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg,pork ham cooked - per 1kg,3.0,1.0,2013.0
1,ŁÓDŹ,,PLN,4.0,bread - per 1kg,,2018-02-01,bread - per 1kg,bread - per 1kg,2.0,1.0,2018.0
2,KUYAVIA-POMERANIA,,PLN,2.0,barley groats sausage - per 1kg,3.55,2019-12-01,barley groats sausage - per 1kg,barley groats sausage - per 1kg,12.0,4.0,2019.0
3,LOWER SILESIA,,PLN,2.0,dressed chickens - per 1kg,6.14,2019-02-01,dressed chickens - per 1kg,dressed chickens - per 1kg,2.0,1.0,2019.0
4,WARMIA-MASURIA,,PLN,2.0,Italian head cheese - per 1kg,5.63,2002-03-01,Italian head cheese - per 1kg,Italian head cheese - per 1kg,3.0,1.0,2002.0


In [5]:
# Skonvertujte stĺpec dátumu do formátu RRRR-MM-01
df['date'] = df['date'].dt.strftime('%Y-%m-01')

In [9]:
# Prepíšte súbor product_prices_cleaned.csv s aktualizovanými dátami
df.to_csv('../../01_Data/product_prices_cleaned_2.csv', sep=',', index=False)

In [7]:
df.columns

Index(['province', 'product_types', 'currency', 'product_group_id',
       'product_line', 'value', 'date', 'product2', 'product', 'month',
       'quarter', 'year'],
      dtype='object')