**Table of contents**<a id='toc0_'></a>    
- [Formatting dates in pandas](#toc1_)    
  - [Transforming column dtype to `datetime`](#toc1_1_)    
  - [Extracting day, month, year](#toc1_2_)    
  - [Extracting weekday, weekend](#toc1_3_)    
  - [Converting times](#toc1_4_)    
  - [Extracting times](#toc1_5_)    
    - [Extra: Create groups - morning, day, night](#toc1_5_1_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Formatting dates in pandas](#toc0_)

In [5]:
import pandas as pd

sales = pd.read_csv("https://github.com/data-bootcamp-v4/data/raw/refs/heads/main/supermarket_sales.csv")
sales.head()

Unnamed: 0,Invoice ID,Branch,City,Customer type,Gender,Product line,Unit price,Quantity,Tax 5%,Total,Date,Time,Payment,cogs,gross margin percentage,gross income,Rating
0,750-67-8428,A,Yangon,Member,Female,Health and beauty,74.69,7,26.1415,548.9715,1/5/2019,13:08,Ewallet,522.83,4.761905,26.1415,9.1
1,226-31-3081,C,Naypyitaw,Normal,Female,Electronic accessories,15.28,5,3.82,80.22,3/8/2019,10:29,Cash,76.4,4.761905,3.82,9.6
2,631-41-3108,A,Yangon,Normal,Male,Home and lifestyle,46.33,7,16.2155,340.5255,3/3/2019,13:23,Credit card,324.31,4.761905,16.2155,7.4
3,123-19-1176,A,Yangon,Member,Male,Health and beauty,58.22,8,23.288,489.048,1/27/2019,20:33,Ewallet,465.76,4.761905,23.288,8.4
4,373-73-7910,A,Yangon,Normal,Male,Sports and travel,86.31,7,30.2085,634.3785,2/8/2019,10:37,Ewallet,604.17,4.761905,30.2085,5.3


In [2]:
# Review dataframe characteristics
sales.dtypes

Invoice ID                  object
Branch                      object
City                        object
Customer type               object
Gender                      object
Product line                object
Unit price                 float64
Quantity                     int64
Tax 5%                     float64
Total                      float64
Date                        object
Time                        object
Payment                     object
cogs                       float64
gross margin percentage    float64
gross income               float64
Rating                     float64
dtype: object

## <a id='toc1_1_'></a>[Transforming column dtype to `datetime`](#toc0_)

In [8]:
# Convert Date
display(sales["Date"])
sales["Date"] = pd.to_datetime(sales["Date"], format="mixed")
display(sales["Date"])

0       1/5/2019
1       3/8/2019
2       3/3/2019
3      1/27/2019
4       2/8/2019
         ...    
995    1/29/2019
996     3/2/2019
997     2/9/2019
998    2/22/2019
999    2/18/2019
Name: Date, Length: 1000, dtype: object

0     2019-01-05
1     2019-03-08
2     2019-03-03
3     2019-01-27
4     2019-02-08
         ...    
995   2019-01-29
996   2019-03-02
997   2019-02-09
998   2019-02-22
999   2019-02-18
Name: Date, Length: 1000, dtype: datetime64[ns]

In [10]:
# Convert Time
display(sales["Time"])
sales["Time"] = pd.to_datetime(sales["Time"], format="mixed")
display(sales["Time"])

0      13:08
1      10:29
2      13:23
3      20:33
4      10:37
       ...  
995    13:46
996    17:16
997    13:22
998    15:33
999    13:28
Name: Time, Length: 1000, dtype: object

0     2024-11-05 13:08:00
1     2024-11-05 10:29:00
2     2024-11-05 13:23:00
3     2024-11-05 20:33:00
4     2024-11-05 10:37:00
              ...        
995   2024-11-05 13:46:00
996   2024-11-05 17:16:00
997   2024-11-05 13:22:00
998   2024-11-05 15:33:00
999   2024-11-05 13:28:00
Name: Time, Length: 1000, dtype: datetime64[ns]

## <a id='toc1_2_'></a>[Extracting day, month, year](#toc0_)

In [11]:
sales["Day"] = sales["Date"].dt.day
sales["Month"] = sales["Date"].dt.month
sales["Year"] = sales["Date"].dt.year

sales[["Date", "Day", "Month", "Year"]]

Unnamed: 0,Date,Day,Month,Year
0,2019-01-05,5,1,2019
1,2019-03-08,8,3,2019
2,2019-03-03,3,3,2019
3,2019-01-27,27,1,2019
4,2019-02-08,8,2,2019
...,...,...,...,...
995,2019-01-29,29,1,2019
996,2019-03-02,2,3,2019
997,2019-02-09,9,2,2019
998,2019-02-22,22,2,2019


In [None]:
def get_season(month):
    # Add your code here ;)
    return

## <a id='toc1_3_'></a>[Extracting weekday, weekend](#toc0_)

In [12]:
# Extract weekday
sales["Weekday"] = sales["Date"].dt.weekday

In [13]:
# Check unique vals
sales["Weekday"].unique()

array([5, 4, 6, 0, 3, 2, 1])

In [15]:
# Extract weekend
sales["Weekend"] = sales["Weekday"].apply(lambda x: True if x >= 5 else False)

sales[["Date", "Weekday", "Weekend"]]

Unnamed: 0,Date,Weekday,Weekend
0,2019-01-05,5,True
1,2019-03-08,4,False
2,2019-03-03,6,True
3,2019-01-27,6,True
4,2019-02-08,4,False
...,...,...,...
995,2019-01-29,1,False
996,2019-03-02,5,True
997,2019-02-09,5,True
998,2019-02-22,4,False


## <a id='toc1_5_'></a>[Extracting hour, minute](#toc0_)

In [16]:
sales["Hour"] = sales["Time"].dt.hour
sales["Minute"] = sales["Time"].dt.minute

sales[["Time", "Hour", "Minute"]]

Unnamed: 0,Time,Hour,Minute
0,2024-11-05 13:08:00,13,8
1,2024-11-05 10:29:00,10,29
2,2024-11-05 13:23:00,13,23
3,2024-11-05 20:33:00,20,33
4,2024-11-05 10:37:00,10,37
...,...,...,...
995,2024-11-05 13:46:00,13,46
996,2024-11-05 17:16:00,17,16
997,2024-11-05 13:22:00,13,22
998,2024-11-05 15:33:00,15,33


In [18]:
sales.Hour.value_counts().sort_index()

Hour
10    101
11     90
12     89
13    103
14     83
15    102
16     77
17     74
18     93
19    113
20     75
Name: count, dtype: int64

### <a id='toc1_5_1_'></a>[Extra: Create groups - morning, day, night](#toc0_)

In [19]:
def get_time_of_day(hour):
    if hour < 12:
        return "morning"
    elif hour < 17:
        return "afternoon"
    else:
        return "evening"

sales["Time of Day"] = sales["Hour"].apply(get_time_of_day)
sales["Time of Day"].value_counts()

Time of Day
afternoon    454
evening      355
morning      191
Name: count, dtype: int64