# Working with datetime in Pandas DataFrame

This is a notebook for the medium article [Working with datetime in Pandas DataFrame](https://medium.com/@bindiatwork/working-with-datetime-in-pandas-dataframe-663f7af6c587)

Please check out article for instructions

## Copyright

This notebook and the medium article are Copyright (c) B. Chen. You are

* free for study
* free for making copies of the notebooks or forking this repo for your own private use
* **not allowed for commercial or broadcast use**

In [1]:
import pandas as pd
import numpy as np

### 1. Convert strings to datetime

#### Default arguments

In [2]:
df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000'],
                   'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'])

In [3]:
df

Unnamed: 0,date,value
0,2000-03-10,2
1,2000-03-11,3
2,2000-03-12,4


#### Day first format

In [4]:
df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000'],
                   'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'], dayfirst=True)

In [5]:
df

Unnamed: 0,date,value
0,2000-10-03,2
1,2000-11-03,3
2,2000-12-03,4


#### Custom format 

In [6]:
df = pd.DataFrame({'date': ['2016-6-10 20:30:0', 
                            '2016-7-1 19:45:30', 
                            '2013-10-12 4:5:1'],
                   'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'], format="%Y-%d-%m %H:%M:%S")

In [7]:
df

Unnamed: 0,date,value
0,2016-10-06 20:30:00,2
1,2016-01-07 19:45:30,3
2,2013-12-10 04:05:01,4


#### Speedup parsing with infer_datetime_format

In [8]:
df = pd.DataFrame({'date': ['3/11/2000', '3/12/2000', '3/13/2000'] * 1000 })
df.head()

Unnamed: 0,date
0,3/11/2000
1,3/12/2000
2,3/13/2000
3,3/11/2000
4,3/12/2000


In [9]:
%timeit pd.to_datetime(df['date'], infer_datetime_format=True)

1.26 ms ± 36.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [10]:
%timeit pd.to_datetime(df['date'], infer_datetime_format=False)

1.2 ms ± 8.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


#### Handle parsing error

In [11]:
df = pd.DataFrame({'date': ['3/10/2000', 'a/11/2000', '3/12/2000'],
                   'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'])

ParserError: Unknown string format: a/11/2000

In [None]:
df['date'] = pd.to_datetime(df['date'], errors='ignore')
df

In [None]:
df['date'] = pd.to_datetime(df['date'], errors='coerce')
df

### 2. Assemble a datetime from multiple columns

In [None]:
df = pd.DataFrame({'year': [2015, 2016],
                   'month': [2, 3],
                   'day': [4, 5]})

df['date'] = pd.to_datetime(df)

In [None]:
df

### 3. Get year, month and day

In [None]:
df = pd.DataFrame({'name': ['Tom', 'Andy', 'Lucas'],
                 'DoB': ['08-05-1997', '04-28-1996', '12-16-1995']})
df['DoB'] = pd.to_datetime(df['DoB'])

In [None]:
df['year']= df['DoB'].dt.year
df['month']= df['DoB'].dt.month
df['day']= df['DoB'].dt.day
df

### 4. Get the week of year, the day of week and leap year

In [None]:
df['week_of_year'] = df['DoB'].dt.week
df['day_of_week'] = df['DoB'].dt.dayofweek
df['is_leap_year'] = df['DoB'].dt.is_leap_year
df

In [None]:
dw_mapping={
    0: 'Monday', 
    1: 'Tuesday', 
    2: 'Wednesday', 
    3: 'Thursday', 
    4: 'Friday',
    5: 'Saturday', 
    6: 'Sunday'
} 
df['day_of_week_name']=df['DoB'].dt.weekday.map(dw_mapping)
df

### 5. Get the age from the date of birth

In [None]:
today = pd.to_datetime('today')
df['age'] = today.year - df['DoB'].dt.year

df

In [None]:
# Year difference
today = pd.to_datetime('today')
diff_y = today.year - df['DoB'].dt.year
# Haven't had birthday
b_md = df['DoB'].apply(lambda x: (x.month,x.day) )
no_birthday = b_md > (today.month,today.day)

df['age'] = diff_y - no_birthday
df

### 6. Improve performance by setting date column as the index 

In [None]:
df = pd.read_csv('data/city_sales.csv',parse_dates=['date'])
df.info()

In [None]:
df = df.set_index(['date'])

In [None]:
df

### 7. Select data with a specific year and perform aggregation

In [None]:
df.loc['2018']

In [None]:
df.loc['2018','num'].sum()

In [None]:
df['2018'].groupby('city').sum()

### 8. Select data with a specific month or a specific day of the month

In [None]:
df.loc['2018-5']

In [None]:
df.loc['2018-5-1']

In [None]:
cond = df.index.month==2
df[cond]

### 9. Select data between two dates

In [None]:
df.loc['2016' : '2018']

In [None]:
df.loc['2018-5-2 10' : '2018-5-2 11' ]

In [None]:
df.loc['2018-5-2 10:30' : '2018-5-2 10:45' ]

In [None]:
df.between_time('10:30','10:45')

### 10. Handle missing values

In [None]:
df['rolling_sum'] = df.rolling(3).sum()
df.head()

In [None]:
df['rolling_sum_backfilled'] = df['rolling_sum'].fillna(method='backfill')
df.head()

### That's it
This is a notebook for the medium article [Working with datetime in Pandas DataFrame](https://medium.com/@bindiatwork/working-with-datetime-in-pandas-dataframe-663f7af6c587)

Please check out article for instructions