# Converting numbers and strings to datetime

This is a notebook for the medium article [10 tricks for converting numbers and strings to datetime in Pandas](https://bindichen.medium.com/10-tricks-for-converting-numbers-and-strings-to-datetime-in-pandas-82a4645fc23d)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)


In [1]:
import pandas as pd
import numpy as np

## 1. Convert numbers to datetime

In [2]:
df = pd.DataFrame({'date': [1470195805, 1480195805, 1490195805],
                   'value': [2, 3, 4]})
df

Unnamed: 0,date,value
0,1470195805,2
1,1480195805,3
2,1490195805,4


In [3]:
pd.to_datetime(df['date'], unit='s')

0   2016-08-03 03:43:25
1   2016-11-26 21:30:05
2   2017-03-22 15:16:45
Name: date, dtype: datetime64[ns]

In [4]:
df['date'].astype('datetime64[s]')

0   2016-08-03 03:43:25
1   2016-11-26 21:30:05
2   2017-03-22 15:16:45
Name: date, dtype: datetime64[ns]

In [5]:
df = pd.DataFrame({'date': [1470, 1480, 1490],
                   'value': [2, 3, 4]})
df

Unnamed: 0,date,value
0,1470,2
1,1480,3
2,1490,4


In [6]:
pd.to_datetime(df['date'], unit='D')

0   1974-01-10
1   1974-01-20
2   1974-01-30
Name: date, dtype: datetime64[ns]

In [7]:
df['date'].astype('datetime64[D]')

0   1974-01-10
1   1974-01-20
2   1974-01-30
Name: date, dtype: datetime64[ns]

## 2. Converting strings to datetime

In [8]:
df = pd.DataFrame({'date': ['3/10/2015', '3/11/2015', '3/12/2015'],
                   'value': [2, 3, 4]})
df

Unnamed: 0,date,value
0,3/10/2015,2
1,3/11/2015,3
2,3/12/2015,4


In [9]:
df.dtypes

date     object
value     int64
dtype: object

In [10]:
pd.to_datetime(df['date'])

0   2015-03-10
1   2015-03-11
2   2015-03-12
Name: date, dtype: datetime64[ns]

In [11]:
df['date'].astype('datetime64')

0   2015-03-10
1   2015-03-11
2   2015-03-12
Name: date, dtype: datetime64[ns]

## 3. Handling day first format

In [12]:
df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000'],
                   'value': [2, 3, 4]})

df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df

Unnamed: 0,date,value
0,2000-10-03,2
1,2000-11-03,3
2,2000-12-03,4


## 4. Handling custom datetime format 

In [13]:
df = pd.DataFrame({'date': ['2016-6-10 20:30:0', 
                            '2016-7-1 19:45:30', 
                            '2013-10-12 4:5:1'],
                   'value': [2, 3, 4]})

df['date'] = pd.to_datetime(df['date'], format="%Y-%d-%m %H:%M:%S")
df

Unnamed: 0,date,value
0,2016-10-06 20:30:00,2
1,2016-01-07 19:45:30,3
2,2013-12-10 04:05:01,4


## 5. Handling invalid value

In [14]:
df = pd.DataFrame({'date': ['3/10/2000', 'a/11/2000', '3/12/2000'],
                   'value': [2, 3, 4]})
df

Unnamed: 0,date,value
0,3/10/2000,2
1,a/11/2000,3
2,3/12/2000,4


In [15]:
# Getting ParseError
df['date'] = pd.to_datetime(df['date'])

ParserError: Unknown string format: a/11/2000

In [16]:
df['date'] = pd.to_datetime(df['date'], errors='ignore')
df

Unnamed: 0,date,value
0,3/10/2000,2
1,a/11/2000,3
2,3/12/2000,4


In [17]:
df['date'] = pd.to_datetime(df['date'], errors='coerce')
df

Unnamed: 0,date,value
0,2000-03-10,2
1,NaT,3
2,2000-03-12,4


## 6. Handling missing values

In [18]:
df = pd.DataFrame({'date': ['3/10/2000', np.nan, '3/12/2000'],
                   'value': [2, 3, 4]})
df

Unnamed: 0,date,value
0,3/10/2000,2
1,,3
2,3/12/2000,4


In [19]:
df['date'].astype('datetime64')

0   2000-03-10
1          NaT
2   2000-03-12
Name: date, dtype: datetime64[ns]

In [20]:
pd.to_datetime(df['date'])

0   2000-03-10
1          NaT
2   2000-03-12
Name: date, dtype: datetime64[ns]

In [21]:
df.dtypes

date     object
value     int64
dtype: object

In [22]:
# replacing NaN
df = pd.DataFrame({'date': ['3/10/2000', np.nan, '3/12/2000'],
                   'value': [2, 3, 4]})

df['date'] = df['date'].fillna('3/11/2000').astype('datetime64[ns]')

In [23]:
df

Unnamed: 0,date,value
0,2000-03-10,2
1,2000-03-11,3
2,2000-03-12,4


## 7. Assembling a datetime from multiple columns

In [24]:
df = pd.DataFrame({'year': [2015, 2016],
                   'month': [2, 3],
                   'day': [4, 5],
                   'hour': [10,11]
                  })
df

Unnamed: 0,year,month,day,hour
0,2015,2,4,10
1,2016,3,5,11


In [25]:
# Create a datetime column from a subset of columns
pd.to_datetime(df[['month','day','year']])

0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

In [26]:
# Create a datetime column from the entire DataFrame
pd.to_datetime(df)

0   2015-02-04 10:00:00
1   2016-03-05 11:00:00
dtype: datetime64[ns]

## 8. Converting multiple columns at once

In [27]:
df = pd.DataFrame({'date_start': ['3/10/2000', '3/11/2000', '3/12/2000'],
                   'date_end': ['3/11/2000', '3/12/2000', '3/13/2000'],
                   'value': [2, 3, 4]})

In [28]:
df = df.astype({
    'date_start': 'datetime64',
    'date_end': 'datetime64'
})

In [29]:
df

Unnamed: 0,date_start,date_end,value
0,2000-03-10,2000-03-11,2
1,2000-03-11,2000-03-12,3
2,2000-03-12,2000-03-13,4


## 9. Parsing a date column when reading a CSV file

In [30]:
df = pd.read_csv(
    'dataset.csv',
    dtype={
        # datetime64[ns] is not supported
        'value': 'float16'
    },
    parse_dates=['date']
)
df

Unnamed: 0,date,value
0,2000-03-10,2.0
1,2000-03-11,3.0
2,2000-03-12,4.0


## 10. `astype()` vs. `to_numeric()`

In [31]:
s = pd.Series(np.arange(1352500101000000000,1420438546000000000,6000000000))

In [32]:
%timeit pd.to_datetime(s)

1.4 s ± 9.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [33]:
%timeit s.astype('datetime64[ns]')

23.4 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [34]:
%timeit pd.to_datetime(s, unit='ns')

46.1 ms ± 185 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
