# Working with Dates and Times

## Summary
In this notebook, we'll be covering:
- [The Datetime Library](#The-Datetime-Library)
- [Datetimes in Pandas](#Datetimes-in-Pandas)

Working with dates and times in Python can often be confusing at first. For example, you know how to use Python to evaluate 4 - 3 (which equals 1, of course). But how can we use Python to compute the number of days between Jan 7, 2012 and March 23, 2017? 

Thankfully, computers are good at these sorts of calculations, and Python provides a library for working with dates and times that makes these sorts of calculations very straightforward. If you don't intend to work with date and time information in this way this section is something you can safely skip, but if you do this will save you a lot of time and frustration.

### The Datetime Library
Python provides a library for handling dates and time called "datetime". Not only is the datetime library useful on its own, but it also integrates nicely with pandas.

First, we need to import datetime. By convention, datetime is typically imported and renamed to "dt".

In [1]:
import datetime as dt

First, let's create two datetime objects that we can play with.

In [2]:
present = dt.datetime.now()
print(present)
past = dt.datetime(year=1997, month=2, day=18, hour=2, minute=17, second=39)
print(past)

2025-11-13 08:45:37.393261
1997-02-18 02:17:39


Above, we created a variable `present` that grabbed the time the moment you ran that cell. If you run it again, `present` will change because it will be reset to the new time. We also created a variable `past` that we set to 2:17:39 am on February 18th, 1997. (We could easily set it to 2 pm by setting the time to 14. Or, if you don't love 24-hour time, setting it equal to 2 + 12.)

You can also now see why the datetime library is renamed as "dt": datetime (the library) has another object in it called datetime! When we ran `dt.datetime.now()` we asked for the `now()` function in the `datetime` object in the `datetime` library, which is under the alias `dt`. It can be easier to read if one of these datetimes is dt and the other datetime.

Let's do some simple math now to determine how long it has been since 2:17:39 am on February 18th, 1997.

In [3]:
diff = present - past
print(diff)

10495 days, 6:27:58.393261


`diff` is a `timedelta` object, a datetime object that stores the differences between other datetime objects. It has three attributes we can access: days, seconds, and microseconds. For instance, if we just want the days since `past` we can write:

In [4]:
diff.days

10495

If we want seconds this will work:

In [5]:
diff.seconds

23278

Note, however, that diff.seconds only gave us the seconds that were NOT included as days! If we actually wanted to measure the total number of seconds since `past` we would need to write this:

In [6]:
seconds = diff.seconds + (diff.days * 24 * 60 * 60)
print(seconds)

906791278


#### Quickly, just for practice, make another variable that uses dt.datetime.now() to set a time to the current runtime, and print how many seconds since the variable called present.

In [7]:
# your code goes here


### Datetimes in Pandas
Let's now work with some datetime objects in a pandas dataframe. The code below will create a dataframe with an intake date and a discharge date for 50 people, presumably at a hospital.

In [8]:
import random
import pandas as pd

date_dict = {'Intake': [], 'Discharge': []}

for x in range(0, 50):
    year = random.randint(1990, 2020)
    intake = dt.datetime(year=year, month=random.randint(1, 12), day=random.randint(1, 28), 
                         hour=random.randint(0, 23))
    discharge = intake + dt.timedelta(days=random.randint(0, 30), seconds=random.randint(60 * 60, 23 * 60 * 60))
    date_dict['Intake'].append(intake)
    date_dict['Discharge'].append(discharge)
    
date_frame = pd.DataFrame(date_dict)
date_frame.head()

Unnamed: 0,Intake,Discharge
0,1992-08-22 10:00:00,1992-08-25 12:01:15
1,2018-11-11 22:00:00,2018-11-27 17:01:40
2,1997-06-23 22:00:00,1997-06-25 20:29:08
3,2008-01-23 07:00:00,2008-02-19 21:59:17
4,2004-06-26 05:00:00,2004-07-13 20:30:50


Let's check the dtypes.

In [9]:
date_frame.dtypes

Intake       datetime64[ns]
Discharge    datetime64[ns]
dtype: object

As you can see, the dtypes are `datetimes64[ns]`. They are pandas datetime objects, and you can do math with them and get timedelta objects, just like we did above using the datetime library. Technically, they are slightly different than datetime, but the concepts are very similar.

One difference is how we access the year, month, day, etc. In the case of the pandas row we need to do `.dt.year` instead of just `.year` (or whatever time period). The code below prints the year for every intake.

In [10]:
date_frame['Intake'].dt.year

0     1992
1     2018
2     1997
3     2008
4     2004
5     2006
6     2015
7     1999
8     2002
9     1997
10    1998
11    2003
12    1996
13    1990
14    2013
15    2003
16    1992
17    2018
18    2014
19    2016
20    1999
21    1995
22    1997
23    2014
24    1991
25    1998
26    2019
27    1991
28    2016
29    1998
30    2006
31    2016
32    1995
33    2009
34    2017
35    2017
36    2009
37    1996
38    2010
39    2009
40    2006
41    2011
42    2017
43    2008
44    2016
45    2015
46    1993
47    2018
48    1997
49    1990
Name: Intake, dtype: int32

However, the pandas version has some nice extra features. What if you suspected that the care patients got was different during weekdays versus weekends, or that certain sorts of patients were more likely to wait until a weekend to get care? You could do a lot of math, or you could do this:

In [11]:
date_frame['Intake'].dt.weekday

0     5
1     6
2     0
3     2
4     5
5     0
6     3
7     0
8     6
9     1
10    6
11    1
12    4
13    1
14    0
15    1
16    3
17    5
18    6
19    0
20    2
21    5
22    0
23    5
24    1
25    5
26    5
27    4
28    3
29    5
30    5
31    2
32    4
33    1
34    0
35    5
36    6
37    2
38    6
39    4
40    6
41    6
42    1
43    0
44    3
45    1
46    1
47    5
48    0
49    2
Name: Intake, dtype: int32

That's the day of the week, starting at Monday (0) and ending on Sunday (6). Now, we could start writing out a list of days and use this as a list index to pull the name of the day (`weekday_names[date_frame['Intake'].dt.weekday]`, roughly), but pandas has anticipated this need with the `day_name()` function.

In [12]:
date_frame['Intake'].dt.day_name()

0      Saturday
1        Sunday
2        Monday
3     Wednesday
4      Saturday
5        Monday
6      Thursday
7        Monday
8        Sunday
9       Tuesday
10       Sunday
11      Tuesday
12       Friday
13      Tuesday
14       Monday
15      Tuesday
16     Thursday
17     Saturday
18       Sunday
19       Monday
20    Wednesday
21     Saturday
22       Monday
23     Saturday
24      Tuesday
25     Saturday
26     Saturday
27       Friday
28     Thursday
29     Saturday
30     Saturday
31    Wednesday
32       Friday
33      Tuesday
34       Monday
35     Saturday
36       Sunday
37    Wednesday
38       Sunday
39       Friday
40       Sunday
41       Sunday
42      Tuesday
43       Monday
44     Thursday
45      Tuesday
46      Tuesday
47     Saturday
48       Monday
49    Wednesday
Name: Intake, dtype: object

Note that unlike `weekday`, which is an attribute, `day_name` is a method, and so it must be called as a function (`day_name()`).

When dealing with timedelta objects in pandas we again use the `dt` notation. Like a regular timedelta this has days, seconds, and microseconds, but these will be accessed via `dt.days`, `dt.seconds`, and `dt.microseconds`.

Below, we are creating a column that represents the seconds the person was in the hospital. Note that to use the dot notation on the calculated timedelta we need to enclose the calculation in parentheses, which makes it clear to Python that we finish that calculation and then get the `.dt.seconds` from that.

In [13]:
date_frame['Seconds of Hospitalization'] = (date_frame['Discharge'] - date_frame['Intake']).dt.seconds
date_frame.head()

Unnamed: 0,Intake,Discharge,Seconds of Hospitalization
0,1992-08-22 10:00:00,1992-08-25 12:01:15,7275
1,2018-11-11 22:00:00,2018-11-27 17:01:40,68500
2,1997-06-23 22:00:00,1997-06-25 20:29:08,80948
3,2008-01-23 07:00:00,2008-02-19 21:59:17,53957
4,2004-06-26 05:00:00,2004-07-13 20:30:50,55850


#### Below, create a new column that has the number of days the person was in the hospital. (Note the name of the dataframe, as well. Unlike last notebook, it isn't df.)

In [14]:
# your code goes here


What about dates that come as text? The dataframe below has dates in three formats, all text: month-day-year, day-month-year, and year-month-day.

In [15]:
text_dict = {'Month First': [], 'Day First': [], 'Year First': []}

for x in range(0, 50):
    year = str(random.randint(1990, 2020))
    month = str(random.randint(1, 12))
    day = str(random.randint(1, 28))
    text_dict['Month First'].append(month + '-' + day + '-' + year)
    text_dict['Day First'].append(day + '-' + month + '-' + year)
    text_dict['Year First'].append(year + '-' + month + '-' + day)
    
text_frame = pd.DataFrame(text_dict)
text_frame.head()

Unnamed: 0,Month First,Day First,Year First
0,7-10-2008,10-7-2008,2008-7-10
1,11-17-2018,17-11-2018,2018-11-17
2,4-21-1994,21-4-1994,1994-4-21
3,6-23-2010,23-6-2010,2010-6-23
4,2-12-2010,12-2-2010,2010-2-12


Pandas provides us with the `to_datetime` function, which serves to convert text to datetime format. Let's run that on the month first data.

In [16]:
pd.to_datetime(text_frame['Month First'])

0    2008-07-10
1    2018-11-17
2    1994-04-21
3    2010-06-23
4    2010-02-12
5    2008-09-07
6    1995-02-10
7    2016-06-23
8    2020-10-26
9    2006-09-09
10   1997-04-25
11   2020-04-02
12   1998-07-09
13   2013-02-08
14   2000-06-12
15   2014-02-06
16   2008-11-02
17   1998-07-03
18   2012-06-01
19   2003-05-04
20   2000-06-19
21   2011-01-19
22   1992-09-09
23   2003-11-22
24   2003-02-25
25   2017-10-11
26   2005-02-28
27   2000-03-19
28   1993-08-05
29   2006-10-10
30   1995-01-05
31   2011-04-26
32   2014-04-12
33   2012-06-07
34   2009-02-01
35   1999-07-19
36   1991-02-09
37   2010-03-06
38   2003-05-09
39   1991-05-01
40   2007-10-01
41   2019-01-07
42   1991-12-08
43   2020-11-11
44   1995-09-11
45   2019-06-09
46   2002-07-02
47   2017-05-18
48   1996-06-28
49   1999-05-19
Name: Month First, dtype: datetime64[ns]

To run a day-first column we only need to pass the keyword argument `dayfirst=True`.

In [17]:
pd.to_datetime(text_frame['Day First'], dayfirst=True)

0    2008-07-10
1    2018-11-17
2    1994-04-21
3    2010-06-23
4    2010-02-12
5    2008-09-07
6    1995-02-10
7    2016-06-23
8    2020-10-26
9    2006-09-09
10   1997-04-25
11   2020-04-02
12   1998-07-09
13   2013-02-08
14   2000-06-12
15   2014-02-06
16   2008-11-02
17   1998-07-03
18   2012-06-01
19   2003-05-04
20   2000-06-19
21   2011-01-19
22   1992-09-09
23   2003-11-22
24   2003-02-25
25   2017-10-11
26   2005-02-28
27   2000-03-19
28   1993-08-05
29   2006-10-10
30   1995-01-05
31   2011-04-26
32   2014-04-12
33   2012-06-07
34   2009-02-01
35   1999-07-19
36   1991-02-09
37   2010-03-06
38   2003-05-09
39   1991-05-01
40   2007-10-01
41   2019-01-07
42   1991-12-08
43   2020-11-11
44   1995-09-11
45   2019-06-09
46   2002-07-02
47   2017-05-18
48   1996-06-28
49   1999-05-19
Name: Day First, dtype: datetime64[ns]

#### Now, write code that handles the year-first column and makes a new column, Date, that is the date in a datetime format. Year-first does not require any special arguments.

In [18]:
# your code goes here


What about reassembling datetimes from components? The dataframe below (`separate_frame`) has three pieces of data that, together, could be the year, month, and day of a datetime object.

In [19]:
separate_frame = pd.DataFrame()
separate_frame['Intake Year'] = date_frame['Intake'].dt.year
separate_frame['Intake Month'] = date_frame['Intake'].dt.month
separate_frame['Intake Day'] = date_frame['Intake'].dt.day
separate_frame['Intake Hour'] = date_frame['Intake'].dt.hour
separate_frame.head()

Unnamed: 0,Intake Year,Intake Month,Intake Day,Intake Hour
0,1992,8,22,10
1,2018,11,11,22
2,1997,6,23,22
3,2008,1,23,7
4,2004,6,26,5


It turns out that `pd.to_datetime` can also convert a selection of columns into a pandas Series of datetime objects if the column names match keyword arguments, like "year", or "Month". (Capitalization, in this particular case, does not matter.) Minimally, you need a year, a month, and a day.

The code below asks for Intake Year, Intake Month, and Intake Day and renames them "year", "month", and "day". This is NOT an in-place operation, because this new object is meant to be temporary.

In [20]:
separate_frame[['Intake Year', 'Intake Month', 'Intake Day']].rename(columns={'Intake Year': 'year',
                                                                              'Intake Month': 'month',
                                                                             'Intake Day': 'day'})

Unnamed: 0,year,month,day
0,1992,8,22
1,2018,11,11
2,1997,6,23
3,2008,1,23
4,2004,6,26
5,2006,3,20
6,2015,8,27
7,1999,7,5
8,2002,11,17
9,1997,3,18


Below, we take the whole operation above and enclose it in the parentheses of `pd.to_datetime`.

In [21]:
pd.to_datetime(separate_frame[['Intake Year', 'Intake Month', 'Intake Day']].rename(columns={'Intake Year': 'year',
                                                                              'Intake Month': 'month',
                                                                             'Intake Day': 'day'}))

0    1992-08-22
1    2018-11-11
2    1997-06-23
3    2008-01-23
4    2004-06-26
5    2006-03-20
6    2015-08-27
7    1999-07-05
8    2002-11-17
9    1997-03-18
10   1998-02-15
11   2003-09-02
12   1996-03-08
13   1990-01-09
14   2013-08-12
15   2003-12-16
16   1992-10-15
17   2018-02-03
18   2014-07-27
19   2016-08-08
20   1999-06-16
21   1995-11-04
22   1997-05-19
23   2014-08-16
24   1991-04-02
25   1998-01-10
26   2019-09-07
27   1991-11-01
28   2016-07-28
29   1998-04-18
30   2006-07-22
31   2016-05-18
32   1995-05-19
33   2009-11-24
34   2017-05-22
35   2017-05-06
36   2009-02-22
37   1996-12-04
38   2010-07-11
39   2009-12-18
40   2006-12-03
41   2011-06-12
42   2017-12-19
43   2008-11-17
44   2016-05-05
45   2015-08-18
46   1993-07-13
47   2018-12-15
48   1997-04-07
49   1990-05-02
dtype: datetime64[ns]

If that were assigned to a column we would have a new, assembled, datetime column.

#### Below, make a new column in the separate_frame dataframe that is the datetime object that represents all the columns together.

In [22]:
# your code goes here


And that's it. While dates and times can be very tricky if you treat them like numbers, converting them to datetime objects can make them easy to deal with.